LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v5 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
From: Andy Shevchenko @ 2018-06-12 14:20 UTC (permalink / raw)
  To: Baoquan He
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu, kexec,
	Michal Simek, David S. Miller, Chris Zankel, Max Filippov,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
In-Reply-To: <20180612093812.GC1820@MiWiFi-R3L-srv>

On Tue, Jun 12, 2018 at 12:38 PM, Baoquan He <bhe@redhat.com> wrote:
> On 06/12/18 at 11:29am, Andy Shevchenko wrote:
>> On Tue, Jun 12, 2018 at 6:28 AM, Baoquan He <bhe@redhat.com> wrote:

>> > +{
>>
>> > +       for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
>> > +               if (p->end < res->start)
>> > +                       continue;
>> > +               if (res->end < p->start)
>> > +                       break;
>>
>> > +               if (p->start < res->start || p->end > res->end)
>> > +                       return -1;      /* not completely contained */
>>
>> Usually we are expecting real eeror codes.
>
> Hmm, I just copied it from arch/powerpc/kernel/pci-common.c. The
> function interface expects an integer returned value, not sure what a
> real error codes look like, could you give more hints? Will change
> accordingly.

I briefly looked at the code and error codes we have, so, my proposal
is one of the following
 - use -ECANCELED (not the best choice for first occurrence here,
though I can't find better)
 - use positive integers (or enum), like
  #define RES_REPARENTED 0
  #define RES_OVERLAPPED 1
  #define RES_NOCONFLICT 2


>> > +               if (firstpp == NULL)
>> > +                       firstpp = pp;
>> > +       }
>>
>> > +       if (firstpp == NULL)
>> > +               return -1;      /* didn't find any conflicting entries? */
>>
>> Ditto.

Ditto.

>>
>> > +}
>> > +EXPORT_SYMBOL(reparent_resources);

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: [RFC PATCH 1/3] Revert "mm: always flush VMA ranges affected by zap_page_range"
From: Aneesh Kumar K.V @ 2018-06-12 13:53 UTC (permalink / raw)
  To: Nicholas Piggin, linux-mm
  Cc: linuxppc-dev, linux-arch, Aneesh Kumar K . V, Minchan Kim,
	Mel Gorman, Nadav Amit, Andrew Morton, Linus Torvalds
In-Reply-To: <20180612071621.26775-2-npiggin@gmail.com>

On 06/12/2018 12:46 PM, Nicholas Piggin wrote:
> This reverts commit 4647706ebeee6e50f7b9f922b095f4ec94d581c3.
> 
> Patch 99baac21e4585 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
> problem") provides a superset of the TLB flush coverage of this
> commit, and even includes in the changelog "this patch supersedes
> 'mm: Always flush VMA ranges affected by zap_page_range v2'".
> 
> Reverting this avoids double flushing the TLB range, and the less
> efficient flush_tlb_range() call (the mmu_gather API is more precise
> about what ranges it invalidates).
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   mm/memory.c | 14 +-------------
>   1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 7206a634270b..9d472e00fc2d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1603,20 +1603,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
>   	tlb_gather_mmu(&tlb, mm, start, end);
>   	update_hiwater_rss(mm);
>   	mmu_notifier_invalidate_range_start(mm, start, end);
> -	for ( ; vma && vma->vm_start < end; vma = vma->vm_next) {
> +	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
>   		unmap_single_vma(&tlb, vma, start, end, NULL);
> -
> -		/*
> -		 * zap_page_range does not specify whether mmap_sem should be
> -		 * held for read or write. That allows parallel zap_page_range
> -		 * operations to unmap a PTE and defer a flush meaning that
> -		 * this call observes pte_none and fails to flush the TLB.
> -		 * Rather than adding a complex API, ensure that no stale
> -		 * TLB entries exist when this call returns.
> -		 */
> -		flush_tlb_range(vma, start, end);
> -	}
> -
>   	mmu_notifier_invalidate_range_end(mm, start, end);
>   	tlb_finish_mmu(&tlb, start, end);
>   }
> 

No really related to this patch, but does 99baac21e4585 do the right 
thing if the range start - end covers pages with multiple page sizes?

-aneesh

^ permalink raw reply

* Re: [v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
From: Michael Ellerman @ 2018-06-12 13:47 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, linuxppc-dev
  Cc: Aneesh Kumar K.V, Aneesh Kumar K.V, Laurent Dufour,
	Nicholas Piggin
In-Reply-To: <152839253238.25118.3114450844744290470.stgit@jupiter.in.ibm.com>

Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 2edc673be137..e56759d92356 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -422,6 +422,31 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>  	return 0; /* need to perform reset */
>  }
>  
> +static int mce_handle_error(struct rtas_error_log *errp)
> +{
> +	struct pseries_errorlog *pseries_log;
> +	struct pseries_mc_errorlog *mce_log;
> +	int disposition = rtas_error_disposition(errp);
> +	uint8_t error_type;
> +
> +	pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
> +	if (pseries_log == NULL)
> +		goto out;
> +
> +	mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
> +	error_type = rtas_mc_error_type(mce_log);
> +
> +	if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
> +			(error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
> +		slb_dump_contents();
> +		slb_flush_and_rebolt();

Aren't we back in virtual mode here?

Don't we need to do the flush in real mode before turning the MMU back
on. Otherwise we'll just take another multi-hit?

cheers

^ permalink raw reply

* Re: [PATCH 5/5] powerpc/pkeys: make protection key 0 less special
From: Florian Weimer @ 2018-06-12 12:18 UTC (permalink / raw)
  To: Ram Pai, mpe
  Cc: Ulrich.Weigand, mhocko, dave.hansen, aneesh.kumar, luto, bauerman,
	linuxppc-dev
In-Reply-To: <1528164594-23823-6-git-send-email-linuxram@us.ibm.com>

On 06/05/2018 04:09 AM, Ram Pai wrote:
> diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
> index 0409c80..d349e22 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -13,7 +13,10 @@
>   
>   DECLARE_STATIC_KEY_TRUE(pkey_disabled);
>   extern int pkeys_total; /* total pkeys as per device tree */
> -extern u32 initial_allocation_mask; /* bits set for reserved keys */
> +extern u32 initial_allocation_mask; /*  bits set for the initially allocated keys */
> +extern u32 reserved_allocation_mask; /* bits set for reserved keys */
> +
> +#define PKEY_0	0
>   
>   /*
>    * Define these here temporarily so we're not dependent on patching linux/mm.h.

This part no longer applies cleanly.

Thanks,
Florian

^ permalink raw reply

* Re: pkeys on POWER: Access rights not reset on execve
From: Florian Weimer @ 2018-06-12 12:17 UTC (permalink / raw)
  To: Ram Pai; +Cc: Linux-MM, linuxppc-dev, Andy Lutomirski, Dave Hansen
In-Reply-To: <20180611200807.GA5773@ram.oc3035372033.ibm.com>

On 06/11/2018 10:08 PM, Ram Pai wrote:
> Ok. try this patch. This patch is on top of the 5 patches that I had
> sent last week i.e  "[PATCH  0/5] powerpc/pkeys: fixes to pkeys"
> 
> The following is a draft patch though to check if it meets your
> expectations.
> 
> commit fe53b5fe2dcb3139ea27ade3ae7cbbe43c4af3be
> Author: Ram Pai<linuxram@us.ibm.com>
> Date:   Mon Jun 11 14:57:34 2018 -0500
> 
>      powerpc/pkeys: Deny read/write/execute by default

With this patch, my existing misc/tst-pkey test in glibc passes.  The 
in-tree version still has some incorrect assumptions on implementation 
behavior, but those are test bugs.  The kernel behavior with your patch 
look good to me.  Thanks.

Florian

^ permalink raw reply

* Re: [powerpc/powervm]kernel BUG at mm/memory_hotplug.c:1864!
From: Balbir Singh @ 2018-06-12 10:28 UTC (permalink / raw)
  To: vrbagal1, Oscar Salvador
  Cc: sachinp, linux-mm, linuxppc-dev, nfont, Linuxppc-dev, linux-next
In-Reply-To: <0aac625ee724d877b87c69bba5ac9a0e@linux.vnet.ibm.com>



On 11/06/18 17:41, vrbagal1 wrote:
> On 2018-06-08 17:45, Oscar Salvador wrote:
>> On Fri, Jun 08, 2018 at 05:11:24PM +0530, vrbagal1 wrote:
>>> On 2018-06-08 16:58, Oscar Salvador wrote:
>>> >On Fri, Jun 08, 2018 at 04:44:24PM +0530, vrbagal1 wrote:
>>> >>Greetings!!!
>>> >>
>>> >>I am seeing kernel bug followed by oops message and system reboots,
>>> >>while
>>> >>running dlpar memory hotplug test.
>>> >>
>>> >>Machine Details: Power6 PowerVM Platform
>>> >>GCC version: (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC))
>>> >>Test case: dlpar memory hotplug test (https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/memhotplug.py)
>>> >>Kernel Version: Linux version 4.17.0-autotest
>>> >>
>>> >>I am seeing this bug on rc7 as well.
> 
> Observing similar traces on linux next kernel: 4.17.0-next-20180608-autotest
> 
>  Block size [0x4000000] unaligned hotplug range: start 0x220000000, size 0x1000000

size < block_size in this case, why? how? Could you confirm that the block size is 64MB and your trying to remove 16MB

Balbir Singh.

^ permalink raw reply

* Re: [PATCH v2 3/3] powerpc/fsl: Implement cpu_show_spectre_v1/v2 for NXP PowerPC Book3E
From: Michal Suchánek @ 2018-06-12 10:07 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: Diana Madalina Craciun, linuxppc-dev@lists.ozlabs.org,
	oss@buserror.net, Leo Li
In-Reply-To: <AM5PR0401MB2545BC30FB194E4D4E42ADA89A7F0@AM5PR0401MB2545.eurprd04.prod.outlook.com>

On Tue, 12 Jun 2018 02:59:11 +0000
Bharat Bhushan <bharat.bhushan@nxp.com> wrote:

> Hi Diana,
> 
> > -----Original Message-----
> > From: Diana Craciun [mailto:diana.craciun@nxp.com]
> > Sent: Monday, June 11, 2018 6:23 PM
> > To: linuxppc-dev@lists.ozlabs.org
> > Cc: mpe@ellerman.id.au; oss@buserror.net; Leo Li
> > <leoyang.li@nxp.com>; Bharat Bhushan <bharat.bhushan@nxp.com>;
> > Diana Madalina Craciun <diana.craciun@nxp.com>
> > Subject: [PATCH v2 3/3] powerpc/fsl: Implement
> > cpu_show_spectre_v1/v2 for NXP PowerPC Book3E  
> 
> Please add some description

To me the subject is self-explanatory. It implements a kernel interface
that was already described elsewhere.

What are you missing here?

Thanks

Michal

> 
> > 
> > Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
> > ---
> >  arch/powerpc/Kconfig           |  2 +-
> >  arch/powerpc/kernel/security.c | 15 +++++++++++++++
> >  2 files changed, 16 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index
> > 940c955..a781d60 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -170,7 +170,7 @@ config PPC
> >  	select GENERIC_CLOCKEVENTS_BROADCAST	if SMP
> >  	select GENERIC_CMOS_UPDATE
> >  	select GENERIC_CPU_AUTOPROBE
> > -	select GENERIC_CPU_VULNERABILITIES	if PPC_BOOK3S_64
> > +	select GENERIC_CPU_VULNERABILITIES	if PPC_BOOK3S_64
> > || PPC_FSL_BOOK3E
> >  	select GENERIC_IRQ_SHOW
> >  	select GENERIC_IRQ_SHOW_LEVEL
> >  	select GENERIC_SMP_IDLE_THREAD
> > diff --git a/arch/powerpc/kernel/security.c
> > b/arch/powerpc/kernel/security.c index 797c975..aceaadc 100644
> > --- a/arch/powerpc/kernel/security.c
> > +++ b/arch/powerpc/kernel/security.c
> > @@ -183,3 +183,18 @@ ssize_t cpu_show_spectre_v2(struct device *dev,
> > struct device_attribute *attr, c  }  #endif /* CONFIG_PPC_BOOK3S_64
> > */
> > 
> > +#ifdef CONFIG_PPC_FSL_BOOK3E
> > +ssize_t cpu_show_spectre_v1(struct device *dev, struct
> > device_attribute +*attr, char *buf) {
> > +	if (barrier_nospec_enabled)
> > +		return sprintf(buf, "Mitigation: __user pointer
> > sanitization\n"); +
> > +	return sprintf(buf, "Vulnerable\n");
> > +}
> > +
> > +ssize_t cpu_show_spectre_v2(struct device *dev, struct
> > device_attribute +*attr, char *buf) {
> > +	return sprintf(buf, "Vulnerable\n");
> > +}
> > +#endif /* CONFIG_PPC_FSL_BOOK3E */
> > +
> > --
> > 2.5.5  
> 

^ permalink raw reply

* Re: [PATCH v5 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
From: Baoquan He @ 2018-06-12  9:49 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu, kexec,
	Michal Simek, David S. Miller, Chris Zankel, Max Filippov,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
In-Reply-To: <CAHp75Vd6v4+RTq-5uw_BJpT6=w2KGjTfU+rEst6+8igs2cyntw@mail.gmail.com>

On 06/12/18 at 11:29am, Andy Shevchenko wrote:
> On Tue, Jun 12, 2018 at 6:28 AM, Baoquan He <bhe@redhat.com> wrote:
> > reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> > and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> > so that it's shared. Later its code also need be updated using list_head
> > to replace singly linked list.
> 
> While this is a good deduplication of the code, some requirements for
> public functions would be good to satisfy.
> 
> > +/*
> > + * Reparent resource children of pr that conflict with res
> > + * under res, and make res replace those children.
> > + */
> 
> kernel doc format, though...

Will rewrite it, thanks.

> 
> > +static int reparent_resources(struct resource *parent,
> > +                                    struct resource *res)
> 
> ...is it really public with static keyword?!
> 
> 
> 
> > +{
> 
> > +       for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
> > +               if (p->end < res->start)
> > +                       continue;
> > +               if (res->end < p->start)
> > +                       break;
> 
> > +               if (p->start < res->start || p->end > res->end)
> > +                       return -1;      /* not completely contained */
> 
> Usually we are expecting real eeror codes.
> 
> > +               if (firstpp == NULL)
> > +                       firstpp = pp;
> > +       }
> 
> > +       if (firstpp == NULL)
> > +               return -1;      /* didn't find any conflicting entries? */
> 
> Ditto.
> 
> > +}
> > +EXPORT_SYMBOL(reparent_resources);
> 
> -- 
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply

* Re: [PATCH v5 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
From: Baoquan He @ 2018-06-12  9:38 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu, kexec,
	Michal Simek, David S. Miller, Chris Zankel, Max Filippov,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
In-Reply-To: <CAHp75Vd6v4+RTq-5uw_BJpT6=w2KGjTfU+rEst6+8igs2cyntw@mail.gmail.com>

On 06/12/18 at 11:29am, Andy Shevchenko wrote:
> On Tue, Jun 12, 2018 at 6:28 AM, Baoquan He <bhe@redhat.com> wrote:
> > reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> > and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> > so that it's shared. Later its code also need be updated using list_head
> > to replace singly linked list.
> 
> While this is a good deduplication of the code, some requirements for
> public functions would be good to satisfy.
> 
> > +/*
> > + * Reparent resource children of pr that conflict with res
> > + * under res, and make res replace those children.
> > + */
> 
> kernel doc format, though...

> 
> > +static int reparent_resources(struct resource *parent,
> > +                                    struct resource *res)
> 
> ...is it really public with static keyword?!

Thanks for looking into this. This is a code bug, I copied and changed,
but forgot merging the changing to local commit. And the error reported
by test robot in patch 2 was changed too locally, forgot merging it to
patch. Will repost to address this.

> 
> 
> 
> > +{
> 
> > +       for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
> > +               if (p->end < res->start)
> > +                       continue;
> > +               if (res->end < p->start)
> > +                       break;
> 
> > +               if (p->start < res->start || p->end > res->end)
> > +                       return -1;      /* not completely contained */
> 
> Usually we are expecting real eeror codes.

Hmm, I just copied it from arch/powerpc/kernel/pci-common.c. The
function interface expects an integer returned value, not sure what a
real error codes look like, could you give more hints? Will change
accordingly.

> 
> > +               if (firstpp == NULL)
> > +                       firstpp = pp;
> > +       }
> 
> > +       if (firstpp == NULL)
> > +               return -1;      /* didn't find any conflicting entries? */
> 
> Ditto.
> 
> > +}
> > +EXPORT_SYMBOL(reparent_resources);
> 
> -- 
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply

* [PATCH] powerpc/64s/radix: Fix radix_kvm_prefetch_workaround paca access of not possible CPU
From: Nicholas Piggin @ 2018-06-12  9:38 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, Pridhiviraj Paidipeddi, stable

If possible CPUs are limited (e.g., by kexec), then the kvm prefetch
workaround function can access the paca pointer for a !possible CPU.

Fixes: d2e60075a3d44 ("powerpc/64: Use array of paca pointers and allocate pacas individually")
Cc: stable@kernel.org
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

---
[kdump still seems to have a problem upstream, but this solves one crash]

 arch/powerpc/mm/tlb-radix.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 67a6e86d3e7e..06ea845d8033 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -1043,6 +1043,8 @@ extern void radix_kvm_prefetch_workaround(struct mm_struct *mm)
 		for (; sib <= cpu_last_thread_sibling(cpu) && !flush; sib++) {
 			if (sib == cpu)
 				continue;
+			if (!cpu_possible(sib))
+				continue;
 			if (paca_ptrs[sib]->kvm_hstate.kvm_vcpu)
 				flush = true;
 		}
-- 
2.17.0

^ permalink raw reply related

* [PATCH v6 4/4] selftests/powerpc: update strlen() test to test the new assembly function
From: Christophe Leroy @ 2018-06-12  9:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	wei.guo.simon, segher
  Cc: linux-kernel, linuxppc-dev
In-Reply-To: <85de16f5629ac9f4a815230cced361908758b53a.1528791416.git.christophe.leroy@c-s.fr>

This patch modifies the test for testing the new assembly strlen() instead
of the generic strlen()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 v6: added additional necessary defines in ppc_asm.h
 v5: no change
 v4: new

 .../testing/selftests/powerpc/stringloops/Makefile |  3 +--
 .../selftests/powerpc/stringloops/asm/cache.h      |  1 +
 .../selftests/powerpc/stringloops/asm/ppc_asm.h    | 30 ++++++++++++++++++++++
 .../testing/selftests/powerpc/stringloops/string.S |  1 +
 4 files changed, 33 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/stringloops/asm/cache.h
 create mode 120000 tools/testing/selftests/powerpc/stringloops/string.S

diff --git a/tools/testing/selftests/powerpc/stringloops/Makefile b/tools/testing/selftests/powerpc/stringloops/Makefile
index df663ee9ddb3..0c088a6d0369 100644
--- a/tools/testing/selftests/powerpc/stringloops/Makefile
+++ b/tools/testing/selftests/powerpc/stringloops/Makefile
@@ -10,8 +10,7 @@ $(OUTPUT)/memcmp_64: CFLAGS += -m64
 $(OUTPUT)/memcmp_32: memcmp.c
 $(OUTPUT)/memcmp_32: CFLAGS += -m32
 
-$(OUTPUT)/strlen: strlen.c string.o
-$(OUTPUT)/string.o: string.c
+$(OUTPUT)/strlen: strlen.c string.S
 
 ASFLAGS = $(CFLAGS)
 
diff --git a/tools/testing/selftests/powerpc/stringloops/asm/cache.h b/tools/testing/selftests/powerpc/stringloops/asm/cache.h
new file mode 100644
index 000000000000..8a2840831122
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/asm/cache.h
@@ -0,0 +1 @@
+#define	IFETCH_ALIGN_BYTES 4
diff --git a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
index 136242ec4b0e..5226bd8bc39f 100644
--- a/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
+++ b/tools/testing/selftests/powerpc/stringloops/asm/ppc_asm.h
@@ -1,4 +1,18 @@
 /* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(CONFIG_PPC64) && !defined(CONFIG_PPC32)
+#ifdef __powerpc64__
+#define CONFIG_PPC64
+#else
+#define CONFIG_PPC32
+#endif
+#endif
+
+#ifdef __LITTLE_ENDIAN__
+#define CONFIG_CPU_LITTLE_ENDIAN
+#else
+#define CONFIG_CPU_BIG_ENDIAN
+#endif
+
 #include <ppc-asm.h>
 
 #ifndef r1
@@ -6,3 +20,19 @@
 #endif
 
 #define _GLOBAL(A) FUNC_START(test_ ## A)
+
+#ifdef __powerpc64__
+#define SZL		8
+#define PPC_LLU		ldu
+#define PPC_LCMPI	cmpldi
+#define PPC_ROTLI	rotldi
+#define PPC_CNTLZL	cntlzd
+#define PPC_SRLI	srdi
+#else
+#define SZL		4
+#define PPC_LLU		lwzu
+#define PPC_LCMPI	cmplwi
+#define PPC_ROTLI	rotlwi
+#define PPC_CNTLZL	cntlzw
+#define PPC_SRLI	srwi
+#endif
diff --git a/tools/testing/selftests/powerpc/stringloops/string.S b/tools/testing/selftests/powerpc/stringloops/string.S
new file mode 120000
index 000000000000..9f5babec7d21
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/string.S
@@ -0,0 +1 @@
+../../../../../arch/powerpc/lib/string.S
\ No newline at end of file
-- 
2.13.3

^ permalink raw reply related

* [PATCH v6 3/4] powerpc/lib: implement strlen() in assembly
From: Christophe Leroy @ 2018-06-12  9:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	wei.guo.simon, segher
  Cc: linux-kernel, linuxppc-dev
In-Reply-To: <85de16f5629ac9f4a815230cced361908758b53a.1528791416.git.christophe.leroy@c-s.fr>

The generic implementation of strlen() reads strings byte per byte.

This patch implements strlen() in assembly based on a read of entire
words, in the same spirit as what some other arches and glibc do.

On a 8xx the time spent in strlen is reduced by 3/4 for long strings.

strlen() selftest on an 8xx provides the following values:

Before the patch (ie with the generic strlen() in lib/string.c):

len 256 : time = 1.195055
len 016 : time = 0.083745
len 008 : time = 0.046828
len 004 : time = 0.028390

After the patch:

len 256 : time = 0.272185 ==> 78% improvment
len 016 : time = 0.040632 ==> 51% improvment
len 008 : time = 0.033060 ==> 29% improvment
len 004 : time = 0.029149 ==> 2% degradation

On a 832x:

Before the patch:

len 256 : time = 0.236125
len 016 : time = 0.018136
len 008 : time = 0.011000
len 004 : time = 0.007229

After the patch:

len 256 : time = 0.094950 ==> 60% improvment
len 016 : time = 0.013357 ==> 26% improvment
len 008 : time = 0.010586 ==> 4% improvment
len 004 : time = 0.008784

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
Not tested on PPC64.

Changes in v6:
 - Reworked for having branchless conclusion

Changes in v5:
 - Fixed for PPC64 LITTLE ENDIAN

Changes in v4:
 - Added alignment of the loop
 - doing the andc only if still not 0 as it happends only for bytes above 0x7f which is pretty rare in a string

Changes in v3:
 - Made it common to PPC32 and PPC64

Changes in v2:
 - Moved handling of unaligned strings outside of the main path as it is very unlikely.
 - Removed the verification of the fourth byte in case none of the three first ones are NUL.

 arch/powerpc/include/asm/asm-compat.h |  6 +++
 arch/powerpc/include/asm/string.h     |  1 +
 arch/powerpc/lib/string.S             | 81 +++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+)

diff --git a/arch/powerpc/include/asm/asm-compat.h b/arch/powerpc/include/asm/asm-compat.h
index 7f2a7702596c..fe2b459c8486 100644
--- a/arch/powerpc/include/asm/asm-compat.h
+++ b/arch/powerpc/include/asm/asm-compat.h
@@ -20,8 +20,11 @@
 
 /* operations for longs and pointers */
 #define PPC_LL		stringify_in_c(ld)
+#define PPC_LLU		stringify_in_c(ldu)
 #define PPC_STL		stringify_in_c(std)
 #define PPC_STLU	stringify_in_c(stdu)
+#define PPC_ROTLI	stringify_in_c(rotldi)
+#define PPC_SRLI	stringify_in_c(srdi)
 #define PPC_LCMPI	stringify_in_c(cmpdi)
 #define PPC_LCMPLI	stringify_in_c(cmpldi)
 #define PPC_LCMP	stringify_in_c(cmpd)
@@ -53,8 +56,11 @@
 
 /* operations for longs and pointers */
 #define PPC_LL		stringify_in_c(lwz)
+#define PPC_LLU		stringify_in_c(lwzu)
 #define PPC_STL		stringify_in_c(stw)
 #define PPC_STLU	stringify_in_c(stwu)
+#define PPC_ROTLI	stringify_in_c(rotlwi)
+#define PPC_SRLI	stringify_in_c(srwi)
 #define PPC_LCMPI	stringify_in_c(cmpwi)
 #define PPC_LCMPLI	stringify_in_c(cmplwi)
 #define PPC_LCMP	stringify_in_c(cmpw)
diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h
index 9b8cedf618f4..8fdcb532de72 100644
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -13,6 +13,7 @@
 #define __HAVE_ARCH_MEMCHR
 #define __HAVE_ARCH_MEMSET16
 #define __HAVE_ARCH_MEMCPY_FLUSHCACHE
+#define __HAVE_ARCH_STRLEN
 
 extern char * strcpy(char *,const char *);
 extern char * strncpy(char *,const char *, __kernel_size_t);
diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S
index 4b41970e9ed8..1d0593cba9d4 100644
--- a/arch/powerpc/lib/string.S
+++ b/arch/powerpc/lib/string.S
@@ -67,3 +67,84 @@ _GLOBAL(memchr)
 2:	li	r3,0
 	blr
 EXPORT_SYMBOL(memchr)
+
+/*
+ * Algorigthm:
+ *
+ * 1) Given a word 'x', we can test to see if it contains any 0 bytes
+ *    by subtracting 0x01010101, and seeing if any of the high bits of each
+ *    byte changed from 0 to 1. This works because the least significant
+ *    0 byte must have had no incoming carry (otherwise it's not the least
+ *    significant), so it is 0x00 - 0x01 == 0xff. For all other
+ *    byte values, either they have the high bit set initially, or when
+ *    1 is subtracted you get a value in the range 0x00-0x7f, none of which
+ *    have their high bit set. The expression here is
+ *    (x - 0x01010101) & ~x & 0x80808080), which gives 0x00000000 when
+ *    there were no 0x00 bytes in the word.  You get 0x80 in bytes that
+ *    match, but possibly false 0x80 matches in the next more significant
+ *    byte to a true match due to carries.  For little-endian this is
+ *    of no consequence since the least significant match is the one
+ *    we're interested in, but big-endian needs method 2 to find which
+ *    byte matches.
+ * 2) Given a word 'x', we can test to see _which_ byte was zero by
+ *    calculating ~(((x & ~0x80808080) - 0x80808080 - 1) | x | ~0x80808080).
+ *    This produces 0x80 in each byte that was zero, and 0x00 in all
+ *    the other bytes. The '| ~0x80808080' clears the low 7 bits in each
+ *    byte, and the '| x' part ensures that bytes with the high bit set
+ *    produce 0x00. The addition will carry into the high bit of each byte
+ *    iff that byte had one of its low 7 bits set. We can then just see
+ *    which was the most significant bit set and divide by 8 to find how
+ *    many to add to the index.
+ *    This is from the book 'The PowerPC Compiler Writer's Guide',
+ *    by Steve Hoxey, Faraydon Karim, Bill Hay and Hank Warren.
+ */
+
+_GLOBAL(strlen)
+	andi.   r9, r3, (SZL - 1)
+	lis	r7, 0x0101
+	addi	r10, r3, -SZL
+	addic	r7, r7, 0x0101		/* r7 = 0x01010101 (lomagic) & clr CA */
+#ifdef CONFIG_PPC64
+	rldimi	r7, r7, 32, 0		/* r7 = 0x0101010101010101 (lomagic) */
+#endif
+	bne-	1f
+2:	PPC_ROTLI	r6, r7, 31 	/* r6 = 0x80808080(80808080) (himagic)*/
+	.balign IFETCH_ALIGN_BYTES
+3:	PPC_LLU	r9, SZL(r10)
+	/* ((x - lomagic) & ~x & himagic) == 0 means no byte in x is NUL */
+	subf	r8, r7, r9
+	and.	r8, r8, r6
+	beq+	3b
+	andc.	r8, r8, r9
+	beq+	3b
+#ifdef CONFIG_CPU_BIG_ENDIAN
+	andc	r8, r9, r6
+	orc	r9, r9, r6
+	subfe	r8, r6, r8
+	nor	r8, r8, r9
+	PPC_CNTLZL	r8, r8
+	subf	r3, r3, r10
+	PPC_SRLI	r8, r8, 3
+	add	r3, r3, r8
+#else
+	addi	r9, r8, -1
+	addi	r10, r10, (SZL - 1)
+	andc	r8, r9, r8
+	PPC_CNTLZL	r8, r8
+	subf	r3, r3, r10
+	PPC_SRLI	r8, r8, 3
+	subf	r3, r8, r3
+#endif
+	blr
+
+1:	lbz	r9, SZL(r10)
+	addi	r10, r10, 1
+	cmpwi	cr1, r9, 0
+	andi.	r9, r10, (SZL - 1)
+	beq	cr1, 4f
+	bne	1b
+	b	2b
+4:	addi	r10, r10, (SZL - 1)
+	subf	r3, r3, r10
+	blr
+EXPORT_SYMBOL(strlen)
-- 
2.13.3

^ permalink raw reply related

* [PATCH v6 2/4] selftests/powerpc: Add test for strlen()
From: Christophe Leroy @ 2018-06-12  9:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	wei.guo.simon, segher
  Cc: linux-kernel, linuxppc-dev
In-Reply-To: <85de16f5629ac9f4a815230cced361908758b53a.1528791416.git.christophe.leroy@c-s.fr>

This patch adds a test for strlen()

string.c contains a copy of strlen() from lib/string.c

The test first tests the correctness of strlen() by comparing
the result with libc strlen(). It tests all cases of alignment.

It them tests the duration of an aligned strlen() on a 4 bytes string,
on a 16 bytes string and on a 256 bytes string.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 v6: refactorised the benchmark test
 v5: no change
 v4: new

 .../testing/selftests/powerpc/stringloops/Makefile |   5 +-
 .../testing/selftests/powerpc/stringloops/string.c |  36 ++++++
 .../testing/selftests/powerpc/stringloops/strlen.c | 127 +++++++++++++++++++++
 3 files changed, 167 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/stringloops/string.c
 create mode 100644 tools/testing/selftests/powerpc/stringloops/strlen.c

diff --git a/tools/testing/selftests/powerpc/stringloops/Makefile b/tools/testing/selftests/powerpc/stringloops/Makefile
index 1e7301d4bac9..df663ee9ddb3 100644
--- a/tools/testing/selftests/powerpc/stringloops/Makefile
+++ b/tools/testing/selftests/powerpc/stringloops/Makefile
@@ -10,9 +10,12 @@ $(OUTPUT)/memcmp_64: CFLAGS += -m64
 $(OUTPUT)/memcmp_32: memcmp.c
 $(OUTPUT)/memcmp_32: CFLAGS += -m32
 
+$(OUTPUT)/strlen: strlen.c string.o
+$(OUTPUT)/string.o: string.c
+
 ASFLAGS = $(CFLAGS)
 
-TEST_GEN_PROGS := memcmp_32 memcmp_64
+TEST_GEN_PROGS := memcmp_32 memcmp_64 strlen
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/stringloops/string.c b/tools/testing/selftests/powerpc/stringloops/string.c
new file mode 100644
index 000000000000..d05200481017
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/string.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  linux/lib/string.c
+ *
+ *  Copyright (C) 1991, 1992  Linus Torvalds
+ */
+
+/*
+ * stupid library routines.. The optimized versions should generally be found
+ * as inline code in <asm-xx/string.h>
+ *
+ * These are buggy as well..
+ *
+ * * Fri Jun 25 1999, Ingo Oeser <ioe@informatik.tu-chemnitz.de>
+ * -  Added strsep() which will replace strtok() soon (because strsep() is
+ *    reentrant and should be faster). Use only strsep() in new code, please.
+ *
+ * * Sat Feb 09 2002, Jason Thomas <jason@topic.com.au>,
+ *                    Matthew Hawkins <matt@mh.dropbear.id.au>
+ * -  Kissed strtok() goodbye
+ */
+
+#include <stddef.h>
+
+/**
+ * strlen - Find the length of a string
+ * @s: The string to be sized
+ */
+size_t test_strlen(const char *s)
+{
+	const char *sc;
+
+	for (sc = s; *sc != '\0'; ++sc)
+		/* nothing */;
+	return sc - s;
+}
diff --git a/tools/testing/selftests/powerpc/stringloops/strlen.c b/tools/testing/selftests/powerpc/stringloops/strlen.c
new file mode 100644
index 000000000000..9055ebc484d0
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/strlen.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <malloc.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+#include "utils.h"
+
+#define SIZE 256
+#define ITERATIONS 1000
+#define ITERATIONS_BENCH 100000
+
+int test_strlen(const void *s);
+
+/* test all offsets and lengths */
+static void test_one(char *s)
+{
+	unsigned long offset;
+
+	for (offset = 0; offset < SIZE; offset++) {
+		int x, y;
+		unsigned long i;
+
+		y = strlen(s + offset);
+		x = test_strlen(s + offset);
+
+		if (x != y) {
+			printf("strlen() returned %d, should have returned %d (%p offset %ld)\n", x, y, s, offset);
+
+			for (i = offset; i < SIZE; i++)
+				printf("%02x ", s[i]);
+			printf("\n");
+		}
+	}
+}
+
+static void bench_test(char *s)
+{
+	struct timespec ts_start, ts_end;
+	int i;
+
+	clock_gettime(CLOCK_MONOTONIC, &ts_start);
+
+	for (i = 0; i < ITERATIONS_BENCH; i++)
+		test_strlen(s);
+
+	clock_gettime(CLOCK_MONOTONIC, &ts_end);
+
+	printf("len %3.3d : time = %.6f\n", test_strlen(s), ts_end.tv_sec - ts_start.tv_sec + (ts_end.tv_nsec - ts_start.tv_nsec) / 1e9);
+}
+
+static int testcase(void)
+{
+	char *s;
+	unsigned long i;
+
+	s = memalign(128, SIZE);
+	if (!s) {
+		perror("memalign");
+		exit(1);
+	}
+
+	srandom(1);
+
+	memset(s, 0, SIZE);
+	for (i = 0; i < SIZE; i++) {
+		char c;
+
+		do {
+			c = random() & 0x7f;
+		} while (!c);
+		s[i] = c;
+		test_one(s);
+	}
+
+	for (i = 0; i < ITERATIONS; i++) {
+		unsigned long j;
+
+		for (j = 0; j < SIZE; j++) {
+			char c;
+
+			do {
+				c = random() & 0x7f;
+			} while (!c);
+			s[j] = c;
+		}
+		for (j = 0; j < sizeof(long); j++) {
+			s[SIZE - 1 - j] = 0;
+			test_one(s);
+		}
+	}
+
+	for (i = 0; i < SIZE; i++) {
+		char c;
+
+		do {
+			c = random() & 0x7f;
+		} while (!c);
+		s[i] = c;
+	}
+
+	bench_test(s);
+
+	s[16] = 0;
+	bench_test(s);
+
+	s[8] = 0;
+	bench_test(s);
+
+	s[4] = 0;
+	bench_test(s);
+
+	s[3] = 0;
+	bench_test(s);
+
+	s[2] = 0;
+	bench_test(s);
+
+	s[1] = 0;
+	bench_test(s);
+
+	return 0;
+}
+
+int main(void)
+{
+	return test_harness(testcase, "strlen");
+}
-- 
2.13.3

^ permalink raw reply related

* [PATCH v6 1/4] selftests/powerpc: add test for 32 bits memcmp
From: Christophe Leroy @ 2018-06-12  9:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	wei.guo.simon, segher
  Cc: linux-kernel, linuxppc-dev

This patch renames memcmp test to memcmp_64 and adds
a memcmp_32 test for testing the 32 bits version of memcmp()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 v6: no change
 v5: no change
 v4: new

 tools/testing/selftests/powerpc/stringloops/Makefile    | 14 +++++++++++---
 tools/testing/selftests/powerpc/stringloops/memcmp_32.S |  1 +
 2 files changed, 12 insertions(+), 3 deletions(-)
 create mode 120000 tools/testing/selftests/powerpc/stringloops/memcmp_32.S

diff --git a/tools/testing/selftests/powerpc/stringloops/Makefile b/tools/testing/selftests/powerpc/stringloops/Makefile
index 1125e489055e..1e7301d4bac9 100644
--- a/tools/testing/selftests/powerpc/stringloops/Makefile
+++ b/tools/testing/selftests/powerpc/stringloops/Makefile
@@ -1,10 +1,18 @@
 # SPDX-License-Identifier: GPL-2.0
 # The loops are all 64-bit code
-CFLAGS += -m64
 CFLAGS += -I$(CURDIR)
 
-TEST_GEN_PROGS := memcmp
-EXTRA_SOURCES := memcmp_64.S ../harness.c
+EXTRA_SOURCES := ../harness.c
+
+$(OUTPUT)/memcmp_64: memcmp.c
+$(OUTPUT)/memcmp_64: CFLAGS += -m64
+
+$(OUTPUT)/memcmp_32: memcmp.c
+$(OUTPUT)/memcmp_32: CFLAGS += -m32
+
+ASFLAGS = $(CFLAGS)
+
+TEST_GEN_PROGS := memcmp_32 memcmp_64
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/stringloops/memcmp_32.S b/tools/testing/selftests/powerpc/stringloops/memcmp_32.S
new file mode 120000
index 000000000000..056f2b3af789
--- /dev/null
+++ b/tools/testing/selftests/powerpc/stringloops/memcmp_32.S
@@ -0,0 +1 @@
+../../../../../arch/powerpc/lib/memcmp_32.S
\ No newline at end of file
-- 
2.13.3

^ permalink raw reply related

* Re: [PATCH v5 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
From: Andy Shevchenko @ 2018-06-12  8:29 UTC (permalink / raw)
  To: Baoquan He
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu, kexec,
	Michal Simek, David S. Miller, Chris Zankel, Max Filippov,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
In-Reply-To: <20180612032831.29747-2-bhe@redhat.com>

On Tue, Jun 12, 2018 at 6:28 AM, Baoquan He <bhe@redhat.com> wrote:
> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> so that it's shared. Later its code also need be updated using list_head
> to replace singly linked list.

While this is a good deduplication of the code, some requirements for
public functions would be good to satisfy.

> +/*
> + * Reparent resource children of pr that conflict with res
> + * under res, and make res replace those children.
> + */

kernel doc format, though...

> +static int reparent_resources(struct resource *parent,
> +                                    struct resource *res)

...is it really public with static keyword?!



> +{

> +       for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
> +               if (p->end < res->start)
> +                       continue;
> +               if (res->end < p->start)
> +                       break;

> +               if (p->start < res->start || p->end > res->end)
> +                       return -1;      /* not completely contained */

Usually we are expecting real eeror codes.

> +               if (firstpp == NULL)
> +                       firstpp = pp;
> +       }

> +       if (firstpp == NULL)
> +               return -1;      /* didn't find any conflicting entries? */

Ditto.

> +}
> +EXPORT_SYMBOL(reparent_resources);

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: 4.17.0-10146-gf0dc7f9c6dd9: hw csum failure on powerpc+sungem
From: Mathieu Malaterre @ 2018-06-12  8:15 UTC (permalink / raw)
  To: Balbir Singh, Christophe LEROY; +Cc: linuxppc-dev
In-Reply-To: <4030b744-f99c-bdfc-5f95-c9311e31ad0c@gmail.com>

Hi Balbir,

On Tue, Jun 12, 2018 at 9:39 AM Balbir Singh <bsingharora@gmail.com> wrote:
>
>
> On 12/06/18 06:20, Mathieu Malaterre wrote:
>
> > Hi Meelis,
> >
> > On Mon, Jun 11, 2018 at 1:21 PM Meelis Roos <mroos@linux.ee> wrote:
> >> I am seeing this on PowerMac G4 with sungem ethernet driver. 4.17 was
> >> OK, 4.17.0-10146-gf0dc7f9c6dd9 is problematic.
> > Same here.
> >
> >> [  140.518664] eth0: hw csum failure
> >> [  140.518699] CPU: 0 PID: 1237 Comm: postconf Not tainted 4.17.0-10146-gf0dc7f9c6dd9 #83
> >> [  140.518707] Call Trace:
> >> [  140.518734] [effefd90] [c03d6db8] __skb_checksum_complete+0xd8/0xdc (unreliable)
> >> [  140.518759] [effefdb0] [c04c1284] icmpv6_rcv+0x248/0x4ec
> >> [  140.518775] [effefdd0] [c049a448] ip6_input_finish.constprop.0+0x11c/0x5f4
> >> [  140.518786] [effefe10] [c049b1c0] ip6_mc_input+0xcc/0x100
> >> [  140.518807] [effefe20] [c03e110c] __netif_receive_skb_core+0x310/0x944
> >> [  140.518820] [effefe70] [c03e76ec] napi_gro_receive+0xd0/0xe8
> >> [  140.518845] [effefe80] [f3e1f66c] gem_poll+0x618/0x1274 [sungem]
> >> [  140.518856] [effeff30] [c03e6f0c] net_rx_action+0x198/0x374
> >> [  140.518872] [effeff90] [c0501a88] __do_softirq+0x120/0x278
> >> [  140.518890] [effeffe0] [c0036188] irq_exit+0xd8/0xdc
> >> [  140.518908] [effefff0] [c000f478] call_do_irq+0x24/0x3c
> >> [  140.518925] [d05a5d30] [c0007120] do_IRQ+0x74/0xf0
> >> [  140.518941] [d05a5d50] [c0012474] ret_from_except+0x0/0x14
> >> [  140.518960] --- interrupt: 501 at copy_page+0x40/0x90
> >>                    LR = copy_user_page+0x18/0x30
> >> [  140.518973] [d05a5e10] [d058cd80] 0xd058cd80 (unreliable)
> >> [  140.518989] [d05a5e20] [c00fa2bc] wp_page_copy+0xec/0x654
> >> [  140.519002] [d05a5e60] [c00fd3a4] do_wp_page+0xa8/0x5b4
> >> [  140.519013] [d05a5e90] [c00fe934] handle_mm_fault+0x564/0xa84
> >> [  140.519025] [d05a5f00] [c0016230] do_page_fault+0x1bc/0x7e8
> >> [  140.519037] [d05a5f40] [c0012300] handle_page_fault+0x14/0x40
> >> [  140.519048] --- interrupt: 301 at 0xb78b6864
> >>                    LR = 0xb78b6c54
> >>
> > For some reason if I do a git bisect it returns that:
> >
> > $ git bisect good
> > 3036bc45364f98515a2c446d7fac2c34dcfbeff4 is the first bad commit
> >
> > Could you also check on your side please.
>
> Don't have a G4, but the related commits look like
>
> 373e098e1e788d7b89ec0f31765a6c08e2ea0f7c and e9c4943a107b56696e4872cdffdba6b0c7277c77 Balbir
>

Indeed that makes more sense. I must have messed up during my
git-bisect operation. Will try a simple git-revert on those and report
back.

Thanks

^ permalink raw reply

* Re: [PATCH] pci/shpchp: no claim on pcie port device
From: kbuild test robot @ 2018-06-12  7:37 UTC (permalink / raw)
  To: Pingfan Liu; +Cc: kbuild-all, linux-pci, Bjorn Helgaas, linuxppc-dev
In-Reply-To: <1528785733-19442-1-git-send-email-kernelfans@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2611 bytes --]

Hi Pingfan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v4.17]
[cannot apply to pci/next next-20180612]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pingfan-Liu/pci-shpchp-no-claim-on-pcie-port-device/20180612-144419
config: i386-randconfig-a1-201823 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/pci/hotplug/shpchp_core.c: In function 'shpc_probe':
>> drivers/pci/hotplug/shpchp_core.c:291:18: error: 'dev' undeclared (first use in this function)
     if (pci_is_pcie(dev) &&
                     ^
   drivers/pci/hotplug/shpchp_core.c:291:18: note: each undeclared identifier is reported only once for each function it appears in

vim +/dev +291 drivers/pci/hotplug/shpchp_core.c

   284	
   285	static int shpc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
   286	{
   287		int rc;
   288		struct controller *ctrl;
   289	
   290		/* do not claim pcie port device */
 > 291		if (pci_is_pcie(dev) &&
   292		    ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) ||
   293		     (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) ||
   294		     (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)))
   295			return -ENODEV;
   296	
   297		if (!is_shpc_capable(pdev))
   298			return -ENODEV;
   299	
   300		ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
   301		if (!ctrl)
   302			goto err_out_none;
   303	
   304		INIT_LIST_HEAD(&ctrl->slot_list);
   305	
   306		rc = shpc_init(ctrl, pdev);
   307		if (rc) {
   308			ctrl_dbg(ctrl, "Controller initialization failed\n");
   309			goto err_out_free_ctrl;
   310		}
   311	
   312		pci_set_drvdata(pdev, ctrl);
   313	
   314		/* Setup the slot information structures */
   315		rc = init_slots(ctrl);
   316		if (rc) {
   317			ctrl_err(ctrl, "Slot initialization failed\n");
   318			goto err_out_release_ctlr;
   319		}
   320	
   321		rc = shpchp_create_ctrl_files(ctrl);
   322		if (rc)
   323			goto err_cleanup_slots;
   324	
   325		return 0;
   326	
   327	err_cleanup_slots:
   328		cleanup_slots(ctrl);
   329	err_out_release_ctlr:
   330		ctrl->hpc_ops->release_ctlr(ctrl);
   331	err_out_free_ctrl:
   332		kfree(ctrl);
   333	err_out_none:
   334		return -ENODEV;
   335	}
   336	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28049 bytes --]

^ permalink raw reply

* Re: 4.17.0-10146-gf0dc7f9c6dd9: hw csum failure on powerpc+sungem
From: Balbir Singh @ 2018-06-12  7:37 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <CA+7wUsw_8znPDsej=j48kkmqV3vPbOuDociBE4tth53-Zj32qA@mail.gmail.com>


On 12/06/18 06:20, Mathieu Malaterre wrote:

> Hi Meelis,
>
> On Mon, Jun 11, 2018 at 1:21 PM Meelis Roos <mroos@linux.ee> wrote:
>> I am seeing this on PowerMac G4 with sungem ethernet driver. 4.17 was
>> OK, 4.17.0-10146-gf0dc7f9c6dd9 is problematic.
> Same here.
>
>> [  140.518664] eth0: hw csum failure
>> [  140.518699] CPU: 0 PID: 1237 Comm: postconf Not tainted 4.17.0-10146-gf0dc7f9c6dd9 #83
>> [  140.518707] Call Trace:
>> [  140.518734] [effefd90] [c03d6db8] __skb_checksum_complete+0xd8/0xdc (unreliable)
>> [  140.518759] [effefdb0] [c04c1284] icmpv6_rcv+0x248/0x4ec
>> [  140.518775] [effefdd0] [c049a448] ip6_input_finish.constprop.0+0x11c/0x5f4
>> [  140.518786] [effefe10] [c049b1c0] ip6_mc_input+0xcc/0x100
>> [  140.518807] [effefe20] [c03e110c] __netif_receive_skb_core+0x310/0x944
>> [  140.518820] [effefe70] [c03e76ec] napi_gro_receive+0xd0/0xe8
>> [  140.518845] [effefe80] [f3e1f66c] gem_poll+0x618/0x1274 [sungem]
>> [  140.518856] [effeff30] [c03e6f0c] net_rx_action+0x198/0x374
>> [  140.518872] [effeff90] [c0501a88] __do_softirq+0x120/0x278
>> [  140.518890] [effeffe0] [c0036188] irq_exit+0xd8/0xdc
>> [  140.518908] [effefff0] [c000f478] call_do_irq+0x24/0x3c
>> [  140.518925] [d05a5d30] [c0007120] do_IRQ+0x74/0xf0
>> [  140.518941] [d05a5d50] [c0012474] ret_from_except+0x0/0x14
>> [  140.518960] --- interrupt: 501 at copy_page+0x40/0x90
>>                    LR = copy_user_page+0x18/0x30
>> [  140.518973] [d05a5e10] [d058cd80] 0xd058cd80 (unreliable)
>> [  140.518989] [d05a5e20] [c00fa2bc] wp_page_copy+0xec/0x654
>> [  140.519002] [d05a5e60] [c00fd3a4] do_wp_page+0xa8/0x5b4
>> [  140.519013] [d05a5e90] [c00fe934] handle_mm_fault+0x564/0xa84
>> [  140.519025] [d05a5f00] [c0016230] do_page_fault+0x1bc/0x7e8
>> [  140.519037] [d05a5f40] [c0012300] handle_page_fault+0x14/0x40
>> [  140.519048] --- interrupt: 301 at 0xb78b6864
>>                    LR = 0xb78b6c54
>>
> For some reason if I do a git bisect it returns that:
>
> $ git bisect good
> 3036bc45364f98515a2c446d7fac2c34dcfbeff4 is the first bad commit
>
> Could you also check on your side please.

Don't have a G4, but the related commits look like

373e098e1e788d7b89ec0f31765a6c08e2ea0f7c and e9c4943a107b56696e4872cdffdba6b0c7277c77 Balbir

^ permalink raw reply

* Re: [PATCH] pci/shpchp: no claim on pcie port device
From: Christoph Hellwig @ 2018-06-12  6:57 UTC (permalink / raw)
  To: Pingfan Liu; +Cc: linux-pci, Bjorn Helgaas, linuxppc-dev
In-Reply-To: <1528785733-19442-1-git-send-email-kernelfans@gmail.com>

On Tue, Jun 12, 2018 at 02:42:13PM +0800, Pingfan Liu wrote:
> The Linux Device Driver Model allows a physical device to be handled
> by only a single driver. But at present, both shpchp and portdrv_pci
> claim PCI_CLASS_BRIDGE_PCI, and touch devices_kset. This causes a
> few problems, one is the wrong shutdown seq of devices, owing to the
> broken devices_kset.

How can they both touch devices_kset?  Once one driver has claimed the
device it should not be available to others.

> +	/* do not claim pcie port device */
> +	if (pci_is_pcie(dev) &&
> +	    ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) ||
> +	     (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) ||
> +	     (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)))
> +		return -ENODEV;

No need for the inner braces.

^ permalink raw reply

* Re: [PATCH] pci/shpchp: no claim on pcie port device
From: Pingfan Liu @ 2018-06-12  7:20 UTC (permalink / raw)
  To: hch; +Cc: linux-pci, Bjorn Helgaas, linuxppc-dev
In-Reply-To: <20180612065727.GA5195@infradead.org>

On Tue, Jun 12, 2018 at 2:57 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Jun 12, 2018 at 02:42:13PM +0800, Pingfan Liu wrote:
> > The Linux Device Driver Model allows a physical device to be handled
> > by only a single driver. But at present, both shpchp and portdrv_pci
> > claim PCI_CLASS_BRIDGE_PCI, and touch devices_kset. This causes a
> > few problems, one is the wrong shutdown seq of devices, owing to the
> > broken devices_kset.
>
> How can they both touch devices_kset?  Once one driver has claimed the
> device it should not be available to others.
>
Unfortunately, it could be.  There are two code path for do_one_initcall
kernel_init->..->do_one_initcall->pcie_portdrv_init
And
load_module->do_init_module->do_one_initcall->shpcd_init
> > +     /* do not claim pcie port device */
> > +     if (pci_is_pcie(dev) &&
> > +         ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) ||
> > +          (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) ||
> > +          (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)))
> > +             return -ENODEV;
>
> No need for the inner braces.

OK.

Thanks,
Pingfan

^ permalink raw reply

* [RFC PATCH 3/3] powerpc/64s/radix: optimise TLB flush with precise TLB ranges in mmu_gather
From: Nicholas Piggin @ 2018-06-12  7:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linuxppc-dev, linux-arch, Aneesh Kumar K . V,
	Minchan Kim, Mel Gorman, Nadav Amit, Andrew Morton,
	Linus Torvalds
In-Reply-To: <20180612071621.26775-1-npiggin@gmail.com>

Use the page_start and page_end fields of mmu_gather to implement more
precise TLB flushing. (start, end) covers the entire TLB and page
table range that has been invalidated, for architectures that do not
have explicit page walk cache management. page_start and page_end are
just for ranges that may have TLB entries.

A tlb_flush may have no pages in this range, but still requires PWC
to be flushed. That is handled properly.

This brings the number of tlbiel instructions required by a kernel
compile from 33M to 25M, most avoided from exec->shift_arg_pages.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/tlb-radix.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 67a6e86d3e7e..06452ad701cf 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -853,8 +853,11 @@ void radix__tlb_flush(struct mmu_gather *tlb)
 		else
 			radix__flush_all_mm(mm);
 	} else {
-		unsigned long start = tlb->start;
-		unsigned long end = tlb->end;
+		unsigned long start = tlb->page_start;
+		unsigned long end = tlb->page_end;
+
+		if (end < start)
+			end = start;
 
 		if (!tlb->need_flush_all)
 			radix__flush_tlb_range_psize(mm, start, end, psize);
-- 
2.17.0

^ permalink raw reply related

* [RFC PATCH 2/3] mm: mmu_gather track of invalidated TLB ranges explicitly for more precise flushing
From: Nicholas Piggin @ 2018-06-12  7:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linuxppc-dev, linux-arch, Aneesh Kumar K . V,
	Minchan Kim, Mel Gorman, Nadav Amit, Andrew Morton,
	Linus Torvalds
In-Reply-To: <20180612071621.26775-1-npiggin@gmail.com>

The mmu_gather APIs keep track of the invalidated address range
including the span covered by invalidated page table pages. Page table
pages with no ptes (and therefore could not have TLB entries) still
need to be involved in the invalidation if the processor caches
intermediate levels of the page table.

This allows a backwards compatible / legacy implementation to cache
page tables without modification, if they invalidate their page table
cache using their existing tlb invalidation instructions.

However this additional flush range is not necessary if the
architecture provides explicit page table cache management, or if it
ensures that page table cache entries will never be instantiated if
they did not reach a valid pte.

This is very noticable on powerpc in the exec path, in shift_arg_pages
where the TLB flushing for the page table teardown is a very large
range that gets implemented as a full process flush. This patch
provides page_start and page_end fields to mmu_gather which
architectures can use to optimise their TLB flushing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/asm-generic/tlb.h | 27 +++++++++++++++++++++++++--
 mm/memory.c               |  4 +++-
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index faddde44de8c..a006f702b4c2 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -96,6 +96,8 @@ struct mmu_gather {
 #endif
 	unsigned long		start;
 	unsigned long		end;
+	unsigned long		page_start;
+	unsigned long		page_end;
 	/* we are in the middle of an operation to clear
 	 * a full mm and can make some optimizations */
 	unsigned int		fullmm : 1,
@@ -128,13 +130,25 @@ static inline void __tlb_adjust_range(struct mmu_gather *tlb,
 	tlb->end = max(tlb->end, address + range_size);
 }
 
+static inline void __tlb_adjust_page_range(struct mmu_gather *tlb,
+				      unsigned long address,
+				      unsigned int range_size)
+{
+	tlb->page_start = min(tlb->page_start, address);
+	tlb->page_end = max(tlb->page_end, address + range_size);
+}
+
+
 static inline void __tlb_reset_range(struct mmu_gather *tlb)
 {
 	if (tlb->fullmm) {
 		tlb->start = tlb->end = ~0;
+		tlb->page_start = tlb->page_end = ~0;
 	} else {
 		tlb->start = TASK_SIZE;
 		tlb->end = 0;
+		tlb->page_start = TASK_SIZE;
+		tlb->page_end = 0;
 	}
 }
 
@@ -210,12 +224,14 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_tlb_entry(tlb, ptep, address)		\
 	do {							\
 		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
+		__tlb_adjust_page_range(tlb, address, PAGE_SIZE); \
 		__tlb_remove_tlb_entry(tlb, ptep, address);	\
 	} while (0)
 
 #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)	     \
 	do {							     \
 		__tlb_adjust_range(tlb, address, huge_page_size(h)); \
+		__tlb_adjust_page_range(tlb, address, huge_page_size(h)); \
 		__tlb_remove_tlb_entry(tlb, ptep, address);	     \
 	} while (0)
 
@@ -230,6 +246,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_pmd_tlb_entry(tlb, pmdp, address)			\
 	do {								\
 		__tlb_adjust_range(tlb, address, HPAGE_PMD_SIZE);	\
+		__tlb_adjust_page_range(tlb, address, HPAGE_PMD_SIZE);	\
 		__tlb_remove_pmd_tlb_entry(tlb, pmdp, address);		\
 	} while (0)
 
@@ -244,6 +261,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_pud_tlb_entry(tlb, pudp, address)			\
 	do {								\
 		__tlb_adjust_range(tlb, address, HPAGE_PUD_SIZE);	\
+		__tlb_adjust_page_range(tlb, address, HPAGE_PUD_SIZE);	\
 		__tlb_remove_pud_tlb_entry(tlb, pudp, address);		\
 	} while (0)
 
@@ -262,6 +280,11 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
  * architecture to do its own odd thing, not cause pain for others
  * http://lkml.kernel.org/r/CA+55aFzBggoXtNXQeng5d_mRoDnaMBE5Y+URs+PHR67nUpMtaw@mail.gmail.com
  *
+ * Powerpc (Book3S 64-bit) with the radix MMU has an architected "page
+ * walk cache" that is invalidated with a specific instruction. It uses
+ * need_flush_all to issue this instruction, which is set by its own
+ * __p??_free_tlb functions.
+ *
  * For now w.r.t page table cache, mark the range_size as PAGE_SIZE
  */
 
@@ -273,7 +296,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 
 #define pmd_free_tlb(tlb, pmdp, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);		\
+		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
 		__pmd_free_tlb(tlb, pmdp, address);		\
 	} while (0)
 
@@ -288,7 +311,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #ifndef __ARCH_HAS_5LEVEL_HACK
 #define p4d_free_tlb(tlb, pudp, address)			\
 	do {							\
-		__tlb_adjust_range(tlb, address, PAGE_SIZE);		\
+		__tlb_adjust_range(tlb, address, PAGE_SIZE);	\
 		__p4d_free_tlb(tlb, pudp, address);		\
 	} while (0)
 #endif
diff --git a/mm/memory.c b/mm/memory.c
index 9d472e00fc2d..a46896b85e54 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -277,8 +277,10 @@ void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 {
 	struct mmu_gather_batch *batch, *next;
 
-	if (force)
+	if (force) {
 		__tlb_adjust_range(tlb, start, end - start);
+		__tlb_adjust_page_range(tlb, start, end - start);
+	}
 
 	tlb_flush_mmu(tlb);
 
-- 
2.17.0

^ permalink raw reply related

* [RFC PATCH 1/3] Revert "mm: always flush VMA ranges affected by zap_page_range"
From: Nicholas Piggin @ 2018-06-12  7:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linuxppc-dev, linux-arch, Aneesh Kumar K . V,
	Minchan Kim, Mel Gorman, Nadav Amit, Andrew Morton,
	Linus Torvalds
In-Reply-To: <20180612071621.26775-1-npiggin@gmail.com>

This reverts commit 4647706ebeee6e50f7b9f922b095f4ec94d581c3.

Patch 99baac21e4585 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
problem") provides a superset of the TLB flush coverage of this
commit, and even includes in the changelog "this patch supersedes
'mm: Always flush VMA ranges affected by zap_page_range v2'".

Reverting this avoids double flushing the TLB range, and the less
efficient flush_tlb_range() call (the mmu_gather API is more precise
about what ranges it invalidates).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/memory.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7206a634270b..9d472e00fc2d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1603,20 +1603,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
 	tlb_gather_mmu(&tlb, mm, start, end);
 	update_hiwater_rss(mm);
 	mmu_notifier_invalidate_range_start(mm, start, end);
-	for ( ; vma && vma->vm_start < end; vma = vma->vm_next) {
+	for ( ; vma && vma->vm_start < end; vma = vma->vm_next)
 		unmap_single_vma(&tlb, vma, start, end, NULL);
-
-		/*
-		 * zap_page_range does not specify whether mmap_sem should be
-		 * held for read or write. That allows parallel zap_page_range
-		 * operations to unmap a PTE and defer a flush meaning that
-		 * this call observes pte_none and fails to flush the TLB.
-		 * Rather than adding a complex API, ensure that no stale
-		 * TLB entries exist when this call returns.
-		 */
-		flush_tlb_range(vma, start, end);
-	}
-
 	mmu_notifier_invalidate_range_end(mm, start, end);
 	tlb_finish_mmu(&tlb, start, end);
 }
-- 
2.17.0

^ permalink raw reply related

* [RFC PATCH 0/3] couple of TLB flush optimisations
From: Nicholas Piggin @ 2018-06-12  7:16 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linuxppc-dev, linux-arch, Aneesh Kumar K . V,
	Minchan Kim, Mel Gorman, Nadav Amit, Andrew Morton,
	Linus Torvalds

I'm just looking around TLB flushing and noticed a few issues with
the core code. The first one seems pretty straightforward, unless I
missed something, but the TLB flush pattern after the revert seems
okay.

The second one might be a bit more interesting for other architectures
and the big comment in include/asm-generic/tlb.h and linked mail from
Linus gives some good context.

I suspect mmu notifiers should use this precise TLB range too, because
I don't see how they could care about the page table structure under
the mapping. Although I only use it in powerpc so far.

Comments?

Thanks,
Nick

Nicholas Piggin (3):
  Revert "mm: always flush VMA ranges affected by zap_page_range"
  mm: mmu_gather track of invalidated TLB ranges explicitly for more
    precise flushing
  powerpc/64s/radix: optimise TLB flush with precise TLB ranges in
    mmu_gather

 arch/powerpc/mm/tlb-radix.c |  7 +++++--
 include/asm-generic/tlb.h   | 27 +++++++++++++++++++++++++--
 mm/memory.c                 | 18 ++++--------------
 3 files changed, 34 insertions(+), 18 deletions(-)

-- 
2.17.0

^ permalink raw reply

* Re: [PATCH] pci/shpchp: no claim on pcie port device
From: Pingfan Liu @ 2018-06-12  7:14 UTC (permalink / raw)
  To: linux-pci; +Cc: Bjorn Helgaas, linuxppc-dev
In-Reply-To: <1528785733-19442-1-git-send-email-kernelfans@gmail.com>

On Tue, Jun 12, 2018 at 2:42 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> The Linux Device Driver Model allows a physical device to be handled
> by only a single driver. But at present, both shpchp and portdrv_pci
> claim PCI_CLASS_BRIDGE_PCI, and touch devices_kset. This causes a
> few problems, one is the wrong shutdown seq of devices, owing to the
> broken devices_kset.
>
> I hit this bug on a Power9 machine, when "kexec -e", and see a ata-disk
> behind a bridge can not write back buffer in flight due to the former
> shutdown of the bridge which clears the BusMaster bit.
>
> Note the device involved:
> 0004:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         NUMA node: 0
>         Bus: primary=00, secondary=01, subordinate=12, sec-latency=0
>         I/O behind bridge: 00000000-00000fff
>         Memory behind bridge: 80000000-ffefffff
>         Prefetchable memory behind bridge: 0006024000000000-0006027f7fffffff
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
>         BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>         Capabilities: [40] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [48] Express (v2) Root Port (Slot-), MSI 00
>                 DevCap: MaxPayload 512 bytes, PhantFunc 0
>                         ExtTag- RBE+
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                         MaxPayload 256 bytes, MaxReadReq 128 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>                 LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s <64ns, L1 <1us
>                         ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
>                 LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- CommClk-
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 8GT/s, Width x4, TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt+
>                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
>                 RootCap: CRSVisible-
>                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>                 DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd+
>                 DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis+, LTR-, OBFF Disabled ARIFwd+
>                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
>                          EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
>         Capabilities: [100 v1] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP+ FCP- CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF- MalfTLP- ECRC+ UnsupReq- ACSViol-
>                 UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>                 AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+
>         Capabilities: [148 v1] #19
>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> ---
>  drivers/pci/hotplug/shpchp_core.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/pci/hotplug/shpchp_core.c b/drivers/pci/hotplug/shpchp_core.c
> index 1f0f969..2cd8e19 100644
> --- a/drivers/pci/hotplug/shpchp_core.c
> +++ b/drivers/pci/hotplug/shpchp_core.c
> @@ -287,6 +287,13 @@ static int shpc_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>         int rc;
>         struct controller *ctrl;
>
> +       /* do not claim pcie port device */
> +       if (pci_is_pcie(dev) &&
> +           ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) ||
> +            (pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) ||
> +            (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)))
> +               return -ENODEV;
> +
Sorry, should s/dev/pdev/.  I will try to loan a power9 to test it.
NACK it, I will send out v2.

Thanks,
Pingfan
>         if (!is_shpc_capable(pdev))
>                 return -ENODEV;
>
> --
> 2.7.4
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox