LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [RESEND PATCH v7 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP
From: Vaibhav Jain @ 2020-05-21 16:59 UTC (permalink / raw)
  To: Michael Ellerman, Ira Weiny
  Cc: Aneesh Kumar K . V, linuxppc-dev, linux-kernel, Steven Rostedt,
	linux-nvdimm
In-Reply-To: <87k115gy0i.fsf@mpe.ellerman.id.au>

Michael Ellerman <michaele@au1.ibm.com> writes:

> Vaibhav Jain <vaibhav@linux.ibm.com> writes:
>> Thanks for reviewing this this patch Ira. My responses below:
>> Ira Weiny <ira.weiny@intel.com> writes:
>>> On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:
>>>> Implement support for fetching nvdimm health information via
>>>> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
>>>> of 64-bit big-endian integers, bitwise-and of which is then stored in
>>>> 'struct papr_scm_priv' and subsequently partially exposed to
>>>> user-space via newly introduced dimm specific attribute
>>>> 'papr/flags'. Since the hcall is costly, the health information is
>>>> cached and only re-queried, 60s after the previous successful hcall.
> ...
>>>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>>>> index f35592423380..142636e1a59f 100644
>>>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>>>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>>>> @@ -39,6 +78,15 @@ struct papr_scm_priv {
>>>>  	struct resource res;
>>>>  	struct nd_region *region;
>>>>  	struct nd_interleave_set nd_set;
>>>> +
>>>> +	/* Protect dimm health data from concurrent read/writes */
>>>> +	struct mutex health_mutex;
>>>> +
>>>> +	/* Last time the health information of the dimm was updated */
>>>> +	unsigned long lasthealth_jiffies;
>>>> +
>>>> +	/* Health information for the dimm */
>>>> +	u64 health_bitmap;
>>>
>>> I wonder if this should be typed big endian as you mention that it is in the
>>> commit message?
>> This was discussed in an earlier review of the patch series at
>> https://lore.kernel.org/linux-nvdimm/878sjetcis.fsf@mpe.ellerman.id.au
>>
>> Even though health bitmap is returned in big endian format (For ex
>> value 0xC00000000000000 indicates bits 0,1 set), its value is never
>> used. Instead only test for specific bits being set in the register is
>> done.
>
> This has already caused a lot of confusion, so let me try and clear it
> up. I will probably fail :)
>
> The value is not big endian.
>
> It's returned in a GPR (a register), from the hypervisor. The ordering
> of bytes in a register is not dependent on what endian we're executing
> in.
>
> It's true that the hypervisor will have been running big endian, and
> when it returns to us we will now be running little endian. But the
> value is unchanged, it was 0xC00000000000000 in the GPR while the HV was
> running and it's still 0xC00000000000000 when we return to Linux. You
> can see this in mambo, see below for an example.
>
>
> _However_, the specification of the bits in the bitmap value uses MSB 0
> ordering, as is traditional for IBM documentation. That means the most
> significant bit, aka. the left most bit, is called "bit 0".
>
> See: https://en.wikipedia.org/wiki/Bit_numbering#MSB_0_bit_numbering
>
> That is the opposite numbering from what most people use, and in
> particular what most code in Linux uses, which is that bit 0 is the
> least significant bit.
>
> Which is where the confusion comes in. It's not that the bytes are
> returned in a different order, it's that the bits are numbered
> differently in the IBM documentation.
>
> The way to fix this kind of thing is to read the docs, and convert all
> the bits into correct numbering (LSB=0), and then throw away the docs ;)
>
> cheers
Thanks a lot for clarifying this Mpe and for this detailed explaination.

I have removed the term Big-Endian from v8 patch description to avoid
any further confusion.

>
>
>
> In mambo we can set a breakpoint just before the kernel enters skiboot,
> towards the end of __opal_call. The kernel is running LE and skiboot
> runs BE.
>
>   systemsim-p9 [~/skiboot/skiboot/external/mambo] b 0xc0000000000c1744
>   breakpoint set at [0:0:0]: 0xc0000000000c1744 (0x00000000000C1744) Enc:0x2402004C : hrfid
>
> Then run:
>
>   systemsim-p9 [~/skiboot/skiboot/external/mambo] c
>   [0:0:0]: 0xC0000000000C1744 (0x00000000000C1744) Enc:0x2402004C : hrfid
>   INFO: 121671618: (121671618): ** Execution stopped: user (tcl),  **
>   121671618: ** finished running 121671618 instructions **
>
> And we stop there, on an hrfid that we haven't executed yet.
> We can print r0, to see the OPAL token:
>
>   systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0
>   0x0000000000000019
>
> ie. we're calling OPAL_CONSOLE_WRITE_BUFFER_SPACE (25).
>
> And we can print the MSR:
>
>   systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr
>   0x9000000002001033
>   
>                      64-bit mode (SF): 0x1 [64-bit mode]
>                 Hypervisor State (HV): 0x1
>                Vector Available (VEC): 0x1
>   Machine Check Interrupt Enable (ME): 0x1
>             Instruction Relocate (IR): 0x1
>                    Data Relocate (DR): 0x1
>            Recoverable Interrupt (RI): 0x1
>               Little-Endian Mode (LE): 0x1 [little-endian]
>
> ie. we're little endian.
>
> We then step one instruction:
>
>   systemsim-p9 [~/skiboot/skiboot/external/mambo] s
>   [0:0:0]: 0x0000000030002BF0 (0x0000000030002BF0) Enc:0x7D9FFAA6 : mfspr   r12,PIR
>
> Now we're in skiboot. Print the MSR again:
>
>   systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr
>   0x9000000002001002
>   
>                      64-bit mode (SF): 0x1 [64-bit mode]
>                 Hypervisor State (HV): 0x1
>                Vector Available (VEC): 0x1
>   Machine Check Interrupt Enable (ME): 0x1
>            Recoverable Interrupt (RI): 0x1
>
> We're big endian.
> Print r0:
>
>   systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0
>   0x0000000000000019
>
> r0 is unchanged!
Got it. Thanks again.

-- 
Cheers
~ Vaibhav

^ permalink raw reply

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
From: Aneesh Kumar K.V @ 2020-05-21 17:02 UTC (permalink / raw)
  To: Jeff Moyer, Dan Williams
  Cc: Jan Kara, linux-nvdimm, mpatocka, alistair, linuxppc-dev
In-Reply-To: <x49o8qh9wu5.fsf@segfault.boston.devel.redhat.com>

On 5/21/20 8:08 PM, Jeff Moyer wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
> 
>>> But I agree with your concern that if we have older kernel/applications
>>> that continue to use `dcbf` on future hardware we will end up
>>> having issues w.r.t powerfail consistency. The plan is what you outlined
>>> above as tighter ecosystem control. Considering we don't have a pmem
>>> device generally available, we get both kernel and userspace upgraded
>>> to use these new instructions before such a device is made available.
> 
> I thought power already supported NVDIMM-N, no?  So are you saying that
> those devices will continue to work with the existing flushing and
> fencing mechanisms?
> 

yes. these devices can continue to use 'dcbf + hwsync' as long as we are 
running them on P9.


>> Ok, I think a compile time kernel option with a runtime override
>> satisfies my concern. Does that work for you?
> 
> The compile time option only helps when running newer kernels.  I'm not
> sure how you would even begin to audit userspace applications (keep in
> mind, not every application is open source, and not every application
> uses pmdk).  I also question the merits of forcing the administrator to
> make the determination of whether all applications on the system will
> work properly.  Really, you have to rely on the vendor to tell you the
> platform is supported, and at that point, why put further hurdles in the
> way?
> 
> The decision to require different instructions on ppc is unfortunate,
> but one I'm sure we have no control over.  I don't see any merit in the
> kernel disallowing MAP_SYNC access on these platforms.  Ideally, we'd
> have some way of ensuring older kernels don't work with these new
> platforms, but I don't think that's possible.
> 


I am currently looking at the possibility of firmware present these 
devices with different device-tree compat values. So that older 
/existing kernel won't initialize the device on newer systems. Is that a 
good compromise? We still can end up with older userspace and newer 
kernel. One of the option suggested by Jan Kara is to use a prctl flag 
to control that? (intead of kernel parameter option I posted before)


> Moving on to the patch itself--Aneesh, have you audited other persistent
> memory users in the kernel?  For example, drivers/md/dm-writecache.c does
> this:
> 
> static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
> {
>   	if (WC_MODE_PMEM(wc))
> 	        wmb(); <==========
>          else
>                  ssd_commit_flushed(wc, wait_for_ios);
> }
> 
> I believe you'll need to make modifications there.
> 

Correct. Thanks for catching that.


I don't understand dm much, wondering how this will work with 
non-synchronous DAX device?

-aneesh

^ permalink raw reply

* Re: [PATCH v3 6/7] powerpc/dt_cpu_ftrs: Add MMA feature
From: Paul A. Clarke @ 2020-05-21 17:17 UTC (permalink / raw)
  To: Alistair Popple; +Cc: aneesh.kumar, mikey, linuxppc-dev, npiggin
In-Reply-To: <20200521014341.29095-7-alistair@popple.id.au>

On Thu, May 21, 2020 at 11:43:40AM +1000, Alistair Popple wrote:
> Matrix multiple assist (MMA) is a new feature added to ISAv3.1 and

s/Matrix multiple assist/Matrix-Multiply Assist/

> POWER10. Support on powernv can be selected via a firmware CPU device
> tree feature which enables it via a PCR bit.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> ---
>  arch/powerpc/include/asm/reg.h    |  3 ++-
>  arch/powerpc/kernel/dt_cpu_ftrs.c | 17 ++++++++++++++++-
>  2 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index dd20af367b57..88e6c78100d9 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -481,7 +481,8 @@
>  #define   PCR_VEC_DIS	(__MASK(63-0))	/* Vec. disable (bit NA since POWER8) */
>  #define   PCR_VSX_DIS	(__MASK(63-1))	/* VSX disable (bit NA since POWER8) */
>  #define   PCR_TM_DIS	(__MASK(63-2))	/* Trans. memory disable (POWER8) */
> -#define   PCR_HIGH_BITS	(PCR_VEC_DIS | PCR_VSX_DIS | PCR_TM_DIS)
> +#define   PCR_MMA_DIS	(__MASK(63-3)) /* Matrix-Multiply Accelerator */

s/Accelerator/Assist/

PC

> +#define   PCR_HIGH_BITS	(PCR_MMA_DIS | PCR_VEC_DIS | PCR_VSX_DIS | PCR_TM_DIS)
>  /*
>   * These bits are used in the function kvmppc_set_arch_compat() to specify and
>   * determine both the compatibility level which we want to emulate and the
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index 93c340906aad..0a41fce34165 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -75,6 +75,7 @@ static struct {
>  	u64	lpcr_clear;
>  	u64	hfscr;
>  	u64	fscr;
> +	u64	pcr;
>  } system_registers;
> 
>  static void (*init_pmu_registers)(void);
> @@ -102,7 +103,7 @@ static void __restore_cpu_cpufeatures(void)
>  	if (hv_mode) {
>  		mtspr(SPRN_LPID, 0);
>  		mtspr(SPRN_HFSCR, system_registers.hfscr);
> -		mtspr(SPRN_PCR, PCR_MASK);
> +		mtspr(SPRN_PCR, system_registers.pcr);
>  	}
>  	mtspr(SPRN_FSCR, system_registers.fscr);
> 
> @@ -555,6 +556,18 @@ static int __init feat_enable_large_ci(struct dt_cpu_feature *f)
>  	return 1;
>  }
> 
> +static int __init feat_enable_mma(struct dt_cpu_feature *f)
> +{
> +	u64 pcr;
> +
> +	feat_enable(f);
> +	pcr = mfspr(SPRN_PCR);
> +	pcr &= ~PCR_MMA_DIS;
> +	mtspr(SPRN_PCR, pcr);
> +
> +	return 1;
> +}
> +
>  struct dt_cpu_feature_match {
>  	const char *name;
>  	int (*enable)(struct dt_cpu_feature *f);
> @@ -629,6 +642,7 @@ static struct dt_cpu_feature_match __initdata
>  	{"vector-binary16", feat_enable, 0},
>  	{"wait-v3", feat_enable, 0},
>  	{"prefix-instructions", feat_enable, 0},
> +	{"matrix-multiply-assist", feat_enable_mma, 0},
>  };
> 
>  static bool __initdata using_dt_cpu_ftrs;
> @@ -779,6 +793,7 @@ static void __init cpufeatures_setup_finished(void)
>  	system_registers.lpcr = mfspr(SPRN_LPCR);
>  	system_registers.hfscr = mfspr(SPRN_HFSCR);
>  	system_registers.fscr = mfspr(SPRN_FSCR);
> +	system_registers.pcr = mfspr(SPRN_PCR);
> 
>  	pr_info("final cpu/mmu features = 0x%016lx 0x%08x\n",
>  		cur_cpu_spec->cpu_features, cur_cpu_spec->mmu_features);
> -- 
> 2.20.1
> 

^ permalink raw reply

* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Al Viro @ 2020-05-21 17:27 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Paul Mackerras,
	H. Peter Anvin, sparclinux, ira.weiny, Dan Williams, Helge Deller,
	x86, linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
	linux-xtensa, Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, Christian Koenig, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <20200519165422.GA5838@roeck-us.net>

On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > The kunmap_atomic clean up failed to remove one set of pagefault/preempt
> > enables when vaddr is not in the fixmap.
> > 
> > Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> microblazeel works with this patch, as do the nosmp sparc32 boot tests,
> but sparc32 boot tests with SMP enabled still fail with lots of messages
> such as:

BTW, what's your setup for sparc32 boot tests?  IOW, how do you manage to
shrink the damn thing enough to have the loader cope with it?  I hadn't
been able to do that for the current mainline ;-/

^ permalink raw reply

* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Ira Weiny @ 2020-05-21 17:42 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Paul Mackerras,
	H. Peter Anvin, sparclinux, Dan Williams, Helge Deller, x86,
	linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
	linux-xtensa, Borislav Petkov, Al Viro, Andy Lutomirski,
	Thomas Gleixner, linux-arm-kernel, Chris Zankel,
	Thomas Bogendoerfer, linux-parisc, linux-kernel, Christian Koenig,
	Andrew Morton, linuxppc-dev, David S. Miller
In-Reply-To: <d86dba19-4f4b-061e-a2c7-4f037e9e2de2@roeck-us.net>

On Thu, May 21, 2020 at 09:05:41AM -0700, Guenter Roeck wrote:
> On 5/19/20 10:13 PM, Ira Weiny wrote:
> > On Tue, May 19, 2020 at 12:42:15PM -0700, Guenter Roeck wrote:
> >> On Tue, May 19, 2020 at 11:40:32AM -0700, Ira Weiny wrote:
> >>> On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
> >>>> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.weiny@intel.com wrote:
> >>>>> From: Ira Weiny <ira.weiny@intel.com>
> >>>>>
> >>>>> The kunmap_atomic clean up failed to remove one set of pagefault/preempt
> >>>>> enables when vaddr is not in the fixmap.
> >>>>>
> >>>>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
> >>>>> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> >>>>
> >>>> microblazeel works with this patch,
> >>>
> >>> Awesome...  Andrew in my rush yesterday I should have put a reported by on the
> >>> patch for Guenter as well.
> >>>
> >>> Sorry about that Guenter,
> >>
> >> No worries.
> >>
> >>> Ira
> >>>
> >>>> as do the nosmp sparc32 boot tests,
> >>>> but sparc32 boot tests with SMP enabled still fail with lots of messages
> >>>> such as:
> >>>>
> >>>> BUG: Bad page state in process swapper/0  pfn:006a1
> >>>> page:f0933420 refcount:0 mapcount:1 mapping:(ptrval) index:0x1
> >>>> flags: 0x0()
> >>>> raw: 00000000 00000100 00000122 00000000 00000001 00000000 00000000 00000000
> >>>> page dumped because: nonzero mapcount
> >>>> Modules linked in:
> >>>> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B             5.7.0-rc6-next-20200518-00002-gb178d2d56f29 #1
> >>>> [f00e7ab8 :
> >>>> bad_page+0xa8/0x108 ]
> >>>> [f00e8b54 :
> >>>> free_pcppages_bulk+0x154/0x52c ]
> >>>> [f00ea024 :
> >>>> free_unref_page+0x54/0x6c ]
> >>>> [f00ed864 :
> >>>> free_reserved_area+0x58/0xec ]
> >>>> [f0527104 :
> >>>> kernel_init+0x14/0x110 ]
> >>>> [f000b77c :
> >>>> ret_from_kernel_thread+0xc/0x38 ]
> >>>> [00000000 :
> >>>> 0x0 ]
> >>>>
> >>>> Code path leading to that message is different but always the same
> >>>> from free_unref_page().
> > 
> > Actually it occurs to me that the patch consolidating kmap_prot is odd for
> > sparc 32 bit...
> > 
> > Its a long shot but could you try reverting this patch?
> > 
> > 4ea7d2419e3f kmap: consolidate kmap_prot definitions
> > 
> 
> That is not easy to revert, unfortunately, due to several follow-up patches.

I have gotten your sparc tests to run and they all pass...

08:10:34 > ../linux-build-test/rootfs/sparc/run-qemu-sparc.sh 
Build reference: v5.7-rc4-17-g852b6f2edc0f

Building sparc32:SPARCClassic:nosmp:scsi:hd ... running ......... passed
Building sparc32:SPARCbook:nosmp:scsi:cd ... running ......... passed
Building sparc32:LX:nosmp:noapc:scsi:hd ... running ......... passed
Building sparc32:SS-4:nosmp:initrd ... running ......... passed
Building sparc32:SS-5:nosmp:scsi:hd ... running ......... passed
Building sparc32:SS-10:nosmp:scsi:cd ... running ......... passed
Building sparc32:SS-20:nosmp:scsi:hd ... running ......... passed
Building sparc32:SS-600MP:nosmp:scsi:hd ... running ......... passed
Building sparc32:Voyager:nosmp:noapc:scsi:hd ... running ......... passed
Building sparc32:SS-4:smp:scsi:hd ... running ......... passed
Building sparc32:SS-5:smp:scsi:cd ... running ......... passed
Building sparc32:SS-10:smp:scsi:hd ... running ......... passed
Building sparc32:SS-20:smp:scsi:hd ... running ......... passed
Building sparc32:SS-600MP:smp:scsi:hd ... running ......... passed
Building sparc32:Voyager:smp:noapc:scsi:hd ... running ......... passed

Is there another test I need to run?

Ira


> 
> Guenter
> 
> > Alternately I will need to figure out how to run the sparc on qemu here...
> > 
> > Thanks very much for all the testing though!  :-D
> > 
> > Ira
> > 
> >>>>
> >>>> Still testing ppc images.
> >>>>
> >>
> >> ppc image tests are passing with this patch.
> >>
> >> Guenter
> 

^ permalink raw reply

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
From: Dan Williams @ 2020-05-21 18:25 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Jan Kara, linux-nvdimm, Jeff Moyer, Mikulas Patocka, alistair,
	linuxppc-dev
In-Reply-To: <ba91c061-41ef-5c54-8e9b-7b22e44577cd@linux.ibm.com>

On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> On 5/21/20 8:08 PM, Jeff Moyer wrote:
> > Dan Williams <dan.j.williams@intel.com> writes:
> >
> >>> But I agree with your concern that if we have older kernel/applications
> >>> that continue to use `dcbf` on future hardware we will end up
> >>> having issues w.r.t powerfail consistency. The plan is what you outlined
> >>> above as tighter ecosystem control. Considering we don't have a pmem
> >>> device generally available, we get both kernel and userspace upgraded
> >>> to use these new instructions before such a device is made available.
> >
> > I thought power already supported NVDIMM-N, no?  So are you saying that
> > those devices will continue to work with the existing flushing and
> > fencing mechanisms?
> >
>
> yes. these devices can continue to use 'dcbf + hwsync' as long as we are
> running them on P9.
>
>
> >> Ok, I think a compile time kernel option with a runtime override
> >> satisfies my concern. Does that work for you?
> >
> > The compile time option only helps when running newer kernels.  I'm not
> > sure how you would even begin to audit userspace applications (keep in
> > mind, not every application is open source, and not every application
> > uses pmdk).  I also question the merits of forcing the administrator to
> > make the determination of whether all applications on the system will
> > work properly.  Really, you have to rely on the vendor to tell you the
> > platform is supported, and at that point, why put further hurdles in the
> > way?
> >
> > The decision to require different instructions on ppc is unfortunate,
> > but one I'm sure we have no control over.  I don't see any merit in the
> > kernel disallowing MAP_SYNC access on these platforms.  Ideally, we'd
> > have some way of ensuring older kernels don't work with these new
> > platforms, but I don't think that's possible.
> >
>
>
> I am currently looking at the possibility of firmware present these
> devices with different device-tree compat values. So that older
> /existing kernel won't initialize the device on newer systems. Is that a
> good compromise? We still can end up with older userspace and newer
> kernel. One of the option suggested by Jan Kara is to use a prctl flag
> to control that? (intead of kernel parameter option I posted before)
>
>
> > Moving on to the patch itself--Aneesh, have you audited other persistent
> > memory users in the kernel?  For example, drivers/md/dm-writecache.c does
> > this:
> >
> > static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
> > {
> >       if (WC_MODE_PMEM(wc))
> >               wmb(); <==========
> >          else
> >                  ssd_commit_flushed(wc, wait_for_ios);
> > }
> >
> > I believe you'll need to make modifications there.
> >
>
> Correct. Thanks for catching that.
>
>
> I don't understand dm much, wondering how this will work with
> non-synchronous DAX device?

That's a good point. DM-writecache needs to be cognizant of things
like virtio-pmem that violate the rule that persisent memory writes
can be flushed by CPU functions rather than calling back into the
driver. It seems we need to always make the flush case a dax_operation
callback to account for this.

^ permalink raw reply

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
From: Dan Williams @ 2020-05-21 18:34 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: linux-nvdimm, Aneesh Kumar K.V, Mikulas Patocka, alistair,
	linuxppc-dev
In-Reply-To: <x49o8qh9wu5.fsf@segfault.boston.devel.redhat.com>

On Thu, May 21, 2020 at 7:39 AM Jeff Moyer <jmoyer@redhat.com> wrote:
>
> Dan Williams <dan.j.williams@intel.com> writes:
>
> >> But I agree with your concern that if we have older kernel/applications
> >> that continue to use `dcbf` on future hardware we will end up
> >> having issues w.r.t powerfail consistency. The plan is what you outlined
> >> above as tighter ecosystem control. Considering we don't have a pmem
> >> device generally available, we get both kernel and userspace upgraded
> >> to use these new instructions before such a device is made available.
>
> I thought power already supported NVDIMM-N, no?  So are you saying that
> those devices will continue to work with the existing flushing and
> fencing mechanisms?
>
> > Ok, I think a compile time kernel option with a runtime override
> > satisfies my concern. Does that work for you?
>
> The compile time option only helps when running newer kernels.  I'm not
> sure how you would even begin to audit userspace applications (keep in
> mind, not every application is open source, and not every application
> uses pmdk).  I also question the merits of forcing the administrator to
> make the determination of whether all applications on the system will
> work properly.  Really, you have to rely on the vendor to tell you the
> platform is supported, and at that point, why put further hurdles in the
> way?

I'm thoroughly confused by this. I thought this was exactly the role
of a Linux distribution vendor. ISVs qualify their application on a
hardware-platform + distribution combination and the distribution owns
picking ABI defaults like CONFIG_SYSFS_DEPRECATED regardless of
whether they can guarantee that all apps are updated to the new
semantics.

The administrator is not forced, the administrator if afforded an
override in the extreme case that they find an exception to what was
qualified and need to override the distribution's compile-time choice.

>
> The decision to require different instructions on ppc is unfortunate,
> but one I'm sure we have no control over.  I don't see any merit in the
> kernel disallowing MAP_SYNC access on these platforms.  Ideally, we'd
> have some way of ensuring older kernels don't work with these new
> platforms, but I don't think that's possible.

I see disabling MAP_SYNC as the more targeted form of "ensursing older
kernels don't work.

So I guess we agree that something should break when baseline
assumptions change, we just don't yet agree on where that break should
happen?

^ permalink raw reply

* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
From: Mikulas Patocka @ 2020-05-21 18:52 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, linux-nvdimm, Aneesh Kumar K.V, Jeff Moyer, alistair,
	linuxppc-dev
In-Reply-To: <CAPcyv4iG9GC42s5DaWWegH=Mi7XHgJoUghgOM9qMRrCg4wuMig@mail.gmail.com>



On Thu, 21 May 2020, Dan Williams wrote:

> On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
> >
> > > Moving on to the patch itself--Aneesh, have you audited other persistent
> > > memory users in the kernel?  For example, drivers/md/dm-writecache.c does
> > > this:
> > >
> > > static void writecache_commit_flushed(struct dm_writecache *wc, bool wait_for_ios)
> > > {
> > >       if (WC_MODE_PMEM(wc))
> > >               wmb(); <==========
> > >          else
> > >                  ssd_commit_flushed(wc, wait_for_ios);
> > > }
> > >
> > > I believe you'll need to make modifications there.
> > >
> >
> > Correct. Thanks for catching that.
> >
> >
> > I don't understand dm much, wondering how this will work with
> > non-synchronous DAX device?
> 
> That's a good point. DM-writecache needs to be cognizant of things
> like virtio-pmem that violate the rule that persisent memory writes
> can be flushed by CPU functions rather than calling back into the
> driver. It seems we need to always make the flush case a dax_operation
> callback to account for this.

dm-writecache is normally sitting on the top of dm-linear, so it would 
need to pass the wmb() call through the dm core and dm-linear target ... 
that would slow it down ... I remember that you already did it this way 
some times ago and then removed it.

What's the exact problem with POWER? Could the POWER system have two types 
of persistent memory that need two different ways of flushing?

Mikulas


^ permalink raw reply

* Re: linux-next: manual merge of the rcu tree with the powerpc tree
From: Thomas Gleixner @ 2020-05-21 20:38 UTC (permalink / raw)
  To: paulmck, Stephen Rothwell
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Nicholas Piggin,
	Linux Next Mailing List, H. Peter Anvin, Ingo Molnar, PowerPC
In-Reply-To: <20200521133543.GX2869@paulmck-ThinkPad-P72>

"Paul E. McKenney" <paulmck@kernel.org> writes:
> On Thu, May 21, 2020 at 02:51:24PM +1000, Stephen Rothwell wrote:
>> Hi all,
>> 
>> On Tue, 19 May 2020 17:23:16 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>> >
>> > Today's linux-next merge of the rcu tree got a conflict in:
>> > 
>> >   arch/powerpc/kernel/traps.c
>> > 
>> > between commit:
>> > 
>> >   116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting")
>> > 
>> > from the powerpc tree and commit:
>> > 
>> >   187416eeb388 ("hardirq/nmi: Allow nested nmi_enter()")
>> > 
>> > from the rcu tree.
>> > 
>> > I fixed it up (I used the powerpc tree version for now) and can carry the
>> > fix as necessary. This is now fixed as far as linux-next is concerned,
>> > but any non trivial conflicts should be mentioned to your upstream
>> > maintainer when your tree is submitted for merging.  You may also want
>> > to consider cooperating with the maintainer of the conflicting tree to
>> > minimise any particularly complex conflicts.
>> 
>> This is now a conflict between the powerpc commit and commit
>> 
>>   69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()")
>> 
>> from the tip tree.  I assume that the rcu and tip trees are sharing
>> some patches (but not commits) :-(
>
> We are sharing commits, and in fact 187416eeb388 in the rcu tree came
> from the tip tree.  My guess is version skew, and that I probably have
> another rebase coming up.
>
> Why is this happening?  There are sets of conflicting commits in different
> efforts, and we are trying to resolve them.  But we are getting feedback
> on some of those commits, which is probably what is causing the skew.

Correct. We had to rebase that. I don't think we do it again. The
changes I just sent out are carefully crafted to avoid that.

Thanks,

        tglx

^ permalink raw reply

* Re: [PATCH v2 5/7] mm: parallelize deferred_init_memmap()
From: Daniel Jordan @ 2020-05-21 21:15 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Hildenbrand, Peter Zijlstra, Dave Hansen, Michal Hocko,
	linux-mm, Steven Sistare, Pavel Machek, Alexander Duyck,
	Steffen Klassert, linux-s390, Herbert Xu, Jonathan Corbet,
	Daniel Jordan, Jason Gunthorpe, Zi Yan, Robert Elliott,
	Pavel Tatashin, Shile Zhang, Josh Triplett, Alex Williamson,
	Kirill Tkhai, Dan Williams, Randy Dunlap, LKML, linux-crypto,
	Tejun Heo, Andrew Morton,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <CAKgT0Uc_LNe+KuyYxFnQ44GAfygEOQNubxwzxmTDVBvFA=WZkA@mail.gmail.com>

On Thu, May 21, 2020 at 09:46:35AM -0700, Alexander Duyck wrote:
> It is more about not bothering with the extra tracking. We don't
> really need it and having it doesn't really add much in the way of
> value.

Yeah, it can probably go.

> > > > @@ -1863,11 +1892,32 @@ static int __init deferred_init_memmap(void *data)
> > > >                 goto zone_empty;
> > > >
> > > >         /*
> > > > -        * Initialize and free pages in MAX_ORDER sized increments so
> > > > -        * that we can avoid introducing any issues with the buddy
> > > > -        * allocator.
> > > > +        * More CPUs always led to greater speedups on tested systems, up to
> > > > +        * all the nodes' CPUs.  Use all since the system is otherwise idle now.
> > > >          */
> > > > +       max_threads = max(cpumask_weight(cpumask), 1u);
> > > > +
> > > >         while (spfn < epfn) {
> > > > +               epfn_align = ALIGN_DOWN(epfn, PAGES_PER_SECTION);
> > > > +
> > > > +               if (IS_ALIGNED(spfn, PAGES_PER_SECTION) &&
> > > > +                   epfn_align - spfn >= PAGES_PER_SECTION) {
> > > > +                       struct definit_args arg = { zone, ATOMIC_LONG_INIT(0) };
> > > > +                       struct padata_mt_job job = {
> > > > +                               .thread_fn   = deferred_init_memmap_chunk,
> > > > +                               .fn_arg      = &arg,
> > > > +                               .start       = spfn,
> > > > +                               .size        = epfn_align - spfn,
> > > > +                               .align       = PAGES_PER_SECTION,
> > > > +                               .min_chunk   = PAGES_PER_SECTION,
> > > > +                               .max_threads = max_threads,
> > > > +                       };
> > > > +
> > > > +                       padata_do_multithreaded(&job);
> > > > +                       nr_pages += atomic_long_read(&arg.nr_pages);
> > > > +                       spfn = epfn_align;
> > > > +               }
> > > > +
> > > >                 nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
> > > >                 cond_resched();
> > > >         }
> > >
> > > This doesn't look right. You are basically adding threads in addition
> > > to calls to deferred_init_maxorder.
> >
> > The deferred_init_maxorder call is there to do the remaining, non-section
> > aligned part of a range.  It doesn't have to be done this way.
> 
> It is also doing the advancing though isn't it?

Yes.  Not sure what you're getting at.  There's the 'spfn = epfn_align' before
so nothing is skipped.  It's true that the nonaligned part is done outside of
padata when it could be done by a thread that'd otherwise be waiting or idle,
which should be addressed in the next version.

> I think I resolved this with the fix for it I described in the other
> email. We just need to swap out spfn for epfn and make sure we align
> spfn with epfn_align. Then I think that takes care of possible skips.

Right, though your fix looks a lot like deferred_init_mem_pfn_range_in_zone().
Seems better to just use that and not repeat ourselves.  Lame that it's
starting at the beginning of the ranges every time, maybe it could be
generalized somehow, but I think it should be fast enough.

> > We could use deferred_init_mem_pfn_range_in_zone() instead of the for_each
> > loop.
> >
> > What I was trying to avoid by aligning down is creating a discontiguous pfn
> > range that get passed to padata.  We already discussed how those are handled
> > by the zone iterator in the thread function, but job->size can be exaggerated
> > to include parts of the range that are never touched.  Thinking more about it
> > though, it's a small fraction of the total work and shouldn't matter.
> 
> So the problem with aligning down is that you are going to be slowed
> up as you have to go single threaded to initialize whatever remains.
> So worst case scenario is that you have a section aligned block and
> you will process all but 1 section in parallel, and then have to
> process the remaining section one max order block at a time.

Yes, aligning up is better.

> > > This should accomplish the same thing, but much more efficiently.
> >
> > Well, more cleanly.  I'll give it a try.
> 
> I agree I am not sure if it will make a big difference on x86, however
> the more ranges you have to process the faster this approach should be
> as it stays parallel the entire time rather than having to drop out
> and process the last section one max order block at a time.

Right.

^ permalink raw reply

* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Guenter Roeck @ 2020-05-21 22:20 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Paul Mackerras,
	H. Peter Anvin, sparclinux, ira.weiny, Dan Williams, Helge Deller,
	x86, linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
	linux-xtensa, Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, Christian Koenig, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <20200521172704.GF23230@ZenIV.linux.org.uk>

On 5/21/20 10:27 AM, Al Viro wrote:
> On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
>> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.weiny@intel.com wrote:
>>> From: Ira Weiny <ira.weiny@intel.com>
>>>
>>> The kunmap_atomic clean up failed to remove one set of pagefault/preempt
>>> enables when vaddr is not in the fixmap.
>>>
>>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
>>> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
>>
>> microblazeel works with this patch, as do the nosmp sparc32 boot tests,
>> but sparc32 boot tests with SMP enabled still fail with lots of messages
>> such as:
> 
> BTW, what's your setup for sparc32 boot tests?  IOW, how do you manage to
> shrink the damn thing enough to have the loader cope with it?  I hadn't
> been able to do that for the current mainline ;-/
> 

defconfig seems to work just fine, even after enabling various debug
and file system options.

Guenter

^ permalink raw reply

* Re: Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#1432 (master - 0aceafc)
From: Nick Desaulniers @ 2020-05-21 22:23 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: clang-built-linux, Nathan Chancellor, linuxppc-dev
In-Reply-To: <87r1vdh28z.fsf@mpe.ellerman.id.au>

On Thu, May 21, 2020 at 6:00 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Nathan Chancellor <natechancellor@gmail.com> writes:
> > On Tue, May 19, 2020 at 05:56:32PM -0700, 'Nick Desaulniers' via Clang Built Linux wrote:
> >> Looks like our CI is still red from this:
> >>
> >> https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/builds/166854584
> >>
> >> Filing a bug to follow up on:
> >> https://github.com/ClangBuiltLinux/linux/issues/1031
> >>
> >> On Thu, May 7, 2020 at 8:29 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
> >> >
> >> > Nick Desaulniers <ndesaulniers@google.com> writes:
> >> > > Looks like ppc64le powernv_defconfig is suddenly failing the locking
> >> > > torture tests, then locks up?
> >> > > https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/jobs/329211572#L3111-L3167
> >> > > Any recent changes related here in -next?  I believe this is the first
> >> > > failure, so I'll report back if we see this again.
> >> >
> >> > Thanks for the report.
> >> >
> >> > There's nothing newly in next-20200507 that seems related.
> ...
> >
> > This is probably still a manifestation of
> > https://github.com/ClangBuiltLinux/continuous-integration/issues/262
> > because rekicking the tests usually fixes it.

I thought we had upgraded our version of QEMU in response to this already?
https://github.com/ClangBuiltLinux/dockerimage/pull/44
https://github.com/ClangBuiltLinux/dockerimage/pull/46

>
> Oh yep.
>
> I was looking at the RCU warning, which I still don't understand, but
> the lockup is presumably the same problem you hit with interrupts being
> lost.
>
> > We should probably just disable the torture tests like we do for x86_64
> > for CI because we do not have access to QEMU 5.0.0 where this should be
> > fixed. I believe it is slated for 4.2.1 as well but we still have to
> > wait for that to be updated and packaged in Ubuntu.
>
> You just need to start building Qemu HEAD as part of your CI ;)

LOL
https://github.com/ClangBuiltLinux/dockerimage/pull/46#pullrequestreview-395639442
Yeah I think the hard part for all these dependendencies is the risk
of living on the edge of "top of tree" for all of them, and trying to
control for some by using stable releases.  May not always be
possible.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply

* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Guenter Roeck @ 2020-05-21 22:27 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Paul Mackerras,
	H. Peter Anvin, sparclinux, Dan Williams, Helge Deller, x86,
	linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
	linux-xtensa, Borislav Petkov, Al Viro, Andy Lutomirski,
	Thomas Gleixner, linux-arm-kernel, Chris Zankel,
	Thomas Bogendoerfer, linux-parisc, linux-kernel, Christian Koenig,
	Andrew Morton, linuxppc-dev, David S. Miller
In-Reply-To: <20200521174250.GB176262@iweiny-DESK2.sc.intel.com>

On 5/21/20 10:42 AM, Ira Weiny wrote:
> On Thu, May 21, 2020 at 09:05:41AM -0700, Guenter Roeck wrote:
>> On 5/19/20 10:13 PM, Ira Weiny wrote:
>>> On Tue, May 19, 2020 at 12:42:15PM -0700, Guenter Roeck wrote:
>>>> On Tue, May 19, 2020 at 11:40:32AM -0700, Ira Weiny wrote:
>>>>> On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
>>>>>> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.weiny@intel.com wrote:
>>>>>>> From: Ira Weiny <ira.weiny@intel.com>
>>>>>>>
>>>>>>> The kunmap_atomic clean up failed to remove one set of pagefault/preempt
>>>>>>> enables when vaddr is not in the fixmap.
>>>>>>>
>>>>>>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
>>>>>>> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
>>>>>>
>>>>>> microblazeel works with this patch,
>>>>>
>>>>> Awesome...  Andrew in my rush yesterday I should have put a reported by on the
>>>>> patch for Guenter as well.
>>>>>
>>>>> Sorry about that Guenter,
>>>>
>>>> No worries.
>>>>
>>>>> Ira
>>>>>
>>>>>> as do the nosmp sparc32 boot tests,
>>>>>> but sparc32 boot tests with SMP enabled still fail with lots of messages
>>>>>> such as:
>>>>>>
>>>>>> BUG: Bad page state in process swapper/0  pfn:006a1
>>>>>> page:f0933420 refcount:0 mapcount:1 mapping:(ptrval) index:0x1
>>>>>> flags: 0x0()
>>>>>> raw: 00000000 00000100 00000122 00000000 00000001 00000000 00000000 00000000
>>>>>> page dumped because: nonzero mapcount
>>>>>> Modules linked in:
>>>>>> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B             5.7.0-rc6-next-20200518-00002-gb178d2d56f29 #1
>>>>>> [f00e7ab8 :
>>>>>> bad_page+0xa8/0x108 ]
>>>>>> [f00e8b54 :
>>>>>> free_pcppages_bulk+0x154/0x52c ]
>>>>>> [f00ea024 :
>>>>>> free_unref_page+0x54/0x6c ]
>>>>>> [f00ed864 :
>>>>>> free_reserved_area+0x58/0xec ]
>>>>>> [f0527104 :
>>>>>> kernel_init+0x14/0x110 ]
>>>>>> [f000b77c :
>>>>>> ret_from_kernel_thread+0xc/0x38 ]
>>>>>> [00000000 :
>>>>>> 0x0 ]
>>>>>>
>>>>>> Code path leading to that message is different but always the same
>>>>>> from free_unref_page().
>>>
>>> Actually it occurs to me that the patch consolidating kmap_prot is odd for
>>> sparc 32 bit...
>>>
>>> Its a long shot but could you try reverting this patch?
>>>
>>> 4ea7d2419e3f kmap: consolidate kmap_prot definitions
>>>
>>
>> That is not easy to revert, unfortunately, due to several follow-up patches.
> 
> I have gotten your sparc tests to run and they all pass...
> 
> 08:10:34 > ../linux-build-test/rootfs/sparc/run-qemu-sparc.sh 
> Build reference: v5.7-rc4-17-g852b6f2edc0f
> 

That doesn't look like it is linux-next, which I guess means that something
else in linux-next breaks it. What is your qemu version ?

Thanks,
Guenter

> Building sparc32:SPARCClassic:nosmp:scsi:hd ... running ......... passed
> Building sparc32:SPARCbook:nosmp:scsi:cd ... running ......... passed
> Building sparc32:LX:nosmp:noapc:scsi:hd ... running ......... passed
> Building sparc32:SS-4:nosmp:initrd ... running ......... passed
> Building sparc32:SS-5:nosmp:scsi:hd ... running ......... passed
> Building sparc32:SS-10:nosmp:scsi:cd ... running ......... passed
> Building sparc32:SS-20:nosmp:scsi:hd ... running ......... passed
> Building sparc32:SS-600MP:nosmp:scsi:hd ... running ......... passed
> Building sparc32:Voyager:nosmp:noapc:scsi:hd ... running ......... passed
> Building sparc32:SS-4:smp:scsi:hd ... running ......... passed
> Building sparc32:SS-5:smp:scsi:cd ... running ......... passed
> Building sparc32:SS-10:smp:scsi:hd ... running ......... passed
> Building sparc32:SS-20:smp:scsi:hd ... running ......... passed
> Building sparc32:SS-600MP:smp:scsi:hd ... running ......... passed
> Building sparc32:Voyager:smp:noapc:scsi:hd ... running ......... passed
> 
> Is there another test I need to run?
> 
> Ira
> 
> 
>>
>> Guenter
>>
>>> Alternately I will need to figure out how to run the sparc on qemu here...
>>>
>>> Thanks very much for all the testing though!  :-D
>>>
>>> Ira
>>>
>>>>>>
>>>>>> Still testing ppc images.
>>>>>>
>>>>
>>>> ppc image tests are passing with this patch.
>>>>
>>>> Guenter
>>


^ permalink raw reply

* RE: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Weiny, Ira @ 2020-05-21 22:36 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Peter Zijlstra, Dave Hansen, dri-devel@lists.freedesktop.org,
	linux-mips@vger.kernel.org, James E.J. Bottomley, Max Filippov,
	Paul Mackerras, H. Peter Anvin, sparclinux@vger.kernel.org,
	Williams, Dan J, Helge Deller, x86@kernel.org,
	linux-csky@vger.kernel.org, Christoph Hellwig, Ingo Molnar,
	linux-snps-arc@lists.infradead.org, linux-xtensa@linux-xtensa.org,
	Borislav Petkov, Al Viro, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel@lists.infradead.org, Chris Zankel,
	Thomas Bogendoerfer, linux-parisc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Christian Koenig, Andrew Morton,
	linuxppc-dev@lists.ozlabs.org, David S. Miller
In-Reply-To: <9088585b-b52f-39ad-1651-53cfc0abd714@roeck-us.net>

> On 5/21/20 10:42 AM, Ira Weiny wrote:
> > On Thu, May 21, 2020 at 09:05:41AM -0700, Guenter Roeck wrote:
> >> On 5/19/20 10:13 PM, Ira Weiny wrote:
> >>> On Tue, May 19, 2020 at 12:42:15PM -0700, Guenter Roeck wrote:
> >>>> On Tue, May 19, 2020 at 11:40:32AM -0700, Ira Weiny wrote:
> >>>>> On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
> >>>>>> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.weiny@intel.com
> wrote:
> >>>>>>> From: Ira Weiny <ira.weiny@intel.com>
> >>>>>>>
> >>>>>>> The kunmap_atomic clean up failed to remove one set of
> >>>>>>> pagefault/preempt enables when vaddr is not in the fixmap.
> >>>>>>>
> >>>>>>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate
> >>>>>>> code")
> >>>>>>> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> >>>>>>
> >>>>>> microblazeel works with this patch,
> >>>>>
> >>>>> Awesome...  Andrew in my rush yesterday I should have put a
> >>>>> reported by on the patch for Guenter as well.
> >>>>>
> >>>>> Sorry about that Guenter,
> >>>>
> >>>> No worries.
> >>>>
> >>>>> Ira
> >>>>>
> >>>>>> as do the nosmp sparc32 boot tests, but sparc32 boot tests with
> >>>>>> SMP enabled still fail with lots of messages such as:
> >>>>>>
> >>>>>> BUG: Bad page state in process swapper/0  pfn:006a1
> >>>>>> page:f0933420 refcount:0 mapcount:1 mapping:(ptrval) index:0x1
> >>>>>> flags: 0x0()
> >>>>>> raw: 00000000 00000100 00000122 00000000 00000001 00000000
> >>>>>> 00000000 00000000 page dumped because: nonzero mapcount
> Modules
> >>>>>> linked in:
> >>>>>> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G    B             5.7.0-rc6-next-
> 20200518-00002-gb178d2d56f29 #1
> >>>>>> [f00e7ab8 :
> >>>>>> bad_page+0xa8/0x108 ]
> >>>>>> [f00e8b54 :
> >>>>>> free_pcppages_bulk+0x154/0x52c ]
> >>>>>> [f00ea024 :
> >>>>>> free_unref_page+0x54/0x6c ]
> >>>>>> [f00ed864 :
> >>>>>> free_reserved_area+0x58/0xec ]
> >>>>>> [f0527104 :
> >>>>>> kernel_init+0x14/0x110 ]
> >>>>>> [f000b77c :
> >>>>>> ret_from_kernel_thread+0xc/0x38 ]
> >>>>>> [00000000 :
> >>>>>> 0x0 ]
> >>>>>>
> >>>>>> Code path leading to that message is different but always the
> >>>>>> same from free_unref_page().
> >>>
> >>> Actually it occurs to me that the patch consolidating kmap_prot is
> >>> odd for sparc 32 bit...
> >>>
> >>> Its a long shot but could you try reverting this patch?
> >>>
> >>> 4ea7d2419e3f kmap: consolidate kmap_prot definitions
> >>>
> >>
> >> That is not easy to revert, unfortunately, due to several follow-up
> patches.
> >
> > I have gotten your sparc tests to run and they all pass...
> >
> > 08:10:34 > ../linux-build-test/rootfs/sparc/run-qemu-sparc.sh
> > Build reference: v5.7-rc4-17-g852b6f2edc0f
> >
> 
> That doesn't look like it is linux-next, which I guess means that something
> else in linux-next breaks it. What is your qemu version ?

Ah yea that was just 5.7-rc4 with my patch set applied.  Yes must be something else or an interaction with my patch set.

Did I see another email with Mike which may fix this?

Ira

> 
> Thanks,
> Guenter
> 
> > Building sparc32:SPARCClassic:nosmp:scsi:hd ... running ......... passed
> > Building sparc32:SPARCbook:nosmp:scsi:cd ... running ......... passed
> > Building sparc32:LX:nosmp:noapc:scsi:hd ... running ......... passed
> > Building sparc32:SS-4:nosmp:initrd ... running ......... passed
> > Building sparc32:SS-5:nosmp:scsi:hd ... running ......... passed
> > Building sparc32:SS-10:nosmp:scsi:cd ... running ......... passed
> > Building sparc32:SS-20:nosmp:scsi:hd ... running ......... passed
> > Building sparc32:SS-600MP:nosmp:scsi:hd ... running ......... passed
> > Building sparc32:Voyager:nosmp:noapc:scsi:hd ... running ......... passed
> > Building sparc32:SS-4:smp:scsi:hd ... running ......... passed
> > Building sparc32:SS-5:smp:scsi:cd ... running ......... passed
> > Building sparc32:SS-10:smp:scsi:hd ... running ......... passed
> > Building sparc32:SS-20:smp:scsi:hd ... running ......... passed
> > Building sparc32:SS-600MP:smp:scsi:hd ... running ......... passed
> > Building sparc32:Voyager:smp:noapc:scsi:hd ... running ......... passed
> >
> > Is there another test I need to run?
> >
> > Ira
> >
> >
> >>
> >> Guenter
> >>
> >>> Alternately I will need to figure out how to run the sparc on qemu here...
> >>>
> >>> Thanks very much for all the testing though!  :-D
> >>>
> >>> Ira
> >>>
> >>>>>>
> >>>>>> Still testing ppc images.
> >>>>>>
> >>>>
> >>>> ppc image tests are passing with this patch.
> >>>>
> >>>> Guenter
> >>


^ permalink raw reply

* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Al Viro @ 2020-05-21 22:46 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Paul Mackerras,
	H. Peter Anvin, sparclinux, ira.weiny, Dan Williams, Helge Deller,
	x86, linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
	linux-xtensa, Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, Christian Koenig, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <bdc8dc64-622c-3b0d-1ae1-48222cf34358@roeck-us.net>

On Thu, May 21, 2020 at 03:20:46PM -0700, Guenter Roeck wrote:
> On 5/21/20 10:27 AM, Al Viro wrote:
> > On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
> >> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.weiny@intel.com wrote:
> >>> From: Ira Weiny <ira.weiny@intel.com>
> >>>
> >>> The kunmap_atomic clean up failed to remove one set of pagefault/preempt
> >>> enables when vaddr is not in the fixmap.
> >>>
> >>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
> >>> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> >>
> >> microblazeel works with this patch, as do the nosmp sparc32 boot tests,
> >> but sparc32 boot tests with SMP enabled still fail with lots of messages
> >> such as:
> > 
> > BTW, what's your setup for sparc32 boot tests?  IOW, how do you manage to
> > shrink the damn thing enough to have the loader cope with it?  I hadn't
> > been able to do that for the current mainline ;-/
> > 
> 
> defconfig seems to work just fine, even after enabling various debug
> and file system options.

The hell?  How do you manage to get the kernel in?  sparc32_defconfig
ends up with 5316876 bytes unpacked...

^ permalink raw reply

* Re: Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#1432 (master - 0aceafc)
From: Nathan Chancellor @ 2020-05-21 22:59 UTC (permalink / raw)
  To: Nick Desaulniers; +Cc: linuxppc-dev, clang-built-linux
In-Reply-To: <CAKwvOdn_rNgPERgUfBgGywbyRBdSoEbQCaBO1o7fgqkMcCYXqQ@mail.gmail.com>

On Thu, May 21, 2020 at 03:23:11PM -0700, Nick Desaulniers wrote:
> On Thu, May 21, 2020 at 6:00 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> >
> > Nathan Chancellor <natechancellor@gmail.com> writes:
> > > On Tue, May 19, 2020 at 05:56:32PM -0700, 'Nick Desaulniers' via Clang Built Linux wrote:
> > >> Looks like our CI is still red from this:
> > >>
> > >> https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/builds/166854584
> > >>
> > >> Filing a bug to follow up on:
> > >> https://github.com/ClangBuiltLinux/linux/issues/1031
> > >>
> > >> On Thu, May 7, 2020 at 8:29 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
> > >> >
> > >> > Nick Desaulniers <ndesaulniers@google.com> writes:
> > >> > > Looks like ppc64le powernv_defconfig is suddenly failing the locking
> > >> > > torture tests, then locks up?
> > >> > > https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/jobs/329211572#L3111-L3167
> > >> > > Any recent changes related here in -next?  I believe this is the first
> > >> > > failure, so I'll report back if we see this again.
> > >> >
> > >> > Thanks for the report.
> > >> >
> > >> > There's nothing newly in next-20200507 that seems related.
> > ...
> > >
> > > This is probably still a manifestation of
> > > https://github.com/ClangBuiltLinux/continuous-integration/issues/262
> > > because rekicking the tests usually fixes it.
> 
> I thought we had upgraded our version of QEMU in response to this already?
> https://github.com/ClangBuiltLinux/dockerimage/pull/44
> https://github.com/ClangBuiltLinux/dockerimage/pull/46

That was more of a bandaid than an actual fix. It happens a lot less
often with QEMU 4.2.0 but I could still reproduce that hang very
sparingly with the POWER9 machines on it. My machines are way more
powerful than the ones on Travis, which I am sure factors into that.
the hang with the POWER9 machines very sparingly with QEMU 4.2.0 but

The real solution is to upgrade to QEMU 5.0.0, which we could probably
do via a PPA (or through our Docker image), or wait for QEMU 4.2.1,
which should hopefully have that fix since it was CC'd for QEMU stable.

> >
> > Oh yep.
> >
> > I was looking at the RCU warning, which I still don't understand, but
> > the lockup is presumably the same problem you hit with interrupts being
> > lost.
> >
> > > We should probably just disable the torture tests like we do for x86_64
> > > for CI because we do not have access to QEMU 5.0.0 where this should be
> > > fixed. I believe it is slated for 4.2.1 as well but we still have to
> > > wait for that to be updated and packaged in Ubuntu.
> >
> > You just need to start building Qemu HEAD as part of your CI ;)
> 
> LOL
> https://github.com/ClangBuiltLinux/dockerimage/pull/46#pullrequestreview-395639442
> Yeah I think the hard part for all these dependendencies is the risk
> of living on the edge of "top of tree" for all of them, and trying to
> control for some by using stable releases.  May not always be
> possible.

Unfortunately, we are at the mercy of a bunch of different parties. If
only we had a ClangBuiltLinux build server that we maintained...

Cheers,
Nathan

^ permalink raw reply

* Re: [RESEND PATCH v7 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP
From: Ira Weiny @ 2020-05-21 23:32 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Aneesh Kumar K . V, linuxppc-dev, linux-kernel, Steven Rostedt,
	linux-nvdimm
In-Reply-To: <87tv0awmr5.fsf@linux.ibm.com>

On Wed, May 20, 2020 at 10:45:58PM +0530, Vaibhav Jain wrote:
...

> > On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:

...

> >> @@ -39,6 +78,15 @@ struct papr_scm_priv {
> >>  	struct resource res;
> >>  	struct nd_region *region;
> >>  	struct nd_interleave_set nd_set;
> >> +
> >> +	/* Protect dimm health data from concurrent read/writes */
> >> +	struct mutex health_mutex;
> >> +
> >> +	/* Last time the health information of the dimm was updated */
> >> +	unsigned long lasthealth_jiffies;
> >> +
> >> +	/* Health information for the dimm */
> >> +	u64 health_bitmap;
> >
> > I wonder if this should be typed big endian as you mention that it is in the
> > commit message?
> This was discussed in an earlier review of the patch series at
> https://lore.kernel.org/linux-nvdimm/878sjetcis.fsf@mpe.ellerman.id.au
> 
> Even though health bitmap is returned in big endian format (For ex
> value 0xC00000000000000 indicates bits 0,1 set), its value is never
> used. Instead only test for specific bits being set in the register is
> done.
> 
> Hence using native cpu type instead of __be64 to store this value.

ok.

> 
> >
> >>  };
> >>  
> >>  static int drc_pmem_bind(struct papr_scm_priv *p)
> >> @@ -144,6 +192,62 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
> >>  	return drc_pmem_bind(p);
> >>  }
> >>  
> >> +/*
> >> + * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
> >> + * health information.
> >> + */
> >> +static int __drc_pmem_query_health(struct papr_scm_priv *p)
> >> +{
> >> +	unsigned long ret[PLPAR_HCALL_BUFSIZE];
> >
> > Is this exclusive to 64bit?  Why not u64?
> Yes this is specific to 64 bit as the array holds 64 bit register values
> returned from PHYP. Can u64 but here that will be a departure from existing
> practice within arch/powerpc code to use an unsigned long array to fetch
> returned values for PHYP.
> 
> >
> >> +	s64 rc;
> >
> > plpar_hcall() returns long and this function returns int and rc is declared
> > s64?
> >
> > Why not have them all be long to follow plpar_hcall?
> Yes 'long' type is better suited for variable 'rc' and I will get it fixed.
> 
> But the value of variable 'rc' is never directly returned from this
> function, we always return kernel error codes instead. Hence the
> return type of this function is consistent.

Honestly masking the error return of plpar_hcall() seems a problem as well...
but ok.

Ira

> 
> >
> >> +
> >> +	/* issue the hcall */
> >> +	rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
> >> +	if (rc != H_SUCCESS) {
> >> +		dev_err(&p->pdev->dev,
> >> +			 "Failed to query health information, Err:%lld\n", rc);
> >> +		rc = -ENXIO;
> >> +		goto out;
> >> +	}
> >> +
> >> +	p->lasthealth_jiffies = jiffies;
> >> +	p->health_bitmap = ret[0] & ret[1];
> >> +
> >> +	dev_dbg(&p->pdev->dev,
> >> +		"Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
> >> +		ret[0], ret[1]);
> >> +out:
> >> +	return rc;
> >> +}
> >> +
> >> +/* Min interval in seconds for assuming stable dimm health */
> >> +#define MIN_HEALTH_QUERY_INTERVAL 60
> >> +
> >> +/* Query cached health info and if needed call drc_pmem_query_health */
> >> +static int drc_pmem_query_health(struct papr_scm_priv *p)
> >> +{
> >> +	unsigned long cache_timeout;
> >> +	s64 rc;
> >> +
> >> +	/* Protect concurrent modifications to papr_scm_priv */
> >> +	rc = mutex_lock_interruptible(&p->health_mutex);
> >> +	if (rc)
> >> +		return rc;
> >> +
> >> +	/* Jiffies offset for which the health data is assumed to be same */
> >> +	cache_timeout = p->lasthealth_jiffies +
> >> +		msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
> >> +
> >> +	/* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
> >> +	if (time_after(jiffies, cache_timeout))
> >> +		rc = __drc_pmem_query_health(p);
> >
> > And back to s64 after returning int?
> Agree, will change 's64 rc' to 'int rc'.
> 
> >
> >> +	else
> >> +		/* Assume cached health data is valid */
> >> +		rc = 0;
> >> +
> >> +	mutex_unlock(&p->health_mutex);
> >> +	return rc;
> >> +}
> >>  
> >>  static int papr_scm_meta_get(struct papr_scm_priv *p,
> >>  			     struct nd_cmd_get_config_data_hdr *hdr)
> >> @@ -286,6 +390,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> >>  	return 0;
> >>  }
> >>  
> >> +static ssize_t flags_show(struct device *dev,
> >> +				struct device_attribute *attr, char *buf)
> >> +{
> >> +	struct nvdimm *dimm = to_nvdimm(dev);
> >> +	struct papr_scm_priv *p = nvdimm_provider_data(dimm);
> >> +	struct seq_buf s;
> >> +	u64 health;
> >> +	int rc;
> >> +
> >> +	rc = drc_pmem_query_health(p);
> >
> > and back to int...
> >
> drc_pmem_query_health() returns an 'int' so the type of variable 'rc'
> looks correct to me.
> 
> > Just make them long all through...
> I think the return type for above all functions is 'int' with
> an issue in drc_pmem_query_health() that you pointed out.
> 
> With that fixed the usage of 'int' return type for functions will become
> consistent.
> 
> >
> > Ira
> >
> >> +	if (rc)
> >> +		return rc;
> >> +
> >> +	/* Copy health_bitmap locally, check masks & update out buffer */
> >> +	health = READ_ONCE(p->health_bitmap);
> >> +
> >> +	seq_buf_init(&s, buf, PAGE_SIZE);
> >> +	if (health & PAPR_SCM_DIMM_UNARMED_MASK)
> >> +		seq_buf_printf(&s, "not_armed ");
> >> +
> >> +	if (health & PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK)
> >> +		seq_buf_printf(&s, "flush_fail ");
> >> +
> >> +	if (health & PAPR_SCM_DIMM_BAD_RESTORE_MASK)
> >> +		seq_buf_printf(&s, "restore_fail ");
> >> +
> >> +	if (health & PAPR_SCM_DIMM_ENCRYPTED)
> >> +		seq_buf_printf(&s, "encrypted ");
> >> +
> >> +	if (health & PAPR_SCM_DIMM_SMART_EVENT_MASK)
> >> +		seq_buf_printf(&s, "smart_notify ");
> >> +
> >> +	if (health & PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED)
> >> +		seq_buf_printf(&s, "scrubbed locked ");
> >> +
> >> +	if (seq_buf_used(&s))
> >> +		seq_buf_printf(&s, "\n");
> >> +
> >> +	return seq_buf_used(&s);
> >> +}
> >> +DEVICE_ATTR_RO(flags);
> >> +
> >> +/* papr_scm specific dimm attributes */
> >> +static struct attribute *papr_scm_nd_attributes[] = {
> >> +	&dev_attr_flags.attr,
> >> +	NULL,
> >> +};
> >> +
> >> +static struct attribute_group papr_scm_nd_attribute_group = {
> >> +	.name = "papr",
> >> +	.attrs = papr_scm_nd_attributes,
> >> +};
> >> +
> >> +static const struct attribute_group *papr_scm_dimm_attr_groups[] = {
> >> +	&papr_scm_nd_attribute_group,
> >> +	NULL,
> >> +};
> >> +
> >>  static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> >>  {
> >>  	struct device *dev = &p->pdev->dev;
> >> @@ -312,8 +474,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> >>  	dimm_flags = 0;
> >>  	set_bit(NDD_LABELING, &dimm_flags);
> >>  
> >> -	p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
> >> -				  PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
> >> +	p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_attr_groups,
> >> +				  dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
> >>  	if (!p->nvdimm) {
> >>  		dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
> >>  		goto err;
> >> @@ -399,6 +561,9 @@ static int papr_scm_probe(struct platform_device *pdev)
> >>  	if (!p)
> >>  		return -ENOMEM;
> >>  
> >> +	/* Initialize the dimm mutex */
> >> +	mutex_init(&p->health_mutex);
> >> +
> >>  	/* optional DT properties */
> >>  	of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
> >>  
> >> -- 
> >> 2.26.2
> >> _______________________________________________
> >> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> >> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
> 
> -- 
> Cheers
> ~ Vaibhav

^ permalink raw reply

* Re: [PATCH v2 0/7] padata: parallelize deferred page init
From: Josh Triplett @ 2020-05-21 23:43 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: David Hildenbrand, Peter Zijlstra, Dave Hansen, Michal Hocko,
	linux-mm, Steven Sistare, Pavel Machek, Alexander Duyck,
	Steffen Klassert, linux-s390, Herbert Xu, Jonathan Corbet,
	Jason Gunthorpe, Zi Yan, Robert Elliott, Pavel Tatashin,
	Shile Zhang, Alex Williamson, Kirill Tkhai, Dan Williams,
	Randy Dunlap, linux-kernel, linux-crypto, Tejun Heo,
	Andrew Morton, linuxppc-dev
In-Reply-To: <20200520182645.1658949-1-daniel.m.jordan@oracle.com>

On Wed, May 20, 2020 at 02:26:38PM -0400, Daniel Jordan wrote:
> Please review and test, and thanks to Alex, Andrew, Josh, and Pavel for
> their feedback in the last version.

I re-tested v2:

Tested-by: Josh Triplett <josh@joshtriplett.org>

[    0.231435] node 1 initialised, 24189223 pages in 32ms
[    0.236718] node 0 initialised, 23398907 pages in 36ms

^ permalink raw reply

* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Al Viro @ 2020-05-22  0:46 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Paul Mackerras,
	H. Peter Anvin, sparclinux, ira.weiny, Dan Williams, Helge Deller,
	x86, linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
	linux-xtensa, Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, Christian Koenig, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <20200521224612.GJ23230@ZenIV.linux.org.uk>

On Thu, May 21, 2020 at 11:46:12PM +0100, Al Viro wrote:
> On Thu, May 21, 2020 at 03:20:46PM -0700, Guenter Roeck wrote:
> > On 5/21/20 10:27 AM, Al Viro wrote:
> > > On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
> > >> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.weiny@intel.com wrote:
> > >>> From: Ira Weiny <ira.weiny@intel.com>
> > >>>
> > >>> The kunmap_atomic clean up failed to remove one set of pagefault/preempt
> > >>> enables when vaddr is not in the fixmap.
> > >>>
> > >>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
> > >>> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > >>
> > >> microblazeel works with this patch, as do the nosmp sparc32 boot tests,
> > >> but sparc32 boot tests with SMP enabled still fail with lots of messages
> > >> such as:
> > > 
> > > BTW, what's your setup for sparc32 boot tests?  IOW, how do you manage to
> > > shrink the damn thing enough to have the loader cope with it?  I hadn't
> > > been able to do that for the current mainline ;-/
> > > 
> > 
> > defconfig seems to work just fine, even after enabling various debug
> > and file system options.
> 
> The hell?  How do you manage to get the kernel in?  sparc32_defconfig
> ends up with 5316876 bytes unpacked...

Incidentally, trying to load it via -kernel/-initrd leads to
Configuration device id QEMU version 1 machine id 64
Probing SBus slot 0 offset 0
Probing SBus slot 1 offset 0
Probing SBus slot 2 offset 0
Probing SBus slot 3 offset 0
Probing SBus slot 15 offset 0
Invalid FCode start byte
CPUs: 1 x TI,TMS390Z55
UUID: 00000000-0000-0000-0000-000000000000
Welcome to OpenBIOS v1.1 built on Dec 27 2018 19:17
  Type 'help' for detailed information
[sparc] Kernel already loaded
switching to new context:
PROMLIB: obio_ranges 1
PROMLIB: Sun Boot Prom Version 3 Revision 2
Linux version 5.7.0-rc1-00002-gcf51e129b968 (al@duke) (gcc version 6.3.0 20170516 (Debian 6.3.0-18), GNU ld (GNU Binutils for Debian) 2.28) #32 Thu May 21 18:36:07 EDT 2020
printk: bootconsole [earlyprom0] enabled
ARCH: SUN4M
TYPE: Sun4m SparcStation10/20
Ethernet address: 52:54:00:12:34:56
303MB HIGHMEM available.
OF stdout device is: /obio/zs@0,100000:a
PROM: Built device tree with 30051 bytes of memory.
Booting Linux...
Power off control detected.
Kernel panic - not syncing: Failed to allocate memory for percpu areas.
CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-rc1-00002-gcf51e129b968 #32
[f04f92a8 : 
setup_per_cpu_areas+0x58/0x90 ] 
[f04edbf4 : 
start_kernel+0xc0/0x4a0 ] 
[f04ed43c : 
continue_boot+0x324/0x334 ] 
[00000000 : 
0x0 ] 

Press Stop-A (L1-A) from sun keyboard or send break
twice on console to return to the boot prom
---[ end Kernel panic - not syncing: Failed to allocate memory for percpu areas. ]---

Giving guest more RAM doesn't change the outcome (well, the number HIGHMEM line is
obviously higher, but that's it).

So which sparc32 kernel have you booted with defconfig and how have you done
that?

^ permalink raw reply

* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Guenter Roeck @ 2020-05-22  1:11 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Paul Mackerras,
	H. Peter Anvin, sparclinux, ira.weiny, Dan Williams, Helge Deller,
	x86, linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
	linux-xtensa, Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, Christian Koenig, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <20200522004618.GA3151350@ZenIV.linux.org.uk>

On 5/21/20 5:46 PM, Al Viro wrote:
> On Thu, May 21, 2020 at 11:46:12PM +0100, Al Viro wrote:
>> On Thu, May 21, 2020 at 03:20:46PM -0700, Guenter Roeck wrote:
>>> On 5/21/20 10:27 AM, Al Viro wrote:
>>>> On Tue, May 19, 2020 at 09:54:22AM -0700, Guenter Roeck wrote:
>>>>> On Mon, May 18, 2020 at 11:48:43AM -0700, ira.weiny@intel.com wrote:
>>>>>> From: Ira Weiny <ira.weiny@intel.com>
>>>>>>
>>>>>> The kunmap_atomic clean up failed to remove one set of pagefault/preempt
>>>>>> enables when vaddr is not in the fixmap.
>>>>>>
>>>>>> Fixes: bee2128a09e6 ("arch/kunmap_atomic: consolidate duplicate code")
>>>>>> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
>>>>>
>>>>> microblazeel works with this patch, as do the nosmp sparc32 boot tests,
>>>>> but sparc32 boot tests with SMP enabled still fail with lots of messages
>>>>> such as:
>>>>
>>>> BTW, what's your setup for sparc32 boot tests?  IOW, how do you manage to
>>>> shrink the damn thing enough to have the loader cope with it?  I hadn't
>>>> been able to do that for the current mainline ;-/
>>>>
>>>
>>> defconfig seems to work just fine, even after enabling various debug
>>> and file system options.
>>
>> The hell?  How do you manage to get the kernel in?  sparc32_defconfig
>> ends up with 5316876 bytes unpacked...
> 
> Incidentally, trying to load it via -kernel/-initrd leads to
> Configuration device id QEMU version 1 machine id 64
> Probing SBus slot 0 offset 0
> Probing SBus slot 1 offset 0
> Probing SBus slot 2 offset 0
> Probing SBus slot 3 offset 0
> Probing SBus slot 15 offset 0
> Invalid FCode start byte
> CPUs: 1 x TI,TMS390Z55
> UUID: 00000000-0000-0000-0000-000000000000
> Welcome to OpenBIOS v1.1 built on Dec 27 2018 19:17
>   Type 'help' for detailed information
> [sparc] Kernel already loaded
> switching to new context:
> PROMLIB: obio_ranges 1
> PROMLIB: Sun Boot Prom Version 3 Revision 2
> Linux version 5.7.0-rc1-00002-gcf51e129b968 (al@duke) (gcc version 6.3.0 20170516 (Debian 6.3.0-18), GNU ld (GNU Binutils for Debian) 2.28) #32 Thu May 21 18:36:07 EDT 2020
> printk: bootconsole [earlyprom0] enabled
> ARCH: SUN4M
> TYPE: Sun4m SparcStation10/20
> Ethernet address: 52:54:00:12:34:56
> 303MB HIGHMEM available.
> OF stdout device is: /obio/zs@0,100000:a
> PROM: Built device tree with 30051 bytes of memory.
> Booting Linux...
> Power off control detected.
> Kernel panic - not syncing: Failed to allocate memory for percpu areas.
> CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-rc1-00002-gcf51e129b968 #32
> [f04f92a8 : 
> setup_per_cpu_areas+0x58/0x90 ] 
> [f04edbf4 : 
> start_kernel+0xc0/0x4a0 ] 
> [f04ed43c : 
> continue_boot+0x324/0x334 ] 
> [00000000 : 
> 0x0 ] 
> 
> Press Stop-A (L1-A) from sun keyboard or send break
> twice on console to return to the boot prom
> ---[ end Kernel panic - not syncing: Failed to allocate memory for percpu areas. ]---
> 
> Giving guest more RAM doesn't change the outcome (well, the number HIGHMEM line is
> obviously higher, but that's it).
> 
> So which sparc32 kernel have you booted with defconfig and how have you done
> that?
> 

Mainline, with:

qemu-system-sparc -M SS-4 -kernel arch/sparc/boot/zImage -no-reboot \
	-snapshot -drive file=rootfs.ext2,format=raw,if=scsi \
	-append "panic=-1 slub_debug=FZPUA root=/dev/sda console=ttyS0"
	-nographic -monitor none

The machine doesn't really matter, though. The root file system is built
with buildroot.

Note that I carry two reverts in my qemu images.

Revert "tcx: switch to load_image_mr() and remove prom_addr hack"
Revert "cg3: switch to load_image_mr() and remove prom-addr hack"

I have been carrying those since ~2017. I didn't check recently
if they are still needed.

If sparc32 is no longer supported in the upstream kernel,
would it possibly make sense remove its support ?

Guenter

^ permalink raw reply

* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Al Viro @ 2020-05-22  1:29 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Paul Mackerras,
	H. Peter Anvin, sparclinux, ira.weiny, Dan Williams, Helge Deller,
	x86, linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
	linux-xtensa, Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, Christian Koenig, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <970857bd-bb56-7b2e-833e-ca74a82fa9b5@roeck-us.net>

On Thu, May 21, 2020 at 06:11:08PM -0700, Guenter Roeck wrote:

> Mainline, with:
> 
> qemu-system-sparc -M SS-4 -kernel arch/sparc/boot/zImage -no-reboot \
> 	-snapshot -drive file=rootfs.ext2,format=raw,if=scsi \
> 	-append "panic=-1 slub_debug=FZPUA root=/dev/sda console=ttyS0"
> 	-nographic -monitor none
> 
> The machine doesn't really matter, though.

It does, unfortunately - try that with SS-10 and watch what happens ;-/

^ permalink raw reply

* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Al Viro @ 2020-05-22  1:35 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Paul Mackerras,
	H. Peter Anvin, sparclinux, ira.weiny, Dan Williams, Helge Deller,
	x86, linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
	linux-xtensa, Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, Christian Koenig, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <20200522012950.GN23230@ZenIV.linux.org.uk>

On Fri, May 22, 2020 at 02:29:50AM +0100, Al Viro wrote:
> On Thu, May 21, 2020 at 06:11:08PM -0700, Guenter Roeck wrote:
> 
> > Mainline, with:
> > 
> > qemu-system-sparc -M SS-4 -kernel arch/sparc/boot/zImage -no-reboot \
> > 	-snapshot -drive file=rootfs.ext2,format=raw,if=scsi \
> > 	-append "panic=-1 slub_debug=FZPUA root=/dev/sda console=ttyS0"
> > 	-nographic -monitor none
> > 
> > The machine doesn't really matter, though.
> 
> It does, unfortunately - try that with SS-10 and watch what happens ;-/

Ugh...  It's actually something in -m handling: -m 256 passes, -m 512
leads to that panic.

^ permalink raw reply

* Re: Endless soft-lockups for compiling workload since next-20200519
From: Qian Cai @ 2020-05-22  2:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Frederic Weisbecker, Linux Kernel Mailing List,
	Borislav Petkov, Thomas Gleixner, linuxppc-dev
In-Reply-To: <20200521093938.GG325280@hirez.programming.kicks-ass.net>

On Thu, May 21, 2020 at 11:39:38AM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2020 at 02:40:36AM +0200, Frederic Weisbecker wrote:
> > On Wed, May 20, 2020 at 02:50:56PM +0200, Peter Zijlstra wrote:
> > > On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote:
> > > > Just a head up. Repeatedly compiling kernels for a while would trigger
> > > > endless soft-lockups since next-20200519 on both x86_64 and powerpc.
> > > > .config are in,
> > > 
> > > Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've
> > > not seen anything like that myself. Let me go have a look.
> > > 
> > > 
> > > In as far as the logs are readable (they're a wrapped mess, please don't
> > > do that!), they contain very little useful, as is typical with IPIs :/
> > > 
> > > > [ 1167.993773][    C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127
> > > > flush_smp_call_function_queue+0x1fa/0x2e0
> > 
> > So I've tried to think of a race that could produce that and here is
> > the only thing I could come up with. It's a bit complicated unfortunately:
> 
> This:
> 
> >         smp_call_function_single_async() {             smp_call_function_single_async() {
> >             // verified csd->flags != CSD_LOCK             // verified csd->flags != CSD_LOCK
> >             csd->flags = CSD_LOCK                          csd->flags = CSD_LOCK
> 
> concurrent smp_call_function_single_async() using the same csd is what
> I'm looking at as well. Now in the ILB case there is an easy cure:
> 
> (because there is only a single ilb target)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 01f94cf52783..b6d8a7b991f0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10033,7 +10033,7 @@ static void kick_ilb(unsigned int flags)
>  	 * is idle. And the softirq performing nohz idle load balance
>  	 * will be run before returning from the IPI.
>  	 */
> -	smp_call_function_single_async(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd);
> +	smp_call_function_single_async(ilb_cpu, &this_rq()->nohz_csd);
>  }
>  
>  /*
> 
> Qian, can you give that a spin?

Running for a few hours now. It works fine.

^ permalink raw reply

* Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
From: Guenter Roeck @ 2020-05-22  2:28 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Paul Mackerras,
	H. Peter Anvin, sparclinux, ira.weiny, Dan Williams, Helge Deller,
	x86, linux-csky, Christoph Hellwig, Ingo Molnar, linux-snps-arc,
	linux-xtensa, Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, Christian Koenig, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <20200522013523.GO23230@ZenIV.linux.org.uk>

On Fri, May 22, 2020 at 02:35:23AM +0100, Al Viro wrote:
> On Fri, May 22, 2020 at 02:29:50AM +0100, Al Viro wrote:
> > On Thu, May 21, 2020 at 06:11:08PM -0700, Guenter Roeck wrote:
> > 
> > > Mainline, with:
> > > 
> > > qemu-system-sparc -M SS-4 -kernel arch/sparc/boot/zImage -no-reboot \
> > > 	-snapshot -drive file=rootfs.ext2,format=raw,if=scsi \
> > > 	-append "panic=-1 slub_debug=FZPUA root=/dev/sda console=ttyS0"
> > > 	-nographic -monitor none
> > > 
> > > The machine doesn't really matter, though.
> > 
> > It does, unfortunately - try that with SS-10 and watch what happens ;-/
> 
> Ugh...  It's actually something in -m handling: -m 256 passes, -m 512
> leads to that panic.

Default seems to be 128M. Anyway, see below log for SS-10.

Guenter

---
Configuration device id QEMU version 1 machine id 64
Probing SBus slot 0 offset 0
Probing SBus slot 1 offset 0
Probing SBus slot 2 offset 0
Probing SBus slot 3 offset 0
Probing SBus slot 15 offset 0
Invalid FCode start byte
CPUs: 1 x TI,TMS390Z55
UUID: 00000000-0000-0000-0000-000000000000
Welcome to OpenBIOS v1.1 built on Oct 28 2019 17:08
  Type 'help' for detailed information
[sparc] Kernel already loaded
switching to new context:
PROMLIB: obio_ranges 1
PROMLIB: Sun Boot Prom Version 3 Revision 2
Linux version 5.7.0-rc6-00026-g03fb3acae4be (groeck@server.roeck-us.net) (gcc version 6.5.0 (Buildroot 2018.11-rc2-00071-g4310260), GNU ld (GNU Binutils) 2.31.1) #1 Thu May 21 19:17:48 PDT 2020
printk: bootconsole [earlyprom0] enabled
ARCH: SUN4M
TYPE: Sun4m SparcStation10/20
Ethernet address: 52:54:00:12:34:56
OF stdout device is: /obio/zs@0,100000:a
PROM: Built device tree with 30586 bytes of memory.
Booting Linux...
Power off control detected.
Built 1 zonelists, mobility grouping on.  Total pages: 25012
Kernel command line: panic=-1 slub_debug=FZPUA root=/dev/sda console=ttyS0
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
Sorting __ex_table...
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 103428K/100944K available (5050K kernel code, 178K rwdata, 1212K rodata, 176K init, 158K bss, 4294964812K reserved, 0K cma-reserved, 0K highmem)
NR_IRQS: 64
Console: colour dummy device 80x25
------------------------
| Locking API testsuite:
----------------------------------------------------------------------------
                                 | spin |wlock |rlock |mutex | wsem | rsem |
  --------------------------------------------------------------------------
                     A-A deadlock:failed|failed|  ok  |failed|failed|failed|failed|
                 A-B-B-A deadlock:failed|failed|  ok  |failed|failed|failed|failed|
             A-B-B-C-C-A deadlock:failed|failed|  ok  |failed|failed|failed|failed|
             A-B-C-A-B-C deadlock:failed|failed|  ok  |failed|failed|failed|failed|
         A-B-B-C-C-D-D-A deadlock:failed|failed|  ok  |failed|failed|failed|failed|
         A-B-C-D-B-D-D-A deadlock:failed|failed|  ok  |failed|failed|failed|failed|
         A-B-C-D-B-C-D-A deadlock:failed|failed|  ok  |failed|failed|failed|failed|
                    double unlock:  ok  |  ok  |failed|  ok  |failed|failed|  ok  |
                  initialize held:failed|failed|failed|failed|failed|failed|failed|
  --------------------------------------------------------------------------
              recursive read-lock:             |  ok  |             |failed|
           recursive read-lock #2:             |  ok  |             |failed|
            mixed read-write-lock:             |failed|             |failed|
            mixed write-read-lock:             |failed|             |failed|
  mixed read-lock/lock-write ABBA:             |failed|             |failed|
   mixed read-lock/lock-read ABBA:             |  ok  |             |failed|
 mixed write-lock/lock-write ABBA:             |failed|             |failed|
  --------------------------------------------------------------------------
     hard-irqs-on + irq-safe-A/12:failed|failed|  ok  |
     soft-irqs-on + irq-safe-A/12:failed|failed|  ok  |
     hard-irqs-on + irq-safe-A/21:failed|failed|  ok  |
     soft-irqs-on + irq-safe-A/21:failed|failed|  ok  |
       sirq-safe-A => hirqs-on/12:failed|failed|  ok  |
       sirq-safe-A => hirqs-on/21:failed|failed|  ok  |
         hard-safe-A + irqs-on/12:failed|failed|  ok  |
         soft-safe-A + irqs-on/12:failed|failed|  ok  |
         hard-safe-A + irqs-on/21:failed|failed|  ok  |
         soft-safe-A + irqs-on/21:failed|failed|  ok  |
    hard-safe-A + unsafe-B #1/123:failed|failed|  ok  |
    soft-safe-A + unsafe-B #1/123:failed|failed|  ok  |
    hard-safe-A + unsafe-B #1/132:failed|failed|  ok  |
    soft-safe-A + unsafe-B #1/132:failed|failed|  ok  |
    hard-safe-A + unsafe-B #1/213:failed|failed|  ok  |
    soft-safe-A + unsafe-B #1/213:failed|failed|  ok  |
    hard-safe-A + unsafe-B #1/231:failed|failed|  ok  |
    soft-safe-A + unsafe-B #1/231:failed|failed|  ok  |
    hard-safe-A + unsafe-B #1/312:failed|failed|  ok  |
    soft-safe-A + unsafe-B #1/312:failed|failed|  ok  |
    hard-safe-A + unsafe-B #1/321:failed|failed|  ok  |
    soft-safe-A + unsafe-B #1/321:failed|failed|  ok  |
    hard-safe-A + unsafe-B #2/123:failed|failed|  ok  |
    soft-safe-A + unsafe-B #2/123:failed|failed|  ok  |
    hard-safe-A + unsafe-B #2/132:failed|failed|  ok  |
    soft-safe-A + unsafe-B #2/132:failed|failed|  ok  |
    hard-safe-A + unsafe-B #2/213:failed|failed|  ok  |
    soft-safe-A + unsafe-B #2/213:failed|failed|  ok  |
    hard-safe-A + unsafe-B #2/231:failed|failed|  ok  |
    soft-safe-A + unsafe-B #2/231:failed|failed|  ok  |
    hard-safe-A + unsafe-B #2/312:failed|failed|  ok  |
    soft-safe-A + unsafe-B #2/312:failed|failed|  ok  |
    hard-safe-A + unsafe-B #2/321:failed|failed|  ok  |
    soft-safe-A + unsafe-B #2/321:failed|failed|  ok  |
      hard-irq lock-inversion/123:failed|failed|  ok  |
      soft-irq lock-inversion/123:failed|failed|  ok  |
      hard-irq lock-inversion/132:failed|failed|  ok  |
      soft-irq lock-inversion/132:failed|failed|  ok  |
      hard-irq lock-inversion/213:failed|failed|  ok  |
      soft-irq lock-inversion/213:failed|failed|  ok  |
      hard-irq lock-inversion/231:failed|failed|  ok  |
      soft-irq lock-inversion/231:failed|failed|  ok  |
      hard-irq lock-inversion/312:failed|failed|  ok  |
      soft-irq lock-inversion/312:failed|failed|  ok  |
      hard-irq lock-inversion/321:failed|failed|  ok  |
      soft-irq lock-inversion/321:failed|failed|  ok  |
      hard-irq read-recursion/123:  ok  |
      soft-irq read-recursion/123:  ok  |
      hard-irq read-recursion/132:  ok  |
      soft-irq read-recursion/132:  ok  |
      hard-irq read-recursion/213:  ok  |
      soft-irq read-recursion/213:  ok  |
      hard-irq read-recursion/231:  ok  |
      soft-irq read-recursion/231:  ok  |
      hard-irq read-recursion/312:  ok  |
      soft-irq read-recursion/312:  ok  |
      hard-irq read-recursion/321:  ok  |
      soft-irq read-recursion/321:  ok  |
  --------------------------------------------------------------------------
  | Wound/wait tests |
  ---------------------
                  ww api failures:  ok  |  ok  |  ok  |
               ww contexts mixing:failed|  ok  |
             finishing ww context:  ok  |  ok  |  ok  |  ok  |
               locking mismatches:  ok  |  ok  |  ok  |
                 EDEADLK handling:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
           spinlock nest unlocked:failed|
  -----------------------------------------------------
                                 |block | try  |context|
  -----------------------------------------------------
                          context:failed|  ok  |  ok  |
                              try:failed|  ok  |failed|
                            block:failed|  ok  |failed|
                         spinlock:failed|  ok  |failed|
--------------------------------------------------------
164 out of 262 testcases failed, as expected. |
----------------------------------------------------
clocksource: timer_cs: mask: 0xffffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 1763180808480 ns
Calibrating delay loop... 518.14 BogoMIPS (lpj=1036288)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
devtmpfs: initialized
random: get_random_u32 called from bucket_table_alloc.isra.22+0x50/0x1e4 with crng_init=0
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
futex hash table entries: 256 (order: 0, 7168 bytes, linear)
NET: Registered protocol family 16
IOMMU: impl 0 vers 3 table 0x(ptrval)[262144 B] map [65536 b]
vgaarb: loaded
SCSI subsystem initialized
clocksource: Switched to clocksource timer_cs
NET: Registered protocol family 2
tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 6144 bytes, linear)
TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
TCP bind hash table entries: 1024 (order: 2, 20480 bytes, linear)
TCP: Hash tables configured (established 1024 bind 1024)
UDP hash table entries: 256 (order: 1, 12288 bytes, linear)
UDP-Lite hash table entries: 256 (order: 1, 12288 bytes, linear)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
PCI: CLS 0 bytes, default 32
apc: power management initialized
random: fast init done
workingset: timestamp_bits=30 max_order=15 bucket_order=0
squashfs: version 4.0 (2009/01/31) Phillip Lougher
jitterentropy: Initialization failed with host not compliant with requirements: 2
io scheduler mq-deadline registered
io scheduler kyber registered
String selftests succeeded
test_firmware: interface ready
test passed
test_bitmap: loaded.
test_bitmap: parselist: 14: input is '0-2047:128/256' OK, Time: 2000
test_bitmap: parselist_user: 14: input is '0-2047:128/256' OK, Time: 3000
test_bitmap: all 1663 tests passed
test_uuid: all 18 tests passed
crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
crc32: self tests passed, processed 225944 bytes in 450500 nsec
crc32c: CRC_LE_BITS = 64
crc32c: self tests passed, processed 225944 bytes in 249000 nsec
crc32_combine: 8373 self tests passed
crc32c_combine: 8373 self tests passed
rbtree testing
 -> test 1 (latency of nnodes insert+delete): 0 cycles
 -> test 2 (latency of nnodes cached insert+delete): 0 cycles
 -> test 3 (latency of inorder traversal): 0 cycles
 -> test 4 (latency to fetch first node)
        non-cached: 0 cycles
        cached: 0 cycles
augmented rbtree testing
 -> test 1 (latency of nnodes insert+delete): 0 cycles
 -> test 2 (latency of nnodes cached insert+delete): 0 cycles
interval tree insert/remove
 -> 0 cycles
interval tree search
 -> 0 cycles (2692 results)
ffd374f8: ttyS0 at MMIO 0xf1100000 (irq = 5, base_baud = 307200) is a zs
Console: ttyS0 (SunZilog zs0)
printk: console [ttyS0] enabled
printk: console [ttyS0] enabled
printk: bootconsole [earlyprom0] disabled
printk: bootconsole [earlyprom0] disabled
ffd374f8: ttyS1 at MMIO 0xf1100004 (irq = 5, base_baud = 307200) is a zs
ffd37770: Keyboard at MMIO 0xf1000000 (irq = 5) is a zs
ffd37770: Mouse at MMIO 0xf1000004 (irq = 5) is a zs
brd: module loaded
esp ffd392ec: esp0: regs[(ptrval):(ptrval)] irq[2]
esp ffd392ec: esp0: is a FAS100A, 40 MHz (ccf=0), SCSI ID 7
scsi host0: esp
scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
scsi target0:0:0: Beginning Domain Validation
scsi target0:0:0: Domain Validation skipping write tests
scsi target0:0:0: Ending Domain Validation
scsi 0:0:2:0: CD-ROM            QEMU     QEMU CD-ROM      2.5+ PQ: 0 ANSI: 5
scsi target0:0:2: Beginning Domain Validation
scsi target0:0:2: Domain Validation skipping write tests
scsi target0:0:2: Ending Domain Validation
sr 0:0:2:0: [sr0] scsi3-mmc drive: 16x/50x cd/rw xa/form2 cdda tray
cdrom: Uniform CD-ROM driver Revision: 3.20
sd 0:0:0:0: [sda] 40960 512-byte logical blocks: (21.0 MB/20.0 MiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] Attached SCSI disk
ioremap: done with statics, switching to malloc
SunLance: using auto-carrier-detection.
eth0: LANCE 52:54:00:12:34:56
rtc-m48t59 rtc-m48t59.0: IRQ index 0 not found
rtc-m48t59 rtc-m48t59.0: registered as rtc0
rtc-m48t59 rtc-m48t59.0: setting system clock to 2020-05-22T02:19:50 UTC (1590113990)
sdhci: Secure Digital Host Controller Interface driver
sdhci: Copyright(c) Pierre Ossman
NET: Registered protocol family 10
Segment Routing with IPv6
sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
NET: Registered protocol family 17
TAP version 14
    # Subtest: sysctl_test
    1..10
    ok 1 - sysctl_test_api_dointvec_null_tbl_data
    ok 2 - sysctl_test_api_dointvec_table_maxlen_unset
    ok 3 - sysctl_test_api_dointvec_table_len_is_zero
    ok 4 - sysctl_test_api_dointvec_table_read_but_position_set
    ok 5 - sysctl_test_dointvec_read_happy_single_positive
    ok 6 - sysctl_test_dointvec_read_happy_single_negative
    ok 7 - sysctl_test_dointvec_write_happy_single_positive
    ok 8 - sysctl_test_dointvec_write_happy_single_negative
    ok 9 - sysctl_test_api_dointvec_write_single_less_int_min
    ok 10 - sysctl_test_api_dointvec_write_single_greater_int_max
ok 1 - sysctl_test
    # Subtest: ext4_inode_test
    1..1
    ok 1 - inode_test_xtimestamp_decoding
ok 2 - ext4_inode_test
    # Subtest: kunit-try-catch-test
    1..2
    ok 1 - kunit_test_try_catch_successful_try_no_catch
    ok 2 - kunit_test_try_catch_unsuccessful_try_does_catch
ok 3 - kunit-try-catch-test
    # Subtest: kunit-resource-test
    1..5
    ok 1 - kunit_resource_test_init_resources
    ok 2 - kunit_resource_test_alloc_resource
    ok 3 - kunit_resource_test_destroy_resource
    ok 4 - kunit_resource_test_cleanup_resources
    ok 5 - kunit_resource_test_proper_free_ordering
ok 4 - kunit-resource-test
    # Subtest: kunit-log-test
    1..1
put this in log.
this too.
add to suite log.
along with this.
    ok 1 - kunit_log_test
ok 5 - kunit-log-test
    # Subtest: string-stream-test
    1..3
    ok 1 - string_stream_test_empty_on_creation
    ok 2 - string_stream_test_not_empty_after_add
    ok 3 - string_stream_test_get_string
ok 6 - string-stream-test
    # Subtest: list-kunit-test
    1..36
    ok 1 - list_test_list_init
    ok 2 - list_test_list_add
    ok 3 - list_test_list_add_tail
    ok 4 - list_test_list_del
    ok 5 - list_test_list_replace
    ok 6 - list_test_list_replace_init
    ok 7 - list_test_list_swap
    ok 8 - list_test_list_del_init
    ok 9 - list_test_list_move
    ok 10 - list_test_list_move_tail
    ok 11 - list_test_list_bulk_move_tail
    ok 12 - list_test_list_is_first
    ok 13 - list_test_list_is_last
    ok 14 - list_test_list_empty
    ok 15 - list_test_list_empty_careful
    ok 16 - list_test_list_rotate_left
    ok 17 - list_test_list_rotate_to_front
    ok 18 - list_test_list_is_singular
    ok 19 - list_test_list_cut_position
    ok 20 - list_test_list_cut_before
    ok 21 - list_test_list_splice
    ok 22 - list_test_list_splice_tail
    ok 23 - list_test_list_splice_init
    ok 24 - list_test_list_splice_tail_init
    ok 25 - list_test_list_entry
    ok 26 - list_test_list_first_entry
    ok 27 - list_test_list_last_entry
    ok 28 - list_test_list_first_entry_or_null
    ok 29 - list_test_list_next_entry
    ok 30 - list_test_list_prev_entry
    ok 31 - list_test_list_for_each
    ok 32 - list_test_list_for_each_prev
    ok 33 - list_test_list_for_each_safe
    ok 34 - list_test_list_for_each_prev_safe
    ok 35 - list_test_list_for_each_entry
    ok 36 - list_test_list_for_each_entry_reverse
ok 7 - list-kunit-test
    # Subtest: qos-kunit-test
    1..3
    ok 1 - freq_qos_test_min
    ok 2 - freq_qos_test_maxdef
    ok 3 - freq_qos_test_readd
ok 8 - qos-kunit-test
EXT4-fs (sda): mounted filesystem without journal. Opts: (null)
VFS: Mounted root (ext4 filesystem) readonly on device 8:0.
devtmpfs: mounted
Freeing unused kernel memory: 176K
This architecture does not have kernel memory protection.
Run /sbin/init as init process
EXT4-fs (sda): re-mounted. Opts: (null)
ext4 filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
Starting syslogd: OK
Starting klogd: OK
Initializing random number generator... random: dd: uninitialized urandom read (512 bytes read)
done.
Starting network: OK
Found console ttyS0

Linux version 5.7.0-rc6-00026-g03fb3acae4be (groeck@server.roeck-us.net) (gcc version 6.5.0 (Buildroot 2018.11-rc2-00071-g4310260), GNU ld (GNU Binutils) 2.31.1) #1 Thu May 21 19:17:48 PDT 2020
Boot successful.
Rebooting
Found console ttyS0
Stopping network: OK
Saving random seed... random: dd: uninitialized urandom read (512 bytes read)
done.
Stopping klogd: OK
Stopping syslogd: OK
umount: devtmpfs busy - remounted read-only
EXT4-fs (sda): re-mounted. Opts: (null)
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system reboot
sd 0:0:0:0: [sda] Synchronizing SCSI cache
reboot: Restarting system

^ permalink raw reply

* Re: [PATCH V3 0/3] arm64: Enable vmemmap mapping from device memory
From: Jia He @ 2020-05-22  3:58 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Mark Rutland, Michal Hocko, Thomas Gleixner, David Hildenbrand,
	Peter Zijlstra, Catalin Marinas, Dave Hansen, Paul Mackerras,
	linux-ia64, linux-riscv, Will Deacon, jgg, aneesh.kumar, x86,
	Matthew Wilcox (Oracle), Mike Rapoport, Kaly Xin, Ingo Molnar,
	Fenghua Yu, rcampbell, Pavel Tatashin, jglisse, Andy Lutomirski,
	Paul Walmsley, dan.j.williams, linux-arm-kernel, Tony Luck,
	linuxppc-dev, linux-kernel, Palmer Dabbelt, Andrew Morton,
	robin.murphy, Kirill A. Shutemov
In-Reply-To: <1585631387-18819-1-git-send-email-anshuman.khandual@arm.com>

Hi

On 2020/3/31 13:09, Anshuman Khandual wrote:
> This series enables vmemmap backing memory allocation from device memory
> ranges on arm64. But before that, it enables vmemmap_populate_basepages()
> and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
> alocation requests.

I verified no obvious regression after this patch series.

Host: ThunderX2(armv8a server), kernel v5.4

qemu:v3.1, -M virt \

-object 
memory-backend-file,id=mem1,share=on,mem-path=/tmp2/nvdimm.img,size=4G,align=2M \

-device nvdimm,id=nvdimm1,memdev=mem1,label-size=2M

Guest: kernel v5.7.0-rc5 with this patch series.

Tested case:

- 4K PAGESIZE, boot, mount w/ -o dax, mount w/o -o dax, basic io

- 64K PAGESIZE,boot, mount w/ -o dax, mount w/o -o dax, basic io

Not tested:

- 16K pagesize due to my hardware limiation(can't run 16K pgsz kernel)

- hot-add/remove nvdimm device from qemu due to no fully support on arm64 qemu yet

- Host nvdimm device hotplug

Hence from above result,

Tested-by: Jia He <justin.he@arm.com>

> This series applies after latest (v14) arm64 memory hot remove series
> (https://lkml.org/lkml/2020/3/3/1746) on Linux 5.6.
>
> Pending Question:
>
> altmap_alloc_block_buf() does not have any other remaining users in the
> tree after this change. Should it be converted into a static function and
> it's declaration be dropped from the header (include/linux/mm.h). Avoided
> doing so because I was not sure if there are any off-tree users or not.
>
> Changes in V3:
>
> - Dropped comment from free_hotplug_page_range() per Robin
> - Modified comment in unmap_hotplug_range() per Robin
> - Enabled altmap support in vmemmap_alloc_block_buf() per Robin
>
> Changes in V2: (https://lkml.org/lkml/2020/3/4/475)
>
> - Rebased on latest hot-remove series (v14) adding P4D page table support
>
> Changes in V1: (https://lkml.org/lkml/2020/1/23/12)
>
> - Added an WARN_ON() in unmap_hotplug_range() when altmap is
>    provided without the page table backing memory being freed
>
> Changes in RFC V2: (https://lkml.org/lkml/2019/10/21/11)
>
> - Changed the commit message on 1/2 patch per Will
> - Changed the commit message on 2/2 patch as well
> - Rebased on arm64 memory hot remove series (v10)
>
> RFC V1: (https://lkml.org/lkml/2019/6/28/32)
>
> Cc: Catalin Marinas<catalin.marinas@arm.com>
> Cc: Will Deacon<will@kernel.org>
> Cc: Mark Rutland<mark.rutland@arm.com>
> Cc: Paul Walmsley<paul.walmsley@sifive.com>
> Cc: Palmer Dabbelt<palmer@dabbelt.com>
> Cc: Tony Luck<tony.luck@intel.com>
> Cc: Fenghua Yu<fenghua.yu@intel.com>
> Cc: Dave Hansen<dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski<luto@kernel.org>
> Cc: Peter Zijlstra<peterz@infradead.org>
> Cc: Thomas Gleixner<tglx@linutronix.de>
> Cc: Ingo Molnar<mingo@redhat.com>
> Cc: David Hildenbrand<david@redhat.com>
> Cc: Mike Rapoport<rppt@linux.ibm.com>
> Cc: Michal Hocko<mhocko@suse.com>
> Cc: "Matthew Wilcox (Oracle)"<willy@infradead.org>
> Cc: "Kirill A. Shutemov"<kirill.shutemov@linux.intel.com>
> Cc: Andrew Morton<akpm@linux-foundation.org>
> Cc: Dan Williams<dan.j.williams@intel.com>
> Cc: Pavel Tatashin<pasha.tatashin@soleen.com>
> Cc: Benjamin Herrenschmidt<benh@kernel.crashing.org>
> Cc: Paul Mackerras<paulus@samba.org>
> Cc: Michael Ellerman<mpe@ellerman.id.au>
> Cc:linux-arm-kernel@lists.infradead.org
> Cc:linux-ia64@vger.kernel.org
> Cc:linux-riscv@lists.infradead.org
> Cc:x86@kernel.org
> Cc:linuxppc-dev@lists.ozlabs.org
> Cc:linux-mm@kvack.org
> Cc:linux-kernel@vger.kernel.org
>
> Anshuman Khandual (3):
>    mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages()
>    mm/sparsemem: Enable vmem_altmap support in vmemmap_alloc_block_buf()
>    arm64/mm: Enable vmem_altmap support for vmemmap mappings
>
>   arch/arm64/mm/mmu.c       | 59 ++++++++++++++++++++++++++-------------
>   arch/ia64/mm/discontig.c  |  2 +-
>   arch/powerpc/mm/init_64.c | 10 +++----
>   arch/riscv/mm/init.c      |  2 +-
>   arch/x86/mm/init_64.c     | 12 ++++----
>   include/linux/mm.h        |  8 ++++--
>   mm/sparse-vmemmap.c       | 38 ++++++++++++++++++++-----
>   7 files changed, 87 insertions(+), 44 deletions(-)
>
-- 

---
Cheers,
Justin (Jia He)


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox