LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 02/11] powerpc/kexec_file: mark PPC64 specific code
From: Hari Bathini @ 2020-06-29  6:23 UTC (permalink / raw)
  To: Christophe Leroy, Michael Ellerman, Andrew Morton
  Cc: Pingfan Liu, Kexec-ml, Petr Tesarik, Mahesh J Salgaonkar,
	Sourabh Jain, lkml, linuxppc-dev, Mimi Zohar, Vivek Goyal,
	Dave Young, Thiago Jung Bauermann, Eric Biederman
In-Reply-To: <68d59c00-da02-b362-7bd9-a9631eca0fdd@csgroup.eu>

Hi Christophe

Thanks for the review...

On 27/06/20 12:12 pm, Christophe Leroy wrote:
> 
> 
> Le 26/06/2020 à 21:04, Hari Bathini a écrit :
>> Some of the kexec_file_load code isn't PPC64 specific. Move PPC64
>> specific code from kexec/file_load.c to kexec/file_load_64.c. Also,
>> rename purgatory/trampoline.S to purgatory/trampoline_64.S in the
>> same spirit.
> 
> At the time being, CONFIG_KEXEC_FILE depends on PPC64.

Right.

> Are you planning to make it work on PPC32 as well ?

No.

> Otherwise I don't understand the purpose of this patch.

But I want to make sure the changes I am adding in this series do not
get in the way of adding PPC32 changes whenever they are submitted as there
is common code currently and some more of it in the changes I am adding
in this series...

> Also, what is being done in this patch seems to go far beyond what you describe above.> It is propably worth splitting in several patches with proper explanation.

Hmmm.. I don't see any other reason beyond what I mentioned above.
Will try to split the patch but the changelog would still be the same, afaics.

> Christophe
> 
>>
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>> ---
>>   arch/powerpc/include/asm/kexec.h       |   11 +++
>>   arch/powerpc/kexec/Makefile            |    2 -
>>   arch/powerpc/kexec/elf_64.c            |    7 +-
>>   arch/powerpc/kexec/file_load.c         |   37 ++--------
>>   arch/powerpc/kexec/file_load_64.c      |  108 ++++++++++++++++++++++++++++++
>>   arch/powerpc/purgatory/Makefile        |    4 +
>>   arch/powerpc/purgatory/trampoline.S    |  117 --------------------------------
>>   arch/powerpc/purgatory/trampoline_64.S |  117 ++++++++++++++++++++++++++++++++
>>   8 files changed, 248 insertions(+), 155 deletions(-)
>>   create mode 100644 arch/powerpc/kexec/file_load_64.c
>>   delete mode 100644 arch/powerpc/purgatory/trampoline.S
>>   create mode 100644 arch/powerpc/purgatory/trampoline_64.S

Thanks
Hari

^ permalink raw reply

* Re: [PATCH v2] powerpc: Drop CONFIG_MTD_M25P80 in 85xx-hw.config
From: Michael Ellerman @ 2020-06-29  6:06 UTC (permalink / raw)
  To: Bin Meng, linuxppc-dev, linux-kernel; +Cc: Bin Meng
In-Reply-To: <CAEUhbmUj4iC1+4Y=93zpj+aCBqU1ySOHXvQgJHmxNx__UWduCQ@mail.gmail.com>

Bin Meng <bmeng.cn@gmail.com> writes:
> On Sat, May 2, 2020 at 12:45 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>
>> From: Bin Meng <bin.meng@windriver.com>
>>
>> Drop CONFIG_MTD_M25P80 that was removed in
>> commit b35b9a10362d ("mtd: spi-nor: Move m25p80 code in spi-nor.c")
>>
>> Signed-off-by: Bin Meng <bin.meng@windriver.com>
>>
>> ---
>>
>> Changes in v2:
>> - correct the typo (5xx => 85xx) in the commit title
>>
>>  arch/powerpc/configs/85xx-hw.config | 1 -
>>  1 file changed, 1 deletion(-)
>>
>
> It seems this patch isn't applied anywhere. Ping?

I'll grab it.

cheers

^ permalink raw reply

* Re: [PATCH 01/11] kexec_file: allow archs to handle special regions while locating memory hole
From: Hari Bathini @ 2020-06-29  6:00 UTC (permalink / raw)
  To: piliu, Michael Ellerman, Andrew Morton
  Cc: Kexec-ml, Petr Tesarik, Mahesh J Salgaonkar, Sourabh Jain, lkml,
	linuxppc-dev, Mimi Zohar, Vivek Goyal, Dave Young,
	Thiago Jung Bauermann, Eric Biederman
In-Reply-To: <ed94a357-16aa-9f17-aa5f-5aab6617ed68@redhat.com>



On 28/06/20 7:58 am, piliu wrote:
> Hi Hari,
> 
> If in [4/11],  get_exclude_memory_ranges() turns out to be unnecessary
> ,then this patch is abundant either. As my understanding, memblock has
> already helped to achieved the purpose that get_exclude_memory_ranges()
> wants.

As mentioned in the other patch, there is a need for @exclude_ranges as crashkernel
region is likely to have an overlap with regions like opal, rtas..

But yeah.. the weak function should have been kexec_locate_mem_hole() instead
of kexec_add_buffer(). Will take care of that in v2.

> On 06/27/2020 03:04 AM, Hari Bathini wrote:
>> Some archs can have special memory regions, within the given memory
>> range, which can't be used for the buffer in a kexec segment. As
>> kexec_add_buffer() function is being called from generic code as well,
>> add weak arch_kexec_add_buffer definition for archs to override & take
>> care of special regions before trying to locate a memory hole.
>>
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>

Thanks
Hari

^ permalink raw reply

* Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions
From: Hari Bathini @ 2020-06-29  5:55 UTC (permalink / raw)
  To: piliu, Michael Ellerman, Andrew Morton
  Cc: Kexec-ml, Petr Tesarik, Mahesh J Salgaonkar, Sourabh Jain, lkml,
	linuxppc-dev, Mimi Zohar, Thiago Jung Bauermann, Dave Young,
	Vivek Goyal, Eric Biederman
In-Reply-To: <9cfda789-0747-a67a-b825-5ea6f15099b8@redhat.com>



On 28/06/20 7:44 am, piliu wrote:
> Hi Hari,

Hi Pingfan,

> 
> After a quick through for this series, I have a few question/comment on
> this patch for the time being. Pls see comment inline.
> 
> On 06/27/2020 03:05 AM, Hari Bathini wrote:
>> crashkernel region could have an overlap with special memory regions
>> like  opal, rtas, tce-table & such. These regions are referred to as
>> exclude memory ranges. Setup this ranges during image probe in order
>> to avoid them while finding the buffer for different kdump segments.

[...]

>> +	/*
>> +	 * Use the locate_mem_hole logic in kexec_add_buffer() for regular
>> +	 * kexec_file_load syscall
>> +	 */
>> +	if (kbuf->image->type != KEXEC_TYPE_CRASH)
>> +		return 0;
> Can the ranges overlap [crashk_res.start, crashk_res.end]?  Otherwise
> there is no requirement for @exclude_ranges.

The ranges like rtas, opal are loaded by f/w. They almost always overlap with
crashkernel region. So, @exclude_ranges is required to support kdump.

> I guess you have a design for future. If not true, then it is better to
> fold the condition "if (kbuf->image->type != KEXEC_TYPE_CRASH)" into the
> caller and rename this function to better distinguish use cases between
> kexec and kdump

Yeah, this condition will be folded. I have a follow-up patch for that explaining
why kexec case should also be folded. Will try to add that to this series for v2.

Thanks
Hari

^ permalink raw reply

* Re: [PATCH] kbuild: introduce ccflags-remove-y and asflags-remove-y
From: Michael Ellerman @ 2020-06-29  5:57 UTC (permalink / raw)
  To: Masahiro Yamada, linux-kbuild
  Cc: Michal Marek, Yoshinori Sato, linux-sh, Masahiro Yamada,
	linuxppc-dev, linux-kernel, Steven Rostedt, Russell King,
	Ingo Molnar, Paul Mackerras, Sami Tolvanen, Rich Felker,
	linux-arm-kernel
In-Reply-To: <20200628015041.1000002-1-masahiroy@kernel.org>

Masahiro Yamada <masahiroy@kernel.org> writes:
> CFLAGS_REMOVE_<file>.o works per object, that is, there is no
> convenient way to filter out flags for every object in a directory.
>
> Add ccflags-remove-y and asflags-remove-y to make it easily.
>
> Use ccflags-remove-y to clean up some Makefiles.
>
> Suggested-by: Sami Tolvanen <samitolvanen@google.com>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
>
>  arch/arm/boot/compressed/Makefile | 6 +-----
>  arch/powerpc/xmon/Makefile        | 3 +--
>  arch/sh/boot/compressed/Makefile  | 5 +----
>  kernel/trace/Makefile             | 4 ++--
>  lib/Makefile                      | 5 +----
>  scripts/Makefile.lib              | 4 ++--
>  6 files changed, 8 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
> index 89c76ca35640..55cbcdd88ac0 100644
> --- a/arch/powerpc/xmon/Makefile
> +++ b/arch/powerpc/xmon/Makefile
> @@ -7,8 +7,7 @@ UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>  
>  # Disable ftrace for the entire directory
> -ORIG_CFLAGS := $(KBUILD_CFLAGS)
> -KBUILD_CFLAGS = $(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS))
> +ccflags-remove-y += $(CC_FLAGS_FTRACE)

This could be:

ccflags-remove-$(CONFIG_FUNCTION_TRACER) += $(CC_FLAGS_FTRACE)

Similar to kernel/trace/Makefile below.

I don't mind though.

Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

cheers

> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index 6575bb0a0434..7492844a8b1b 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -2,9 +2,9 @@
>  
>  # Do not instrument the tracer itself:
>  
> +ccflags-remove-$(CONFIG_FUNCTION_TRACER) += $(CC_FLAGS_FTRACE)
> +
>  ifdef CONFIG_FUNCTION_TRACER
> -ORIG_CFLAGS := $(KBUILD_CFLAGS)
> -KBUILD_CFLAGS = $(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS))
>  
>  # Avoid recursion due to instrumentation.
>  KCSAN_SANITIZE := n

^ permalink raw reply

* Re: [PATCH 1/2] powerpc/64s: remove PROT_SAO support
From: Nicholas Piggin @ 2020-06-29  4:50 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman
In-Reply-To: <87d064g692.fsf@mpe.ellerman.id.au>

Excerpts from Michael Ellerman's message of June 12, 2020 4:14 pm:
> Nicholas Piggin <npiggin@gmail.com> writes:
>> ISA v3.1 does not support the SAO storage control attribute required to
>> implement PROT_SAO. PROT_SAO was used by specialised system software
>> (Lx86) that has been discontinued for about 7 years, and is not thought
>> to be used elsewhere, so removal should not cause problems.
>>
>> We rather remove it than keep support for older processors, because
>> live migrating guest partitions to newer processors may not be possible
>> if SAO is in use.
> 

Thakns for the review, sorry got distracted...

> They key details being:
>  - you don't remove PROT_SAO from the uapi header, so code using the
>    definition will still build.
>  - you change arch_validate_prot() to reject PROT_SAO, which means code
>    using it will see a failure from mmap() at runtime.

Yes.

> This obviously risks breaking userspace, even if we think it won't in
> practice. I guess we don't really have any option given the hardware
> support is being dropped.
> 
> Can you repost with a wider Cc list, including linux-mm and linux-arch?

Will do.

> I wonder if we should add a comment to the uapi header, eg?
> 
> diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
> index c0c737215b00..d4fdbe768997 100644
> --- a/arch/powerpc/include/uapi/asm/mman.h
> +++ b/arch/powerpc/include/uapi/asm/mman.h
> @@ -11,7 +11,7 @@
>  #include <asm-generic/mman-common.h>
>  
>  
> -#define PROT_SAO	0x10		/* Strong Access Ordering */
> +#define PROT_SAO	0x10		/* Unsupported since v5.9 */
>  
>  #define MAP_RENAME      MAP_ANONYMOUS   /* In SunOS terminology */
>  #define MAP_NORESERVE   0x40            /* don't reserve swap pages */

Yeah that makes sense.

>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index f17442c3a092..d9e92586f8dc 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -20,9 +20,13 @@
>>  #define _PAGE_RW		(_PAGE_READ | _PAGE_WRITE)
>>  #define _PAGE_RWX		(_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
>>  #define _PAGE_PRIVILEGED	0x00008 /* kernel access only */
>> -#define _PAGE_SAO		0x00010 /* Strong access order */
>> +
>> +#define _PAGE_CACHE_CTL		0x00030 /* Bits for the folowing cache modes */
>> +			/*	No bits set is normal cacheable memory */
>> +			/*	0x00010 unused, is SAO bit on radix POWER9 */
>>  #define _PAGE_NON_IDEMPOTENT	0x00020 /* non idempotent memory */
>>  #define _PAGE_TOLERANT		0x00030 /* tolerant memory, cache inhibited */
>> +
> 
> Why'd you do it that way vs just dropping _PAGE_SAO from the or below?

Just didn't like _PAGE_CACHE_CTL depending on values of the variants 
that we use.

>> diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
>> index bac2252c839e..c7e923b0000a 100644
>> --- a/arch/powerpc/include/asm/cputable.h
>> +++ b/arch/powerpc/include/asm/cputable.h
>> @@ -191,7 +191,6 @@ static inline void cpu_feature_keys_init(void) { }
>>  #define CPU_FTR_SPURR			LONG_ASM_CONST(0x0000000001000000)
>>  #define CPU_FTR_DSCR			LONG_ASM_CONST(0x0000000002000000)
>>  #define CPU_FTR_VSX			LONG_ASM_CONST(0x0000000004000000)
>> -#define CPU_FTR_SAO			LONG_ASM_CONST(0x0000000008000000)
> 
> Can you do:
> 
> +// Free				LONG_ASM_CONST(0x0000000008000000)

Yes.

> 
>> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
>> index 9bb9bb370b53..579c9229124b 100644
>> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
>> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
>> @@ -400,7 +400,8 @@ static inline bool hpte_cache_flags_ok(unsigned long hptel, bool is_ci)
>>  
>>  	/* Handle SAO */
>>  	if (wimg == (HPTE_R_W | HPTE_R_I | HPTE_R_M) &&
>> -	    cpu_has_feature(CPU_FTR_ARCH_206))
>> +	    cpu_has_feature(CPU_FTR_ARCH_206) &&
>> +	    !cpu_has_feature(CPU_FTR_ARCH_31))
>>  		wimg = HPTE_R_M;
> 
> Shouldn't it reject that combination if the host can't support it?
> 
> Or I guess it does, but yikes that code is not clear.

Yeah, took me a bit to work that out.

>> diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
>> index d610c2e07b28..43a62f3e21a0 100644
>> --- a/arch/powerpc/include/asm/mman.h
>> +++ b/arch/powerpc/include/asm/mman.h
>> @@ -13,38 +13,24 @@
>>  #include <linux/pkeys.h>
>>  #include <asm/cpu_has_feature.h>
>>  
>> -/*
>> - * This file is included by linux/mman.h, so we can't use cacl_vm_prot_bits()
>> - * here.  How important is the optimization?
>> - */
> 
> This comment seems confused, but also unrelated to this patch?

Yeah.
 
>> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> index 3a409517c031..8d2e4043702f 100644
>> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
>> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> @@ -622,7 +622,7 @@ static struct dt_cpu_feature_match __initdata
>>  	{"processor-control-facility-v3", feat_enable_dbell, CPU_FTR_DBELL},
>>  	{"processor-utilization-of-resources-register", feat_enable_purr, 0},
>>  	{"no-execute", feat_enable, 0},
>> -	{"strong-access-ordering", feat_enable, CPU_FTR_SAO},
>> +	{"strong-access-ordering", feat_enable, 0},
> 
> Would it make more sense to drop it entirely? Or leave it commented out.

Probably would, yes.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v2 1/3] powerpc/mm: Enable radix GTSE only if supported.
From: Bharata B Rao @ 2020-06-29  4:31 UTC (permalink / raw)
  To: Murilo Opsfelder Araújo; +Cc: aneesh.kumar, linuxppc-dev, npiggin
In-Reply-To: <20200626205530.GA23269@kermit.br.ibm.com>

On Fri, Jun 26, 2020 at 05:55:30PM -0300, Murilo Opsfelder Araújo wrote:
> > diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> > index bc73abf0bc25..152aa0200cef 100644
> > --- a/arch/powerpc/mm/init_64.c
> > +++ b/arch/powerpc/mm/init_64.c
> > @@ -407,12 +407,15 @@ static void __init early_check_vec5(void)
> >  		if (!(vec5[OV5_INDX(OV5_RADIX_GTSE)] &
> >  						OV5_FEAT(OV5_RADIX_GTSE))) {
> >  			pr_warn("WARNING: Hypervisor doesn't support RADIX with GTSE\n");
> > -		}
> > +			cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
> > +		} else
> > +			cur_cpu_spec->mmu_features |= MMU_FTR_GTSE;
> >  		/* Do radix anyway - the hypervisor said we had to */
> >  		cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;
> >  	} else if (mmu_supported == OV5_FEAT(OV5_MMU_HASH)) {
> >  		/* Hypervisor only supports hash - disable radix */
> >  		cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
> > +		cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
> >  	}
> >  }
> 
> Is this a part of the code where mmu_clear_feature() cannot be used?

Yes, it appears so. Jump label initialization isn't done yet.

Regards,
Bharata.

^ permalink raw reply

* Re: [PATCH v3 0/4] Migrate non-migrated pages of a SVM.
From: Bharata B Rao @ 2020-06-29  1:53 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david
In-Reply-To: <20200628161149.GA27215@in.ibm.com>

On Sun, Jun 28, 2020 at 09:41:53PM +0530, Bharata B Rao wrote:
> On Fri, Jun 19, 2020 at 03:43:38PM -0700, Ram Pai wrote:
> > The time taken to switch a VM to Secure-VM, increases by the size of the VM.  A
> > 100GB VM takes about 7minutes. This is unacceptable.  This linear increase is
> > caused by a suboptimal behavior by the Ultravisor and the Hypervisor.  The
> > Ultravisor unnecessarily migrates all the GFN of the VM from normal-memory to
> > secure-memory. It has to just migrate the necessary and sufficient GFNs.
> > 
> > However when the optimization is incorporated in the Ultravisor, the Hypervisor
> > starts misbehaving. The Hypervisor has a inbuilt assumption that the Ultravisor
> > will explicitly request to migrate, each and every GFN of the VM. If only
> > necessary and sufficient GFNs are requested for migration, the Hypervisor
> > continues to manage the remaining GFNs as normal GFNs. This leads of memory
> > corruption, manifested consistently when the SVM reboots.
> > 
> > The same is true, when a memory slot is hotplugged into a SVM. The Hypervisor
> > expects the ultravisor to request migration of all GFNs to secure-GFN.  But at
> > the same time, the hypervisor is unable to handle any H_SVM_PAGE_IN requests
> > from the Ultravisor, done in the context of UV_REGISTER_MEM_SLOT ucall.  This
> > problem manifests as random errors in the SVM, when a memory-slot is
> > hotplugged.
> > 
> > This patch series automatically migrates the non-migrated pages of a SVM,
> >      and thus solves the problem.
> 
> So this is what I understand as the objective of this patchset:
> 
> 1. Getting all the pages into the secure memory right when the guest
>    transitions into secure mode is expensive. Ultravisor wants to just get
>    the necessary and sufficient pages in and put the onus on the Hypervisor
>    to mark the remaining pages (w/o actual page-in) as secure during
>    H_SVM_INIT_DONE.
> 2. During H_SVM_INIT_DONE, you want a way to differentiate the pages that
>    are already secure from the pages that are shared and that are paged-out.
>    For this you are introducing all these new states in HV.
> 
> UV knows about the shared GFNs and maintains the state of the same. Hence
> let HV send all the pages (minus already secured pages) via H_SVM_PAGE_IN
> and if UV finds any shared pages in them, let it fail the uv-page-in call.
> Then HV can fail the migration for it  and the page continues to remain
> shared. With this, you don't need to maintain a state for secured GFN in HV.
> 
> In the unlikely case of sending a paged-out page to UV during
> H_SVM_INIT_DONE, let the page-in succeed and HV will fault on it again
> if required. With this, you don't need a state in HV to identify a
> paged-out-but-encrypted state.
> 
> Doesn't the above work?

I see that you want to infact skip the uv-page-in calls from H_SVM_INIT_DONE.
So that would need the extra states in HV which you are proposing here.

Regards,
Bharata.

^ permalink raw reply

* Re: [PATCH] kbuild: introduce ccflags-remove-y and asflags-remove-y
From: Steven Rostedt @ 2020-06-28 19:07 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Michal Marek, Yoshinori Sato, linux-kbuild, linux-sh,
	linux-kernel, Russell King, linuxppc-dev, Ingo Molnar,
	Paul Mackerras, Sami Tolvanen, Rich Felker, linux-arm-kernel
In-Reply-To: <20200628015041.1000002-1-masahiroy@kernel.org>

On Sun, 28 Jun 2020 10:50:41 +0900
Masahiro Yamada <masahiroy@kernel.org> wrote:

> CFLAGS_REMOVE_<file>.o works per object, that is, there is no
> convenient way to filter out flags for every object in a directory.
> 
> Add ccflags-remove-y and asflags-remove-y to make it easily.
> 
> Use ccflags-remove-y to clean up some Makefiles.
> 
> Suggested-by: Sami Tolvanen <samitolvanen@google.com>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---

Nice feature!

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply

* Re: [PATCH v3 3/4] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE
From: Bharata B Rao @ 2020-06-28 16:20 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david
In-Reply-To: <1592606622-29884-4-git-send-email-linuxram@us.ibm.com>

On Fri, Jun 19, 2020 at 03:43:41PM -0700, Ram Pai wrote:
> H_SVM_INIT_DONE incorrectly assumes that the Ultravisor has explicitly

As noted in the last iteration, can you reword the above please?
I don't see it as an incorrect assumption, but see it as extension of
scope now :-)

> called H_SVM_PAGE_IN for all secure pages. These GFNs continue to be
> normal GFNs associated with normal PFNs; when infact, these GFNs should
> have been secure GFNs, associated with device PFNs.
> 
> Move all the PFNs associated with the SVM's GFNs, to secure-PFNs, in
> H_SVM_INIT_DONE. Skip the GFNs that are already Paged-in or Shared
> through H_SVM_PAGE_IN, or Paged-in followed by a Paged-out through
> UV_PAGE_OUT.
> 
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Bharata B Rao <bharata@linux.ibm.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Cc: Claudio Carvalho <cclaudio@linux.ibm.com>
> Cc: kvm-ppc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> ---
>  Documentation/powerpc/ultravisor.rst        |   2 +
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
>  arch/powerpc/kvm/book3s_hv_uvmem.c          | 154 +++++++++++++++++++++++-----
>  3 files changed, 132 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst
> index 363736d..3bc8957 100644
> --- a/Documentation/powerpc/ultravisor.rst
> +++ b/Documentation/powerpc/ultravisor.rst
> @@ -933,6 +933,8 @@ Return values
>  	* H_UNSUPPORTED		if called from the wrong context (e.g.
>  				from an SVM or before an H_SVM_INIT_START
>  				hypercall).
> +	* H_STATE		if the hypervisor could not successfully
> +                                transition the VM to Secure VM.
>  
>  Description
>  ~~~~~~~~~~~
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index 5a9834e..b9cd7eb 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -22,6 +22,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
>  unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
>  void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>  			     struct kvm *kvm, bool skip_page_out);
> +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> +			const struct kvm_memory_slot *memslot);
>  #else
>  static inline int kvmppc_uvmem_init(void)
>  {
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index c8c0290..449e8a7 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -93,6 +93,7 @@
>  #include <asm/ultravisor.h>
>  #include <asm/mman.h>
>  #include <asm/kvm_ppc.h>
> +#include <asm/kvm_book3s_uvmem.h>
>  
>  static struct dev_pagemap kvmppc_uvmem_pgmap;
>  static unsigned long *kvmppc_uvmem_bitmap;
> @@ -339,6 +340,21 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
>  	return false;
>  }
>  
> +/* return true, if the GFN is a shared-GFN, or a secure-GFN */
> +bool kvmppc_gfn_has_transitioned(unsigned long gfn, struct kvm *kvm)
> +{
> +	struct kvmppc_uvmem_slot *p;
> +
> +	list_for_each_entry(p, &kvm->arch.uvmem_pfns, list) {
> +		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
> +			unsigned long index = gfn - p->base_pfn;
> +
> +			return (p->pfns[index] & KVMPPC_GFN_FLAG_MASK);
> +		}
> +	}
> +	return false;
> +}
> +
>  unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  {
>  	struct kvm_memslots *slots;
> @@ -379,12 +395,31 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  
>  unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
>  {
> +	struct kvm_memslots *slots;
> +	struct kvm_memory_slot *memslot;
> +	int srcu_idx;
> +	long ret = H_SUCCESS;
> +
>  	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
>  		return H_UNSUPPORTED;
>  
> +	/* migrate any unmoved normal pfn to device pfns*/
> +	srcu_idx = srcu_read_lock(&kvm->srcu);
> +	slots = kvm_memslots(kvm);
> +	kvm_for_each_memslot(memslot, slots) {
> +		ret = kvmppc_uv_migrate_mem_slot(kvm, memslot);
> +		if (ret) {
> +			ret = H_STATE;
> +			goto out;
> +		}
> +	}
> +
>  	kvm->arch.secure_guest |= KVMPPC_SECURE_INIT_DONE;
>  	pr_info("LPID %d went secure\n", kvm->arch.lpid);
> -	return H_SUCCESS;
> +
> +out:
> +	srcu_read_unlock(&kvm->srcu, srcu_idx);
> +	return ret;
>  }
>  
>  /*
> @@ -505,12 +540,14 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
>  }
>  
>  /*
> - * Alloc a PFN from private device memory pool and copy page from normal
> - * memory to secure memory using UV_PAGE_IN uvcall.
> + * Alloc a PFN from private device memory pool. If @pagein is true,
> + * copy page from normal memory to secure memory using UV_PAGE_IN uvcall.
>   */
> -static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
> -		   unsigned long end, unsigned long gpa, struct kvm *kvm,
> -		   unsigned long page_shift, bool *downgrade)
> +static int kvmppc_svm_migrate_page(struct vm_area_struct *vma,
> +		unsigned long start,
> +		unsigned long end, unsigned long gpa, struct kvm *kvm,
> +		unsigned long page_shift,
> +		bool pagein)
>  {
>  	unsigned long src_pfn, dst_pfn = 0;
>  	struct migrate_vma mig;
> @@ -526,18 +563,6 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
>  	mig.src = &src_pfn;
>  	mig.dst = &dst_pfn;
>  
> -	/*
> -	 * We come here with mmap_sem write lock held just for
> -	 * ksm_madvise(), otherwise we only need read mmap_sem.
> -	 * Hence downgrade to read lock once ksm_madvise() is done.
> -	 */

Can you please retain this comment? This explains why we take write lock
and then downgrade.

> -	ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> -			  MADV_UNMERGEABLE, &vma->vm_flags);
> -	downgrade_write(&kvm->mm->mmap_sem);
> -	*downgrade = true;

When I introduced this variable, there was a suggestion to rename it
to "downgraded", but we were a bit late then. When you are touching
this now, can you rename it appropriately?

> -	if (ret)
> -		return ret;
> -
>  	ret = migrate_vma_setup(&mig);
>  	if (ret)
>  		return ret;
> @@ -553,11 +578,16 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
>  		goto out_finalize;
>  	}
>  
> -	pfn = *mig.src >> MIGRATE_PFN_SHIFT;
> -	spage = migrate_pfn_to_page(*mig.src);
> -	if (spage)
> -		uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0,
> -			   page_shift);
> +	if (pagein) {
> +		pfn = *mig.src >> MIGRATE_PFN_SHIFT;
> +		spage = migrate_pfn_to_page(*mig.src);
> +		if (spage) {
> +			ret = uv_page_in(kvm->arch.lpid, pfn << page_shift,
> +					gpa, 0, page_shift);
> +			if (ret)
> +				goto out_finalize;
> +		}
> +	}
>  
>  	*mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
>  	migrate_vma_pages(&mig);
> @@ -566,6 +596,66 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
>  	return ret;
>  }
>  
> +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> +		const struct kvm_memory_slot *memslot)
> +{
> +	unsigned long gfn = memslot->base_gfn;
> +	unsigned long end;
> +	bool downgrade = false;
> +	struct vm_area_struct *vma;
> +	int i, ret = 0;
> +	unsigned long start = gfn_to_hva(kvm, gfn);
> +
> +	if (kvm_is_error_hva(start))
> +		return H_STATE;
> +
> +	end = start + (memslot->npages << PAGE_SHIFT);
> +
> +	down_write(&kvm->mm->mmap_sem);
> +
> +	mutex_lock(&kvm->arch.uvmem_lock);
> +	vma = find_vma_intersection(kvm->mm, start, end);
> +	if (!vma || vma->vm_start > start || vma->vm_end < end) {
> +		ret = H_STATE;
> +		goto out_unlock;
> +	}
> +
> +	ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +			  MADV_UNMERGEABLE, &vma->vm_flags);
> +	downgrade_write(&kvm->mm->mmap_sem);
> +	downgrade = true;
> +	if (ret) {
> +		ret = H_STATE;
> +		goto out_unlock;
> +	}
> +
> +	for (i = 0; i < memslot->npages; i++, ++gfn) {
> +		/*
> +		 * skip GFNs that have already tranistioned.
> +		 * paged-in GFNs, shared GFNs, paged-in GFNs
> +		 * that were later paged-out.
> +		 */
> +		if (kvmppc_gfn_has_transitioned(gfn, kvm))
> +			continue;
> +
> +		start = gfn_to_hva(kvm, gfn);
> +		end = start + (1UL << PAGE_SHIFT);
> +		ret = kvmppc_svm_migrate_page(vma, start, end,
> +			(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
> +

As I said last time, you are assuming that the vma that you obtained
in the beginning actually spans the entire memslot range. This might
be true as you haven't found any issues during testing, but I feel it
is better if there is no such implicit assumption in the code here.

Regards,
Bharata.

^ permalink raw reply

* Re: [PATCH v3 0/4] Migrate non-migrated pages of a SVM.
From: Bharata B Rao @ 2020-06-28 16:11 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david
In-Reply-To: <1592606622-29884-1-git-send-email-linuxram@us.ibm.com>

On Fri, Jun 19, 2020 at 03:43:38PM -0700, Ram Pai wrote:
> The time taken to switch a VM to Secure-VM, increases by the size of the VM.  A
> 100GB VM takes about 7minutes. This is unacceptable.  This linear increase is
> caused by a suboptimal behavior by the Ultravisor and the Hypervisor.  The
> Ultravisor unnecessarily migrates all the GFN of the VM from normal-memory to
> secure-memory. It has to just migrate the necessary and sufficient GFNs.
> 
> However when the optimization is incorporated in the Ultravisor, the Hypervisor
> starts misbehaving. The Hypervisor has a inbuilt assumption that the Ultravisor
> will explicitly request to migrate, each and every GFN of the VM. If only
> necessary and sufficient GFNs are requested for migration, the Hypervisor
> continues to manage the remaining GFNs as normal GFNs. This leads of memory
> corruption, manifested consistently when the SVM reboots.
> 
> The same is true, when a memory slot is hotplugged into a SVM. The Hypervisor
> expects the ultravisor to request migration of all GFNs to secure-GFN.  But at
> the same time, the hypervisor is unable to handle any H_SVM_PAGE_IN requests
> from the Ultravisor, done in the context of UV_REGISTER_MEM_SLOT ucall.  This
> problem manifests as random errors in the SVM, when a memory-slot is
> hotplugged.
> 
> This patch series automatically migrates the non-migrated pages of a SVM,
>      and thus solves the problem.

So this is what I understand as the objective of this patchset:

1. Getting all the pages into the secure memory right when the guest
   transitions into secure mode is expensive. Ultravisor wants to just get
   the necessary and sufficient pages in and put the onus on the Hypervisor
   to mark the remaining pages (w/o actual page-in) as secure during
   H_SVM_INIT_DONE.
2. During H_SVM_INIT_DONE, you want a way to differentiate the pages that
   are already secure from the pages that are shared and that are paged-out.
   For this you are introducing all these new states in HV.

UV knows about the shared GFNs and maintains the state of the same. Hence
let HV send all the pages (minus already secured pages) via H_SVM_PAGE_IN
and if UV finds any shared pages in them, let it fail the uv-page-in call.
Then HV can fail the migration for it  and the page continues to remain
shared. With this, you don't need to maintain a state for secured GFN in HV.

In the unlikely case of sending a paged-out page to UV during
H_SVM_INIT_DONE, let the page-in succeed and HV will fault on it again
if required. With this, you don't need a state in HV to identify a
paged-out-but-encrypted state.

Doesn't the above work? If so, we can avoid all those extra states
in HV. That way HV can continue to differentiate only between two types
of pages: secure and not-secure. The rest of the states (shared,
paged-out-encrypted) actually belong to SVM/UV and let UV take care of
them.

Or did I miss something?

Regards,
Bharata.

^ permalink raw reply

* Re: [PATCH 4/8] asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()
From: Mike Rapoport @ 2020-06-28  7:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-ia64, linux-sh, Peter Zijlstra, linux-mips, Max Filippov,
	Satheesh Rajendran, linux-csky, sparclinux, linux-riscv,
	linux-arch, Stephen Rothwell, linux-hexagon, Joerg Roedel,
	Mike Rapoport, Abdul Haleem, linux-snps-arc, linux-xtensa,
	Arnd Bergmann, linux-s390, linux-um, Steven Rostedt, linux-m68k,
	openrisc, Andy Lutomirski, Stafford Horne, linux-arm-kernel,
	linux-parisc, linux-mm, linux-kernel, linux-alpha, Andrew Morton,
	linuxppc-dev
In-Reply-To: <20200627190304.GG25039@casper.infradead.org>

On Sat, Jun 27, 2020 at 08:03:04PM +0100, Matthew Wilcox wrote:
> On Sat, Jun 27, 2020 at 05:34:49PM +0300, Mike Rapoport wrote:
> > More elaborate versions on arm64 and x86 account memory for the user page
> > tables and call to pgtable_pmd_page_ctor() as the part of PMD page
> > initialization.
> > 
> > Move the arm64 version to include/asm-generic/pgalloc.h and use the generic
> > version on several architectures.
> > 
> > The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is
> > not enabled, so there is no functional change for most architectures except
> > of the addition of __GFP_ACCOUNT for allocation of user page tables.
> 
> Thanks for including this line; it reminded me that we're not setting
> the PageTable flag on the page, nor accounting it to the zone page stats.
> Hope you don't mind me tagging a patch to do that on as 9/8.
> 
> We could also do with a pud_page_[cd]tor and maybe even p4d/pgd versions.
> But that brings me to the next question -- could/should some of this
> be moved over to asm-generic/pgalloc.h?  The ctor/dtor aren't called
> from anywhere else, and there's value to reducing the total amount of
> code in mm.h, but then there's also value to keeping all the ifdef
> ARCH_ENABLE_SPLIT_PMD_PTLOCK code together too.  So I'm a bit torn.
> What do you think?

There are arhcitectures that don't use asm-generic/pgalloc.h but rather
have their own, sometimes completely different, versoins of these
funcitons.

I've tried adding linux/pgalloc.h, but I've ended up with contradicting
need to include asm/pgalloc.h before the generic code for some
architecures or after the generic code for others :)

I think let's leave it in mm.h for now, maybe after several more cleaups
we could do better.

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH 9/8] mm: Account PMD tables like PTE tables
From: Mike Rapoport @ 2020-06-28  6:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-ia64, linux-sh, Peter Zijlstra, linux-mips, Max Filippov,
	Satheesh Rajendran, linux-csky, sparclinux, linux-riscv,
	linux-arch, Stephen Rothwell, linux-hexagon, Joerg Roedel,
	Mike Rapoport, Abdul Haleem, linux-snps-arc, linux-xtensa,
	Arnd Bergmann, linux-s390, linux-um, Steven Rostedt, linux-m68k,
	openrisc, Andy Lutomirski, Stafford Horne, linux-arm-kernel,
	linux-parisc, linux-mm, linux-kernel, linux-alpha, Andrew Morton,
	linuxppc-dev
In-Reply-To: <20200627184642.GF25039@casper.infradead.org>

On Sat, Jun 27, 2020 at 07:46:42PM +0100, Matthew Wilcox wrote:
> We account the PTE level of the page tables to the process in order to
> make smarter OOM decisions and help diagnose why memory is fragmented.
> For these same reasons, we should account pages allocated for PMDs.
> With larger process address spaces and ASLR, the number of PMDs in use
> is higher than it used to be so the inaccuracy is starting to matter.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  include/linux/mm.h | 24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index dc7b87310c10..b283e25fcffa 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2271,7 +2271,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  	return ptlock_ptr(pmd_to_page(pmd));
>  }
>  
> -static inline bool pgtable_pmd_page_ctor(struct page *page)
> +static inline bool pmd_ptlock_init(struct page *page)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  	page->pmd_huge_pte = NULL;
> @@ -2279,7 +2279,7 @@ static inline bool pgtable_pmd_page_ctor(struct page *page)
>  	return ptlock_init(page);
>  }
>  
> -static inline void pgtable_pmd_page_dtor(struct page *page)
> +static inline void pmd_ptlock_free(struct page *page)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  	VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
> @@ -2296,8 +2296,8 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  	return &mm->page_table_lock;
>  }
>  
> -static inline bool pgtable_pmd_page_ctor(struct page *page) { return true; }
> -static inline void pgtable_pmd_page_dtor(struct page *page) {}
> +static inline bool pmd_ptlock_init(struct page *page) { return true; }
> +static inline void pmd_ptlock_free(struct page *page) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
>  
> @@ -2310,6 +2310,22 @@ static inline spinlock_t *pmd_lock(struct mm_struct *mm, pmd_t *pmd)
>  	return ptl;
>  }
>  
> +static inline bool pgtable_pmd_page_ctor(struct page *page)
> +{
> +	if (!pmd_ptlock_init(page))
> +		return false;
> +	__SetPageTable(page);
> +	inc_zone_page_state(page, NR_PAGETABLE);
> +	return true;
> +}
> +
> +static inline void pgtable_pmd_page_dtor(struct page *page)
> +{
> +	pmd_ptlock_free(page);
> +	__ClearPageTable(page);
> +	dec_zone_page_state(page, NR_PAGETABLE);
> +}
> +
>  /*
>   * No scalability reason to split PUD locks yet, but follow the same pattern
>   * as the PMD locks to make it easier if we decide to.  The VM should not be
> -- 
> 2.27.0
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* [PATCH 4/9] macintosh/via-macii: Remove read_done state
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

The driver state machine may enter the 'read_done' state when leaving the
'idle' or 'reading' state. This transition is pointless, as is the extra
interrupt it requires. The interrupt is produced by the transceiver
(even when it has no data to send) because an extra EVEN/ODD toggle
was signalled by the driver. Drop the extra state to simplify the code.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") # v5.0+
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 70 ++++++++++++++---------------------
 1 file changed, 28 insertions(+), 42 deletions(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index 6a5cd7de05baf..d29c87943ca46 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -110,7 +110,6 @@ static enum macii_state {
 	idle,
 	sending,
 	reading,
-	read_done,
 } macii_state;
 
 static struct adb_request *current_req; /* first request struct in the queue */
@@ -411,8 +410,8 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 			reply_len = 1;
 		} else {
 			/* bus timeout */
-			macii_state = read_done;
 			reply_len = 0;
+			break;
 		}
 
 		/* set ADB state = even for first data byte */
@@ -471,20 +470,6 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 				current_req = req->next;
 				if (req->done)
 					(*req->done)(req);
-
-				if (!current_req)
-					macii_queue_poll();
-
-				if (current_req && macii_state == idle)
-					macii_start();
-
-				if (macii_state == idle) {
-					/* reset to shift in */
-					via[ACR] &= ~SR_OUT;
-					x = via[SR];
-					/* set ADB state idle - might get SRQ */
-					via[B] = (via[B] & ~ST_MASK) | ST_IDLE;
-				}
 				break;
 			}
 		} else {
@@ -511,12 +496,28 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 			} else if (status == ST_ODD && reply_len == 2) {
 				srq_asserted = true;
 			} else {
-				macii_state = read_done;
+				macii_state = idle;
+
+				if (bus_timeout)
+					reply_len = 0;
+
+				if (reading_reply) {
+					struct adb_request *req = current_req;
+
+					req->reply_len = reply_len;
+
+					req->complete = 1;
+					current_req = req->next;
+					if (req->done)
+						(*req->done)(req);
+				} else if (reply_len && autopoll_devs) {
+					adb_input(reply_buf, reply_len, 0);
+				}
+				break;
 			}
 		}
 
-		if (macii_state == reading &&
-		    reply_len < ARRAY_SIZE(reply_buf)) {
+		if (reply_len < ARRAY_SIZE(reply_buf)) {
 			reply_ptr++;
 			*reply_ptr = x;
 			reply_len++;
@@ -526,37 +527,22 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 		via[B] ^= ST_MASK;
 		break;
 
-	case read_done:
-		x = via[SR];
-
-		if (bus_timeout)
-			reply_len = 0;
-
-		if (reading_reply) {
-			reading_reply = 0;
-			req = current_req;
-			req->reply_len = reply_len;
-			req->complete = 1;
-			current_req = req->next;
-			if (req->done)
-				(*req->done)(req);
-		} else if (reply_len && autopoll_devs)
-			adb_input(reply_buf, reply_len, 0);
-
-		macii_state = idle;
+	default:
+		break;
+	}
 
+	if (macii_state == idle) {
 		if (!current_req)
 			macii_queue_poll();
 
 		if (current_req)
 			macii_start();
 
-		if (macii_state == idle)
+		if (macii_state == idle) {
+			via[ACR] &= ~SR_OUT;
+			x = via[SR];
 			via[B] = (via[B] & ~ST_MASK) | ST_IDLE;
-		break;
-
-	default:
-		break;
+		}
 	}
 
 	local_irq_restore(flags);
-- 
2.26.2


^ permalink raw reply related

* [PATCH 9/9] macintosh/via-macii: Clarify definition of macii_init()
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

The function prototype correctly specifies the 'static' storage class.
Let the function definition match the declaration for better readability.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index 2f9be4ec7d345..060e03f2264bc 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -140,7 +140,7 @@ static int macii_probe(void)
 }
 
 /* Initialize the driver */
-int macii_init(void)
+static int macii_init(void)
 {
 	unsigned long flags;
 	int err;
-- 
2.26.2


^ permalink raw reply related

* [PATCH 0/9] Macintosh II ADB driver fixes
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson

Various issues with the via-macii driver have become apparent over the
years. Some examples:

 - A Talk command response can be lost. This can result in phantom devices
being probed or an incorrect device handler ID being retrieved.

 - A reply packet containing a null byte can get truncated. Such packets
are sometimes generated by ADB keyboards.

 - A Talk Register 3 reply from device 15 (that is, command byte 0xFF)
can be mistaken for a bus timeout (empty packet).

This patch series contains fixes for all known bugs in the via-macii
driver, plus a few code style improvements. It has been successfully
tested on an Apple Centris 650 and qemu-system-m68k.

The patched kernel does regress on past QEMU releases, due to ADB
transceiver emulation bugs. Those bugs have been fixed in mainline QEMU.
My thanks go to Mark Cave-Ayland for that effort and for figuring out
the improvements to the signalling between the VIA and the transceiver.

Note to -stable maintainers: these fixes can be cherry-picked without
difficulty, if you have the 5 commits that appeared in v5.0:

b52dce8738938 macintosh/via-macii: Synchronous bus reset
5f93d7081a47e macintosh/via-macii: Remove BUG_ON assertions
5ce6185c2ef4e macintosh/via-macii: Simplify locking
351e5ad327d07 macintosh/via-macii, macintosh/adb-iop: Modernize printk calls
47fd2060660e6 macintosh/via-macii, macintosh/adb-iop: Clean up whitespace

Just for the sake of simplicity, the 'fixes' tags in this series limit
backporting to 'v5.0+'.


Finn Thain (9):
  macintosh/via-macii: Access autopoll_devs when inside lock
  macintosh/via-macii: Poll the device most likely to respond
  macintosh/via-macii: Handle /CTLR_IRQ signal correctly
  macintosh/via-macii: Remove read_done state
  macintosh/via-macii: Handle poll replies correctly
  macintosh/via-macii: Use bool type for reading_reply variable
  macintosh/via-macii: Use unsigned type for autopoll_devs variable
  macintosh/via-macii: Use the stack for reset request storage
  macintosh/via-macii: Clarify definition of macii_init()

 drivers/macintosh/via-macii.c | 324 +++++++++++++++++++---------------
 1 file changed, 179 insertions(+), 145 deletions(-)

-- 
2.26.2


^ permalink raw reply

* [PATCH 8/9] macintosh/via-macii: Use the stack for reset request storage
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

The adb_request struct can be stored on the stack because the request
is synchronous and is completed before the function returns.

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index 447273967e1e8..2f9be4ec7d345 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -313,7 +313,7 @@ static void macii_poll(void)
 /* Reset the bus */
 static int macii_reset_bus(void)
 {
-	static struct adb_request req;
+	struct adb_request req;
 
 	/* Command = 0, Address = ignored */
 	adb_request(&req, NULL, ADBREQ_NOSEND, 1, ADB_BUSRESET);
-- 
2.26.2


^ permalink raw reply related

* [PATCH 7/9] macintosh/via-macii: Use unsigned type for autopoll_devs variable
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index e143ddb81de34..447273967e1e8 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -125,7 +125,7 @@ static bool srq_asserted;    /* have to poll for the device that asserted it */
 static u8 last_cmd;              /* the most recent command byte transmitted */
 static u8 last_talk_cmd;    /* the most recent Talk command byte transmitted */
 static u8 last_poll_cmd; /* the most recent Talk R0 command byte transmitted */
-static int autopoll_devs;      /* bits set are device addresses to be polled */
+static unsigned int autopoll_devs;  /* bits set are device addresses to poll */
 
 /* Check for MacII style ADB */
 static int macii_probe(void)
@@ -291,7 +291,7 @@ static int macii_autopoll(int devs)
 	local_irq_save(flags);
 
 	/* bit 1 == device 1, and so on. */
-	autopoll_devs = devs & 0xFFFE;
+	autopoll_devs = (unsigned int)devs & 0xFFFE;
 
 	if (!current_req) {
 		macii_queue_poll();
-- 
2.26.2


^ permalink raw reply related

* [PATCH 6/9] macintosh/via-macii: Use bool type for reading_reply variable
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index 8d5ef77b4a435..e143ddb81de34 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -116,7 +116,7 @@ static struct adb_request *current_req; /* first request struct in the queue */
 static struct adb_request *last_req;     /* last request struct in the queue */
 static unsigned char reply_buf[16];        /* storage for autopolled replies */
 static unsigned char *reply_ptr;     /* next byte in reply_buf or req->reply */
-static int reading_reply;        /* store reply in reply_buf else req->reply */
+static bool reading_reply;       /* store reply in reply_buf else req->reply */
 static int data_index;      /* index of the next byte to send from req->data */
 static int reply_len; /* number of bytes received in reply_buf or req->reply */
 static int status;          /* VIA's ADB status bits captured upon interrupt */
@@ -394,7 +394,7 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 		WARN_ON((status & ST_MASK) != ST_IDLE);
 
 		reply_ptr = reply_buf;
-		reading_reply = 0;
+		reading_reply = false;
 
 		bus_timeout = false;
 		srq_asserted = false;
@@ -442,7 +442,7 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 			 */
 			macii_state = reading;
 
-			reading_reply = 0;
+			reading_reply = false;
 			reply_ptr = reply_buf;
 			*reply_ptr = last_talk_cmd;
 			reply_len = 1;
@@ -456,7 +456,7 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 			if (req->reply_expected) {
 				macii_state = reading;
 
-				reading_reply = 1;
+				reading_reply = true;
 				reply_ptr = req->reply;
 				*reply_ptr = req->data[1];
 				reply_len = 1;
@@ -466,7 +466,7 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 			} else if ((req->data[1] & OP_MASK) == TALK) {
 				macii_state = reading;
 
-				reading_reply = 0;
+				reading_reply = false;
 				reply_ptr = reply_buf;
 				*reply_ptr = req->data[1];
 				reply_len = 1;
-- 
2.26.2


^ permalink raw reply related

* [PATCH 5/9] macintosh/via-macii: Handle poll replies correctly
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

Userspace applications may use /dev/adb to send Talk requests. Such
requests always have req->reply_expected == 1. The same is true of Talk
requests sent by the kernel, except for poll requests queued internally
by the via-macii driver. Those requests have req->reply_expected == 0.

Consequently, poll reply packets get treated like autopoll reply packets.
(It doesn't make sense to try to distinguish them.) Always enter 'reading'
state after a poll request, so that the reply gets collected and passed
to adb_input(), and none go missing.

All Talk replies passed to adb_input() come from polling or autopolling,
so call adb_input() with the autopoll parameter set to 1.

Fixes: d95fd5fce88f0 ("m68k: Mac II ADB fixes") # v5.0+
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index d29c87943ca46..8d5ef77b4a435 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -463,6 +463,21 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 
 				via[ACR] &= ~SR_OUT;
 				x = via[SR];
+			} else if ((req->data[1] & OP_MASK) == TALK) {
+				macii_state = reading;
+
+				reading_reply = 0;
+				reply_ptr = reply_buf;
+				*reply_ptr = req->data[1];
+				reply_len = 1;
+
+				via[ACR] &= ~SR_OUT;
+				x = via[SR];
+
+				req->complete = 1;
+				current_req = req->next;
+				if (req->done)
+					(*req->done)(req);
 			} else {
 				macii_state = idle;
 
@@ -510,8 +525,9 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 					current_req = req->next;
 					if (req->done)
 						(*req->done)(req);
-				} else if (reply_len && autopoll_devs) {
-					adb_input(reply_buf, reply_len, 0);
+				} else if (reply_len && autopoll_devs &&
+					   reply_buf[0] == last_poll_cmd) {
+					adb_input(reply_buf, reply_len, 1);
 				}
 				break;
 			}
-- 
2.26.2


^ permalink raw reply related

* [PATCH 3/9] macintosh/via-macii: Handle /CTLR_IRQ signal correctly
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

I'm told that the /CTLR_IRQ signal from the ADB transceiver gets
interpreted by MacOS to mean SRQ, bus timeout or end-of-packet depending
on the circumstances, and that Linux's via-macii driver does not
correctly interpret this signal.

Instead, the via-macii driver interprets certain received byte values
(0x00 and 0xFF) as signalling end of packet and bus timeout
(respectively). Problem is, those values can also appear under other
circumstances.

This patch changes the bus timeout, end of packet and SRQ detection logic
to bring it closer to the logic that MacOS reportedly uses.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") # v5.0+
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 166 ++++++++++++++++++++--------------
 1 file changed, 97 insertions(+), 69 deletions(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index d4f1a65c5f1fd..6a5cd7de05baf 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -80,6 +80,8 @@ static volatile unsigned char *via;
 /* ADB command byte structure */
 #define ADDR_MASK	0xF0
 #define CMD_MASK	0x0F
+#define OP_MASK		0x0C
+#define TALK		0x0C
 
 static int macii_init_via(void);
 static void macii_start(void);
@@ -119,9 +121,10 @@ static int reading_reply;        /* store reply in reply_buf else req->reply */
 static int data_index;      /* index of the next byte to send from req->data */
 static int reply_len; /* number of bytes received in reply_buf or req->reply */
 static int status;          /* VIA's ADB status bits captured upon interrupt */
-static int last_status;              /* status bits as at previous interrupt */
-static int srq_asserted;     /* have to poll for the device that asserted it */
+static bool bus_timeout;                   /* no data was sent by the device */
+static bool srq_asserted;    /* have to poll for the device that asserted it */
 static u8 last_cmd;              /* the most recent command byte transmitted */
+static u8 last_talk_cmd;    /* the most recent Talk command byte transmitted */
 static u8 last_poll_cmd; /* the most recent Talk R0 command byte transmitted */
 static int autopoll_devs;      /* bits set are device addresses to be polled */
 
@@ -170,7 +173,6 @@ static int macii_init_via(void)
 
 	/* Set up state: idle */
 	via[B] |= ST_IDLE;
-	last_status = via[B] & (ST_MASK | CTLR_IRQ);
 
 	/* Shift register on input */
 	via[ACR] = (via[ACR] & ~SR_CTRL) | SR_EXT;
@@ -336,13 +338,6 @@ static void macii_start(void)
 	 * And req->nbytes is the number of bytes of real data plus one.
 	 */
 
-	/* store command byte */
-	last_cmd = req->data[1];
-
-	/* If this is a Talk Register 0 command, store the command byte */
-	if ((last_cmd & CMD_MASK) == ADB_READREG(0, 0))
-		last_poll_cmd = last_cmd;
-
 	/* Output mode */
 	via[ACR] |= SR_OUT;
 	/* Load data */
@@ -352,6 +347,9 @@ static void macii_start(void)
 
 	macii_state = sending;
 	data_index = 2;
+
+	bus_timeout = false;
+	srq_asserted = false;
 }
 
 /*
@@ -360,15 +358,17 @@ static void macii_start(void)
  * generating shift register interrupts (SR_INT) for us. This means there has
  * to be activity on the ADB bus. The chip will poll to achieve this.
  *
- * The basic ADB state machine was left unchanged from the original MacII code
- * by Alan Cox, which was based on the CUDA driver for PowerMac.
- * The syntax of the ADB status lines is totally different on MacII,
- * though. MacII uses the states Command -> Even -> Odd -> Even ->...-> Idle
- * for sending and Idle -> Even -> Odd -> Even ->...-> Idle for receiving.
- * Start and end of a receive packet are signalled by asserting /IRQ on the
- * interrupt line (/IRQ means the CTLR_IRQ bit in port B; not to be confused
- * with the VIA shift register interrupt. /IRQ never actually interrupts the
- * processor, it's just an ordinary input.)
+ * The VIA Port B output signalling works as follows. After the ADB transceiver
+ * sees a transition on the PB4 and PB5 lines it will crank over the VIA shift
+ * register which eventually raises the SR_INT interrupt. The PB4/PB5 outputs
+ * are toggled with each byte as the ADB transaction progresses.
+ *
+ * Request with no reply expected (and empty transceiver buffer):
+ *     CMD -> IDLE
+ * Request with expected reply packet (or with buffered autopoll packet):
+ *     CMD -> EVEN -> ODD -> EVEN -> ... -> IDLE
+ * Unsolicited packet:
+ *     IDLE -> EVEN -> ODD -> EVEN -> ... -> IDLE
  */
 static irqreturn_t macii_interrupt(int irq, void *arg)
 {
@@ -388,31 +388,31 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 		}
 	}
 
-	last_status = status;
 	status = via[B] & (ST_MASK | CTLR_IRQ);
 
 	switch (macii_state) {
 	case idle:
-		if (reading_reply) {
-			reply_ptr = current_req->reply;
-		} else {
-			WARN_ON(current_req);
-			reply_ptr = reply_buf;
-		}
+		WARN_ON((status & ST_MASK) != ST_IDLE);
+
+		reply_ptr = reply_buf;
+		reading_reply = 0;
+
+		bus_timeout = false;
+		srq_asserted = false;
 
 		x = via[SR];
 
-		if ((status & CTLR_IRQ) && (x == 0xFF)) {
-			/* Bus timeout without SRQ sequence:
-			 *     data is "FF" while CTLR_IRQ is "H"
+		if (!(status & CTLR_IRQ)) {
+			/* /CTLR_IRQ asserted in idle state means we must
+			 * read an autopoll reply from the transceiver buffer.
 			 */
-			reply_len = 0;
-			srq_asserted = 0;
-			macii_state = read_done;
-		} else {
 			macii_state = reading;
 			*reply_ptr = x;
 			reply_len = 1;
+		} else {
+			/* bus timeout */
+			macii_state = read_done;
+			reply_len = 0;
 		}
 
 		/* set ADB state = even for first data byte */
@@ -421,13 +421,52 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 
 	case sending:
 		req = current_req;
-		if (data_index >= req->nbytes) {
+
+		if (status == (ST_CMD | CTLR_IRQ)) {
+			/* /CTLR_IRQ de-asserted after the command byte means
+			 * the host can continue with the transaction.
+			 */
+
+			/* Store command byte */
+			last_cmd = req->data[1];
+			if ((last_cmd & OP_MASK) == TALK) {
+				last_talk_cmd = last_cmd;
+				if ((last_cmd & CMD_MASK) == ADB_READREG(0, 0))
+					last_poll_cmd = last_cmd;
+			}
+		}
+
+		if (status == ST_CMD) {
+			/* /CTLR_IRQ asserted after the command byte means we
+			 * must read an autopoll reply. The first byte was
+			 * lost because the shift register was an output.
+			 */
+			macii_state = reading;
+
+			reading_reply = 0;
+			reply_ptr = reply_buf;
+			*reply_ptr = last_talk_cmd;
+			reply_len = 1;
+
+			/* reset to shift in */
+			via[ACR] &= ~SR_OUT;
+			x = via[SR];
+		} else if (data_index >= req->nbytes) {
 			req->sent = 1;
-			macii_state = idle;
 
 			if (req->reply_expected) {
+				macii_state = reading;
+
 				reading_reply = 1;
+				reply_ptr = req->reply;
+				*reply_ptr = req->data[1];
+				reply_len = 1;
+
+				via[ACR] &= ~SR_OUT;
+				x = via[SR];
 			} else {
+				macii_state = idle;
+
 				req->complete = 1;
 				current_req = req->next;
 				if (req->done)
@@ -438,25 +477,26 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 
 				if (current_req && macii_state == idle)
 					macii_start();
-			}
 
-			if (macii_state == idle) {
-				/* reset to shift in */
-				via[ACR] &= ~SR_OUT;
-				x = via[SR];
-				/* set ADB state idle - might get SRQ */
-				via[B] = (via[B] & ~ST_MASK) | ST_IDLE;
+				if (macii_state == idle) {
+					/* reset to shift in */
+					via[ACR] &= ~SR_OUT;
+					x = via[SR];
+					/* set ADB state idle - might get SRQ */
+					via[B] = (via[B] & ~ST_MASK) | ST_IDLE;
+				}
+				break;
 			}
 		} else {
 			via[SR] = req->data[data_index++];
+		}
 
-			if ((via[B] & ST_MASK) == ST_CMD) {
-				/* just sent the command byte, set to EVEN */
-				via[B] = (via[B] & ~ST_MASK) | ST_EVEN;
-			} else {
-				/* invert state bits, toggle ODD/EVEN */
-				via[B] ^= ST_MASK;
-			}
+		if ((via[B] & ST_MASK) == ST_CMD) {
+			/* just sent the command byte, set to EVEN */
+			via[B] = (via[B] & ~ST_MASK) | ST_EVEN;
+		} else {
+			/* invert state bits, toggle ODD/EVEN */
+			via[B] ^= ST_MASK;
 		}
 		break;
 
@@ -465,28 +505,13 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 		WARN_ON((status & ST_MASK) == ST_CMD ||
 			(status & ST_MASK) == ST_IDLE);
 
-		/* Bus timeout with SRQ sequence:
-		 *     data is "XX FF"      while CTLR_IRQ is "L L"
-		 * End of packet without SRQ sequence:
-		 *     data is "XX...YY 00" while CTLR_IRQ is "L...H L"
-		 * End of packet SRQ sequence:
-		 *     data is "XX...YY 00" while CTLR_IRQ is "L...L L"
-		 * (where XX is the first response byte and
-		 * YY is the last byte of valid response data.)
-		 */
-
-		srq_asserted = 0;
 		if (!(status & CTLR_IRQ)) {
-			if (x == 0xFF) {
-				if (!(last_status & CTLR_IRQ)) {
-					macii_state = read_done;
-					reply_len = 0;
-					srq_asserted = 1;
-				}
-			} else if (x == 0x00) {
+			if (status == ST_EVEN && reply_len == 1) {
+				bus_timeout = true;
+			} else if (status == ST_ODD && reply_len == 2) {
+				srq_asserted = true;
+			} else {
 				macii_state = read_done;
-				if (!(last_status & CTLR_IRQ))
-					srq_asserted = 1;
 			}
 		}
 
@@ -504,6 +529,9 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 	case read_done:
 		x = via[SR];
 
+		if (bus_timeout)
+			reply_len = 0;
+
 		if (reading_reply) {
 			reading_reply = 0;
 			req = current_req;
-- 
2.26.2


^ permalink raw reply related

* [PATCH 2/9] macintosh/via-macii: Poll the device most likely to respond
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

Poll the most recently polled device by default, rather than the lowest
device address that happens to be enabled in autopoll_devs. This improves
input latency. Re-use macii_queue_poll() rather than duplicate that logic.
This eliminates a static struct and function.

Fixes: d95fd5fce88f0 ("m68k: Mac II ADB fixes") # v5.0+
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 99 +++++++++++++++++++----------------
 1 file changed, 53 insertions(+), 46 deletions(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index 6aa903529570d..d4f1a65c5f1fd 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -77,6 +77,10 @@ static volatile unsigned char *via;
 #define ST_ODD		0x20		/* ADB state: odd data byte */
 #define ST_IDLE		0x30		/* ADB state: idle, nothing to send */
 
+/* ADB command byte structure */
+#define ADDR_MASK	0xF0
+#define CMD_MASK	0x0F
+
 static int macii_init_via(void);
 static void macii_start(void);
 static irqreturn_t macii_interrupt(int irq, void *arg);
@@ -117,7 +121,8 @@ static int reply_len; /* number of bytes received in reply_buf or req->reply */
 static int status;          /* VIA's ADB status bits captured upon interrupt */
 static int last_status;              /* status bits as at previous interrupt */
 static int srq_asserted;     /* have to poll for the device that asserted it */
-static int command_byte;         /* the most recent command byte transmitted */
+static u8 last_cmd;              /* the most recent command byte transmitted */
+static u8 last_poll_cmd; /* the most recent Talk R0 command byte transmitted */
 static int autopoll_devs;      /* bits set are device addresses to be polled */
 
 /* Check for MacII style ADB */
@@ -179,35 +184,49 @@ static int macii_init_via(void)
 /* Send an ADB poll (Talk Register 0 command prepended to the request queue) */
 static void macii_queue_poll(void)
 {
-	/* No point polling the active device as it will never assert SRQ, so
-	 * poll the next device in the autopoll list. This could leave us
-	 * stuck in a polling loop if an unprobed device is asserting SRQ.
-	 * In theory, that could only happen if a device was plugged in after
-	 * probing started. Unplugging it again will break the cycle.
-	 * (Simply polling the next higher device often ends up polling almost
-	 * every device (after wrapping around), which takes too long.)
-	 */
-	int device_mask;
-	int next_device;
 	static struct adb_request req;
+	unsigned char poll_command;
+	unsigned int poll_addr;
 
+	/* This only polls devices in the autopoll list, which assumes that
+	 * unprobed devices never assert SRQ. That could happen if a device was
+	 * plugged in after the adb bus scan. Unplugging it again will resolve
+	 * the problem. This behaviour is similar to MacOS.
+	 */
 	if (!autopoll_devs)
 		return;
 
-	device_mask = (1 << (((command_byte & 0xF0) >> 4) + 1)) - 1;
-	if (autopoll_devs & ~device_mask)
-		next_device = ffs(autopoll_devs & ~device_mask) - 1;
-	else
-		next_device = ffs(autopoll_devs) - 1;
+	/* The device most recently polled may not be the best device to poll
+	 * right now. Some other device(s) may have signalled SRQ (the active
+	 * device won't do that). Or the autopoll list may have been changed.
+	 * Try polling the next higher address.
+	 */
+	poll_addr = (last_poll_cmd & ADDR_MASK) >> 4;
+	if ((srq_asserted && last_cmd == last_poll_cmd) ||
+	    !(autopoll_devs & (1 << poll_addr))) {
+		unsigned int higher_devs;
+
+		higher_devs = autopoll_devs & -(1 << (poll_addr + 1));
+		poll_addr = ffs(higher_devs ? higher_devs : autopoll_devs) - 1;
+	}
 
-	adb_request(&req, NULL, ADBREQ_NOSEND, 1, ADB_READREG(next_device, 0));
+	/* Send a Talk Register 0 command */
+	poll_command = ADB_READREG(poll_addr, 0);
+
+	/* No need to repeat this Talk command. The transceiver will do that
+	 * as long as it is idle.
+	 */
+	if (poll_command == last_cmd)
+		return;
+
+	adb_request(&req, NULL, ADBREQ_NOSEND, 1, poll_command);
 
 	req.sent = 0;
 	req.complete = 0;
 	req.reply_len = 0;
 	req.next = current_req;
 
-	if (current_req != NULL) {
+	if (WARN_ON(current_req)) {
 		current_req = &req;
 	} else {
 		current_req = &req;
@@ -266,37 +285,22 @@ static int macii_write(struct adb_request *req)
 /* Start auto-polling */
 static int macii_autopoll(int devs)
 {
-	static struct adb_request req;
 	unsigned long flags;
-	int err = 0;
 
 	local_irq_save(flags);
 
 	/* bit 1 == device 1, and so on. */
 	autopoll_devs = devs & 0xFFFE;
 
-	if (autopoll_devs && !current_req) {
-		/* Send a Talk Reg 0. The controller will repeatedly transmit
-		 * this as long as it is idle.
-		 */
-		adb_request(&req, NULL, ADBREQ_NOSEND, 1,
-		            ADB_READREG(ffs(autopoll_devs) - 1, 0));
-		err = macii_write(&req);
+	if (!current_req) {
+		macii_queue_poll();
+		if (current_req && macii_state == idle)
+			macii_start();
 	}
 
 	local_irq_restore(flags);
-	return err;
-}
 
-static inline int need_autopoll(void)
-{
-	/* Was the last command Talk Reg 0
-	 * and is the target on the autopoll list?
-	 */
-	if ((command_byte & 0x0F) == 0x0C &&
-	    ((1 << ((command_byte & 0xF0) >> 4)) & autopoll_devs))
-		return 0;
-	return 1;
+	return 0;
 }
 
 /* Prod the chip without interrupts */
@@ -333,7 +337,12 @@ static void macii_start(void)
 	 */
 
 	/* store command byte */
-	command_byte = req->data[1];
+	last_cmd = req->data[1];
+
+	/* If this is a Talk Register 0 command, store the command byte */
+	if ((last_cmd & CMD_MASK) == ADB_READREG(0, 0))
+		last_poll_cmd = last_cmd;
+
 	/* Output mode */
 	via[ACR] |= SR_OUT;
 	/* Load data */
@@ -424,10 +433,11 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 				if (req->done)
 					(*req->done)(req);
 
-				if (current_req)
+				if (!current_req)
+					macii_queue_poll();
+
+				if (current_req && macii_state == idle)
 					macii_start();
-				else if (need_autopoll())
-					macii_autopoll(autopoll_devs);
 			}
 
 			if (macii_state == idle) {
@@ -507,14 +517,11 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 
 		macii_state = idle;
 
-		/* SRQ seen before, initiate poll now */
-		if (srq_asserted)
+		if (!current_req)
 			macii_queue_poll();
 
 		if (current_req)
 			macii_start();
-		else if (need_autopoll())
-			macii_autopoll(autopoll_devs);
 
 		if (macii_state == idle)
 			via[B] = (via[B] & ~ST_MASK) | ST_IDLE;
-- 
2.26.2


^ permalink raw reply related

* [PATCH 1/9] macintosh/via-macii: Access autopoll_devs when inside lock
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

The interrupt handler should be excluded when accessing the autopoll_devs
variable.

Fixes: d95fd5fce88f0 ("m68k: Mac II ADB fixes") # v5.0+
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index ac824d7b2dcfc..6aa903529570d 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -270,15 +270,12 @@ static int macii_autopoll(int devs)
 	unsigned long flags;
 	int err = 0;
 
+	local_irq_save(flags);
+
 	/* bit 1 == device 1, and so on. */
 	autopoll_devs = devs & 0xFFFE;
 
-	if (!autopoll_devs)
-		return 0;
-
-	local_irq_save(flags);
-
-	if (current_req == NULL) {
+	if (autopoll_devs && !current_req) {
 		/* Send a Talk Reg 0. The controller will repeatedly transmit
 		 * this as long as it is idle.
 		 */
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH 01/11] kexec_file: allow archs to handle special regions while locating memory hole
From: piliu @ 2020-06-28  2:28 UTC (permalink / raw)
  To: Hari Bathini, Michael Ellerman, Andrew Morton
  Cc: Kexec-ml, Petr Tesarik, Mahesh J Salgaonkar, Sourabh Jain, lkml,
	linuxppc-dev, Mimi Zohar, Thiago Jung Bauermann, Dave Young,
	Vivek Goyal, Eric Biederman
In-Reply-To: <159319828304.16351.6990340111766605842.stgit@hbathini.in.ibm.com>

Hi Hari,

If in [4/11],  get_exclude_memory_ranges() turns out to be unnecessary
,then this patch is abundant either. As my understanding, memblock has
already helped to achieved the purpose that get_exclude_memory_ranges()
wants.

Thanks,
Pingfan

On 06/27/2020 03:04 AM, Hari Bathini wrote:
> Some archs can have special memory regions, within the given memory
> range, which can't be used for the buffer in a kexec segment. As
> kexec_add_buffer() function is being called from generic code as well,
> add weak arch_kexec_add_buffer definition for archs to override & take
> care of special regions before trying to locate a memory hole.
> 
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>  include/linux/kexec.h |    5 +++++
>  kernel/kexec_file.c   |   37 +++++++++++++++++++++++++++++++++----
>  2 files changed, 38 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 1776eb2..1237682 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -195,6 +195,11 @@ int __weak arch_kexec_apply_relocations(struct purgatory_info *pi,
>  					const Elf_Shdr *relsec,
>  					const Elf_Shdr *symtab);
>  
> +extern int arch_kexec_add_buffer(struct kexec_buf *kbuf);
> +
> +/* arch_kexec_add_buffer calls this when it is ready */
> +extern int __kexec_add_buffer(struct kexec_buf *kbuf);
> +
>  extern int kexec_add_buffer(struct kexec_buf *kbuf);
>  int kexec_locate_mem_hole(struct kexec_buf *kbuf);
>  
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index bb05fd5..a0b4f7f 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -669,10 +669,6 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
>   */
>  int kexec_add_buffer(struct kexec_buf *kbuf)
>  {
> -
> -	struct kexec_segment *ksegment;
> -	int ret;
> -
>  	/* Currently adding segment this way is allowed only in file mode */
>  	if (!kbuf->image->file_mode)
>  		return -EINVAL;
> @@ -696,6 +692,25 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
>  	kbuf->memsz = ALIGN(kbuf->memsz, PAGE_SIZE);
>  	kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE);
>  
> +	return arch_kexec_add_buffer(kbuf);
> +}
> +
> +/**
> + * __kexec_add_buffer - arch_kexec_add_buffer would call this function after
> + *                      updating kbuf, to place a buffer in a kexec segment.
> + * @kbuf:               Buffer contents and memory parameters.
> + *
> + * This function assumes that kexec_mutex is held.
> + * On successful return, @kbuf->mem will have the physical address of
> + * the buffer in memory.
> + *
> + * Return: 0 on success, negative errno on error.
> + */
> +int __kexec_add_buffer(struct kexec_buf *kbuf)
> +{
> +	struct kexec_segment *ksegment;
> +	int ret;
> +
>  	/* Walk the RAM ranges and allocate a suitable range for the buffer */
>  	ret = kexec_locate_mem_hole(kbuf);
>  	if (ret)
> @@ -711,6 +726,20 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
>  	return 0;
>  }
>  
> +/**
> + * arch_kexec_add_buffer - Some archs have memory regions within the given
> + *                         range that can't be used to place a kexec segment.
> + *                         Such archs can override this function to take care
> + *                         of them before trying to locate the memory hole.
> + * @kbuf:                  Buffer contents and memory parameters.
> + *
> + * Return: 0 on success, negative errno on error.
> + */
> +int __weak arch_kexec_add_buffer(struct kexec_buf *kbuf)
> +{
> +	return __kexec_add_buffer(kbuf);
> +}
> +
>  /* Calculate and store the digest of segments */
>  static int kexec_calculate_store_digests(struct kimage *image)
>  {
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


^ permalink raw reply

* Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions
From: piliu @ 2020-06-28  2:14 UTC (permalink / raw)
  To: Hari Bathini, Michael Ellerman, Andrew Morton
  Cc: Kexec-ml, Petr Tesarik, Mahesh J Salgaonkar, Sourabh Jain, lkml,
	linuxppc-dev, Mimi Zohar, Thiago Jung Bauermann, Dave Young,
	Vivek Goyal, Eric Biederman
In-Reply-To: <159319831192.16351.17443438699302756548.stgit@hbathini.in.ibm.com>

Hi Hari,

After a quick through for this series, I have a few question/comment on
this patch for the time being. Pls see comment inline.

On 06/27/2020 03:05 AM, Hari Bathini wrote:
> crashkernel region could have an overlap with special memory regions
> like  opal, rtas, tce-table & such. These regions are referred to as
> exclude memory ranges. Setup this ranges during image probe in order
> to avoid them while finding the buffer for different kdump segments.
> Implement kexec_locate_mem_hole_ppc64() that locates a memory hole
> accounting for these ranges. Also, override arch_kexec_add_buffer()
> to locate a memory hole & later call __kexec_add_buffer() function
> with kbuf->mem set to skip the generic locate memory hole lookup.
> 
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/crashdump-ppc64.h |   10 +
>  arch/powerpc/include/asm/kexec.h           |    7 -
>  arch/powerpc/kexec/elf_64.c                |    7 +
>  arch/powerpc/kexec/file_load_64.c          |  292 ++++++++++++++++++++++++++++
>  4 files changed, 312 insertions(+), 4 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h
> 
> diff --git a/arch/powerpc/include/asm/crashdump-ppc64.h b/arch/powerpc/include/asm/crashdump-ppc64.h
> new file mode 100644
> index 0000000..3596c25
> --- /dev/null
> +++ b/arch/powerpc/include/asm/crashdump-ppc64.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _ARCH_POWERPC_KEXEC_CRASHDUMP_PPC64_H
> +#define _ARCH_POWERPC_KEXEC_CRASHDUMP_PPC64_H
> +
> +/* min & max addresses for kdump load segments */
> +#define KDUMP_BUF_MIN		(crashk_res.start)
> +#define KDUMP_BUF_MAX		((crashk_res.end < ppc64_rma_size) ? \
> +				 crashk_res.end : (ppc64_rma_size - 1))
> +
> +#endif /* __ARCH_POWERPC_KEXEC_CRASHDUMP_PPC64_H */
> diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
> index 7008ea1..bf47a01 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -100,14 +100,16 @@ void relocate_new_kernel(unsigned long indirection_page, unsigned long reboot_co
>  #ifdef CONFIG_KEXEC_FILE
>  extern const struct kexec_file_ops kexec_elf64_ops;
>  
> -#ifdef CONFIG_IMA_KEXEC
>  #define ARCH_HAS_KIMAGE_ARCH
>  
>  struct kimage_arch {
> +	struct crash_mem *exclude_ranges;
> +
> +#ifdef CONFIG_IMA_KEXEC
>  	phys_addr_t ima_buffer_addr;
>  	size_t ima_buffer_size;
> -};
>  #endif
> +};
>  
>  int setup_purgatory(struct kimage *image, const void *slave_code,
>  		    const void *fdt, unsigned long kernel_load_addr,
> @@ -125,6 +127,7 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
>  			unsigned long initrd_load_addr,
>  			unsigned long initrd_len, const char *cmdline);
>  #endif /* CONFIG_PPC64 */
> +
>  #endif /* CONFIG_KEXEC_FILE */
>  
>  #else /* !CONFIG_KEXEC_CORE */
> diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
> index 23ad04c..c695f94 100644
> --- a/arch/powerpc/kexec/elf_64.c
> +++ b/arch/powerpc/kexec/elf_64.c
> @@ -22,6 +22,7 @@
>  #include <linux/of_fdt.h>
>  #include <linux/slab.h>
>  #include <linux/types.h>
> +#include <asm/crashdump-ppc64.h>
>  
>  static void *elf64_load(struct kimage *image, char *kernel_buf,
>  			unsigned long kernel_len, char *initrd,
> @@ -46,6 +47,12 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,
>  	if (ret)
>  		goto out;
>  
> +	if (image->type == KEXEC_TYPE_CRASH) {
> +		/* min & max buffer values for kdump case */
> +		kbuf.buf_min = pbuf.buf_min = KDUMP_BUF_MIN;
> +		kbuf.buf_max = pbuf.buf_max = KDUMP_BUF_MAX;
> +	}
> +
>  	ret = kexec_elf_load(image, &ehdr, &elf_info, &kbuf, &kernel_load_addr);
>  	if (ret)
>  		goto out;
> diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
> index e6bff960..f1d7160 100644
> --- a/arch/powerpc/kexec/file_load_64.c
> +++ b/arch/powerpc/kexec/file_load_64.c
> @@ -17,6 +17,8 @@
>  #include <linux/kexec.h>
>  #include <linux/of_fdt.h>
>  #include <linux/libfdt.h>
> +#include <asm/kexec_ranges.h>
> +#include <asm/crashdump-ppc64.h>
>  
>  const struct kexec_file_ops * const kexec_file_loaders[] = {
>  	&kexec_elf64_ops,
> @@ -24,6 +26,247 @@ const struct kexec_file_ops * const kexec_file_loaders[] = {
>  };
>  
>  /**
> + * get_exclude_memory_ranges - Get exclude memory ranges. This list includes
> + *                             regions like opal/rtas, tce-table, initrd,
> + *                             kernel, htab which should be avoided while
> + *                             setting up kexec load segments.
> + * @mem_ranges:                Range list to add the memory ranges to.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +static int get_exclude_memory_ranges(struct crash_mem **mem_ranges)
Is it needed? See the comment below.
> +{
> +	int ret;
> +
> +	ret = add_tce_mem_ranges(mem_ranges);
> +	if (ret)
> +		goto out;
> +
> +	ret = add_initrd_mem_range(mem_ranges);
> +	if (ret)
> +		goto out;
> +
> +	ret = add_htab_mem_range(mem_ranges);
> +	if (ret)
> +		goto out;
> +
> +	ret = add_kernel_mem_range(mem_ranges);
> +	if (ret)
> +		goto out;
> +
> +	ret = add_rtas_mem_range(mem_ranges, false);
> +	if (ret)
> +		goto out;
> +
> +	ret = add_opal_mem_range(mem_ranges, false);
> +	if (ret)
> +		goto out;
> +
> +	ret = add_reserved_ranges(mem_ranges);
> +	if (ret)
> +		goto out;
> +
> +	/* exclude memory ranges should be sorted for easy lookup */
> +	sort_memory_ranges(*mem_ranges);
> +out:
> +	if (ret)
> +		pr_err("Failed to setup exclude memory ranges\n");
> +	return ret;
> +}
> +
> +/**
> + * __locate_mem_hole_ppc64 - Tests if the memory hole between buf_min & buf_max
> + *                           is large enough for the buffer. If true, sets
> + *                           kbuf->mem to the buffer.
> + * @kbuf:                    Buffer contents and memory parameters.
> + * @buf_min:                 Minimum address for the buffer.
> + * @buf_max:                 Maximum address for the buffer.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +static int __locate_mem_hole_ppc64(struct kexec_buf *kbuf,
> +				   u64 buf_min, u64 buf_max)
> +{
> +	int ret = -EADDRNOTAVAIL;
> +
> +	buf_min = ALIGN(buf_min, kbuf->buf_align);
> +
> +	if (buf_min < buf_max &&
> +	    (buf_max - buf_min + 1) >= kbuf->memsz) {
> +		/*
> +		 * Suitable memory range found. Set kbuf->mem here to skip
> +		 * locate memory hole routine in __kexec_add_buffer() call.
> +		 */
> +		ret = 0;
> +		if (kbuf->top_down)
> +			kbuf->mem = ALIGN_DOWN(buf_max - kbuf->memsz + 1,
> +					       kbuf->buf_align);
> +		else
> +			kbuf->mem = buf_min;
> +	}
> +
> +	return ret;
> +}
> +
> +/**
> + * locate_mem_hole_top_down_ppc64 - Skip special memory regions to find a
> + *                                  suitable buffer with top down approach.
> + * @kbuf:                           Buffer contents and memory parameters.
> + * @buf_min:                        Minimum address for the buffer.
> + * @buf_max:                        Maximum address for the buffer.
> + * @emem:                           Exclude memory ranges.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +static int locate_mem_hole_top_down_ppc64(struct kexec_buf *kbuf,
> +					  u64 buf_min, u64 buf_max,
> +					  const struct crash_mem *emem)
> +{
> +	int i, ret = 0, err = -EADDRNOTAVAIL;
> +	u64 start, end, tmin, tmax;
> +
> +	tmax = buf_max;
> +	for (i = (emem->nr_ranges - 1); i >= 0; i--) {
> +		start = emem->ranges[i].start;
> +		end = emem->ranges[i].end;
> +
> +		if (start > tmax)
> +			continue;
> +
> +		if (end < tmax) {
> +			tmin = (end < buf_min ? buf_min : end + 1);
> +			ret = __locate_mem_hole_ppc64(kbuf, tmin, tmax);
> +			if (!ret)
> +				return 0;
> +		}
> +
> +		tmax = start - 1;
> +
> +		if (tmax < buf_min) {
> +			ret = err;
> +			break;
> +		}
> +		ret = 0;
> +	}
> +
> +	if (!ret) {
> +		tmin = buf_min;
> +		ret = __locate_mem_hole_ppc64(kbuf, tmin, tmax);
> +	}
> +	return ret;
> +}
> +
> +/**
> + * locate_mem_hole_bottom_up_ppc64 - Skip special memory regions to find a
> + *                                   suitable buffer with bottom up approach.
> + * @kbuf:                            Buffer contents and memory parameters.
> + * @buf_min:                         Minimum address for the buffer.
> + * @buf_max:                         Maximum address for the buffer.
> + * @emem:                            Exclude memory ranges.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +static int locate_mem_hole_bottom_up_ppc64(struct kexec_buf *kbuf,
> +					   u64 buf_min, u64 buf_max,
> +					   const struct crash_mem *emem)
> +{
> +	int i, ret = 0, err = -EADDRNOTAVAIL;
> +	u64 start, end, tmin, tmax;
> +
> +	tmin = buf_min;
> +	for (i = 0; i < emem->nr_ranges; i++) {
> +		start = emem->ranges[i].start;
> +		end = emem->ranges[i].end;
> +
> +		if (end < tmin)
> +			continue;
> +
> +		if (start > tmin) {
> +			tmax = (start > buf_max ? buf_max : start - 1);
> +			ret = __locate_mem_hole_ppc64(kbuf, tmin, tmax);
> +			if (!ret)
> +				return 0;
> +		}
> +
> +		tmin = end + 1;
> +
> +		if (tmin > buf_max) {
> +			ret = err;
> +			break;
> +		}
> +		ret = 0;
> +	}
> +
> +	if (!ret) {
> +		tmax = buf_max;
> +		ret = __locate_mem_hole_ppc64(kbuf, tmin, tmax);
> +	}
> +	return ret;
> +}
> +
> +/**
> + * kexec_locate_mem_hole_ppc64 - Skip special memory regions like rtas,
> + *                               tce-table, opal, reserved-ranges & such
> + *                               (exclude memory ranges) as they can't be
> + *                               used for kexec segment buffer. Use buf_min
> + *                               & buf_max fields in kexec_buf structure to
> + *                               skip regions. Sets kbuf->mem when a
> + *                               suitable memory hole is found.
> + * @kbuf:                        Buffer contents and memory parameters.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +static int kexec_locate_mem_hole_ppc64(struct kexec_buf *kbuf)
> +{
> +	struct crash_mem **emem;
> +	u64 buf_min, buf_max;
> +	int ret;
> +
> +	/*
> +	 * Use the locate_mem_hole logic in kexec_add_buffer() for regular
> +	 * kexec_file_load syscall
> +	 */
> +	if (kbuf->image->type != KEXEC_TYPE_CRASH)
> +		return 0;
Can the ranges overlap [crashk_res.start, crashk_res.end]?  Otherwise
there is no requirement for @exclude_ranges.


I guess you have a design for future. If not true, then it is better to
fold the condition "if (kbuf->image->type != KEXEC_TYPE_CRASH)" into the
caller and rename this function to better distinguish use cases between
kexec and kdump

Thanks,
Pingfan
> +
> +	/* Look up the exclude ranges list while locating the memory hole */
> +	emem = &(kbuf->image->arch.exclude_ranges);
> +	if (!(*emem) || ((*emem)->nr_ranges == 0)) {
> +		pr_warn("No exclude range list. Using the default locate mem hole method\n");
> +		return 0;
> +	}
> +
> +	/* Ensure minimum alignment needed for segments. */
> +	kbuf->memsz = ALIGN(kbuf->memsz, PAGE_SIZE);
> +	kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE);
> +
> +	/* Segments for kdump kernel should be within crashkernel region */
> +	buf_min = (kbuf->buf_min < crashk_res.start ?
> +		   crashk_res.start : kbuf->buf_min);
> +	buf_max = (kbuf->buf_max > crashk_res.end ?
> +		   crashk_res.end : kbuf->buf_max);
> +
> +	if (buf_min > buf_max) {
> +		pr_err("Invalid buffer min and/or max values\n");
> +		return -EINVAL;
> +	}
> +
> +	if (kbuf->top_down)
> +		ret = locate_mem_hole_top_down_ppc64(kbuf, buf_min, buf_max,
> +						     *emem);
> +	else
> +		ret = locate_mem_hole_bottom_up_ppc64(kbuf, buf_min, buf_max,
> +						      *emem);
> +
> +	/* Add the buffer allocated to the exclude list for the next lookup */
> +	if (!ret) {
> +		add_mem_range(emem, kbuf->mem, kbuf->memsz);
> +		sort_memory_ranges(*emem);
> +	}
> +	return ret;
> +}
> +
> +/**
>   * setup_purgatory_ppc64 - initialize PPC64 specific purgatory's global
>   *                         variables and call setup_purgatory() to initialize
>   *                         common global variable.
> @@ -89,6 +332,29 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
>  }
>  
>  /**
> + * arch_kexec_add_buffer - Locate memory hole before calling kexec_add_buffer().
> + *                         All kexec_add_buffer() callers should use this
> + *                         function instead.
> + * @kbuf:                  Buffer contents and memory parameters.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +int arch_kexec_add_buffer(struct kexec_buf *kbuf)
> +{
> +	int ret;
> +
> +	ret = kexec_locate_mem_hole_ppc64(kbuf);
> +	if (ret)
> +		goto out;
> +
> +	ret = __kexec_add_buffer(kbuf);
> +out:
> +	if (ret)
> +		pr_err("Failed to add buffer of size %lu\n", kbuf->memsz);
> +	return ret;
> +}
> +
> +/**
>   * arch_kexec_kernel_image_probe - Does additional handling needed to setup
>   *                                 kexec segments.
>   * @image:                         kexec image being loaded.
> @@ -100,9 +366,31 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
>  int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
>  				  unsigned long buf_len)
>  {
> -	/* We don't support crash kernels yet. */
> -	if (image->type == KEXEC_TYPE_CRASH)
> +	if (image->type == KEXEC_TYPE_CRASH) {
> +		int ret;
> +
> +		/* Get exclude memory ranges needed for setting up kdump segments */
> +		ret = get_exclude_memory_ranges(&(image->arch.exclude_ranges));
> +		if (ret)
> +			pr_err("Failed to setup exclude memory ranges for buffer lookup\n");
> +		/* Return this until all changes for panic kernel are in */
>  		return -EOPNOTSUPP;
> +	}
>  
>  	return kexec_image_probe_default(image, buf, buf_len);
>  }
> +
> +/**
> + * arch_kimage_file_post_load_cleanup - Frees up all the allocations done
> + *                                      while loading the image.
> + * @image:                              kexec image being loaded.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +int arch_kimage_file_post_load_cleanup(struct kimage *image)
> +{
> +	kfree(image->arch.exclude_ranges);
> +	image->arch.exclude_ranges = NULL;
> +
> +	return kexec_image_post_load_cleanup_default(image);
> +}
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox