Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 6/9] ARM: dts: wheat: Drop MTD partitioning from DT
From: Simon Horman @ 2018-05-31 11:40 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <d78bb8a8-a351-409c-54d2-33ee02fd8afe@gmail.com>

On Wed, May 30, 2018 at 12:13:31PM +0200, Marek Vasut wrote:
> On 05/28/2018 11:36 AM, Simon Horman wrote:
> > On Mon, May 28, 2018 at 10:54:57AM +0200, Geert Uytterhoeven wrote:
> >> Hi Simon,
> >>
> >> On Mon, May 28, 2018 at 10:41 AM, Simon Horman <horms@verge.net.au> wrote:
> >>> On Thu, May 24, 2018 at 04:52:59PM +0200, Marek Vasut wrote:
> >>>> On 05/23/2018 08:25 AM, Geert Uytterhoeven wrote:
> >>>>> On Wed, May 23, 2018 at 12:01 AM, Marek Vasut <marek.vasut@gmail.com> wrote:
> >>>>>> On 05/22/2018 04:43 PM, Geert Uytterhoeven wrote:
> >>>>>>> On Tue, May 22, 2018 at 2:02 PM, Marek Vasut <marek.vasut@gmail.com> wrote:
> >>>>>>>> Drop the MTD partitioning from DT, since it does not describe HW
> >>>>>>>> and to give way to a more flexible kernel command line partition
> >>>>>>>> passing.
> >>>>>>>>
> >>>>>>>> To retain the original partitioning, assure you have enabled
> >>>>>>>> CONFIG_MTD_CMDLINE_PARTS in your kernel config and add the
> >>>>>>>> following to your kernel command line:
> >>>>>>>>
> >>>>>>>>   mtdparts=spi0.0:256k at 0(loader),4096k(user),-(flash)
> >>>>>>>
> >>>>>>> I think the "@0" can be dropped, as it's optional?
> >>>>>>> 4m?
> >>>>>>
> >>>>>> My take on this is that the loader is actually at offset 0x0 of the MTD
> >>>>>> device and we explicitly state that in the mtdparts to anchor the first
> >>>>>> partition within the MTD device and all the other partitions are at
> >>>>>> offset +(sum of the sizes of all partitions listed before the current
> >>>>>> one) relative to that first partition.
> >>>>>
> >>>>> Where is this explicitly states for the first partition?
> >>>>>
> >>>>>> Removing the @0 feels fragile at best and it seems to depend on the
> >>>>>> current behavior of the code.
> >>>>>
> >>>>> Better, it also depends on the documented behavior:
> >>>>>
> >>>>> Documentation/admin-guide/kernel-parameters.txt refers to
> >>>>> drivers/mtd/cmdlinepart.c, which states:
> >>>>>
> >>>>>  * <offset>  := standard linux memsize
> >>>>>  *              if omitted the part will immediately follow the previous part
> >>>>>  *              or 0 if the first part
> >>>>>
> >>>>> None of the examples listed there or under the MTD_CMDLINE_PARTS Kconfig
> >>>>> help text, or in a defconfig bundled with the kernel, use @0 for the first
> >>>>> partition.
> >>>>
> >>>> I think this is exceptionally fragile and dangerous to depend on this,
> >>>> but so be it.
> >>>
> >>> Could you respin with this change?
> >>>
> >>> I would also like to ask for another change, in light of recent
> >>> feedback from Olof Johansson ("Re: [GIT PULL] Renesas ARM64 Based SoC DT
> >>> Updates for v4.18").
> >>>
> >>> Please consolidate the dts patches into a single patch?
> >>
> >> I think it's better to keep them split, as each commit description mentions
> >> what needs to be passed on the kernel command line for the corresponding
> >> board.
> >>
> >> Combining it in a single patch makes it much harder to extract this information.
> >> Unless you're fine with a list:
> >>
> >>    koelsch: ...
> >>    wheat: mtdparts=spi0.0:256k at 0(loader),4096k(user),-(flash)
> > 
> > Lets try a list.
> 
> Reposted with a list, twice :/

Thanks, got it :)

^ permalink raw reply

* [PATCH v4 1/2] regulator: dt-bindings: add QCOM RPMh regulator bindings
From: Mark Brown @ 2018-05-31 11:48 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <75088820-f20d-32ac-780a-5e7dacdf20ff@codeaurora.org>

On Wed, May 30, 2018 at 04:39:10PM -0700, David Collins wrote:

> The DRMS modes to use and max allowed current per mode need to be
> specified at the board level in device tree instead of hard-coded per
> regulator type in the driver.  There are at least two use cases driving
> this need: LDOs shared between RPMh client processors and SMPSes requiring
> PWM mode in certain circumstances.

Is there really no way to improve the RPM firmware?

> Consider the case of a regulator with physical 10 mA LPM max current. Say
> that modem and application processors each have a load on the regulator
> that draws 9 mA. If they each respect the 10 mA limit, then they'd each
> vote for LPM. The VRM block in RPMh hardware will aggregate these requests

This is, of course, why the regulator API aggregates this stuff based on
the current not based on having every individual user select a mode.

> together using a max function which will result in the regulator being set
> to LPM, even though the total load is 18 mA (which would require high
> power mode (HPM)). To get around this corner case, a LPM max current of 1
> uA can be used for all LDO supplies that have non-application processor
> consumers. Thus, any non-zero regulator_set_load() current request will
> result in setting the regulator to HPM (which is always safe).

That's obviously just never going to work well, the best you can do here
is just pretend that the other components are always operating at full
power (which I assume all the other components are doing too...) or not
try to use load based mode switching in the first place.

> The second situation that needs board-level DRMS mode and current limit
> specification is SMPS regulator AUTO mode to PWM (HPM) mode switching.
> SMPS regulators should theoretically always be able to operate in AUTO
> mode as it switches automatically between PWM mode (which can provide the
> maximum current) and PFM mode (which supports lower current but has higher
> efficiency). However, there may be board/system issues that require
> switching to PWM mode for certain use cases as it has better load
> regulation (i.e. no PFM ripple for lower loads) and supports more
> aggressive load current steps (i.e. greater A/ns).

> If a Linux consumer requires the ability to force a given SMPS regulator
> from AUTO mode into PWM mode and that SMPS is shared by other Linux
> consumers (which may be the case, but at least must be guaranteed to work
> architecturally), then regulator_set_load() is the only option since it
> provides aggregation, where as regulator_set_mode() does not.

That's obviously broken though, what you're describing is just clearly
nothing to do with load so trying to configure it using load is just
going to lead to problems later on.  Honestly it sounds like you just
want to force the regulator into forced PWM mode all the time, otherwise
you need to look into implementing support for describing the thing
you're actually trying to do and add a mechanism for per consumer mode
configuration.

This has been a frequent pattern with these RPM drivers, trying to find
some way to shoehorn something that happens to work right now into the
code.  That's going to make things fragile and hard to maintain, we need
code that does the thing it's saying it does so that it's easier to
understand and work with - getting things running isn't enough, they
need to be clear.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180531/53042fa7/attachment.sig>

^ permalink raw reply

* [PATCH v2 1/6] arm64: KVM: Add support for Stage-2 control of memory types and cacheability
From: Mark Rutland @ 2018-05-31 11:49 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20180530124706.25284-2-marc.zyngier@arm.com>

On Wed, May 30, 2018 at 01:47:01PM +0100, Marc Zyngier wrote:
> Up to ARMv8.3, the combinaison of Stage-1 and Stage-2 attributes
> results in the strongest attribute of the two stages.  This means
> that the hypervisor has to perform quite a lot of cache maintenance
> just in case the guest has some non-cacheable mappings around.
> 
> ARMv8.4 solves this problem by offering a different mode (FWB) where
> Stage-2 has total control over the memory attribute (this is limited
> to systems where both I/O and instruction caches are coherent with

s/caches/fetches/ -- the I-caches themselves aren't coherent with the
D-caches (or we could omit I-cache maintenance).

i.e. this implies IDC, but not DIC.

> the dcache). This is achieved by having a different set of memory
> attributes in the page tables, and a new bit set in HCR_EL2.
> 
> On such a system, we can then safely sidestep any form of dcache
> management.
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---

>  static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  {
> @@ -268,7 +269,10 @@ static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
>  {
>  	void *va = page_address(pfn_to_page(pfn));
>  
> -	kvm_flush_dcache_to_poc(va, size);
> +	if (!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +		kvm_flush_dcache_to_poc(va, size);
> +	else
> +		kvm_flush_dcache_to_pou(va, size);
>  }

Te commit message said instruction fetches were coherent, and that no
D-cache maintenance was necessary, so why do we need maintenance to the
PoU?

> +static void cpu_has_fwb(const struct arm64_cpu_capabilities *__unused)
> +{
> +	u64 val = read_sysreg_s(SYS_CLIDR_EL1);
> +
> +	/* Check that CLIDR_EL1.LOU{U,IS} are both 0 */
> +	WARN_ON(val & (7 << 27 | 7 << 21));
> +}

What about CTR_EL0.IDC?

Thanks,
Mark.

^ permalink raw reply

* [PATCH v2 1/5] crypto: ccree: correct host regs offset
From: Gilad Ben-Yossef @ 2018-05-31 11:51 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20180529161248.fphjnceozoyxqy7v@verge.net.au>

On Tue, May 29, 2018 at 7:12 PM, Simon Horman <horms@verge.net.au> wrote:
> On Thu, May 24, 2018 at 03:19:06PM +0100, Gilad Ben-Yossef wrote:
>> The product signature and HW revision register have different offset on the
>> older HW revisions.
>> This fixes the problem of the driver failing sanity check on silicon
>> despite working on the FPGA emulation systems.
>>
>> Fixes: 27b3b22dd98c ("crypto: ccree - add support for older HW revs")
>
> Did the above introduce a regression that is fixed by this patch
> or did it add a feature that only works with this patch?
>

Sort of in between - the first patch made more devices work but
unreliability (it will sometime work, sometime doesn't).
This one make it work reliably.

> In the case of the latter I would drop the Fixes tag,
> but I don't feel strongly about it.
>
>> Cc: stable at vger.kernel.org
>> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
>
> Minor not below not withstanding,
>
> Reviewed-by: Simon Horman <horms+renesas@verge.net.au>

Thank you for the review and help :-)

Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru

^ permalink raw reply

* [PATCH v2 2/6] arm64: KVM: Handle Set/Way CMOs as NOPs if FWB is present
From: Mark Rutland @ 2018-05-31 11:51 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20180530124706.25284-3-marc.zyngier@arm.com>

On Wed, May 30, 2018 at 01:47:02PM +0100, Marc Zyngier wrote:
> Set/Way handling is one of the ugliest corners of KVM. We shouldn't
> have to handle that, but better safe than sorry.
> 
> Thankfully, FWB fixes this for us by not requiering any maintenance
> whatsoever, which means we don't have to emulate S/W CMOs, and don't
> have to track VM ops either.
> 
> We still have to trap S/W though, if only to prevent the guest from
> doing something bad.

S/W ops *also* do I-cache maintenance, so we'd still need to emulate
that. Though it looks like we're missing that today...

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 6e3b969391fd..9a740f159245 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -195,7 +195,13 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
>  	if (!p->is_write)
>  		return read_from_write_only(vcpu, p, r);
>  
> -	kvm_set_way_flush(vcpu);
> +	/*
> +	 * Only track S/W ops if we don't have FWB. It still indicates
> +	 * that the guest is a bit broken...
> +	 */
> +	if (!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +		kvm_set_way_flush(vcpu);
> +

Assuming we implement I-cache maintenance, we can have something like:

	if (!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
		kvm_set_way_flush_dcache(vcpu);

	kvm_set_way_flush_icache(vcpu);

Thanks,
Mark.

>  	return true;
>  }
>  
> -- 
> 2.17.1
> 

^ permalink raw reply

* [PATCH v2 3/6] arm64: KVM: Avoid marking pages as XN in Stage-2 if CTR_EL0.DIC is set
From: Mark Rutland @ 2018-05-31 11:52 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20180530124706.25284-4-marc.zyngier@arm.com>

On Wed, May 30, 2018 at 01:47:03PM +0100, Marc Zyngier wrote:
> On systems where CTR_EL0.DIC is set, we don't need to perform
> icache invalidation to guarantee that we'll fetch the right
> instruction stream.
> 
> This also means that taking a permission fault to invalidate the
> icache is an unnecessary overhead.
> 
> On such systems, we can safely leave the page as being executable.
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/arm64/include/asm/pgtable-prot.h | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> index c66c3047400e..78b942c1bea4 100644
> --- a/arch/arm64/include/asm/pgtable-prot.h
> +++ b/arch/arm64/include/asm/pgtable-prot.h
> @@ -77,8 +77,18 @@
>  		__val;							\
>  	 })
>  
> -#define PAGE_S2			__pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(NORMAL) | PTE_S2_RDONLY | PTE_S2_XN)
> -#define PAGE_S2_DEVICE		__pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_S2_XN)
> +#define PAGE_S2_XN							\
> +	({								\
> +		u64 __val;						\
> +		if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))		\
> +			__val = 0;					\
> +		else							\
> +			__val = PTE_S2_XN;				\
> +		__val;							\
> +	})
> +
> +#define PAGE_S2			__pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(NORMAL) | PTE_S2_RDONLY | PAGE_S2_XN)
> +#define PAGE_S2_DEVICE		__pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(DEVICE_nGnRE) | PTE_S2_RDONLY | PAGE_S2_XN)
>  
>  #define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
>  #define PAGE_SHARED		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
> -- 
> 2.17.1
> 

^ permalink raw reply

* [PATCH v2 5/5] arm64: dts: renesas: r8a7795: add ccree binding
From: Gilad Ben-Yossef @ 2018-05-31 11:55 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20180529161936.prhp4oikzamr6u3n@verge.net.au>

On Tue, May 29, 2018 at 7:19 PM, Simon Horman <horms@verge.net.au> wrote:
> On Thu, May 24, 2018 at 03:19:10PM +0100, Gilad Ben-Yossef wrote:
>> Add bindings for CryptoCell instance in the SoC.
>>
>> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
>
> In so far as I can review the details of this (which is not much) this
> looks fine to me. I am, however, a little unclear in when it should be
> accepted.

Since Herbert Xu ACKed the driver changes, I would say the only gating
commit is Geert's CR clock patch.
If that one is in, than I would say this one should go in as well.

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru

^ permalink raw reply

* [PATCH v2 4/6] KVM: arm/arm64: Consolidate page-table accessors
From: Mark Rutland @ 2018-05-31 11:55 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20180530124706.25284-5-marc.zyngier@arm.com>

On Wed, May 30, 2018 at 01:47:04PM +0100, Marc Zyngier wrote:
> The arm and arm64 KVM page tables accessors are pointlessly different
> between the two architectures, and likely both wrong one way or another:
> arm64 lacks a dsb(), and arm doesn't use WRITE_ONCE.
> 
> Let's unify them.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/arm/include/asm/kvm_mmu.h   | 12 -----------
>  arch/arm64/include/asm/kvm_mmu.h |  3 ---
>  virt/kvm/arm/mmu.c               | 35 ++++++++++++++++++++++++++++----
>  3 files changed, 31 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 707a1f06dc5d..468ff945efa0 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -75,18 +75,6 @@ phys_addr_t kvm_get_idmap_vector(void);
>  int kvm_mmu_init(void);
>  void kvm_clear_hyp_idmap(void);
>  
> -static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd)
> -{
> -	*pmd = new_pmd;
> -	dsb(ishst);
> -}
> -
> -static inline void kvm_set_pte(pte_t *pte, pte_t new_pte)
> -{
> -	*pte = new_pte;
> -	dsb(ishst);
> -}
> -
>  static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
>  {
>  	pte_val(pte) |= L_PTE_S2_RDWR;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 9dbca5355029..26c89b63f604 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -170,9 +170,6 @@ phys_addr_t kvm_get_idmap_vector(void);
>  int kvm_mmu_init(void);
>  void kvm_clear_hyp_idmap(void);
>  
> -#define	kvm_set_pte(ptep, pte)		set_pte(ptep, pte)
> -#define	kvm_set_pmd(pmdp, pmd)		set_pmd(pmdp, pmd)
> -
>  static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
>  {
>  	pte_val(pte) |= PTE_S2_RDWR;
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index ba66bf7ae299..c9ed239c0840 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -177,6 +177,33 @@ static void clear_stage2_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr
>  	put_page(virt_to_page(pmd));
>  }
>  
> +static inline void kvm_set_pte(pte_t *ptep, pte_t new_pte)
> +{
> +	WRITE_ONCE(*ptep, new_pte);
> +	dsb(ishst);
> +}
> +
> +static inline void kvm_set_pmd(pmd_t *pmdp, pmd_t new_pmd)
> +{
> +	WRITE_ONCE(*pmdp, new_pmd);
> +	dsb(ishst);
> +}
> +
> +static inline void kvm_pmd_populate(pmd_t *pmdp, pte_t *ptep)
> +{
> +	pmd_populate_kernel(NULL, pmdp, ptep);
> +}
> +
> +static inline void kvm_pud_populate(pud_t *pudp, pmd_t *pmdp)
> +{
> +	pud_populate(NULL, pudp, pmdp);
> +}
> +
> +static inline void kvm_pgd_populate(pgd_t *pgdp, pud_t *pudp)
> +{
> +	pgd_populate(NULL, pgdp, pudp);
> +}
> +
>  /*
>   * Unmapping vs dcache management:
>   *
> @@ -603,7 +630,7 @@ static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
>  				kvm_err("Cannot allocate Hyp pte\n");
>  				return -ENOMEM;
>  			}
> -			pmd_populate_kernel(NULL, pmd, pte);
> +			kvm_pmd_populate(pmd, pte);
>  			get_page(virt_to_page(pmd));
>  			kvm_flush_dcache_to_poc(pmd, sizeof(*pmd));
>  		}
> @@ -636,7 +663,7 @@ static int create_hyp_pud_mappings(pgd_t *pgd, unsigned long start,
>  				kvm_err("Cannot allocate Hyp pmd\n");
>  				return -ENOMEM;
>  			}
> -			pud_populate(NULL, pud, pmd);
> +			kvm_pud_populate(pud, pmd);
>  			get_page(virt_to_page(pud));
>  			kvm_flush_dcache_to_poc(pud, sizeof(*pud));
>  		}
> @@ -673,7 +700,7 @@ static int __create_hyp_mappings(pgd_t *pgdp, unsigned long ptrs_per_pgd,
>  				err = -ENOMEM;
>  				goto out;
>  			}
> -			pgd_populate(NULL, pgd, pud);
> +			kvm_pgd_populate(pgd, pud);
>  			get_page(virt_to_page(pgd));
>  			kvm_flush_dcache_to_poc(pgd, sizeof(*pgd));
>  		}
> @@ -1092,7 +1119,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>  		if (!cache)
>  			return 0; /* ignore calls from kvm_set_spte_hva */
>  		pte = mmu_memory_cache_alloc(cache);
> -		pmd_populate_kernel(NULL, pmd, pte);
> +		kvm_pmd_populate(pmd, pte);
>  		get_page(virt_to_page(pmd));
>  	}
>  
> -- 
> 2.17.1
> 

^ permalink raw reply

* [PATCH v2 5/6] KVM: arm/arm64: Stop using {pmd,pud,pgd}_populate
From: Mark Rutland @ 2018-05-31 12:01 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20180530124706.25284-6-marc.zyngier@arm.com>

On Wed, May 30, 2018 at 01:47:05PM +0100, Marc Zyngier wrote:
> The {pmd,pud,pgd}_populate accessors usage in the kernel have always
> been a bit weird in KVM. We don't have a struct mm to pass (and
> neither does the kernel most of the time, but still...), and
> the 32bit code has all kind of cache maintenance that doesn't make
> sense on ARMv7+ when MP extensions are mandatory (which is the
> case when the VEs are present).
> 
> Let's bite the bullet and provide our own implementations. The
> only bit of architectural code left has to do with building the table
> entry itself (arm64 having up to 52bit PA, arm lacking PUD level).
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h   | 4 ++++
>  arch/arm64/include/asm/kvm_mmu.h | 7 +++++++
>  virt/kvm/arm/mmu.c               | 8 +++++---
>  3 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 468ff945efa0..a94ef9833bd3 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -75,6 +75,10 @@ phys_addr_t kvm_get_idmap_vector(void);
>  int kvm_mmu_init(void);
>  void kvm_clear_hyp_idmap(void);
>  
> +#define kvm_mk_pmd(ptep)	__pmd(__pa(ptep) | PMD_TYPE_TABLE)
> +#define kvm_mk_pud(pmdp)	__pud(__pa(pmdp) | PMD_TYPE_TABLE)
> +#define kvm_mk_pgd(pudp)	({ BUILD_BUG(); 0; })

I can't remember how the folding logic works for ARM is a pgd entry the
entire pud table?

Assuming so:

Acked-by: Mark Rutland <mark.rutland@arm.com>

> +
>  static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
>  {
>  	pte_val(pte) |= L_PTE_S2_RDWR;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 26c89b63f604..22c9f7cfdf93 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -170,6 +170,13 @@ phys_addr_t kvm_get_idmap_vector(void);
>  int kvm_mmu_init(void);
>  void kvm_clear_hyp_idmap(void);
>  
> +#define kvm_mk_pmd(ptep)					\
> +	__pmd(__phys_to_pmd_val(__pa(ptep) | PMD_TYPE_TABLE))
> +#define kvm_mk_pud(pmdp)					\
> +	__pud(__phys_to_pud_val(__pa(pmdp) | PMD_TYPE_TABLE))
> +#define kvm_mk_pgd(pudp)					\
> +	__pgd(__phys_to_pgd_val(__pa(pudp) | PUD_TYPE_TABLE))
> +
>  static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
>  {
>  	pte_val(pte) |= PTE_S2_RDWR;
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index c9ed239c0840..ad1980d2118a 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -191,17 +191,19 @@ static inline void kvm_set_pmd(pmd_t *pmdp, pmd_t new_pmd)
>  
>  static inline void kvm_pmd_populate(pmd_t *pmdp, pte_t *ptep)
>  {
> -	pmd_populate_kernel(NULL, pmdp, ptep);
> +	kvm_set_pmd(pmdp, kvm_mk_pmd(ptep));
>  }
>  
>  static inline void kvm_pud_populate(pud_t *pudp, pmd_t *pmdp)
>  {
> -	pud_populate(NULL, pudp, pmdp);
> +	WRITE_ONCE(*pudp, kvm_mk_pud(pmdp));
> +	dsb(ishst);
>  }
>  
>  static inline void kvm_pgd_populate(pgd_t *pgdp, pud_t *pudp)
>  {
> -	pgd_populate(NULL, pgdp, pudp);
> +	WRITE_ONCE(*pgdp, kvm_mk_pgd(pudp));
> +	dsb(ishst);
>  }
>  
>  /*
> -- 
> 2.17.1
> 

^ permalink raw reply

* [PATCH v2 6/6] KVM: arm/arm64: Remove unnecessary CMOs when creating HYP page tables
From: Mark Rutland @ 2018-05-31 12:01 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20180530124706.25284-7-marc.zyngier@arm.com>

On Wed, May 30, 2018 at 01:47:06PM +0100, Marc Zyngier wrote:
> There is no need to perform cache maintenance operations when
> creating the HYP page tables if we have the multiprocessing
> extensions. ARMv7 mandates them with the virtualization support,
> and ARMv8 just mandates them unconditionally.
> 
> Let's remove these operations.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  virt/kvm/arm/mmu.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index ad1980d2118a..ccdf544d44c0 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -607,7 +607,6 @@ static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start,
>  		pte = pte_offset_kernel(pmd, addr);
>  		kvm_set_pte(pte, pfn_pte(pfn, prot));
>  		get_page(virt_to_page(pte));
> -		kvm_flush_dcache_to_poc(pte, sizeof(*pte));
>  		pfn++;
>  	} while (addr += PAGE_SIZE, addr != end);
>  }
> @@ -634,7 +633,6 @@ static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
>  			}
>  			kvm_pmd_populate(pmd, pte);
>  			get_page(virt_to_page(pmd));
> -			kvm_flush_dcache_to_poc(pmd, sizeof(*pmd));
>  		}
>  
>  		next = pmd_addr_end(addr, end);
> @@ -667,7 +665,6 @@ static int create_hyp_pud_mappings(pgd_t *pgd, unsigned long start,
>  			}
>  			kvm_pud_populate(pud, pmd);
>  			get_page(virt_to_page(pud));
> -			kvm_flush_dcache_to_poc(pud, sizeof(*pud));
>  		}
>  
>  		next = pud_addr_end(addr, end);
> @@ -704,7 +701,6 @@ static int __create_hyp_mappings(pgd_t *pgdp, unsigned long ptrs_per_pgd,
>  			}
>  			kvm_pgd_populate(pgd, pud);
>  			get_page(virt_to_page(pgd));
> -			kvm_flush_dcache_to_poc(pgd, sizeof(*pgd));
>  		}
>  
>  		next = pgd_addr_end(addr, end);
> -- 
> 2.17.1
> 

^ permalink raw reply

* [PATCH 0/2] arm64/drivers: avoid alloc memory on offline node
From: Xie XiuQi @ 2018-05-31 12:14 UTC (permalink / raw)
  To: linux-arm-kernel

A numa system may return node which is not online.
For example, a numa node:
1) without memory
2) NR_CPUS is very small, and the cpus on the node are not brought up

In this situation, we use NUMA_NO_NODE to avoid oops.

[   25.732905] Unable to handle kernel NULL pointer dereference at virtual address 00001988
[   25.740982] Mem abort info:
[   25.743762]   ESR = 0x96000005
[   25.746803]   Exception class = DABT (current EL), IL = 32 bits
[   25.752711]   SET = 0, FnV = 0
[   25.755751]   EA = 0, S1PTW = 0
[   25.758878] Data abort info:
[   25.761745]   ISV = 0, ISS = 0x00000005
[   25.765568]   CM = 0, WnR = 0
[   25.768521] [0000000000001988] user address but active_mm is swapper
[   25.774861] Internal error: Oops: 96000005 [#1] SMP
[   25.779724] Modules linked in:
[   25.782768] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc6-mpam+ #115
[   25.789714] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC UEFI Nemo 2.0 RC0 - B305 05/28/2018
[   25.798831] pstate: 80c00009 (Nzcv daif +PAN +UAO)
[   25.803612] pc : __alloc_pages_nodemask+0xf0/0xe70
[   25.808389] lr : __alloc_pages_nodemask+0x184/0xe70
[   25.813252] sp : ffff00000996f660
[   25.816553] x29: ffff00000996f660 x28: 0000000000000000
[   25.821852] x27: 00000000014012c0 x26: 0000000000000000
[   25.827150] x25: 0000000000000003 x24: ffff000008099eac
[   25.832449] x23: 0000000000400000 x22: 0000000000000000
[   25.837747] x21: 0000000000000001 x20: 0000000000000000
[   25.843045] x19: 0000000000400000 x18: 0000000000010e00
[   25.848343] x17: 000000000437f790 x16: 0000000000000020
[   25.853641] x15: 0000000000000000 x14: 6549435020524541
[   25.858939] x13: 20454d502067756c x12: 0000000000000000
[   25.864237] x11: ffff00000996f6f0 x10: 0000000000000006
[   25.869536] x9 : 00000000000012a4 x8 : ffff8023c000ff90
[   25.874834] x7 : 0000000000000000 x6 : ffff000008d73c08
[   25.880132] x5 : 0000000000000000 x4 : 0000000000000081
[   25.885430] x3 : 0000000000000000 x2 : 0000000000000000
[   25.890728] x1 : 0000000000000001 x0 : 0000000000001980
[   25.896027] Process swapper/0 (pid: 1, stack limit = 0x        (ptrval))
[   25.902712] Call trace:
[   25.905146]  __alloc_pages_nodemask+0xf0/0xe70
[   25.909577]  allocate_slab+0x94/0x590
[   25.913225]  new_slab+0x68/0xc8
[   25.916353]  ___slab_alloc+0x444/0x4f8
[   25.920088]  __slab_alloc+0x50/0x68
[   25.923562]  kmem_cache_alloc_node_trace+0xe8/0x230
[   25.928426]  pci_acpi_scan_root+0x94/0x278
[   25.932510]  acpi_pci_root_add+0x228/0x4b0
[   25.936593]  acpi_bus_attach+0x10c/0x218
[   25.940501]  acpi_bus_attach+0xac/0x218
[   25.944323]  acpi_bus_attach+0xac/0x218
[   25.948144]  acpi_bus_scan+0x5c/0xc0
[   25.951708]  acpi_scan_init+0xf8/0x254
[   25.955443]  acpi_init+0x310/0x37c
[   25.958831]  do_one_initcall+0x54/0x208
[   25.962653]  kernel_init_freeable+0x244/0x340
[   25.966999]  kernel_init+0x18/0x118
[   25.970474]  ret_from_fork+0x10/0x1c
[   25.974036] Code: 7100047f 321902a4 1a950095 b5000602 (b9400803)
[   25.980162] ---[ end trace 64f0893eb21ec283 ]---
[   25.984765] Kernel panic - not syncing: Fatal exception

Xie XiuQi (2):
  arm64: avoid alloc memory on offline node
  drivers: check numa node's online status in dev_to_node

 arch/arm64/kernel/pci.c | 3 +++
 include/linux/device.h  | 7 ++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH 1/2] arm64: avoid alloc memory on offline node
From: Xie XiuQi @ 2018-05-31 12:14 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1527768879-88161-1-git-send-email-xiexiuqi@huawei.com>

A numa system may return node which is not online.
For example, a numa node:
1) without memory
2) NR_CPUS is very small, and the cpus on the node are not brought up

In this situation, we use NUMA_NO_NODE to avoid oops.

[   25.732905] Unable to handle kernel NULL pointer dereference at virtual address 00001988
[   25.740982] Mem abort info:
[   25.743762]   ESR = 0x96000005
[   25.746803]   Exception class = DABT (current EL), IL = 32 bits
[   25.752711]   SET = 0, FnV = 0
[   25.755751]   EA = 0, S1PTW = 0
[   25.758878] Data abort info:
[   25.761745]   ISV = 0, ISS = 0x00000005
[   25.765568]   CM = 0, WnR = 0
[   25.768521] [0000000000001988] user address but active_mm is swapper
[   25.774861] Internal error: Oops: 96000005 [#1] SMP
[   25.779724] Modules linked in:
[   25.782768] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc6-mpam+ #115
[   25.789714] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC UEFI Nemo 2.0 RC0 - B305 05/28/2018
[   25.798831] pstate: 80c00009 (Nzcv daif +PAN +UAO)
[   25.803612] pc : __alloc_pages_nodemask+0xf0/0xe70
[   25.808389] lr : __alloc_pages_nodemask+0x184/0xe70
[   25.813252] sp : ffff00000996f660
[   25.816553] x29: ffff00000996f660 x28: 0000000000000000
[   25.821852] x27: 00000000014012c0 x26: 0000000000000000
[   25.827150] x25: 0000000000000003 x24: ffff000008099eac
[   25.832449] x23: 0000000000400000 x22: 0000000000000000
[   25.837747] x21: 0000000000000001 x20: 0000000000000000
[   25.843045] x19: 0000000000400000 x18: 0000000000010e00
[   25.848343] x17: 000000000437f790 x16: 0000000000000020
[   25.853641] x15: 0000000000000000 x14: 6549435020524541
[   25.858939] x13: 20454d502067756c x12: 0000000000000000
[   25.864237] x11: ffff00000996f6f0 x10: 0000000000000006
[   25.869536] x9 : 00000000000012a4 x8 : ffff8023c000ff90
[   25.874834] x7 : 0000000000000000 x6 : ffff000008d73c08
[   25.880132] x5 : 0000000000000000 x4 : 0000000000000081
[   25.885430] x3 : 0000000000000000 x2 : 0000000000000000
[   25.890728] x1 : 0000000000000001 x0 : 0000000000001980
[   25.896027] Process swapper/0 (pid: 1, stack limit = 0x        (ptrval))
[   25.902712] Call trace:
[   25.905146]  __alloc_pages_nodemask+0xf0/0xe70
[   25.909577]  allocate_slab+0x94/0x590
[   25.913225]  new_slab+0x68/0xc8
[   25.916353]  ___slab_alloc+0x444/0x4f8
[   25.920088]  __slab_alloc+0x50/0x68
[   25.923562]  kmem_cache_alloc_node_trace+0xe8/0x230
[   25.928426]  pci_acpi_scan_root+0x94/0x278
[   25.932510]  acpi_pci_root_add+0x228/0x4b0
[   25.936593]  acpi_bus_attach+0x10c/0x218
[   25.940501]  acpi_bus_attach+0xac/0x218
[   25.944323]  acpi_bus_attach+0xac/0x218
[   25.948144]  acpi_bus_scan+0x5c/0xc0
[   25.951708]  acpi_scan_init+0xf8/0x254
[   25.955443]  acpi_init+0x310/0x37c
[   25.958831]  do_one_initcall+0x54/0x208
[   25.962653]  kernel_init_freeable+0x244/0x340
[   25.966999]  kernel_init+0x18/0x118
[   25.970474]  ret_from_fork+0x10/0x1c
[   25.974036] Code: 7100047f 321902a4 1a950095 b5000602 (b9400803)
[   25.980162] ---[ end trace 64f0893eb21ec283 ]---
[   25.984765] Kernel panic - not syncing: Fatal exception

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Tested-by: Huiqiang Wang <wanghuiqiang@huawei.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Tomasz Nowicki <Tomasz.Nowicki@caviumnetworks.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
---
 arch/arm64/kernel/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index 0e2ea1c..e17cc45 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -170,6 +170,9 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
 	struct pci_bus *bus, *child;
 	struct acpi_pci_root_ops *root_ops;
 
+	if (node != NUMA_NO_NODE && !node_online(node))
+		node = NUMA_NO_NODE;
+
 	ri = kzalloc_node(sizeof(*ri), GFP_KERNEL, node);
 	if (!ri)
 		return NULL;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 2/2] drivers: check numa node's online status in dev_to_node
From: Xie XiuQi @ 2018-05-31 12:14 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1527768879-88161-1-git-send-email-xiexiuqi@huawei.com>

If dev->numa_node is not available (or offline), we should
return NUMA_NO_NODE to prevent alloc memory on offline
nodes, which could cause oops.

For example, a numa node:
1) without memory
2) NR_CPUS is very small, and the cpus on the node are not brought up

[   27.851041] Unable to handle kernel NULL pointer dereference at virtual address 00001988
[   27.859128] Mem abort info:
[   27.861908]   ESR = 0x96000005
[   27.864949]   Exception class = DABT (current EL), IL = 32 bits
[   27.870860]   SET = 0, FnV = 0
[   27.873900]   EA = 0, S1PTW = 0
[   27.877029] Data abort info:
[   27.879895]   ISV = 0, ISS = 0x00000005
[   27.883716]   CM = 0, WnR = 0
[   27.886673] [0000000000001988] user address but active_mm is swapper
[   27.893012] Internal error: Oops: 96000005 [#1] SMP
[   27.897876] Modules linked in:
[   27.900919] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc6-mpam+ #116
[   27.907865] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC UEFI Nemo 2.0 RC0 - B306 05/28/2018
[   27.916983] pstate: 80c00009 (Nzcv daif +PAN +UAO)
[   27.921763] pc : __alloc_pages_nodemask+0xf0/0xe70
[   27.926540] lr : __alloc_pages_nodemask+0x184/0xe70
[   27.931403] sp : ffff00000996f7e0
[   27.934704] x29: ffff00000996f7e0 x28: ffff000008cb10a0
[   27.940003] x27: 00000000014012c0 x26: 0000000000000000
[   27.945301] x25: 0000000000000003 x24: ffff0000085bbc14
[   27.950600] x23: 0000000000400000 x22: 0000000000000000
[   27.955898] x21: 0000000000000001 x20: 0000000000000000
[   27.961196] x19: 0000000000400000 x18: 0000000000000f00
[   27.966494] x17: 00000000003bff88 x16: 0000000000000020
[   27.971792] x15: 000000000000003b x14: ffffffffffffffff
[   27.977090] x13: ffffffffffff0000 x12: 0000000000000030
[   27.982388] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[   27.987686] x9 : 2e64716e622e7364 x8 : 7f7f7f7f7f7f7f7f
[   27.992984] x7 : 0000000000000000 x6 : ffff000008d73c08
[   27.998282] x5 : 0000000000000000 x4 : 0000000000000081
[   28.003580] x3 : 0000000000000000 x2 : 0000000000000000
[   28.008878] x1 : 0000000000000001 x0 : 0000000000001980
[   28.014177] Process swapper/0 (pid: 1, stack limit = 0x        (ptrval))
[   28.020863] Call trace:
[   28.023296]  __alloc_pages_nodemask+0xf0/0xe70
[   28.027727]  allocate_slab+0x94/0x590
[   28.031374]  new_slab+0x68/0xc8
[   28.034502]  ___slab_alloc+0x444/0x4f8
[   28.038237]  __slab_alloc+0x50/0x68
[   28.041713]  __kmalloc_node_track_caller+0x100/0x320
[   28.046664]  devm_kmalloc+0x3c/0x90
[   28.050139]  pinctrl_bind_pins+0x4c/0x298
[   28.054135]  driver_probe_device+0xb4/0x4a0
[   28.058305]  __driver_attach+0x124/0x128
[   28.062213]  bus_for_each_dev+0x78/0xe0
[   28.066035]  driver_attach+0x30/0x40
[   28.069597]  bus_add_driver+0x248/0x2b8
[   28.073419]  driver_register+0x68/0x100
[   28.077242]  __pci_register_driver+0x64/0x78
[   28.081500]  pcie_portdrv_init+0x44/0x4c
[   28.085410]  do_one_initcall+0x54/0x208
[   28.089232]  kernel_init_freeable+0x244/0x340
[   28.093577]  kernel_init+0x18/0x118
[   28.097052]  ret_from_fork+0x10/0x1c
[   28.100614] Code: 7100047f 321902a4 1a950095 b5000602 (b9400803)
[   28.106740] ---[ end trace e32df44e6e1c3a4b ]---

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Tested-by: Huiqiang Wang <wanghuiqiang@huawei.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Tomasz Nowicki <Tomasz.Nowicki@caviumnetworks.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
---
 include/linux/device.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 4779569..2a4fb08 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1017,7 +1017,12 @@ extern __printf(2, 3)
 #ifdef CONFIG_NUMA
 static inline int dev_to_node(struct device *dev)
 {
-	return dev->numa_node;
+	int node = dev->numa_node;
+
+	if (unlikely(node != NUMA_NO_NODE && !node_online(node)))
+		return NUMA_NO_NODE;
+
+	return node;
 }
 static inline void set_dev_node(struct device *dev, int node)
 {
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v2 1/6] arm64: KVM: Add support for Stage-2 control of memory types and cacheability
From: Marc Zyngier @ 2018-05-31 12:38 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20180531114938.2w4sbi2nkvqyc3jt@lakrids.cambridge.arm.com>

On 31/05/18 12:49, Mark Rutland wrote:
> On Wed, May 30, 2018 at 01:47:01PM +0100, Marc Zyngier wrote:
>> Up to ARMv8.3, the combinaison of Stage-1 and Stage-2 attributes
>> results in the strongest attribute of the two stages.  This means
>> that the hypervisor has to perform quite a lot of cache maintenance
>> just in case the guest has some non-cacheable mappings around.
>>
>> ARMv8.4 solves this problem by offering a different mode (FWB) where
>> Stage-2 has total control over the memory attribute (this is limited
>> to systems where both I/O and instruction caches are coherent with
> 
> s/caches/fetches/ -- the I-caches themselves aren't coherent with the
> D-caches (or we could omit I-cache maintenance).
> 
> i.e. this implies IDC, but not DIC.

It may imply IDC behaviour, but not quite IDC itself. I agree, this
looks dodgy. I've asked for clarification on the spec.

> 
>> the dcache). This is achieved by having a different set of memory
>> attributes in the page tables, and a new bit set in HCR_EL2.
>>
>> On such a system, we can then safely sidestep any form of dcache
>> management.
>>
>> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
> 
>>  static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>>  {
>> @@ -268,7 +269,10 @@ static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
>>  {
>>  	void *va = page_address(pfn_to_page(pfn));
>>  
>> -	kvm_flush_dcache_to_poc(va, size);
>> +	if (!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
>> +		kvm_flush_dcache_to_poc(va, size);
>> +	else
>> +		kvm_flush_dcache_to_pou(va, size);
>>  }
> 
> Te commit message said instruction fetches were coherent, and that no
> D-cache maintenance was necessary, so why do we need maintenance to the
> PoU?

That maintenance will be elided if we actually have IDC set. I'm happy
to drop it once I have confirmation that we have an identical behaviour.

> 
>> +static void cpu_has_fwb(const struct arm64_cpu_capabilities *__unused)
>> +{
>> +	u64 val = read_sysreg_s(SYS_CLIDR_EL1);
>> +
>> +	/* Check that CLIDR_EL1.LOU{U,IS} are both 0 */
>> +	WARN_ON(val & (7 << 27 | 7 << 21));
>> +}
> 
> What about CTR_EL0.IDC?

Again, that depends on whether FWB implies IDC or not.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply

* [PATCH v1 0/2] Add NOTIFY_SEI notification type support
From: Dongjiu Geng @ 2018-05-31 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

This series patch is separated from https://www.spinics.net/lists/kvm/msg168917.html

1. CPI 6.1 adds support for NOTIFY_SEI as a GHES notification mechanism, so this patch supports this
   notification in software

Dongjiu Geng (2):
  ACPI / APEI: Add SEI notification type support for ARMv8
  arm64: handle NOTIFY_SEI notification by the APEI driver

 arch/arm64/kernel/traps.c | 15 ++++++++++++++
 drivers/acpi/apei/Kconfig | 15 ++++++++++++++
 drivers/acpi/apei/ghes.c  | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/ghes.h       |  1 +
 4 files changed, 84 insertions(+)

-- 
1.9.1

^ permalink raw reply

* [PATCH v1 1/2] ACPI / APEI: Add SEI notification type support for ARMv8
From: Dongjiu Geng @ 2018-05-31 12:41 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1527770506-8076-1-git-send-email-gengdongjiu@huawei.com>

ACPI 6.x adds support for NOTIFY_SEI as a GHES notification
mechanism, so add new GHES notification handling functions.
Expose API ghes_notify_sei() to arch code, arch code will call
this API when it gets this NOTIFY_SEI.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
Note:
Firmware will follow the SError mask rule, if the SError is masked,
the firmware will not deliver NOTIFY_SEI notification.
---
 drivers/acpi/apei/Kconfig | 15 ++++++++++++++
 drivers/acpi/apei/ghes.c  | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/ghes.h       |  1 +
 3 files changed, 69 insertions(+)

diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 52ae543..ff4afc3 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -55,6 +55,21 @@ config ACPI_APEI_SEA
 	  option allows the OS to look for such hardware error record, and
 	  take appropriate action.
 
+config ACPI_APEI_SEI
+	bool "APEI SError(System Error) Interrupt logging/recovering support"
+	depends on ARM64 && ACPI_APEI_GHES
+	default y
+	help
+	  This option should be enabled if the system supports
+	  firmware first handling of SEI (SError interrupt).
+
+	  SEI happens with asynchronous external abort for errors on device
+	  memory reads on ARMv8 systems. If a system supports firmware first
+	  handling of SEI, the platform analyzes and handles hardware error
+	  notifications from SEI, and it may then form a hardware error record for
+	  the OS to parse and handle. This option allows the OS to look for
+	  such hardware error record, and take appropriate action.
+
 config ACPI_APEI_MEMORY_FAILURE
 	bool "APEI memory error recovering support"
 	depends on ACPI_APEI && MEMORY_FAILURE
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1efefe9..33f77ae 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -827,6 +827,46 @@ static inline void ghes_sea_add(struct ghes *ghes) { }
 static inline void ghes_sea_remove(struct ghes *ghes) { }
 #endif /* CONFIG_ACPI_APEI_SEA */
 
+#ifdef CONFIG_ACPI_APEI_SEI
+static LIST_HEAD(ghes_sei);
+
+/*
+ * Return 0 only if one of the SEI error sources successfully reported an error
+ * record sent from the firmware.
+ */
+int ghes_notify_sei(void)
+{
+	struct ghes *ghes;
+	int ret = -ENOENT;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ghes, &ghes_sei, list) {
+		if (!ghes_proc(ghes))
+			ret = 0;
+	}
+	rcu_read_unlock();
+	return ret;
+}
+
+static void ghes_sei_add(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_add_rcu(&ghes->list, &ghes_sei);
+	mutex_unlock(&ghes_list_mutex);
+}
+
+static void ghes_sei_remove(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_del_rcu(&ghes->list);
+	mutex_unlock(&ghes_list_mutex);
+	synchronize_rcu();
+}
+#else /* CONFIG_ACPI_APEI_SEI */
+static inline void ghes_sei_add(struct ghes *ghes) { }
+static inline void ghes_sei_remove(struct ghes *ghes) { }
+#endif /* CONFIG_ACPI_APEI_SEI */
+
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
  * printk is not safe in NMI context.  So in NMI handler, we allocate
@@ -1055,6 +1095,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 			goto err;
 		}
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		if (!IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEI is not supported!\n",
+				generic->header.source_id);
+		goto err;
+	}
+	break;
 	case ACPI_HEST_NOTIFY_NMI:
 		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
 			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1126,6 +1173,9 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_SEA:
 		ghes_sea_add(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		ghes_sei_add(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
@@ -1179,6 +1229,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_SEA:
 		ghes_sea_remove(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		ghes_sei_remove(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 8feb0c8..9ba59e2 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -120,5 +120,6 @@ static inline void *acpi_hest_get_next(struct acpi_hest_generic_data *gdata)
 	     section = acpi_hest_get_next(section))
 
 int ghes_notify_sea(void);
+int ghes_notify_sei(void);
 
 #endif /* GHES_H */
-- 
1.9.1

^ permalink raw reply related

* [PATCH v1 2/2] arm64: handle NOTIFY_SEI notification by the APEI driver
From: Dongjiu Geng @ 2018-05-31 12:41 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1527770506-8076-1-git-send-email-gengdongjiu@huawei.com>

When kernel or KVM gets the NOTIFY_SEI notification, it firstly
calls the APEI driver to handle this notification.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
 arch/arm64/kernel/traps.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
---
change since https://www.spinics.net/lists/kvm/msg168919.html

1. Remove the handle_guest_sei() helper


diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 8bbdc17..676e40c 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -50,6 +50,7 @@
 #include <asm/exception.h>
 #include <asm/system_misc.h>
 #include <asm/sysreg.h>
+#include <acpi/ghes.h>
 
 static const char *handler[]= {
 	"Synchronous Abort",
@@ -701,6 +702,20 @@ void __noreturn arm64_serror_panic(struct pt_regs *regs, u32 esr)
 bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned int esr)
 {
 	u32 aet = arm64_ras_serror_get_severity(esr);
+	int ret = -ENOENT;
+
+	if (IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
+		if (interrupts_enabled(regs))
+			nmi_enter();
+
+		ret = ghes_notify_sei();
+
+		if (interrupts_enabled(regs))
+			nmi_exit();
+
+		if (!ret)
+			return false;
+	}
 
 	switch (aet) {
 	case ESR_ELx_AET_CE:	/* corrected error */
-- 
1.9.1

^ permalink raw reply related

* [PATCH v3 4/5] PM / Domains: Add support for multi PM domains per device to genpd
From: Jon Hunter @ 2018-05-31 12:47 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <5fc1d3ee51c8dbc264fb21edf33a879c7db4056b.camel@lynxeye.de>


On 31/05/18 12:40, Lucas Stach wrote:
> Hi Ulf,
> 
> Am Donnerstag, den 31.05.2018, 12:59 +0200 schrieb Ulf Hansson:
>> To support devices being partitioned across multiple PM domains, let's
>> begin with extending genpd to cope with these kind of configurations.
>>
>> Therefore, add a new exported function genpd_dev_pm_attach_by_id(), which
>> is similar to the existing genpd_dev_pm_attach(), but with the difference
>> that it allows its callers to provide an index to the PM domain that it
>> wants to attach.
>>
>> Note that, genpd_dev_pm_attach_by_id() shall only be called by the driver
>> core / PM core, similar to how the existing dev_pm_domain_attach() makes
>> use of genpd_dev_pm_attach(). However, this is implemented by following
>> changes on top.
> 
> by_id() APIs are not really intuitive to use for driver writers. Other
> subsystems have solved this by providing a "-names" property to give
> the phandles a bit more meaning and then providing a by_name API. I
> would really appreciate if PM domains could move in the same direction.

As discussed here [0], there are plans to add that.

Cheers
Jon

[0] https://patchwork.ozlabs.org/patch/921938/

-- 
nvpublic

^ permalink raw reply

* [PATCH 0/7] sunxi: Add DT representation for the MBUS controller
From: Maxime Ripard @ 2018-05-31 12:52 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20180409092229.ljcnsqgv7wh2s4op@flea>

On Mon, Apr 09, 2018 at 11:22:29AM +0200, Maxime Ripard wrote:
> Hi Rob,
> 
> On Tue, Apr 03, 2018 at 11:03:30AM -0500, Rob Herring wrote:
> > On Tue, Apr 3, 2018 at 8:29 AM, Maxime Ripard <maxime.ripard@bootlin.com> wrote:
> > > Hi,
> > >
> > > We've had for quite some time to hack around in our drivers to take into
> > > account the fact that our DMA accesses are not done through the parent
> > > node, but through another bus with a different mapping than the CPU for the
> > > RAM (0 instead of 0x40000000 for most SoCs).
> > >
> > > After some discussion after the submission of a camera device suffering of
> > > the same hacks, I've decided to put together a serie that introduce a
> > > property called dma-parent that allows to express the DMA relationship
> > > between a master and its bus, even if they are not direct parents in the DT.
> > 
> > Reading thru v6 of the camera driver, it seems like having
> > intermediate buses would solve the problem in your case?
> 
> I guess it would yes, but I guess it wouldn't model the hardware
> properly since this seems to be really a bus only meant to do DMA, and
> you're not accessing the registers of the device through that bus.
> 
> And as far as I know, the DT implies that the topology is the one of
> the "control" side of the devices.
> 
> We'll also need eventually to have retrieve the MBUS endpoints ID to
> be able to support perf and PM QoS properly.
> 
> > As Arnd mentioned in that thread, something new needs to address all
> > the deficiencies with dma-ranges and describing DMA bus topologies.
> > This doesn't address the needs of describing bus interconnects.
> > There's been some efforts by the QCom folks with an interconnect
> > binding. They've mostly punted (for now at least) to not describing
> > the whole interconnect in DT and keeping the details in a driver.
> 
> Is it that patch serie? https://lkml.org/lkml/2018/3/9/856
> 
> > On the flip side, this does mirror the established pattern used by
> > interrupts, so maybe it's okay on it's own. I'll wait for others to
> > comment.
> 
> We'll see how it turns out then :)

Ping?

How should we move forward on this?

Maxime


-- 
Maxime Ripard, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180531/dc5709d9/attachment.sig>

^ permalink raw reply

* [PATCH v2 2/6] arm64: KVM: Handle Set/Way CMOs as NOPs if FWB is present
From: Marc Zyngier @ 2018-05-31 13:00 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20180531115127.2ymmtlwemz6g5qzj@lakrids.cambridge.arm.com>

On 31/05/18 12:51, Mark Rutland wrote:
> On Wed, May 30, 2018 at 01:47:02PM +0100, Marc Zyngier wrote:
>> Set/Way handling is one of the ugliest corners of KVM. We shouldn't
>> have to handle that, but better safe than sorry.
>>
>> Thankfully, FWB fixes this for us by not requiering any maintenance
>> whatsoever, which means we don't have to emulate S/W CMOs, and don't
>> have to track VM ops either.
>>
>> We still have to trap S/W though, if only to prevent the guest from
>> doing something bad.
> 
> S/W ops *also* do I-cache maintenance, so we'd still need to emulate
> that. Though it looks like we're missing that today...

This doesn't look right: CSSELR_EL1 does indeed have an InD bit, but
that's only for the purpose of reading CSSIDR_EL1. DC CSW and co
directly take a level *without* the InD bit, and seem to be limited to
"data and unified cache".

Am I missing something?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply

* [PATCH 1/7] iommu/dma: fix trival coding style mistake
From: Robin Murphy @ 2018-05-31 13:03 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1527752569-18020-2-git-send-email-thunder.leizhen@huawei.com>

On 31/05/18 08:42, Zhen Lei wrote:
> The static function iova_reserve_iommu_regions is only called by function
> iommu_dma_init_domain, and the 'if (!dev)' check in iommu_dma_init_domain
> affect it only, so we can safely move the check into it. I think it looks
> more natural.

As before, I disagree - the logic of iommu_dma_init_domain() is "we 
expect to have a valid device, but stop here if we don't", and moving 
the check just needlessly obfuscates that. It is not a coincidence that 
the arguments of both functions are in effective order of importance.

> In addition, the local variable 'ret' is only assigned in the branch of
> 'if (region->type == IOMMU_RESV_MSI)', so the 'if (ret)' should also only
> take effect in the branch, add a brace to enclose it.

'ret' is clearly also assigned at its declaration, to cover the (very 
likely) case where we don't enter the loop at all. Thus testing it in 
the loop is harmless, and cluttering that up with extra tabs and braces 
is just noise.

Robin.

> No functional changes.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   drivers/iommu/dma-iommu.c | 12 +++++++-----
>   1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index ddcbbdb..4e885f7 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -231,6 +231,9 @@ static int iova_reserve_iommu_regions(struct device *dev,
>   	LIST_HEAD(resv_regions);
>   	int ret = 0;
>   
> +	if (!dev)
> +		return 0;
> +
>   	if (dev_is_pci(dev))
>   		iova_reserve_pci_windows(to_pci_dev(dev), iovad);
>   
> @@ -246,11 +249,12 @@ static int iova_reserve_iommu_regions(struct device *dev,
>   		hi = iova_pfn(iovad, region->start + region->length - 1);
>   		reserve_iova(iovad, lo, hi);
>   
> -		if (region->type == IOMMU_RESV_MSI)
> +		if (region->type == IOMMU_RESV_MSI) {
>   			ret = cookie_init_hw_msi_region(cookie, region->start,
>   					region->start + region->length);
> -		if (ret)
> -			break;
> +			if (ret)
> +				break;
> +		}
>   	}
>   	iommu_put_resv_regions(dev, &resv_regions);
>   
> @@ -308,8 +312,6 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>   	}
>   
>   	init_iova_domain(iovad, 1UL << order, base_pfn);
> -	if (!dev)
> -		return 0;
>   
>   	return iova_reserve_iommu_regions(dev, domain);
>   }
> 

^ permalink raw reply

* [PATCH 3/7] iommu: prepare for the non-strict mode support
From: Robin Murphy @ 2018-05-31 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1527752569-18020-4-git-send-email-thunder.leizhen@huawei.com>

On 31/05/18 08:42, Zhen Lei wrote:
> In common, a IOMMU unmap operation follow the below steps:
> 1. remove the mapping in page table of the specified iova range
> 2. execute tlbi command to invalid the mapping which is cached in TLB
> 3. wait for the above tlbi operation to be finished
> 4. free the IOVA resource
> 5. free the physical memory resource
> 
> This maybe a problem when unmap is very frequently, the combination of tlbi
> and wait operation will consume a lot of time. A feasible method is put off
> tlbi and iova-free operation, when accumulating to a certain number or
> reaching a specified time, execute only one tlbi_all command to clean up
> TLB, then free the backup IOVAs. Mark as non-strict mode.
> 
> But it must be noted that, although the mapping has already been removed in
> the page table, it maybe still exist in TLB. And the freed physical memory
> may also be reused for others. So a attacker can persistent access to memory
> based on the just freed IOVA, to obtain sensible data or corrupt memory. So
> the VFIO should always choose the strict mode.
> 
> This patch just add a new parameter for the unmap operation, to help the
> upper functions capable choose which mode to be applied.

This seems like it might be better handled by a flag in 
io_pgtable_cfg->quirks. This interface change on its own looks rather 
invasive, and teh fact that it ends up only being used to pass through a 
constant property of the domain (which is already known by the point 
io_pgtable_alloc() is called) implies that it is indeed the wrong level 
of abstraction.

> No functional changes.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   drivers/iommu/arm-smmu-v3.c        | 2 +-
>   drivers/iommu/arm-smmu.c           | 2 +-
>   drivers/iommu/io-pgtable-arm-v7s.c | 6 +++---
>   drivers/iommu/io-pgtable-arm.c     | 6 +++---
>   drivers/iommu/io-pgtable.h         | 2 +-
>   drivers/iommu/ipmmu-vmsa.c         | 2 +-
>   drivers/iommu/msm_iommu.c          | 2 +-
>   drivers/iommu/mtk_iommu.c          | 2 +-
>   drivers/iommu/qcom_iommu.c         | 2 +-
>   include/linux/iommu.h              | 2 ++

Plus things specific to io-pgtable shouldn't really be spilling into the 
core API header either.

Robin.

>   10 files changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 4402187..59b3387 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1767,7 +1767,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>   	if (!ops)
>   		return 0;
>   
> -	return ops->unmap(ops, iova, size);
> +	return ops->unmap(ops, iova, size, IOMMU_STRICT);
>   }
>   
>   static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 69e7c60..253e807 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1249,7 +1249,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>   	if (!ops)
>   		return 0;
>   
> -	return ops->unmap(ops, iova, size);
> +	return ops->unmap(ops, iova, size, IOMMU_STRICT);
>   }
>   
>   static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 10e4a3d..799eced 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -658,7 +658,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data,
>   }
>   
>   static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
> -			    size_t size)
> +			    size_t size, int strict)
>   {
>   	struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
>   
> @@ -883,7 +883,7 @@ static int __init arm_v7s_do_selftests(void)
>   	size = 1UL << __ffs(cfg.pgsize_bitmap);
>   	while (i < loopnr) {
>   		iova_start = i * SZ_16M;
> -		if (ops->unmap(ops, iova_start + size, size) != size)
> +		if (ops->unmap(ops, iova_start + size, size, IOMMU_STRICT) != size)
>   			return __FAIL(ops);
>   
>   		/* Remap of partial unmap */
> @@ -902,7 +902,7 @@ static int __init arm_v7s_do_selftests(void)
>   	while (i != BITS_PER_LONG) {
>   		size = 1UL << i;
>   
> -		if (ops->unmap(ops, iova, size) != size)
> +		if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
>   			return __FAIL(ops);
>   
>   		if (ops->iova_to_phys(ops, iova + 42))
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 39c2a05..e0f52db 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -624,7 +624,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>   }
>   
>   static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
> -			     size_t size)
> +			     size_t size, int strict)
>   {
>   	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>   	arm_lpae_iopte *ptep = data->pgd;
> @@ -1108,7 +1108,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>   
>   		/* Partial unmap */
>   		size = 1UL << __ffs(cfg->pgsize_bitmap);
> -		if (ops->unmap(ops, SZ_1G + size, size) != size)
> +		if (ops->unmap(ops, SZ_1G + size, size, IOMMU_STRICT) != size)
>   			return __FAIL(ops, i);
>   
>   		/* Remap of partial unmap */
> @@ -1124,7 +1124,7 @@ static int __init arm_lpae_run_tests(struct io_pgtable_cfg *cfg)
>   		while (j != BITS_PER_LONG) {
>   			size = 1UL << j;
>   
> -			if (ops->unmap(ops, iova, size) != size)
> +			if (ops->unmap(ops, iova, size, IOMMU_STRICT) != size)
>   				return __FAIL(ops, i);
>   
>   			if (ops->iova_to_phys(ops, iova + 42))
> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> index 2df7909..2908806 100644
> --- a/drivers/iommu/io-pgtable.h
> +++ b/drivers/iommu/io-pgtable.h
> @@ -120,7 +120,7 @@ struct io_pgtable_ops {
>   	int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
>   		   phys_addr_t paddr, size_t size, int prot);
>   	size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
> -			size_t size);
> +			size_t size, int strict);
>   	phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
>   				    unsigned long iova);
>   };
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index 40ae6e8..e6d9e11 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -716,7 +716,7 @@ static size_t ipmmu_unmap(struct iommu_domain *io_domain, unsigned long iova,
>   {
>   	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
>   
> -	return domain->iop->unmap(domain->iop, iova, size);
> +	return domain->iop->unmap(domain->iop, iova, size, IOMMU_STRICT);
>   }
>   
>   static void ipmmu_iotlb_sync(struct iommu_domain *io_domain)
> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> index 0d33504..180fa3d 100644
> --- a/drivers/iommu/msm_iommu.c
> +++ b/drivers/iommu/msm_iommu.c
> @@ -532,7 +532,7 @@ static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>   	unsigned long flags;
>   
>   	spin_lock_irqsave(&priv->pgtlock, flags);
> -	len = priv->iop->unmap(priv->iop, iova, len);
> +	len = priv->iop->unmap(priv->iop, iova, len, IOMMU_STRICT);
>   	spin_unlock_irqrestore(&priv->pgtlock, flags);
>   
>   	return len;
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index f2832a1..54661ed 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -386,7 +386,7 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
>   	size_t unmapsz;
>   
>   	spin_lock_irqsave(&dom->pgtlock, flags);
> -	unmapsz = dom->iop->unmap(dom->iop, iova, size);
> +	unmapsz = dom->iop->unmap(dom->iop, iova, size, IOMMU_STRICT);
>   	spin_unlock_irqrestore(&dom->pgtlock, flags);
>   
>   	return unmapsz;
> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> index 65b9c99..90abde1 100644
> --- a/drivers/iommu/qcom_iommu.c
> +++ b/drivers/iommu/qcom_iommu.c
> @@ -444,7 +444,7 @@ static size_t qcom_iommu_unmap(struct iommu_domain *domain, unsigned long iova,
>   	 */
>   	pm_runtime_get_sync(qcom_domain->iommu->dev);
>   	spin_lock_irqsave(&qcom_domain->pgtbl_lock, flags);
> -	ret = ops->unmap(ops, iova, size);
> +	ret = ops->unmap(ops, iova, size, IOMMU_STRICT);
>   	spin_unlock_irqrestore(&qcom_domain->pgtbl_lock, flags);
>   	pm_runtime_put_sync(qcom_domain->iommu->dev);
>   
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 19938ee..39b3150 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -86,6 +86,8 @@ struct iommu_domain_geometry {
>   #define IOMMU_DOMAIN_DMA	(__IOMMU_DOMAIN_PAGING |	\
>   				 __IOMMU_DOMAIN_DMA_API)
>   
> +#define IOMMU_STRICT		1
> +
>   struct iommu_domain {
>   	unsigned type;
>   	const struct iommu_ops *ops;
> 

^ permalink raw reply

* [PATCH 4/7] iommu/amd: make sure TLB to be flushed before IOVA freed
From: Robin Murphy @ 2018-05-31 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1527752569-18020-5-git-send-email-thunder.leizhen@huawei.com>

On 31/05/18 08:42, Zhen Lei wrote:
> Although the mapping has already been removed in the page table, it maybe
> still exist in TLB. Suppose the freed IOVAs is reused by others before the
> flush operation completed, the new user can not correctly access to its
> meomory.

This change seems reasonable in isolation, but why is it right in the 
middle of a series which has nothing to do with x86?

Robin.

> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   drivers/iommu/amd_iommu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 8fb8c73..93aa389 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2402,9 +2402,9 @@ static void __unmap_single(struct dma_ops_domain *dma_dom,
>   	}
>   
>   	if (amd_iommu_unmap_flush) {
> -		dma_ops_free_iova(dma_dom, dma_addr, pages);
>   		domain_flush_tlb(&dma_dom->domain);
>   		domain_flush_complete(&dma_dom->domain);
> +		dma_ops_free_iova(dma_dom, dma_addr, pages);
>   	} else {
>   		pages = __roundup_pow_of_two(pages);
>   		queue_iova(&dma_dom->iovad, dma_addr >> PAGE_SHIFT, pages, 0);
> 

^ permalink raw reply

* [PATCH 5/7] iommu/dma: add support for non-strict mode
From: Robin Murphy @ 2018-05-31 13:04 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1527752569-18020-6-git-send-email-thunder.leizhen@huawei.com>

On 31/05/18 08:42, Zhen Lei wrote:
> 1. Save the related domain pointer in struct iommu_dma_cookie, make iovad
>     capable call domain->ops->flush_iotlb_all to flush TLB.
> 2. Define a new iommu capable: IOMMU_CAP_NON_STRICT, which used to indicate
>     that the iommu domain support non-strict mode.
> 3. During the iommu domain initialization phase, call capable() to check
>     whether it support non-strcit mode. If so, call init_iova_flush_queue
>     to register iovad->flush_cb callback.
> 4. All unmap(contains iova-free) APIs will finally invoke __iommu_dma_unmap
>     -->iommu_dma_free_iova. Use iovad->flush_cb to check whether its related
>     iommu support non-strict mode or not, and call IOMMU_DOMAIN_IS_STRICT to
>     make sure the IOMMU_DOMAIN_UNMANAGED domain always follow strict mode.

Once again, this is a whole load of complexity for a property which 
could just be statically encoded at allocation, e.g. in the cookie type.

> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>   drivers/iommu/dma-iommu.c | 29 ++++++++++++++++++++++++++---
>   include/linux/iommu.h     |  3 +++
>   2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 4e885f7..2e116d9 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -55,6 +55,7 @@ struct iommu_dma_cookie {
>   	};
>   	struct list_head		msi_page_list;
>   	spinlock_t			msi_lock;
> +	struct iommu_domain		*domain;
>   };
>   
>   static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
> @@ -64,7 +65,8 @@ static inline size_t cookie_msi_granule(struct iommu_dma_cookie *cookie)
>   	return PAGE_SIZE;
>   }
>   
> -static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
> +static struct iommu_dma_cookie *cookie_alloc(struct iommu_domain *domain,
> +					     enum iommu_dma_cookie_type type)
>   {
>   	struct iommu_dma_cookie *cookie;
>   
> @@ -73,6 +75,7 @@ static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)
>   		spin_lock_init(&cookie->msi_lock);
>   		INIT_LIST_HEAD(&cookie->msi_page_list);
>   		cookie->type = type;
> +		cookie->domain = domain;
>   	}
>   	return cookie;
>   }
> @@ -94,7 +97,7 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
>   	if (domain->iova_cookie)
>   		return -EEXIST;
>   
> -	domain->iova_cookie = cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
> +	domain->iova_cookie = cookie_alloc(domain, IOMMU_DMA_IOVA_COOKIE);
>   	if (!domain->iova_cookie)
>   		return -ENOMEM;
>   
> @@ -124,7 +127,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
>   	if (domain->iova_cookie)
>   		return -EEXIST;
>   
> -	cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
> +	cookie = cookie_alloc(domain, IOMMU_DMA_MSI_COOKIE);
>   	if (!cookie)
>   		return -ENOMEM;
>   
> @@ -261,6 +264,17 @@ static int iova_reserve_iommu_regions(struct device *dev,
>   	return ret;
>   }
>   
> +static void iova_flush_iotlb_all(struct iova_domain *iovad)

iommu_dma_flush...

> +{
> +	struct iommu_dma_cookie *cookie;
> +	struct iommu_domain *domain;
> +
> +	cookie = container_of(iovad, struct iommu_dma_cookie, iovad);
> +	domain = cookie->domain;
> +
> +	domain->ops->flush_iotlb_all(domain);
> +}
> +
>   /**
>    * iommu_dma_init_domain - Initialise a DMA mapping domain
>    * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
> @@ -276,6 +290,7 @@ static int iova_reserve_iommu_regions(struct device *dev,
>   int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>   		u64 size, struct device *dev)
>   {
> +	const struct iommu_ops *ops = domain->ops;
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   	struct iova_domain *iovad = &cookie->iovad;
>   	unsigned long order, base_pfn, end_pfn;
> @@ -313,6 +328,11 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>   
>   	init_iova_domain(iovad, 1UL << order, base_pfn);
>   
> +	if (ops->capable && ops->capable(IOMMU_CAP_NON_STRICT)) {
> +		BUG_ON(!ops->flush_iotlb_all);
> +		init_iova_flush_queue(iovad, iova_flush_iotlb_all, NULL);
> +	}
> +
>   	return iova_reserve_iommu_regions(dev, domain);
>   }
>   EXPORT_SYMBOL(iommu_dma_init_domain);
> @@ -392,6 +412,9 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
>   	/* The MSI case is only ever cleaning up its most recent allocation */
>   	if (cookie->type == IOMMU_DMA_MSI_COOKIE)
>   		cookie->msi_iova -= size;
> +	else if (!IOMMU_DOMAIN_IS_STRICT(cookie->domain) && iovad->flush_cb)
> +		queue_iova(iovad, iova_pfn(iovad, iova),
> +				size >> iova_shift(iovad), 0);
>   	else
>   		free_iova_fast(iovad, iova_pfn(iovad, iova),
>   				size >> iova_shift(iovad));
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 39b3150..01ff569 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -87,6 +87,8 @@ struct iommu_domain_geometry {
>   				 __IOMMU_DOMAIN_DMA_API)
>   
>   #define IOMMU_STRICT		1
> +#define IOMMU_DOMAIN_IS_STRICT(domain)	\
> +		(domain->type == IOMMU_DOMAIN_UNMANAGED)
>   
>   struct iommu_domain {
>   	unsigned type;
> @@ -103,6 +105,7 @@ enum iommu_cap {
>   					   transactions */
>   	IOMMU_CAP_INTR_REMAP,		/* IOMMU supports interrupt isolation */
>   	IOMMU_CAP_NOEXEC,		/* IOMMU_NOEXEC flag */
> +	IOMMU_CAP_NON_STRICT,		/* IOMMU supports non-strict mode */

This isn't a property of the IOMMU, it depends purely on the driver 
implementation. I think it also doesn't matter anyway - if a caller asks 
for lazy unmapping on their domain but the IOMMU driver just does strict 
unmaps anyway because that's all it supports, there's no actual harm done.

Robin.

>   };
>   
>   /*
> 

^ permalink raw reply

* [PATCH v1 0/2] support to set VSESR_EL2 by user space
From: Dongjiu Geng @ 2018-05-31 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

This series patch is separated from https://www.spinics.net/lists/kvm/msg168917.html

1. Detect whether KVM can set set guest SError syndrome
2. Support to Set VSESR_EL2 and inject SError by user space.
3. Support live migration to keep SError pending state and VSESR_EL2 value

The user space patch is here: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg06965.html

Dongjiu Geng (2):
  arm64: KVM: export the capability to set guest SError syndrome
  arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS

 Documentation/virtual/kvm/api.txt    | 42 +++++++++++++++++++++++++++++++++---
 arch/arm/include/asm/kvm_host.h      |  6 ++++++
 arch/arm/kvm/guest.c                 | 12 +++++++++++
 arch/arm64/include/asm/kvm_emulate.h |  5 +++++
 arch/arm64/include/asm/kvm_host.h    |  7 ++++++
 arch/arm64/include/uapi/asm/kvm.h    | 13 +++++++++++
 arch/arm64/kvm/guest.c               | 36 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/inject_fault.c        |  7 +++++-
 arch/arm64/kvm/reset.c               |  4 ++++
 include/uapi/linux/kvm.h             |  1 +
 virt/kvm/arm/arm.c                   | 21 ++++++++++++++++++
 11 files changed, 150 insertions(+), 4 deletions(-)

-- 
2.7.4

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox