LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v11 19/26] mm: provide speculative fault infrastructure
From: zhong jiang @ 2018-07-24 14:26 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: akpm, mhocko, peterz, kirill, ak, dave, jack, Matthew Wilcox,
	khandual, aneesh.kumar, benh, mpe, paulus, Thomas Gleixner,
	Ingo Molnar, hpa, Will Deacon, Sergey Senozhatsky,
	sergey.senozhatsky.work, Andrea Arcangeli, Alexei Starovoitov,
	kemi.wang, Daniel Jordan, David Rientjes, Jerome Glisse,
	Ganesh Mahendran, Minchan Kim, Punit Agrawal, vinayak menon,
	Yang Shi, linux-kernel, linux-mm, haren, npiggin, bsingharora,
	paulmck, Tim Chen, linuxppc-dev, x86
In-Reply-To: <1526555193-7242-20-git-send-email-ldufour@linux.vnet.ibm.com>

On 2018/5/17 19:06, Laurent Dufour wrote:
> From: Peter Zijlstra <peterz@infradead.org>
>
> Provide infrastructure to do a speculative fault (not holding
> mmap_sem).
>
> The not holding of mmap_sem means we can race against VMA
> change/removal and page-table destruction. We use the SRCU VMA freeing
> to keep the VMA around. We use the VMA seqcount to detect change
> (including umapping / page-table deletion) and we use gup_fast() style
> page-table walking to deal with page-table races.
>
> Once we've obtained the page and are ready to update the PTE, we
> validate if the state we started the fault with is still valid, if
> not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the
> PTE and we're done.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> [Manage the newly introduced pte_spinlock() for speculative page
>  fault to fail if the VMA is touched in our back]
> [Rename vma_is_dead() to vma_has_changed() and declare it here]
> [Fetch p4d and pud]
> [Set vmd.sequence in __handle_mm_fault()]
> [Abort speculative path when handle_userfault() has to be called]
> [Add additional VMA's flags checks in handle_speculative_fault()]
> [Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()]
> [Don't set vmf->pte and vmf->ptl if pte_map_lock() failed]
> [Remove warning comment about waiting for !seq&1 since we don't want
>  to wait]
> [Remove warning about no huge page support, mention it explictly]
> [Don't call do_fault() in the speculative path as __do_fault() calls
>  vma->vm_ops->fault() which may want to release mmap_sem]
> [Only vm_fault pointer argument for vma_has_changed()]
> [Fix check against huge page, calling pmd_trans_huge()]
> [Use READ_ONCE() when reading VMA's fields in the speculative path]
> [Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for
>  processing done in vm_normal_page()]
> [Check that vma->anon_vma is already set when starting the speculative
>  path]
> [Check for memory policy as we can't support MPOL_INTERLEAVE case due to
>  the processing done in mpol_misplaced()]
> [Don't support VMA growing up or down]
> [Move check on vm_sequence just before calling handle_pte_fault()]
> [Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT]
> [Add mem cgroup oom check]
> [Use READ_ONCE to access p*d entries]
> [Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()]
> [Don't fetch pte again in handle_pte_fault() when running the speculative
>  path]
> [Check PMD against concurrent collapsing operation]
> [Try spin lock the pte during the speculative path to avoid deadlock with
>  other CPU's invalidating the TLB and requiring this CPU to catch the
>  inter processor's interrupt]
> [Move define of FAULT_FLAG_SPECULATIVE here]
> [Introduce __handle_speculative_fault() and add a check against
>  mm->mm_users in handle_speculative_fault() defined in mm.h]
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> ---
>  include/linux/hugetlb_inline.h |   2 +-
>  include/linux/mm.h             |  30 ++++
>  include/linux/pagemap.h        |   4 +-
>  mm/internal.h                  |  16 +-
>  mm/memory.c                    | 340 ++++++++++++++++++++++++++++++++++++++++-
>  5 files changed, 385 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h
> index 0660a03d37d9..9e25283d6fc9 100644
> --- a/include/linux/hugetlb_inline.h
> +++ b/include/linux/hugetlb_inline.h
> @@ -8,7 +8,7 @@
>  
>  static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma)
>  {
> -	return !!(vma->vm_flags & VM_HUGETLB);
> +	return !!(READ_ONCE(vma->vm_flags) & VM_HUGETLB);
>  }
>  
>  #else
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 05cbba70104b..31acf98a7d92 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -315,6 +315,7 @@ extern pgprot_t protection_map[16];
>  #define FAULT_FLAG_USER		0x40	/* The fault originated in userspace */
>  #define FAULT_FLAG_REMOTE	0x80	/* faulting for non current tsk/mm */
>  #define FAULT_FLAG_INSTRUCTION  0x100	/* The fault was during an instruction fetch */
> +#define FAULT_FLAG_SPECULATIVE	0x200	/* Speculative fault, not holding mmap_sem */
>  
>  #define FAULT_FLAG_TRACE \
>  	{ FAULT_FLAG_WRITE,		"WRITE" }, \
> @@ -343,6 +344,10 @@ struct vm_fault {
>  	gfp_t gfp_mask;			/* gfp mask to be used for allocations */
>  	pgoff_t pgoff;			/* Logical page offset based on vma */
>  	unsigned long address;		/* Faulting virtual address */
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +	unsigned int sequence;
> +	pmd_t orig_pmd;			/* value of PMD at the time of fault */
> +#endif
>  	pmd_t *pmd;			/* Pointer to pmd entry matching
>  					 * the 'address' */
>  	pud_t *pud;			/* Pointer to pud entry matching
> @@ -1415,6 +1420,31 @@ int invalidate_inode_page(struct page *page);
>  #ifdef CONFIG_MMU
>  extern int handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>  		unsigned int flags);
> +
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +extern int __handle_speculative_fault(struct mm_struct *mm,
> +				      unsigned long address,
> +				      unsigned int flags);
> +static inline int handle_speculative_fault(struct mm_struct *mm,
> +					   unsigned long address,
> +					   unsigned int flags)
> +{
> +	/*
> +	 * Try speculative page fault for multithreaded user space task only.
> +	 */
> +	if (!(flags & FAULT_FLAG_USER) || atomic_read(&mm->mm_users) == 1)
> +		return VM_FAULT_RETRY;
> +	return __handle_speculative_fault(mm, address, flags);
> +}
> +#else
> +static inline int handle_speculative_fault(struct mm_struct *mm,
> +					   unsigned long address,
> +					   unsigned int flags)
> +{
> +	return VM_FAULT_RETRY;
> +}
> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
> +
>  extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
>  			    unsigned long address, unsigned int fault_flags,
>  			    bool *unlocked);
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index b1bd2186e6d2..6e2aa4e79af7 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -456,8 +456,8 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
>  	pgoff_t pgoff;
>  	if (unlikely(is_vm_hugetlb_page(vma)))
>  		return linear_hugepage_index(vma, address);
> -	pgoff = (address - vma->vm_start) >> PAGE_SHIFT;
> -	pgoff += vma->vm_pgoff;
> +	pgoff = (address - READ_ONCE(vma->vm_start)) >> PAGE_SHIFT;
> +	pgoff += READ_ONCE(vma->vm_pgoff);
>  	return pgoff;
>  }
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index fb2667b20f0a..10b188c87fa4 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -44,7 +44,21 @@ int do_swap_page(struct vm_fault *vmf);
>  extern struct vm_area_struct *get_vma(struct mm_struct *mm,
>  				      unsigned long addr);
>  extern void put_vma(struct vm_area_struct *vma);
> -#endif
> +
> +static inline bool vma_has_changed(struct vm_fault *vmf)
> +{
> +	int ret = RB_EMPTY_NODE(&vmf->vma->vm_rb);
> +	unsigned int seq = READ_ONCE(vmf->vma->vm_sequence.sequence);
> +
> +	/*
> +	 * Matches both the wmb in write_seqlock_{begin,end}() and
> +	 * the wmb in vma_rb_erase().
> +	 */
> +	smp_rmb();
> +
> +	return ret || seq != vmf->sequence;
> +}
> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
>  
>  void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>  		unsigned long floor, unsigned long ceiling);
> diff --git a/mm/memory.c b/mm/memory.c
> index ab32b0b4bd69..7bbbb8c7b9cd 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -769,7 +769,8 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>  	if (page)
>  		dump_page(page, "bad pte");
>  	pr_alert("addr:%p vm_flags:%08lx anon_vma:%p mapping:%p index:%lx\n",
> -		 (void *)addr, vma->vm_flags, vma->anon_vma, mapping, index);
> +		 (void *)addr, READ_ONCE(vma->vm_flags), vma->anon_vma,
> +		 mapping, index);
>  	pr_alert("file:%pD fault:%pf mmap:%pf readpage:%pf\n",
>  		 vma->vm_file,
>  		 vma->vm_ops ? vma->vm_ops->fault : NULL,
> @@ -2306,6 +2307,118 @@ int apply_to_page_range(struct mm_struct *mm, unsigned long addr,
>  }
>  EXPORT_SYMBOL_GPL(apply_to_page_range);
>  
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +static bool pte_spinlock(struct vm_fault *vmf)
> +{
> +	bool ret = false;
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	pmd_t pmdval;
> +#endif
> +
> +	/* Check if vma is still valid */
> +	if (!(vmf->flags & FAULT_FLAG_SPECULATIVE)) {
> +		vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> +		spin_lock(vmf->ptl);
> +		return true;
> +	}
> +
> +again:
> +	local_irq_disable();
> +	if (vma_has_changed(vmf))
> +		goto out;
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	/*
> +	 * We check if the pmd value is still the same to ensure that there
> +	 * is not a huge collapse operation in progress in our back.
> +	 */
> +	pmdval = READ_ONCE(*vmf->pmd);
> +	if (!pmd_same(pmdval, vmf->orig_pmd))
> +		goto out;
> +#endif
> +
> +	vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> +	if (unlikely(!spin_trylock(vmf->ptl))) {
> +		local_irq_enable();
> +		goto again;
> +	}
> +
> +	if (vma_has_changed(vmf)) {
> +		spin_unlock(vmf->ptl);
> +		goto out;
> +	}
> +
> +	ret = true;
> +out:
> +	local_irq_enable();
> +	return ret;
> +}
> +
> +static bool pte_map_lock(struct vm_fault *vmf)
> +{
> +	bool ret = false;
> +	pte_t *pte;
> +	spinlock_t *ptl;
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	pmd_t pmdval;
> +#endif
> +
> +	if (!(vmf->flags & FAULT_FLAG_SPECULATIVE)) {
> +		vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
> +					       vmf->address, &vmf->ptl);
> +		return true;
> +	}
> +
> +	/*
> +	 * The first vma_has_changed() guarantees the page-tables are still
> +	 * valid, having IRQs disabled ensures they stay around, hence the
> +	 * second vma_has_changed() to make sure they are still valid once
> +	 * we've got the lock. After that a concurrent zap_pte_range() will
> +	 * block on the PTL and thus we're safe.
> +	 */
> +again:
> +	local_irq_disable();
> +	if (vma_has_changed(vmf))
> +		goto out;
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	/*
> +	 * We check if the pmd value is still the same to ensure that there
> +	 * is not a huge collapse operation in progress in our back.
> +	 */
> +	pmdval = READ_ONCE(*vmf->pmd);
> +	if (!pmd_same(pmdval, vmf->orig_pmd))
> +		goto out;
> +#endif
> +
> +	/*
> +	 * Same as pte_offset_map_lock() except that we call
> +	 * spin_trylock() in place of spin_lock() to avoid race with
> +	 * unmap path which may have the lock and wait for this CPU
> +	 * to invalidate TLB but this CPU has irq disabled.
> +	 * Since we are in a speculative patch, accept it could fail
> +	 */
> +	ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> +	pte = pte_offset_map(vmf->pmd, vmf->address);
> +	if (unlikely(!spin_trylock(ptl))) {
> +		pte_unmap(pte);
> +		local_irq_enable();
> +		goto again;
> +	}
> +
> +	if (vma_has_changed(vmf)) {
> +		pte_unmap_unlock(pte, ptl);
> +		goto out;
> +	}
> +
> +	vmf->pte = pte;
> +	vmf->ptl = ptl;
> +	ret = true;
> +out:
> +	local_irq_enable();
> +	return ret;
> +}
> +#else
>  static inline bool pte_spinlock(struct vm_fault *vmf)
>  {
>  	vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> @@ -2319,6 +2432,7 @@ static inline bool pte_map_lock(struct vm_fault *vmf)
>  				       vmf->address, &vmf->ptl);
>  	return true;
>  }
> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
>  
>  /*
>   * handle_pte_fault chooses page fault handler according to an entry which was
> @@ -3208,6 +3322,14 @@ static int do_anonymous_page(struct vm_fault *vmf)
>  		ret = check_stable_address_space(vma->vm_mm);
>  		if (ret)
>  			goto unlock;
> +		/*
> +		 * Don't call the userfaultfd during the speculative path.
> +		 * We already checked for the VMA to not be managed through
> +		 * userfaultfd, but it may be set in our back once we have lock
> +		 * the pte. In such a case we can ignore it this time.
> +		 */
> +		if (vmf->flags & FAULT_FLAG_SPECULATIVE)
> +			goto setpte;
>  		/* Deliver the page fault to userland, check inside PT lock */
>  		if (userfaultfd_missing(vma)) {
>  			pte_unmap_unlock(vmf->pte, vmf->ptl);
> @@ -3249,7 +3371,7 @@ static int do_anonymous_page(struct vm_fault *vmf)
>  		goto unlock_and_release;
>  
>  	/* Deliver the page fault to userland, check inside PT lock */
> -	if (userfaultfd_missing(vma)) {
> +	if (!(vmf->flags & FAULT_FLAG_SPECULATIVE) && userfaultfd_missing(vma)) {
>  		pte_unmap_unlock(vmf->pte, vmf->ptl);
>  		mem_cgroup_cancel_charge(page, memcg, false);
>  		put_page(page);
> @@ -3994,13 +4116,22 @@ static int handle_pte_fault(struct vm_fault *vmf)
>  
>  	if (unlikely(pmd_none(*vmf->pmd))) {
>  		/*
> +		 * In the case of the speculative page fault handler we abort
> +		 * the speculative path immediately as the pmd is probably
> +		 * in the way to be converted in a huge one. We will try
> +		 * again holding the mmap_sem (which implies that the collapse
> +		 * operation is done).
> +		 */
> +		if (vmf->flags & FAULT_FLAG_SPECULATIVE)
> +			return VM_FAULT_RETRY;
> +		/*
>  		 * Leave __pte_alloc() until later: because vm_ops->fault may
>  		 * want to allocate huge page, and if we expose page table
>  		 * for an instant, it will be difficult to retract from
>  		 * concurrent faults and from rmap lookups.
>  		 */
>  		vmf->pte = NULL;
> -	} else {
> +	} else if (!(vmf->flags & FAULT_FLAG_SPECULATIVE)) {
>  		/* See comment in pte_alloc_one_map() */
>  		if (pmd_devmap_trans_unstable(vmf->pmd))
>  			return 0;
> @@ -4009,6 +4140,9 @@ static int handle_pte_fault(struct vm_fault *vmf)
>  		 * pmd from under us anymore at this point because we hold the
>  		 * mmap_sem read mode and khugepaged takes it in write mode.
>  		 * So now it's safe to run pte_offset_map().
> +		 * This is not applicable to the speculative page fault handler
> +		 * but in that case, the pte is fetched earlier in
> +		 * handle_speculative_fault().
>  		 */
>  		vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
>  		vmf->orig_pte = *vmf->pte;
> @@ -4031,6 +4165,8 @@ static int handle_pte_fault(struct vm_fault *vmf)
>  	if (!vmf->pte) {
>  		if (vma_is_anonymous(vmf->vma))
>  			return do_anonymous_page(vmf);
> +		else if (vmf->flags & FAULT_FLAG_SPECULATIVE)
> +			return VM_FAULT_RETRY;
>  		else
>  			return do_fault(vmf);
>  	}
> @@ -4128,6 +4264,9 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>  	vmf.pmd = pmd_alloc(mm, vmf.pud, address);
>  	if (!vmf.pmd)
>  		return VM_FAULT_OOM;
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +	vmf.sequence = raw_read_seqcount(&vma->vm_sequence);
> +#endif
>  	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
>  		ret = create_huge_pmd(&vmf);
>  		if (!(ret & VM_FAULT_FALLBACK))
> @@ -4161,6 +4300,201 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>  	return handle_pte_fault(&vmf);
>  }
>  
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +/*
> + * Tries to handle the page fault in a speculative way, without grabbing the
> + * mmap_sem.
> + */
> +int __handle_speculative_fault(struct mm_struct *mm, unsigned long address,
> +			       unsigned int flags)
> +{
> +	struct vm_fault vmf = {
> +		.address = address,
> +	};
> +	pgd_t *pgd, pgdval;
> +	p4d_t *p4d, p4dval;
> +	pud_t pudval;
> +	int seq, ret = VM_FAULT_RETRY;
> +	struct vm_area_struct *vma;
> +#ifdef CONFIG_NUMA
> +	struct mempolicy *pol;
> +#endif
> +
> +	/* Clear flags that may lead to release the mmap_sem to retry */
> +	flags &= ~(FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_KILLABLE);
> +	flags |= FAULT_FLAG_SPECULATIVE;
> +
> +	vma = get_vma(mm, address);
> +	if (!vma)
> +		return ret;
> +
> +	seq = raw_read_seqcount(&vma->vm_sequence); /* rmb <-> seqlock,vma_rb_erase() */
> +	if (seq & 1)
> +		goto out_put;
> +
> +	/*
> +	 * Can't call vm_ops service has we don't know what they would do
> +	 * with the VMA.
> +	 * This include huge page from hugetlbfs.
> +	 */
> +	if (vma->vm_ops)
> +		goto out_put;
> +
  Hi   Laurent
   
   I think that most of pagefault will leave here.   Is there any case  need to skip ?
  I have tested the following  patch, it work well.

diff --git a/mm/memory.c b/mm/memory.c
index 936128b..9bc1545 100644
 @@ -3893,8 +3898,6 @@ static int handle_pte_fault(struct fault_env *fe)
        if (!fe->pte) {
                if (vma_is_anonymous(fe->vma))
                        return do_anonymous_page(fe);
-               else if (fe->flags & FAULT_FLAG_SPECULATIVE)
-                       return VM_FAULT_RETRY;
                else
                        return do_fault(fe);
        }
@@ -4026,20 +4029,11 @@ int __handle_speculative_fault(struct mm_struct *mm, unsigned long address,
                goto out_put;
        }
        /*
-        * Can't call vm_ops service has we don't know what they would do
-        * with the VMA.
-        * This include huge page from hugetlbfs.
-        */
-       if (vma->vm_ops) {
-               trace_spf_vma_notsup(_RET_IP_, vma, address);
-               goto out_put;
-       }


Thanks
zhong jiang
> +	/*
> +	 * __anon_vma_prepare() requires the mmap_sem to be held
> +	 * because vm_next and vm_prev must be safe. This can't be guaranteed
> +	 * in the speculative path.
> +	 */
> +	if (unlikely(!vma->anon_vma))
> +		goto out_put;
> +
> +	vmf.vma_flags = READ_ONCE(vma->vm_flags);
> +	vmf.vma_page_prot = READ_ONCE(vma->vm_page_prot);
> +
> +	/* Can't call userland page fault handler in the speculative path */
> +	if (unlikely(vmf.vma_flags & VM_UFFD_MISSING))
> +		goto out_put;
> +
> +	if (vmf.vma_flags & VM_GROWSDOWN || vmf.vma_flags & VM_GROWSUP)
> +		/*
> +		 * This could be detected by the check address against VMA's
> +		 * boundaries but we want to trace it as not supported instead
> +		 * of changed.
> +		 */
> +		goto out_put;
> +
> +	if (address < READ_ONCE(vma->vm_start)
> +	    || READ_ONCE(vma->vm_end) <= address)
> +		goto out_put;
> +
> +	if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,
> +				       flags & FAULT_FLAG_INSTRUCTION,
> +				       flags & FAULT_FLAG_REMOTE)) {
> +		ret = VM_FAULT_SIGSEGV;
> +		goto out_put;
> +	}
> +
> +	/* This is one is required to check that the VMA has write access set */
> +	if (flags & FAULT_FLAG_WRITE) {
> +		if (unlikely(!(vmf.vma_flags & VM_WRITE))) {
> +			ret = VM_FAULT_SIGSEGV;
> +			goto out_put;
> +		}
> +	} else if (unlikely(!(vmf.vma_flags & (VM_READ|VM_EXEC|VM_WRITE)))) {
> +		ret = VM_FAULT_SIGSEGV;
> +		goto out_put;
> +	}
> +
> +#ifdef CONFIG_NUMA
> +	/*
> +	 * MPOL_INTERLEAVE implies additional checks in
> +	 * mpol_misplaced() which are not compatible with the
> +	 *speculative page fault processing.
> +	 */
> +	pol = __get_vma_policy(vma, address);
> +	if (!pol)
> +		pol = get_task_policy(current);
> +	if (pol && pol->mode == MPOL_INTERLEAVE)
> +		goto out_put;
> +#endif
> +
> +	/*
> +	 * Do a speculative lookup of the PTE entry.
> +	 */
> +	local_irq_disable();
> +	pgd = pgd_offset(mm, address);
> +	pgdval = READ_ONCE(*pgd);
> +	if (pgd_none(pgdval) || unlikely(pgd_bad(pgdval)))
> +		goto out_walk;
> +
> +	p4d = p4d_offset(pgd, address);
> +	p4dval = READ_ONCE(*p4d);
> +	if (p4d_none(p4dval) || unlikely(p4d_bad(p4dval)))
> +		goto out_walk;
> +
> +	vmf.pud = pud_offset(p4d, address);
> +	pudval = READ_ONCE(*vmf.pud);
> +	if (pud_none(pudval) || unlikely(pud_bad(pudval)))
> +		goto out_walk;
> +
> +	/* Huge pages at PUD level are not supported. */
> +	if (unlikely(pud_trans_huge(pudval)))
> +		goto out_walk;
> +
> +	vmf.pmd = pmd_offset(vmf.pud, address);
> +	vmf.orig_pmd = READ_ONCE(*vmf.pmd);
> +	/*
> +	 * pmd_none could mean that a hugepage collapse is in progress
> +	 * in our back as collapse_huge_page() mark it before
> +	 * invalidating the pte (which is done once the IPI is catched
> +	 * by all CPU and we have interrupt disabled).
> +	 * For this reason we cannot handle THP in a speculative way since we
> +	 * can't safely indentify an in progress collapse operation done in our
> +	 * back on that PMD.
> +	 * Regarding the order of the following checks, see comment in
> +	 * pmd_devmap_trans_unstable()
> +	 */
> +	if (unlikely(pmd_devmap(vmf.orig_pmd) ||
> +		     pmd_none(vmf.orig_pmd) || pmd_trans_huge(vmf.orig_pmd) ||
> +		     is_swap_pmd(vmf.orig_pmd)))
> +		goto out_walk;
> +
> +	/*
> +	 * The above does not allocate/instantiate page-tables because doing so
> +	 * would lead to the possibility of instantiating page-tables after
> +	 * free_pgtables() -- and consequently leaking them.
> +	 *
> +	 * The result is that we take at least one !speculative fault per PMD
> +	 * in order to instantiate it.
> +	 */
> +
> +	vmf.pte = pte_offset_map(vmf.pmd, address);
> +	vmf.orig_pte = READ_ONCE(*vmf.pte);
> +	barrier(); /* See comment in handle_pte_fault() */
> +	if (pte_none(vmf.orig_pte)) {
> +		pte_unmap(vmf.pte);
> +		vmf.pte = NULL;
> +	}
> +
> +	vmf.vma = vma;
> +	vmf.pgoff = linear_page_index(vma, address);
> +	vmf.gfp_mask = __get_fault_gfp_mask(vma);
> +	vmf.sequence = seq;
> +	vmf.flags = flags;
> +
> +	local_irq_enable();
> +
> +	/*
> +	 * We need to re-validate the VMA after checking the bounds, otherwise
> +	 * we might have a false positive on the bounds.
> +	 */
> +	if (read_seqcount_retry(&vma->vm_sequence, seq))
> +		goto out_put;
> +
> +	mem_cgroup_oom_enable();
> +	ret = handle_pte_fault(&vmf);
> +	mem_cgroup_oom_disable();
> +
> +	put_vma(vma);
> +
> +	/*
> +	 * The task may have entered a memcg OOM situation but
> +	 * if the allocation error was handled gracefully (no
> +	 * VM_FAULT_OOM), there is no need to kill anything.
> +	 * Just clean up the OOM state peacefully.
> +	 */
> +	if (task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM))
> +		mem_cgroup_oom_synchronize(false);
> +	return ret;
> +
> +out_walk:
> +	local_irq_enable();
> +out_put:
> +	put_vma(vma);
> +	return ret;
> +}
> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
> +
>  /*
>   * By the time we get here, we already hold the mm semaphore
>   *

^ permalink raw reply

* RE: [PATCH] net: ethernet: fs-enet: Use generic CRC32 implementation
From: David Laight @ 2018-07-24 11:05 UTC (permalink / raw)
  To: 'Krzysztof Kozlowski', Pantelis Antoniou, David S. Miller,
	linuxppc-dev@lists.ozlabs.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: Eric Biggers
In-Reply-To: <20180723162020.6221-1-krzk@kernel.org>

From: Krzysztof Kozlowski
> Sent: 23 July 2018 17:20
> Use generic kernel CRC32 implementation because it:
> 1. Should be faster (uses lookup tables),

Are you sure?
The lookup tables are unlikely to be in the data cache and
the 6 cache misses kill performance.
(Not that it particularly matters when setting up multicast hash tables).

> 2. Removes duplicated CRC generation code,
> 3. Uses well-proven algorithm instead of coding it one more time.
...
>=20
> Not tested on hardware.

Have you verified that the old and new functions give the
same result for a few mac addresses?
It is very easy to use the wrong bits in crc calculations
or generate the output in the wrong bit order.

=09David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1=
PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* Re: [PATCH v11 19/26] mm: provide speculative fault infrastructure
From: Laurent Dufour @ 2018-07-24 16:10 UTC (permalink / raw)
  To: zhong jiang
  Cc: akpm, mhocko, peterz, kirill, ak, dave, jack, Matthew Wilcox,
	khandual, aneesh.kumar, benh, mpe, paulus, Thomas Gleixner,
	Ingo Molnar, hpa, Will Deacon, Sergey Senozhatsky,
	sergey.senozhatsky.work, Andrea Arcangeli, Alexei Starovoitov,
	kemi.wang, Daniel Jordan, David Rientjes, Jerome Glisse,
	Ganesh Mahendran, Minchan Kim, Punit Agrawal, vinayak menon,
	Yang Shi, linux-kernel, linux-mm, haren, npiggin, bsingharora,
	paulmck, Tim Chen, linuxppc-dev, x86
In-Reply-To: <5B573715.5070201@huawei.com>



On 24/07/2018 16:26, zhong jiang wrote:
> On 2018/5/17 19:06, Laurent Dufour wrote:
>> From: Peter Zijlstra <peterz@infradead.org>
>>
>> Provide infrastructure to do a speculative fault (not holding
>> mmap_sem).
>>
>> The not holding of mmap_sem means we can race against VMA
>> change/removal and page-table destruction. We use the SRCU VMA freeing
>> to keep the VMA around. We use the VMA seqcount to detect change
>> (including umapping / page-table deletion) and we use gup_fast() style
>> page-table walking to deal with page-table races.
>>
>> Once we've obtained the page and are ready to update the PTE, we
>> validate if the state we started the fault with is still valid, if
>> not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the
>> PTE and we're done.
>>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>
>> [Manage the newly introduced pte_spinlock() for speculative page
>>  fault to fail if the VMA is touched in our back]
>> [Rename vma_is_dead() to vma_has_changed() and declare it here]
>> [Fetch p4d and pud]
>> [Set vmd.sequence in __handle_mm_fault()]
>> [Abort speculative path when handle_userfault() has to be called]
>> [Add additional VMA's flags checks in handle_speculative_fault()]
>> [Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()]
>> [Don't set vmf->pte and vmf->ptl if pte_map_lock() failed]
>> [Remove warning comment about waiting for !seq&1 since we don't want
>>  to wait]
>> [Remove warning about no huge page support, mention it explictly]
>> [Don't call do_fault() in the speculative path as __do_fault() calls
>>  vma->vm_ops->fault() which may want to release mmap_sem]
>> [Only vm_fault pointer argument for vma_has_changed()]
>> [Fix check against huge page, calling pmd_trans_huge()]
>> [Use READ_ONCE() when reading VMA's fields in the speculative path]
>> [Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for
>>  processing done in vm_normal_page()]
>> [Check that vma->anon_vma is already set when starting the speculative
>>  path]
>> [Check for memory policy as we can't support MPOL_INTERLEAVE case due to
>>  the processing done in mpol_misplaced()]
>> [Don't support VMA growing up or down]
>> [Move check on vm_sequence just before calling handle_pte_fault()]
>> [Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT]
>> [Add mem cgroup oom check]
>> [Use READ_ONCE to access p*d entries]
>> [Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()]
>> [Don't fetch pte again in handle_pte_fault() when running the speculative
>>  path]
>> [Check PMD against concurrent collapsing operation]
>> [Try spin lock the pte during the speculative path to avoid deadlock with
>>  other CPU's invalidating the TLB and requiring this CPU to catch the
>>  inter processor's interrupt]
>> [Move define of FAULT_FLAG_SPECULATIVE here]
>> [Introduce __handle_speculative_fault() and add a check against
>>  mm->mm_users in handle_speculative_fault() defined in mm.h]
>> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
>> ---
>>  include/linux/hugetlb_inline.h |   2 +-
>>  include/linux/mm.h             |  30 ++++
>>  include/linux/pagemap.h        |   4 +-
>>  mm/internal.h                  |  16 +-
>>  mm/memory.c                    | 340 ++++++++++++++++++++++++++++++++++++++++-
>>  5 files changed, 385 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h
>> index 0660a03d37d9..9e25283d6fc9 100644
>> --- a/include/linux/hugetlb_inline.h
>> +++ b/include/linux/hugetlb_inline.h
>> @@ -8,7 +8,7 @@
>>  
>>  static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma)
>>  {
>> -	return !!(vma->vm_flags & VM_HUGETLB);
>> +	return !!(READ_ONCE(vma->vm_flags) & VM_HUGETLB);
>>  }
>>  
>>  #else
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 05cbba70104b..31acf98a7d92 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -315,6 +315,7 @@ extern pgprot_t protection_map[16];
>>  #define FAULT_FLAG_USER		0x40	/* The fault originated in userspace */
>>  #define FAULT_FLAG_REMOTE	0x80	/* faulting for non current tsk/mm */
>>  #define FAULT_FLAG_INSTRUCTION  0x100	/* The fault was during an instruction fetch */
>> +#define FAULT_FLAG_SPECULATIVE	0x200	/* Speculative fault, not holding mmap_sem */
>>  
>>  #define FAULT_FLAG_TRACE \
>>  	{ FAULT_FLAG_WRITE,		"WRITE" }, \
>> @@ -343,6 +344,10 @@ struct vm_fault {
>>  	gfp_t gfp_mask;			/* gfp mask to be used for allocations */
>>  	pgoff_t pgoff;			/* Logical page offset based on vma */
>>  	unsigned long address;		/* Faulting virtual address */
>> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
>> +	unsigned int sequence;
>> +	pmd_t orig_pmd;			/* value of PMD at the time of fault */
>> +#endif
>>  	pmd_t *pmd;			/* Pointer to pmd entry matching
>>  					 * the 'address' */
>>  	pud_t *pud;			/* Pointer to pud entry matching
>> @@ -1415,6 +1420,31 @@ int invalidate_inode_page(struct page *page);
>>  #ifdef CONFIG_MMU
>>  extern int handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>>  		unsigned int flags);
>> +
>> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
>> +extern int __handle_speculative_fault(struct mm_struct *mm,
>> +				      unsigned long address,
>> +				      unsigned int flags);
>> +static inline int handle_speculative_fault(struct mm_struct *mm,
>> +					   unsigned long address,
>> +					   unsigned int flags)
>> +{
>> +	/*
>> +	 * Try speculative page fault for multithreaded user space task only.
>> +	 */
>> +	if (!(flags & FAULT_FLAG_USER) || atomic_read(&mm->mm_users) == 1)
>> +		return VM_FAULT_RETRY;
>> +	return __handle_speculative_fault(mm, address, flags);
>> +}
>> +#else
>> +static inline int handle_speculative_fault(struct mm_struct *mm,
>> +					   unsigned long address,
>> +					   unsigned int flags)
>> +{
>> +	return VM_FAULT_RETRY;
>> +}
>> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
>> +
>>  extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
>>  			    unsigned long address, unsigned int fault_flags,
>>  			    bool *unlocked);
>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>> index b1bd2186e6d2..6e2aa4e79af7 100644
>> --- a/include/linux/pagemap.h
>> +++ b/include/linux/pagemap.h
>> @@ -456,8 +456,8 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
>>  	pgoff_t pgoff;
>>  	if (unlikely(is_vm_hugetlb_page(vma)))
>>  		return linear_hugepage_index(vma, address);
>> -	pgoff = (address - vma->vm_start) >> PAGE_SHIFT;
>> -	pgoff += vma->vm_pgoff;
>> +	pgoff = (address - READ_ONCE(vma->vm_start)) >> PAGE_SHIFT;
>> +	pgoff += READ_ONCE(vma->vm_pgoff);
>>  	return pgoff;
>>  }
>>  
>> diff --git a/mm/internal.h b/mm/internal.h
>> index fb2667b20f0a..10b188c87fa4 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -44,7 +44,21 @@ int do_swap_page(struct vm_fault *vmf);
>>  extern struct vm_area_struct *get_vma(struct mm_struct *mm,
>>  				      unsigned long addr);
>>  extern void put_vma(struct vm_area_struct *vma);
>> -#endif
>> +
>> +static inline bool vma_has_changed(struct vm_fault *vmf)
>> +{
>> +	int ret = RB_EMPTY_NODE(&vmf->vma->vm_rb);
>> +	unsigned int seq = READ_ONCE(vmf->vma->vm_sequence.sequence);
>> +
>> +	/*
>> +	 * Matches both the wmb in write_seqlock_{begin,end}() and
>> +	 * the wmb in vma_rb_erase().
>> +	 */
>> +	smp_rmb();
>> +
>> +	return ret || seq != vmf->sequence;
>> +}
>> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
>>  
>>  void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>>  		unsigned long floor, unsigned long ceiling);
>> diff --git a/mm/memory.c b/mm/memory.c
>> index ab32b0b4bd69..7bbbb8c7b9cd 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -769,7 +769,8 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>>  	if (page)
>>  		dump_page(page, "bad pte");
>>  	pr_alert("addr:%p vm_flags:%08lx anon_vma:%p mapping:%p index:%lx\n",
>> -		 (void *)addr, vma->vm_flags, vma->anon_vma, mapping, index);
>> +		 (void *)addr, READ_ONCE(vma->vm_flags), vma->anon_vma,
>> +		 mapping, index);
>>  	pr_alert("file:%pD fault:%pf mmap:%pf readpage:%pf\n",
>>  		 vma->vm_file,
>>  		 vma->vm_ops ? vma->vm_ops->fault : NULL,
>> @@ -2306,6 +2307,118 @@ int apply_to_page_range(struct mm_struct *mm, unsigned long addr,
>>  }
>>  EXPORT_SYMBOL_GPL(apply_to_page_range);
>>  
>> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
>> +static bool pte_spinlock(struct vm_fault *vmf)
>> +{
>> +	bool ret = false;
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +	pmd_t pmdval;
>> +#endif
>> +
>> +	/* Check if vma is still valid */
>> +	if (!(vmf->flags & FAULT_FLAG_SPECULATIVE)) {
>> +		vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
>> +		spin_lock(vmf->ptl);
>> +		return true;
>> +	}
>> +
>> +again:
>> +	local_irq_disable();
>> +	if (vma_has_changed(vmf))
>> +		goto out;
>> +
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +	/*
>> +	 * We check if the pmd value is still the same to ensure that there
>> +	 * is not a huge collapse operation in progress in our back.
>> +	 */
>> +	pmdval = READ_ONCE(*vmf->pmd);
>> +	if (!pmd_same(pmdval, vmf->orig_pmd))
>> +		goto out;
>> +#endif
>> +
>> +	vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
>> +	if (unlikely(!spin_trylock(vmf->ptl))) {
>> +		local_irq_enable();
>> +		goto again;
>> +	}
>> +
>> +	if (vma_has_changed(vmf)) {
>> +		spin_unlock(vmf->ptl);
>> +		goto out;
>> +	}
>> +
>> +	ret = true;
>> +out:
>> +	local_irq_enable();
>> +	return ret;
>> +}
>> +
>> +static bool pte_map_lock(struct vm_fault *vmf)
>> +{
>> +	bool ret = false;
>> +	pte_t *pte;
>> +	spinlock_t *ptl;
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +	pmd_t pmdval;
>> +#endif
>> +
>> +	if (!(vmf->flags & FAULT_FLAG_SPECULATIVE)) {
>> +		vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
>> +					       vmf->address, &vmf->ptl);
>> +		return true;
>> +	}
>> +
>> +	/*
>> +	 * The first vma_has_changed() guarantees the page-tables are still
>> +	 * valid, having IRQs disabled ensures they stay around, hence the
>> +	 * second vma_has_changed() to make sure they are still valid once
>> +	 * we've got the lock. After that a concurrent zap_pte_range() will
>> +	 * block on the PTL and thus we're safe.
>> +	 */
>> +again:
>> +	local_irq_disable();
>> +	if (vma_has_changed(vmf))
>> +		goto out;
>> +
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +	/*
>> +	 * We check if the pmd value is still the same to ensure that there
>> +	 * is not a huge collapse operation in progress in our back.
>> +	 */
>> +	pmdval = READ_ONCE(*vmf->pmd);
>> +	if (!pmd_same(pmdval, vmf->orig_pmd))
>> +		goto out;
>> +#endif
>> +
>> +	/*
>> +	 * Same as pte_offset_map_lock() except that we call
>> +	 * spin_trylock() in place of spin_lock() to avoid race with
>> +	 * unmap path which may have the lock and wait for this CPU
>> +	 * to invalidate TLB but this CPU has irq disabled.
>> +	 * Since we are in a speculative patch, accept it could fail
>> +	 */
>> +	ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
>> +	pte = pte_offset_map(vmf->pmd, vmf->address);
>> +	if (unlikely(!spin_trylock(ptl))) {
>> +		pte_unmap(pte);
>> +		local_irq_enable();
>> +		goto again;
>> +	}
>> +
>> +	if (vma_has_changed(vmf)) {
>> +		pte_unmap_unlock(pte, ptl);
>> +		goto out;
>> +	}
>> +
>> +	vmf->pte = pte;
>> +	vmf->ptl = ptl;
>> +	ret = true;
>> +out:
>> +	local_irq_enable();
>> +	return ret;
>> +}
>> +#else
>>  static inline bool pte_spinlock(struct vm_fault *vmf)
>>  {
>>  	vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
>> @@ -2319,6 +2432,7 @@ static inline bool pte_map_lock(struct vm_fault *vmf)
>>  				       vmf->address, &vmf->ptl);
>>  	return true;
>>  }
>> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
>>  
>>  /*
>>   * handle_pte_fault chooses page fault handler according to an entry which was
>> @@ -3208,6 +3322,14 @@ static int do_anonymous_page(struct vm_fault *vmf)
>>  		ret = check_stable_address_space(vma->vm_mm);
>>  		if (ret)
>>  			goto unlock;
>> +		/*
>> +		 * Don't call the userfaultfd during the speculative path.
>> +		 * We already checked for the VMA to not be managed through
>> +		 * userfaultfd, but it may be set in our back once we have lock
>> +		 * the pte. In such a case we can ignore it this time.
>> +		 */
>> +		if (vmf->flags & FAULT_FLAG_SPECULATIVE)
>> +			goto setpte;
>>  		/* Deliver the page fault to userland, check inside PT lock */
>>  		if (userfaultfd_missing(vma)) {
>>  			pte_unmap_unlock(vmf->pte, vmf->ptl);
>> @@ -3249,7 +3371,7 @@ static int do_anonymous_page(struct vm_fault *vmf)
>>  		goto unlock_and_release;
>>  
>>  	/* Deliver the page fault to userland, check inside PT lock */
>> -	if (userfaultfd_missing(vma)) {
>> +	if (!(vmf->flags & FAULT_FLAG_SPECULATIVE) && userfaultfd_missing(vma)) {
>>  		pte_unmap_unlock(vmf->pte, vmf->ptl);
>>  		mem_cgroup_cancel_charge(page, memcg, false);
>>  		put_page(page);
>> @@ -3994,13 +4116,22 @@ static int handle_pte_fault(struct vm_fault *vmf)
>>  
>>  	if (unlikely(pmd_none(*vmf->pmd))) {
>>  		/*
>> +		 * In the case of the speculative page fault handler we abort
>> +		 * the speculative path immediately as the pmd is probably
>> +		 * in the way to be converted in a huge one. We will try
>> +		 * again holding the mmap_sem (which implies that the collapse
>> +		 * operation is done).
>> +		 */
>> +		if (vmf->flags & FAULT_FLAG_SPECULATIVE)
>> +			return VM_FAULT_RETRY;
>> +		/*
>>  		 * Leave __pte_alloc() until later: because vm_ops->fault may
>>  		 * want to allocate huge page, and if we expose page table
>>  		 * for an instant, it will be difficult to retract from
>>  		 * concurrent faults and from rmap lookups.
>>  		 */
>>  		vmf->pte = NULL;
>> -	} else {
>> +	} else if (!(vmf->flags & FAULT_FLAG_SPECULATIVE)) {
>>  		/* See comment in pte_alloc_one_map() */
>>  		if (pmd_devmap_trans_unstable(vmf->pmd))
>>  			return 0;
>> @@ -4009,6 +4140,9 @@ static int handle_pte_fault(struct vm_fault *vmf)
>>  		 * pmd from under us anymore at this point because we hold the
>>  		 * mmap_sem read mode and khugepaged takes it in write mode.
>>  		 * So now it's safe to run pte_offset_map().
>> +		 * This is not applicable to the speculative page fault handler
>> +		 * but in that case, the pte is fetched earlier in
>> +		 * handle_speculative_fault().
>>  		 */
>>  		vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
>>  		vmf->orig_pte = *vmf->pte;
>> @@ -4031,6 +4165,8 @@ static int handle_pte_fault(struct vm_fault *vmf)
>>  	if (!vmf->pte) {
>>  		if (vma_is_anonymous(vmf->vma))
>>  			return do_anonymous_page(vmf);
>> +		else if (vmf->flags & FAULT_FLAG_SPECULATIVE)
>> +			return VM_FAULT_RETRY;
>>  		else
>>  			return do_fault(vmf);
>>  	}
>> @@ -4128,6 +4264,9 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>>  	vmf.pmd = pmd_alloc(mm, vmf.pud, address);
>>  	if (!vmf.pmd)
>>  		return VM_FAULT_OOM;
>> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
>> +	vmf.sequence = raw_read_seqcount(&vma->vm_sequence);
>> +#endif
>>  	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
>>  		ret = create_huge_pmd(&vmf);
>>  		if (!(ret & VM_FAULT_FALLBACK))
>> @@ -4161,6 +4300,201 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
>>  	return handle_pte_fault(&vmf);
>>  }
>>  
>> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
>> +/*
>> + * Tries to handle the page fault in a speculative way, without grabbing the
>> + * mmap_sem.
>> + */
>> +int __handle_speculative_fault(struct mm_struct *mm, unsigned long address,
>> +			       unsigned int flags)
>> +{
>> +	struct vm_fault vmf = {
>> +		.address = address,
>> +	};
>> +	pgd_t *pgd, pgdval;
>> +	p4d_t *p4d, p4dval;
>> +	pud_t pudval;
>> +	int seq, ret = VM_FAULT_RETRY;
>> +	struct vm_area_struct *vma;
>> +#ifdef CONFIG_NUMA
>> +	struct mempolicy *pol;
>> +#endif
>> +
>> +	/* Clear flags that may lead to release the mmap_sem to retry */
>> +	flags &= ~(FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_KILLABLE);
>> +	flags |= FAULT_FLAG_SPECULATIVE;
>> +
>> +	vma = get_vma(mm, address);
>> +	if (!vma)
>> +		return ret;
>> +
>> +	seq = raw_read_seqcount(&vma->vm_sequence); /* rmb <-> seqlock,vma_rb_erase() */
>> +	if (seq & 1)
>> +		goto out_put;
>> +
>> +	/*
>> +	 * Can't call vm_ops service has we don't know what they would do
>> +	 * with the VMA.
>> +	 * This include huge page from hugetlbfs.
>> +	 */
>> +	if (vma->vm_ops)
>> +		goto out_put;
>> +
>   Hi   Laurent
>    
>    I think that most of pagefault will leave here.   Is there any case  need to skip ?
>   I have tested the following  patch, it work well.

Hi Zhong,

Well this will allow file mapping to be handle in a speculative way, but that's
a bit dangerous today as there is no guaranty that the vm_ops.vm_fault()
operation will be fair.

In the case of the anonymous file mapping that's often not a problem, depending
on the underlying file system, but there are so many cases to check and this is
hard to say this can be done in a speculative way as is.

The huge work to do is to double check that all the code called by
vm_ops.fault() is not dealing with the mmap_sem, which could be handled using
FAULT_FLAG_RETRY_NOWAIT, and care is also needed about the resources that code
is managing as it may assume that it is under the protection of the mmap_sem in
read mode, and that can be done implicitly.

Cheers,
Laurent.

> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 936128b..9bc1545 100644
>  @@ -3893,8 +3898,6 @@ static int handle_pte_fault(struct fault_env *fe)
>         if (!fe->pte) {
>                 if (vma_is_anonymous(fe->vma))
>                         return do_anonymous_page(fe);
> -               else if (fe->flags & FAULT_FLAG_SPECULATIVE)
> -                       return VM_FAULT_RETRY;
>                 else
>                         return do_fault(fe);
>         }
> @@ -4026,20 +4029,11 @@ int __handle_speculative_fault(struct mm_struct *mm, unsigned long address,
>                 goto out_put;
>         }
>         /*
> -        * Can't call vm_ops service has we don't know what they would do
> -        * with the VMA.
> -        * This include huge page from hugetlbfs.
> -        */
> -       if (vma->vm_ops) {
> -               trace_spf_vma_notsup(_RET_IP_, vma, address);
> -               goto out_put;
> -       }
> 
> 
> Thanks
> zhong jiang
>> +	/*
>> +	 * __anon_vma_prepare() requires the mmap_sem to be held
>> +	 * because vm_next and vm_prev must be safe. This can't be guaranteed
>> +	 * in the speculative path.
>> +	 */
>> +	if (unlikely(!vma->anon_vma))
>> +		goto out_put;
>> +
>> +	vmf.vma_flags = READ_ONCE(vma->vm_flags);
>> +	vmf.vma_page_prot = READ_ONCE(vma->vm_page_prot);
>> +
>> +	/* Can't call userland page fault handler in the speculative path */
>> +	if (unlikely(vmf.vma_flags & VM_UFFD_MISSING))
>> +		goto out_put;
>> +
>> +	if (vmf.vma_flags & VM_GROWSDOWN || vmf.vma_flags & VM_GROWSUP)
>> +		/*
>> +		 * This could be detected by the check address against VMA's
>> +		 * boundaries but we want to trace it as not supported instead
>> +		 * of changed.
>> +		 */
>> +		goto out_put;
>> +
>> +	if (address < READ_ONCE(vma->vm_start)
>> +	    || READ_ONCE(vma->vm_end) <= address)
>> +		goto out_put;
>> +
>> +	if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,
>> +				       flags & FAULT_FLAG_INSTRUCTION,
>> +				       flags & FAULT_FLAG_REMOTE)) {
>> +		ret = VM_FAULT_SIGSEGV;
>> +		goto out_put;
>> +	}
>> +
>> +	/* This is one is required to check that the VMA has write access set */
>> +	if (flags & FAULT_FLAG_WRITE) {
>> +		if (unlikely(!(vmf.vma_flags & VM_WRITE))) {
>> +			ret = VM_FAULT_SIGSEGV;
>> +			goto out_put;
>> +		}
>> +	} else if (unlikely(!(vmf.vma_flags & (VM_READ|VM_EXEC|VM_WRITE)))) {
>> +		ret = VM_FAULT_SIGSEGV;
>> +		goto out_put;
>> +	}
>> +
>> +#ifdef CONFIG_NUMA
>> +	/*
>> +	 * MPOL_INTERLEAVE implies additional checks in
>> +	 * mpol_misplaced() which are not compatible with the
>> +	 *speculative page fault processing.
>> +	 */
>> +	pol = __get_vma_policy(vma, address);
>> +	if (!pol)
>> +		pol = get_task_policy(current);
>> +	if (pol && pol->mode == MPOL_INTERLEAVE)
>> +		goto out_put;
>> +#endif
>> +
>> +	/*
>> +	 * Do a speculative lookup of the PTE entry.
>> +	 */
>> +	local_irq_disable();
>> +	pgd = pgd_offset(mm, address);
>> +	pgdval = READ_ONCE(*pgd);
>> +	if (pgd_none(pgdval) || unlikely(pgd_bad(pgdval)))
>> +		goto out_walk;
>> +
>> +	p4d = p4d_offset(pgd, address);
>> +	p4dval = READ_ONCE(*p4d);
>> +	if (p4d_none(p4dval) || unlikely(p4d_bad(p4dval)))
>> +		goto out_walk;
>> +
>> +	vmf.pud = pud_offset(p4d, address);
>> +	pudval = READ_ONCE(*vmf.pud);
>> +	if (pud_none(pudval) || unlikely(pud_bad(pudval)))
>> +		goto out_walk;
>> +
>> +	/* Huge pages at PUD level are not supported. */
>> +	if (unlikely(pud_trans_huge(pudval)))
>> +		goto out_walk;
>> +
>> +	vmf.pmd = pmd_offset(vmf.pud, address);
>> +	vmf.orig_pmd = READ_ONCE(*vmf.pmd);
>> +	/*
>> +	 * pmd_none could mean that a hugepage collapse is in progress
>> +	 * in our back as collapse_huge_page() mark it before
>> +	 * invalidating the pte (which is done once the IPI is catched
>> +	 * by all CPU and we have interrupt disabled).
>> +	 * For this reason we cannot handle THP in a speculative way since we
>> +	 * can't safely indentify an in progress collapse operation done in our
>> +	 * back on that PMD.
>> +	 * Regarding the order of the following checks, see comment in
>> +	 * pmd_devmap_trans_unstable()
>> +	 */
>> +	if (unlikely(pmd_devmap(vmf.orig_pmd) ||
>> +		     pmd_none(vmf.orig_pmd) || pmd_trans_huge(vmf.orig_pmd) ||
>> +		     is_swap_pmd(vmf.orig_pmd)))
>> +		goto out_walk;
>> +
>> +	/*
>> +	 * The above does not allocate/instantiate page-tables because doing so
>> +	 * would lead to the possibility of instantiating page-tables after
>> +	 * free_pgtables() -- and consequently leaking them.
>> +	 *
>> +	 * The result is that we take at least one !speculative fault per PMD
>> +	 * in order to instantiate it.
>> +	 */
>> +
>> +	vmf.pte = pte_offset_map(vmf.pmd, address);
>> +	vmf.orig_pte = READ_ONCE(*vmf.pte);
>> +	barrier(); /* See comment in handle_pte_fault() */
>> +	if (pte_none(vmf.orig_pte)) {
>> +		pte_unmap(vmf.pte);
>> +		vmf.pte = NULL;
>> +	}
>> +
>> +	vmf.vma = vma;
>> +	vmf.pgoff = linear_page_index(vma, address);
>> +	vmf.gfp_mask = __get_fault_gfp_mask(vma);
>> +	vmf.sequence = seq;
>> +	vmf.flags = flags;
>> +
>> +	local_irq_enable();
>> +
>> +	/*
>> +	 * We need to re-validate the VMA after checking the bounds, otherwise
>> +	 * we might have a false positive on the bounds.
>> +	 */
>> +	if (read_seqcount_retry(&vma->vm_sequence, seq))
>> +		goto out_put;
>> +
>> +	mem_cgroup_oom_enable();
>> +	ret = handle_pte_fault(&vmf);
>> +	mem_cgroup_oom_disable();
>> +
>> +	put_vma(vma);
>> +
>> +	/*
>> +	 * The task may have entered a memcg OOM situation but
>> +	 * if the allocation error was handled gracefully (no
>> +	 * VM_FAULT_OOM), there is no need to kill anything.
>> +	 * Just clean up the OOM state peacefully.
>> +	 */
>> +	if (task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM))
>> +		mem_cgroup_oom_synchronize(false);
>> +	return ret;
>> +
>> +out_walk:
>> +	local_irq_enable();
>> +out_put:
>> +	put_vma(vma);
>> +	return ret;
>> +}
>> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
>> +
>>  /*
>>   * By the time we get here, we already hold the mm semaphore
>>   *
> 
> 

^ permalink raw reply

* Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier
From: Will Deacon @ 2018-07-24 16:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Rik van Riel, Peter Zijlstra, Vitaly Kuznetsov, Juergen Gross,
	Boris Ostrovsky, linux-arch, Catalin Marinas, linux-s390,
	Benjamin Herrenschmidt, linuxppc-dev, LKML, X86 ML,
	Mike Galbraith, kernel-team, Ingo Molnar, Dave Hansen
In-Reply-To: <CALCETrXLMsSBChDvrms-omwYV4LHT30GenDjbnD-+LTg55yPow@mail.gmail.com>

Hi Andy,

Sorry, I missed the arm64 question at the end of this...

On Thu, Jul 19, 2018 at 10:04:09AM -0700, Andy Lutomirski wrote:
> On Thu, Jul 19, 2018 at 9:45 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > [I added PeterZ and Vitaly -- can you see any way in which this would
> > break something obscure?  I don't.]
> >
> > On Thu, Jul 19, 2018 at 7:14 AM, Rik van Riel <riel@surriel.com> wrote:
> >> I guess we can skip both switch_ldt and load_mm_cr4 if real_prev equals
> >> next?
> >
> > Yes, AFAICS.
> >
> >>
> >> On to the lazy TLB mm_struct refcounting stuff :)
> >>
> >>>
> >>> Which refcount?  mm_users shouldn’t be hot, so I assume you’re talking about
> >>> mm_count. My suggestion is to get rid of mm_count instead of trying to
> >>> optimize it.
> >>
> >>
> >> Do you have any suggestions on how? :)
> >>
> >> The TLB shootdown sent at __exit_mm time does not get rid of the
> >> kernelthread->active_mm
> >> pointer pointing at the mm that is exiting.
> >>
> >
> > Ah, but that's conceptually very easy to fix.  Add a #define like
> > ARCH_NO_TASK_ACTIVE_MM.  Then just get rid of active_mm if that
> > #define is set.  After some grepping, there are very few users.  The
> > only nontrivial ones are the ones in kernel/ and mm/mmu_context.c that
> > are involved in the rather complicated dance of refcounting active_mm.
> > If that field goes away, it doesn't need to be refcounted.  Instead, I
> > think the refcounting can get replaced with something like:
> >
> > /*
> >  * Release any arch-internal references to mm.  Only called when
> > mm_users is zero
> >  * and all tasks using mm have either been switch_mm()'d away or have had
> >  * enter_lazy_tlb() called.
> >  */
> > extern void arch_shoot_down_dead_mm(struct mm_struct *mm);
> >
> > which the kernel calls in __mmput() after tearing down all the page
> > tables.  The body can be something like:
> >
> > if (WARN_ON(cpumask_any_but(mm_cpumask(...), ...)) {
> >   /* send an IPI.  Maybe just call tlb_flush_remove_tables() */
> > }
> >
> > (You'll also have to fix up the highly questionable users in
> > arch/x86/platform/efi/efi_64.c, but that's easy.)
> >
> > Does all that make sense?  Basically, as I understand it, the
> > expensive atomic ops you're seeing are all pointless because they're
> > enabling an optimization that hasn't actually worked for a long time,
> > if ever.
> 
> Hmm.  Xen PV has a big hack in xen_exit_mmap(), which is called from
> arch_exit_mmap(), I think.  It's a heavier weight version of more or
> less the same thing that arch_shoot_down_dead_mm() would be, except
> that it happens before exit_mmap().  But maybe Xen actually has the
> right idea.  In other words, rather doing the big pagetable free in
> exit_mmap() while there may still be other CPUs pointing at the page
> tables, the other order might make more sense.  So maybe, if
> ARCH_NO_TASK_ACTIVE_MM is set, arch_exit_mmap() should be responsible
> for getting rid of all secret arch references to the mm.
> 
> Hmm.  ARCH_FREE_UNUSED_MM_IMMEDIATELY might be a better name.
> 
> I added some more arch maintainers.  The idea here is that, on x86 at
> least, task->active_mm and all its refcounting is pure overhead.  When
> a process exits, __mmput() gets called, but the core kernel has a
> longstanding "optimization" in which other tasks (kernel threads and
> idle tasks) may have ->active_mm pointing at this mm.  This is nasty,
> complicated, and hurts performance on large systems, since it requires
> extra atomic operations whenever a CPU switches between real users
> threads and idle/kernel threads.
> 
> It's also almost completely worthless on x86 at least, since __mmput()
> frees pagetables, and that operation *already* forces a remote TLB
> flush, so we might as well zap all the active_mm references at the
> same time.
> 
> But arm64 has real HW remote flushes.  Does arm64 actually benefit
> from the active_mm optimization?  What happens on arm64 when a process
> exits?  How about s390?  I suspect that x390 has rather larger systems
> than arm64, where the cost of the reference counting can be much
> higher.

IIRC, the TLB invalidation on task exit has the fullmm field set in the
mmu_gather structure, so we don't actually do any TLB invalidation at all.
Instead, we just don't re-allocate the ASID and invalidate the whole TLB
when we run out of ASIDs (they're 16-bit on most Armv8 CPUs).

Does that answer your question?

Will

^ permalink raw reply

* [PATCH 0/7] powerpc: Modernize unhandled signals message
From: Murilo Opsfelder Araujo @ 2018-07-24 19:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alastair D'Silva, Andrew Donnellan, Balbir Singh,
	Benjamin Herrenschmidt, Christophe Leroy, Cyril Bur,
	Eric W . Biederman, Michael Ellerman, Michael Neuling,
	Murilo Opsfelder Araujo, Nicholas Piggin, Paul Mackerras,
	Simon Guo, Sukadev Bhattiprolu, Tobin C . Harding, linuxppc-dev

Hi, everyone.

This series was inspired by the need to modernize and display more
informative messages about unhandled signals.

The "unhandled signal NN" is not very informative.  We thought it would
be helpful adding a human-readable message describing what the signal
number means, printing the VMA address, and dumping the instructions.

We can add more informative messages, like informing what each code of a
SIGSEGV signal means.  We are open to suggestions.

I have collected some early feedback from Michael Ellerman about this
series and would love to hear more feedback from you all.

Before this series:

    Jul 24 13:01:07 localhost kernel: pandafault[5989]: unhandled signal 11 at 00000000100007d0 nip 000000001000061c lr 00003fff85a75100 code 2

After this series:

    Jul 24 13:08:01 localhost kernel: pandafault[10758]: segfault (11) at 00000000100007d0 nip 000000001000061c lr 00007fffabc85100 code 2 in pandafault[10000000+10000]
    Jul 24 13:08:01 localhost kernel: Instruction dump:
    Jul 24 13:08:01 localhost kernel: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
    Jul 24 13:08:01 localhost kernel: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040

Cheers
Murilo

Murilo Opsfelder Araujo (7):
  powerpc/traps: Print unhandled signals in a separate function
  powerpc/traps: Return early in show_signal_msg()
  powerpc/reg: Add REG_FMT definition
  powerpc/traps: Use REG_FMT in show_signal_msg()
  powerpc/traps: Print VMA for unhandled signals
  powerpc/traps: Print signal name for unhandled signals
  powerpc/traps: Show instructions on exceptions

 arch/powerpc/include/asm/reg.h        |  6 +++
 arch/powerpc/include/asm/stacktrace.h |  7 +++
 arch/powerpc/kernel/process.c         | 28 +++++-----
 arch/powerpc/kernel/traps.c           | 73 +++++++++++++++++++++++----
 4 files changed, 89 insertions(+), 25 deletions(-)
 create mode 100644 arch/powerpc/include/asm/stacktrace.h

-- 
2.17.1

^ permalink raw reply

* [PATCH 1/7] powerpc/traps: Print unhandled signals in a separate function
From: Murilo Opsfelder Araujo @ 2018-07-24 19:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alastair D'Silva, Andrew Donnellan, Balbir Singh,
	Benjamin Herrenschmidt, Christophe Leroy, Cyril Bur,
	Eric W . Biederman, Michael Ellerman, Michael Neuling,
	Murilo Opsfelder Araujo, Nicholas Piggin, Paul Mackerras,
	Simon Guo, Sukadev Bhattiprolu, Tobin C . Harding, linuxppc-dev
In-Reply-To: <20180724192720.32417-1-muriloo@linux.ibm.com>

Isolate the logic of printing unhandled signals out of _exception_pkey().  No
functional change, only code rearrangement.

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
---
 arch/powerpc/kernel/traps.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0e17dcb48720..cbd3dc365193 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -301,26 +301,32 @@ void user_single_step_siginfo(struct task_struct *tsk,
 	info->si_addr = (void __user *)regs->nip;
 }
 
+static void show_signal_msg(int signr, struct pt_regs *regs, int code,
+			    unsigned long addr)
+{
+	const char fmt32[] = KERN_INFO "%s[%d]: unhandled signal %d " \
+		"at %08lx nip %08lx lr %08lx code %x\n";
+	const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
+		"at %016lx nip %016lx lr %016lx code %x\n";
+
+	if (show_unhandled_signals && unhandled_signal(current, signr)) {
+		printk_ratelimited(regs->msr & MSR_64BIT ? fmt64 : fmt32,
+				   current->comm, current->pid, signr,
+				   addr, regs->nip, regs->link, code);
+	}
+}
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
-		unsigned long addr, int key)
+		     unsigned long addr, int key)
 {
 	siginfo_t info;
-	const char fmt32[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-			"at %08lx nip %08lx lr %08lx code %x\n";
-	const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-			"at %016lx nip %016lx lr %016lx code %x\n";
 
 	if (!user_mode(regs)) {
 		die("Exception in kernel mode", regs, signr);
 		return;
 	}
 
-	if (show_unhandled_signals && unhandled_signal(current, signr)) {
-		printk_ratelimited(regs->msr & MSR_64BIT ? fmt64 : fmt32,
-				   current->comm, current->pid, signr,
-				   addr, regs->nip, regs->link, code);
-	}
+	show_signal_msg(signr, regs, code, addr);
 
 	if (arch_irqs_disabled() && !arch_irq_disabled_regs(regs))
 		local_irq_enable();
-- 
2.17.1

^ permalink raw reply related

* [PATCH 2/7] powerpc/traps: Return early in show_signal_msg()
From: Murilo Opsfelder Araujo @ 2018-07-24 19:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alastair D'Silva, Andrew Donnellan, Balbir Singh,
	Benjamin Herrenschmidt, Christophe Leroy, Cyril Bur,
	Eric W . Biederman, Michael Ellerman, Michael Neuling,
	Murilo Opsfelder Araujo, Nicholas Piggin, Paul Mackerras,
	Simon Guo, Sukadev Bhattiprolu, Tobin C . Harding, linuxppc-dev
In-Reply-To: <20180724192720.32417-1-muriloo@linux.ibm.com>

Modify logic of show_signal_msg() to return early, if possible.  Replace
printk_ratelimited() by printk() and a default rate limit burst to limit
displaying unhandled signals messages.

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
---
 arch/powerpc/kernel/traps.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index cbd3dc365193..4faab4705774 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -301,6 +301,13 @@ void user_single_step_siginfo(struct task_struct *tsk,
 	info->si_addr = (void __user *)regs->nip;
 }
 
+static bool show_unhandled_signals_ratelimited(void)
+{
+	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+	return show_unhandled_signals && __ratelimit(&rs);
+}
+
 static void show_signal_msg(int signr, struct pt_regs *regs, int code,
 			    unsigned long addr)
 {
@@ -309,11 +316,12 @@ static void show_signal_msg(int signr, struct pt_regs *regs, int code,
 	const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
 		"at %016lx nip %016lx lr %016lx code %x\n";
 
-	if (show_unhandled_signals && unhandled_signal(current, signr)) {
-		printk_ratelimited(regs->msr & MSR_64BIT ? fmt64 : fmt32,
-				   current->comm, current->pid, signr,
-				   addr, regs->nip, regs->link, code);
-	}
+	if (!unhandled_signal(current, signr))
+		return;
+
+	printk(regs->msr & MSR_64BIT ? fmt64 : fmt32,
+	       current->comm, current->pid, signr,
+	       addr, regs->nip, regs->link, code);
 }
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
@@ -326,7 +334,8 @@ void _exception_pkey(int signr, struct pt_regs *regs, int code,
 		return;
 	}
 
-	show_signal_msg(signr, regs, code, addr);
+	if (show_unhandled_signals_ratelimited())
+		show_signal_msg(signr, regs, code, addr);
 
 	if (arch_irqs_disabled() && !arch_irq_disabled_regs(regs))
 		local_irq_enable();
-- 
2.17.1

^ permalink raw reply related

* [PATCH 3/7] powerpc/reg: Add REG_FMT definition
From: Murilo Opsfelder Araujo @ 2018-07-24 19:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alastair D'Silva, Andrew Donnellan, Balbir Singh,
	Benjamin Herrenschmidt, Christophe Leroy, Cyril Bur,
	Eric W . Biederman, Michael Ellerman, Michael Neuling,
	Murilo Opsfelder Araujo, Nicholas Piggin, Paul Mackerras,
	Simon Guo, Sukadev Bhattiprolu, Tobin C . Harding, linuxppc-dev
In-Reply-To: <20180724192720.32417-1-muriloo@linux.ibm.com>

Make REG definition, in arch/powerpc/kernel/process.c, generic enough by
renaming it to REG_FMT and placing it in arch/powerpc/include/asm/reg.h to be
used elsewhere.

Replace occurrences of REG by REG_FMT in arch/powerpc/kernel/process.c.

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
---
 arch/powerpc/include/asm/reg.h |  6 ++++++
 arch/powerpc/kernel/process.c  | 22 ++++++++++------------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 858aa7984ab0..d6c5c77383de 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1319,6 +1319,12 @@
 #define PVR_ARCH_207	0x0f000004
 #define PVR_ARCH_300	0x0f000005
 
+#ifdef CONFIG_PPC64
+#define REG_FMT		"%016lx"
+#else
+#define REG_FMT		"%08lx"
+#endif /* CONFIG_PPC64 */
+
 /* Macros for setting and retrieving special purpose registers */
 #ifndef __ASSEMBLY__
 #define mfmsr()		({unsigned long rval; \
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 27f0caee55ea..b1af3390249c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1381,11 +1381,9 @@ static void print_msr_bits(unsigned long val)
 }
 
 #ifdef CONFIG_PPC64
-#define REG		"%016lx"
 #define REGS_PER_LINE	4
 #define LAST_VOLATILE	13
 #else
-#define REG		"%08lx"
 #define REGS_PER_LINE	8
 #define LAST_VOLATILE	12
 #endif
@@ -1396,21 +1394,21 @@ void show_regs(struct pt_regs * regs)
 
 	show_regs_print_info(KERN_DEFAULT);
 
-	printk("NIP:  "REG" LR: "REG" CTR: "REG"\n",
+	printk("NIP:  "REG_FMT" LR: "REG_FMT" CTR: "REG_FMT"\n",
 	       regs->nip, regs->link, regs->ctr);
 	printk("REGS: %px TRAP: %04lx   %s  (%s)\n",
 	       regs, regs->trap, print_tainted(), init_utsname()->release);
-	printk("MSR:  "REG" ", regs->msr);
+	printk("MSR:  "REG_FMT" ", regs->msr);
 	print_msr_bits(regs->msr);
-	pr_cont("  CR: %08lx  XER: %08lx\n", regs->ccr, regs->xer);
+	pr_cont("  CR: "REG_FMT"  XER: "REG_FMT"\n", regs->ccr, regs->xer);
 	trap = TRAP(regs);
 	if ((TRAP(regs) != 0xc00) && cpu_has_feature(CPU_FTR_CFAR))
-		pr_cont("CFAR: "REG" ", regs->orig_gpr3);
+		pr_cont("CFAR: "REG_FMT" ", regs->orig_gpr3);
 	if (trap == 0x200 || trap == 0x300 || trap == 0x600)
 #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
-		pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, regs->dsisr);
+		pr_cont("DEAR: "REG_FMT" ESR: "REG_FMT" ", regs->dar, regs->dsisr);
 #else
-		pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, regs->dsisr);
+		pr_cont("DAR: "REG_FMT" DSISR: "REG_FMT" ", regs->dar, regs->dsisr);
 #endif
 #ifdef CONFIG_PPC64
 	pr_cont("IRQMASK: %lx ", regs->softe);
@@ -1423,7 +1421,7 @@ void show_regs(struct pt_regs * regs)
 	for (i = 0;  i < 32;  i++) {
 		if ((i % REGS_PER_LINE) == 0)
 			pr_cont("\nGPR%02d: ", i);
-		pr_cont(REG " ", regs->gpr[i]);
+		pr_cont(REG_FMT " ", regs->gpr[i]);
 		if (i == LAST_VOLATILE && !FULL_REGS(regs))
 			break;
 	}
@@ -1433,8 +1431,8 @@ void show_regs(struct pt_regs * regs)
 	 * Lookup NIP late so we have the best change of getting the
 	 * above info out without failing
 	 */
-	printk("NIP ["REG"] %pS\n", regs->nip, (void *)regs->nip);
-	printk("LR ["REG"] %pS\n", regs->link, (void *)regs->link);
+	printk("NIP ["REG_FMT"] %pS\n", regs->nip, (void *)regs->nip);
+	printk("LR ["REG_FMT"] %pS\n", regs->link, (void *)regs->link);
 #endif
 	show_stack(current, (unsigned long *) regs->gpr[1]);
 	if (!user_mode(regs))
@@ -2038,7 +2036,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
 		newsp = stack[0];
 		ip = stack[STACK_FRAME_LR_SAVE];
 		if (!firstframe || ip != lr) {
-			printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip);
+			printk("["REG_FMT"] ["REG_FMT"] %pS", sp, ip, (void *)ip);
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 			if ((ip == rth) && curr_frame >= 0) {
 				pr_cont(" (%pS)",
-- 
2.17.1

^ permalink raw reply related

* [PATCH 4/7] powerpc/traps: Use REG_FMT in show_signal_msg()
From: Murilo Opsfelder Araujo @ 2018-07-24 19:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alastair D'Silva, Andrew Donnellan, Balbir Singh,
	Benjamin Herrenschmidt, Christophe Leroy, Cyril Bur,
	Eric W . Biederman, Michael Ellerman, Michael Neuling,
	Murilo Opsfelder Araujo, Nicholas Piggin, Paul Mackerras,
	Simon Guo, Sukadev Bhattiprolu, Tobin C . Harding, linuxppc-dev
In-Reply-To: <20180724192720.32417-1-muriloo@linux.ibm.com>

Simplify the message format by using REG_FMT as the register format.  This
avoids having two different formats and avoids checking for MSR_64BIT.

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
---
 arch/powerpc/kernel/traps.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 4faab4705774..047d980ac776 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -311,17 +311,13 @@ static bool show_unhandled_signals_ratelimited(void)
 static void show_signal_msg(int signr, struct pt_regs *regs, int code,
 			    unsigned long addr)
 {
-	const char fmt32[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-		"at %08lx nip %08lx lr %08lx code %x\n";
-	const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-		"at %016lx nip %016lx lr %016lx code %x\n";
-
 	if (!unhandled_signal(current, signr))
 		return;
 
-	printk(regs->msr & MSR_64BIT ? fmt64 : fmt32,
-	       current->comm, current->pid, signr,
-	       addr, regs->nip, regs->link, code);
+	pr_info("%s[%d]: unhandled signal %d at "REG_FMT \
+		" nip "REG_FMT" lr "REG_FMT" code %x\n",
+		current->comm, current->pid, signr, addr,
+		regs->nip, regs->link, code);
 }
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
-- 
2.17.1

^ permalink raw reply related

* [PATCH 5/7] powerpc/traps: Print VMA for unhandled signals
From: Murilo Opsfelder Araujo @ 2018-07-24 19:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alastair D'Silva, Andrew Donnellan, Balbir Singh,
	Benjamin Herrenschmidt, Christophe Leroy, Cyril Bur,
	Eric W . Biederman, Michael Ellerman, Michael Neuling,
	Murilo Opsfelder Araujo, Nicholas Piggin, Paul Mackerras,
	Simon Guo, Sukadev Bhattiprolu, Tobin C . Harding, linuxppc-dev
In-Reply-To: <20180724192720.32417-1-muriloo@linux.ibm.com>

This adds VMA address in the message printed for unhandled signals, similarly to
what other architectures, like x86, print.

Before this patch, a page fault looked like:

    Jul 11 15:56:25 localhost kernel: pandafault[61470]: unhandled signal 11 at 00000000100007d0 nip 000000001000061c lr 00007fff8d185100 code 2

After this patch, a page fault looks like:

    Jul 11 16:04:11 localhost kernel: pandafault[6303]: unhandled signal 11 at 00000000100007d0 nip 000000001000061c lr 00007fff93c55100 code 2 in pandafault[10000000+10000]

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
---
 arch/powerpc/kernel/traps.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 047d980ac776..e6c43ef9fb50 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -315,9 +315,13 @@ static void show_signal_msg(int signr, struct pt_regs *regs, int code,
 		return;
 
 	pr_info("%s[%d]: unhandled signal %d at "REG_FMT \
-		" nip "REG_FMT" lr "REG_FMT" code %x\n",
+		" nip "REG_FMT" lr "REG_FMT" code %x",
 		current->comm, current->pid, signr, addr,
 		regs->nip, regs->link, code);
+
+	print_vma_addr(KERN_CONT " in ", regs->nip);
+
+	pr_cont("\n");
 }
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
-- 
2.17.1

^ permalink raw reply related

* [PATCH 6/7] powerpc/traps: Print signal name for unhandled signals
From: Murilo Opsfelder Araujo @ 2018-07-24 19:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alastair D'Silva, Andrew Donnellan, Balbir Singh,
	Benjamin Herrenschmidt, Christophe Leroy, Cyril Bur,
	Eric W . Biederman, Michael Ellerman, Michael Neuling,
	Murilo Opsfelder Araujo, Nicholas Piggin, Paul Mackerras,
	Simon Guo, Sukadev Bhattiprolu, Tobin C . Harding, linuxppc-dev
In-Reply-To: <20180724192720.32417-1-muriloo@linux.ibm.com>

This adds a human-readable name in the unhandled signal message.

Before this patch, a page fault looked like:

    Jul 11 16:04:11 localhost kernel: pandafault[6303]: unhandled signal 11 at 00000000100007d0 nip 000000001000061c lr 00007fff93c55100 code 2 in pandafault[10000000+10000]

After this patch, a page fault looks like:

    Jul 11 18:14:48 localhost kernel: pandafault[6352]: segfault (11) at 000000013a2a09f8 nip 000000013a2a086c lr 00007fffb63e5100 code 2 in pandafault[13a2a0000+10000]

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
---
 arch/powerpc/kernel/traps.c | 43 +++++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index e6c43ef9fb50..e55ee639d010 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -96,6 +96,41 @@ EXPORT_SYMBOL(__debugger_fault_handler);
 #define TM_DEBUG(x...) do { } while(0)
 #endif
 
+static const char *signames[SIGRTMIN + 1] = {
+	"UNKNOWN",
+	"SIGHUP",			// 1
+	"SIGINT",			// 2
+	"SIGQUIT",			// 3
+	"SIGILL",			// 4
+	"unhandled trap",		// 5 = SIGTRAP
+	"SIGABRT",			// 6 = SIGIOT
+	"bus error",			// 7 = SIGBUS
+	"floating point exception",	// 8 = SIGFPE
+	"illegal instruction",		// 9 = SIGILL
+	"SIGUSR1",			// 10
+	"segfault",			// 11 = SIGSEGV
+	"SIGUSR2",			// 12
+	"SIGPIPE",			// 13
+	"SIGALRM",			// 14
+	"SIGTERM",			// 15
+	"SIGSTKFLT",			// 16
+	"SIGCHLD",			// 17
+	"SIGCONT",			// 18
+	"SIGSTOP",			// 19
+	"SIGTSTP",			// 20
+	"SIGTTIN",			// 21
+	"SIGTTOU",			// 22
+	"SIGURG",			// 23
+	"SIGXCPU",			// 24
+	"SIGXFSZ",			// 25
+	"SIGVTALRM",			// 26
+	"SIGPROF",			// 27
+	"SIGWINCH",			// 28
+	"SIGIO",			// 29 = SIGPOLL = SIGLOST
+	"SIGPWR",			// 30
+	"SIGSYS",			// 31 = SIGUNUSED
+};
+
 /*
  * Trap & Exception support
  */
@@ -314,10 +349,10 @@ static void show_signal_msg(int signr, struct pt_regs *regs, int code,
 	if (!unhandled_signal(current, signr))
 		return;
 
-	pr_info("%s[%d]: unhandled signal %d at "REG_FMT \
-		" nip "REG_FMT" lr "REG_FMT" code %x",
-		current->comm, current->pid, signr, addr,
-		regs->nip, regs->link, code);
+	pr_info("%s[%d]: %s (%d) at "REG_FMT" nip "REG_FMT \
+		" lr "REG_FMT" code %x",
+		current->comm, current->pid, signames[signr],
+		signr, addr, regs->nip, regs->link, code);
 
 	print_vma_addr(KERN_CONT " in ", regs->nip);
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH 7/7] powerpc/traps: Show instructions on exceptions
From: Murilo Opsfelder Araujo @ 2018-07-24 19:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alastair D'Silva, Andrew Donnellan, Balbir Singh,
	Benjamin Herrenschmidt, Christophe Leroy, Cyril Bur,
	Eric W . Biederman, Michael Ellerman, Michael Neuling,
	Murilo Opsfelder Araujo, Nicholas Piggin, Paul Mackerras,
	Simon Guo, Sukadev Bhattiprolu, Tobin C . Harding, linuxppc-dev
In-Reply-To: <20180724192720.32417-1-muriloo@linux.ibm.com>

Move show_instructions() declaration to arch/powerpc/include/asm/stacktrace.h
and include asm/stracktrace.h in arch/powerpc/kernel/process.c, which contains
the implementation.

Modify show_instructions() not to call __kernel_text_address(), allowing
userspace instruction dump.  probe_kernel_address(), which returns -EFAULT if
something goes wrong, is still being called.

Call show_instructions() in arch/powerpc/kernel/traps.c to dump instructions at
faulty location, useful to debugging.

Before this patch, an unhandled signal message looked like:

    Jul 24 09:57:00 localhost kernel: pandafault[10524]: segfault (11) at 00000000100007d0 nip 000000001000061c lr 00007fffbd295100 code 2 in pandafault[10000000+10000]

After this patch, it looks like:

    Jul 24 09:57:00 localhost kernel: pandafault[10524]: segfault (11) at 00000000100007d0 nip 000000001000061c lr 00007fffbd295100 code 2 in pandafault[10000000+10000]
    Jul 24 09:57:00 localhost kernel: Instruction dump:
    Jul 24 09:57:00 localhost kernel: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
    Jul 24 09:57:00 localhost kernel: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
---
 arch/powerpc/include/asm/stacktrace.h | 7 +++++++
 arch/powerpc/kernel/process.c         | 6 +++---
 arch/powerpc/kernel/traps.c           | 3 +++
 3 files changed, 13 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/stacktrace.h

diff --git a/arch/powerpc/include/asm/stacktrace.h b/arch/powerpc/include/asm/stacktrace.h
new file mode 100644
index 000000000000..46e5ef451578
--- /dev/null
+++ b/arch/powerpc/include/asm/stacktrace.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_STACKTRACE_H
+#define _ASM_POWERPC_STACKTRACE_H
+
+void show_instructions(struct pt_regs *regs);
+
+#endif /* _ASM_POWERPC_STACKTRACE_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b1af3390249c..ee1d63e03c52 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -52,6 +52,7 @@
 #include <asm/machdep.h>
 #include <asm/time.h>
 #include <asm/runlatch.h>
+#include <asm/stacktrace.h>
 #include <asm/syscalls.h>
 #include <asm/switch_to.h>
 #include <asm/tm.h>
@@ -1261,7 +1262,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
 
 static int instructions_to_print = 16;
 
-static void show_instructions(struct pt_regs *regs)
+void show_instructions(struct pt_regs *regs)
 {
 	int i;
 	unsigned long pc = regs->nip - (instructions_to_print * 3 / 4 *
@@ -1283,8 +1284,7 @@ static void show_instructions(struct pt_regs *regs)
 			pc = (unsigned long)phys_to_virt(pc);
 #endif
 
-		if (!__kernel_text_address(pc) ||
-		     probe_kernel_address((unsigned int __user *)pc, instr)) {
+		if (probe_kernel_address((unsigned int __user *)pc, instr)) {
 			pr_cont("XXXXXXXX ");
 		} else {
 			if (regs->nip == pc)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index e55ee639d010..3beca17ac1b1 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -70,6 +70,7 @@
 #include <asm/hmi.h>
 #include <sysdev/fsl_pci.h>
 #include <asm/kprobes.h>
+#include <asm/stacktrace.h>
 
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC_CORE)
 int (*__debugger)(struct pt_regs *regs) __read_mostly;
@@ -357,6 +358,8 @@ static void show_signal_msg(int signr, struct pt_regs *regs, int code,
 	print_vma_addr(KERN_CONT " in ", regs->nip);
 
 	pr_cont("\n");
+
+	show_instructions(regs);
 }
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8
From: Simon Horman @ 2018-07-24 15:33 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andrew Morton, Joe Perches, Samuel Ortiz, David S. Miller,
	Rob Herring, Michael Ellerman, Jonathan Cameron, linux-wireless,
	netdev, devicetree, linux-kernel, linux-arm-kernel, linux-crypto,
	linuxppc-dev, linux-iio, linux-pm, lvs-devel, netfilter-devel,
	coreteam
In-Reply-To: <20180724111600.4158975-1-arnd@arndb.de>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 32201 bytes --]

On Tue, Jul 24, 2018 at 01:13:25PM +0200, Arnd Bergmann wrote:
> Almost all files in the kernel are either plain text or UTF-8
> encoded. A couple however are ISO_8859-1, usually just a few
> characters in a C comments, for historic reasons.
> 
> This converts them all to UTF-8 for consistency.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  .../devicetree/bindings/net/nfc/pn544.txt     |   2 +-
>  arch/arm/boot/dts/sun4i-a10-inet97fv2.dts     |   2 +-
>  arch/arm/crypto/sha256_glue.c                 |   2 +-
>  arch/arm/crypto/sha256_neon_glue.c            |   4 +-
>  drivers/crypto/vmx/ghashp8-ppc.pl             |  12 +-
>  drivers/iio/dac/ltc2632.c                     |   2 +-
>  drivers/power/reset/ltc2952-poweroff.c        |   4 +-
>  kernel/events/callchain.c                     |   2 +-
>  net/netfilter/ipvs/Kconfig                    |   8 +-
>  net/netfilter/ipvs/ip_vs_mh.c                 |   4 +-

IPVS portion:

Acked-by: Simon Horman <horms@verge.net.au>


>  tools/power/cpupower/po/de.po                 |  44 +++----
>  tools/power/cpupower/po/fr.po                 | 120 +++++++++---------
>  12 files changed, 103 insertions(+), 103 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/nfc/pn544.txt b/Documentation/devicetree/bindings/net/nfc/pn544.txt
> index 538a86f7b2b0..72593f056b75 100644
> --- a/Documentation/devicetree/bindings/net/nfc/pn544.txt
> +++ b/Documentation/devicetree/bindings/net/nfc/pn544.txt
> @@ -2,7 +2,7 @@
>  
>  Required properties:
>  - compatible: Should be "nxp,pn544-i2c".
> -- clock-frequency: I²C work frequency.
> +- clock-frequency: IÂ²C work frequency.
>  - reg: address on the bus
>  - interrupt-parent: phandle for the interrupt gpio controller
>  - interrupts: GPIO interrupt to which the chip is connected
> diff --git a/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts b/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
> index 5d096528e75a..71c27ea0b53e 100644
> --- a/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
> +++ b/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
> @@ -1,7 +1,7 @@
>  /*
>   * Copyright 2014 Open Source Support GmbH
>   *
> - * David Lanzendörfer <david.lanzendoerfer@o2s.ch>
> + * David LanzendÃ¶rfer <david.lanzendoerfer@o2s.ch>
>   *
>   * This file is dual-licensed: you can use it either under the terms
>   * of the GPL or the X11 license, at your option. Note that this dual
> diff --git a/arch/arm/crypto/sha256_glue.c b/arch/arm/crypto/sha256_glue.c
> index bf8ccff2c9d0..0ae900e778f3 100644
> --- a/arch/arm/crypto/sha256_glue.c
> +++ b/arch/arm/crypto/sha256_glue.c
> @@ -2,7 +2,7 @@
>   * Glue code for the SHA256 Secure Hash Algorithm assembly implementation
>   * using optimized ARM assembler and NEON instructions.
>   *
> - * Copyright © 2015 Google Inc.
> + * Copyright Â© 2015 Google Inc.
>   *
>   * This file is based on sha256_ssse3_glue.c:
>   *   Copyright (C) 2013 Intel Corporation
> diff --git a/arch/arm/crypto/sha256_neon_glue.c b/arch/arm/crypto/sha256_neon_glue.c
> index 9bbee56fbdc8..1d82c6cd31a4 100644
> --- a/arch/arm/crypto/sha256_neon_glue.c
> +++ b/arch/arm/crypto/sha256_neon_glue.c
> @@ -2,10 +2,10 @@
>   * Glue code for the SHA256 Secure Hash Algorithm assembly implementation
>   * using NEON instructions.
>   *
> - * Copyright © 2015 Google Inc.
> + * Copyright Â© 2015 Google Inc.
>   *
>   * This file is based on sha512_neon_glue.c:
> - *   Copyright © 2014 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> + *   Copyright Â© 2014 Jussi Kivilinna <jussi.kivilinna@iki.fi>
>   *
>   * This program is free software; you can redistribute it and/or modify it
>   * under the terms of the GNU General Public License as published by the Free
> diff --git a/drivers/crypto/vmx/ghashp8-ppc.pl b/drivers/crypto/vmx/ghashp8-ppc.pl
> index f746af271460..38b06503ede0 100644
> --- a/drivers/crypto/vmx/ghashp8-ppc.pl
> +++ b/drivers/crypto/vmx/ghashp8-ppc.pl
> @@ -129,9 +129,9 @@ $code=<<___;
>  	 le?vperm	$IN,$IN,$IN,$lemask
>  	vxor		$zero,$zero,$zero
>  
> -	vpmsumd		$Xl,$IN,$Hl		# H.lo·Xi.lo
> -	vpmsumd		$Xm,$IN,$H		# H.hi·Xi.lo+H.lo·Xi.hi
> -	vpmsumd		$Xh,$IN,$Hh		# H.hi·Xi.hi
> +	vpmsumd		$Xl,$IN,$Hl		# H.loÂ·Xi.lo
> +	vpmsumd		$Xm,$IN,$H		# H.hiÂ·Xi.lo+H.loÂ·Xi.hi
> +	vpmsumd		$Xh,$IN,$Hh		# H.hiÂ·Xi.hi
>  
>  	vpmsumd		$t2,$Xl,$xC2		# 1st phase
>  
> @@ -187,11 +187,11 @@ $code=<<___;
>  .align	5
>  Loop:
>  	 subic		$len,$len,16
> -	vpmsumd		$Xl,$IN,$Hl		# H.lo·Xi.lo
> +	vpmsumd		$Xl,$IN,$Hl		# H.loÂ·Xi.lo
>  	 subfe.		r0,r0,r0		# borrow?-1:0
> -	vpmsumd		$Xm,$IN,$H		# H.hi·Xi.lo+H.lo·Xi.hi
> +	vpmsumd		$Xm,$IN,$H		# H.hiÂ·Xi.lo+H.loÂ·Xi.hi
>  	 and		r0,r0,$len
> -	vpmsumd		$Xh,$IN,$Hh		# H.hi·Xi.hi
> +	vpmsumd		$Xh,$IN,$Hh		# H.hiÂ·Xi.hi
>  	 add		$inp,$inp,r0
>  
>  	vpmsumd		$t2,$Xl,$xC2		# 1st phase
> diff --git a/drivers/iio/dac/ltc2632.c b/drivers/iio/dac/ltc2632.c
> index cca278eaa138..885105135580 100644
> --- a/drivers/iio/dac/ltc2632.c
> +++ b/drivers/iio/dac/ltc2632.c
> @@ -1,7 +1,7 @@
>  /*
>   * LTC2632 Digital to analog convertors spi driver
>   *
> - * Copyright 2017 Maxime Roussin-Bélanger
> + * Copyright 2017 Maxime Roussin-BÃ©langer
>   * expanded by Silvan Murer <silvan.murer@gmail.com>
>   *
>   * Licensed under the GPL-2.
> diff --git a/drivers/power/reset/ltc2952-poweroff.c b/drivers/power/reset/ltc2952-poweroff.c
> index 6b911b6b10a6..c484584745bc 100644
> --- a/drivers/power/reset/ltc2952-poweroff.c
> +++ b/drivers/power/reset/ltc2952-poweroff.c
> @@ -2,7 +2,7 @@
>   * LTC2952 (PowerPath) driver
>   *
>   * Copyright (C) 2014, Xsens Technologies BV <info@xsens.com>
> - * Maintainer: René Moll <linux@r-moll.nl>
> + * Maintainer: RenÃ© Moll <linux@r-moll.nl>
>   *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU General Public License
> @@ -319,6 +319,6 @@ static struct platform_driver ltc2952_poweroff_driver = {
>  
>  module_platform_driver(ltc2952_poweroff_driver);
>  
> -MODULE_AUTHOR("René Moll <rene.moll@xsens.com>");
> +MODULE_AUTHOR("RenÃ© Moll <rene.moll@xsens.com>");
>  MODULE_DESCRIPTION("LTC PowerPath power-off driver");
>  MODULE_LICENSE("GPL v2");
> diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
> index c187aa3df3c8..24a77c34e9ad 100644
> --- a/kernel/events/callchain.c
> +++ b/kernel/events/callchain.c
> @@ -4,7 +4,7 @@
>   *  Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de>
>   *  Copyright (C) 2008-2011 Red Hat, Inc., Ingo Molnar
>   *  Copyright (C) 2008-2011 Red Hat, Inc., Peter Zijlstra
> - *  Copyright  ©  2009 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
> + *  Copyright  Â©  2009 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
>   *
>   * For licensing details see kernel-base/COPYING
>   */
> diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
> index 05dc1b77e466..cad48d07c818 100644
> --- a/net/netfilter/ipvs/Kconfig
> +++ b/net/netfilter/ipvs/Kconfig
> @@ -296,10 +296,10 @@ config IP_VS_MH_TAB_INDEX
>  	  stored in a hash table. This table is assigned by a preference
>  	  list of the positions to each destination until all slots in
>  	  the table are filled. The index determines the prime for size of
> -	  the table as 251, 509, 1021, 2039, 4093, 8191, 16381, 32749,
> -	  65521 or 131071. When using weights to allow destinations to
> -	  receive more connections, the table is assigned an amount
> -	  proportional to the weights specified. The table needs to be large
> +	  the table asÂ 251, 509, 1021, 2039, 4093, 8191, 16381, 32749,
> +	  65521 or 131071.Â When using weights to allow destinations to
> +	  receive more connections,Â the table is assigned an amount
> +	  proportional to the weights specified.Â The table needs to be large
>  	  enough to effectively fit all the destinations multiplied by their
>  	  respective weights.
>  
> diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> index 0f795b186eb3..94d9d349ebb0 100644
> --- a/net/netfilter/ipvs/ip_vs_mh.c
> +++ b/net/netfilter/ipvs/ip_vs_mh.c
> @@ -5,10 +5,10 @@
>   *
>   */
>  
> -/* The mh algorithm is to assign a preference list of all the lookup
> +/* The mh algorithm is to assignÂ a preference list of all the lookup
>   * table positions to each destination and populate the table with
>   * the most-preferred position of destinations. Then it is to select
> - * destination with the hash key of source IP address through looking
> + * destination with the hash key of source IP addressÂ through looking
>   * up a the lookup table.
>   *
>   * The algorithm is detailed in:
> diff --git a/tools/power/cpupower/po/de.po b/tools/power/cpupower/po/de.po
> index 78c09e51663a..840c17cc450a 100644
> --- a/tools/power/cpupower/po/de.po
> +++ b/tools/power/cpupower/po/de.po
> @@ -323,12 +323,12 @@ msgstr "  Hardwarebedingte Grenzen der Taktfrequenz: "
>  #: utils/cpufreq-info.c:256
>  #, c-format
>  msgid "  available frequency steps: "
> -msgstr "  mögliche Taktfrequenzen: "
> +msgstr "  mÃ¶gliche Taktfrequenzen: "
>  
>  #: utils/cpufreq-info.c:269
>  #, c-format
>  msgid "  available cpufreq governors: "
> -msgstr "  mögliche Regler: "
> +msgstr "  mÃ¶gliche Regler: "
>  
>  #: utils/cpufreq-info.c:280
>  #, c-format
> @@ -381,7 +381,7 @@ msgstr "Optionen:\n"
>  msgid "  -e, --debug          Prints out debug information [default]\n"
>  msgstr ""
>  "  -e, --debug          Erzeugt detaillierte Informationen, hilfreich\n"
> -"                       zum Aufspüren von Fehlern\n"
> +"                       zum AufspÃ¼ren von Fehlern\n"
>  
>  #: utils/cpufreq-info.c:475
>  #, c-format
> @@ -424,7 +424,7 @@ msgstr "  -p, --policy         Findet die momentane Taktik heraus *\n"
>  #: utils/cpufreq-info.c:482
>  #, c-format
>  msgid "  -g, --governors      Determines available cpufreq governors *\n"
> -msgstr "  -g, --governors      Erzeugt eine Liste mit verfügbaren Reglern *\n"
> +msgstr "  -g, --governors      Erzeugt eine Liste mit verfÃ¼gbaren Reglern *\n"
>  
>  #: utils/cpufreq-info.c:483
>  #, c-format
> @@ -450,7 +450,7 @@ msgstr ""
>  #, c-format
>  msgid "  -s, --stats          Shows cpufreq statistics if available\n"
>  msgstr ""
> -"  -s, --stats          Zeigt, sofern möglich, Statistiken über cpufreq an.\n"
> +"  -s, --stats          Zeigt, sofern mÃ¶glich, Statistiken Ã¼ber cpufreq an.\n"
>  
>  #: utils/cpufreq-info.c:487
>  #, c-format
> @@ -473,9 +473,9 @@ msgid ""
>  "cpufreq\n"
>  "                       interface in 2.4. and early 2.6. kernels\n"
>  msgstr ""
> -"  -o, --proc           Erzeugt Informationen in einem ähnlichem Format zu "
> +"  -o, --proc           Erzeugt Informationen in einem Ã¤hnlichem Format zu "
>  "dem\n"
> -"                       der /proc/cpufreq-Datei in 2.4. und frühen 2.6.\n"
> +"                       der /proc/cpufreq-Datei in 2.4. und frÃ¼hen 2.6.\n"
>  "                       Kernel-Versionen\n"
>  
>  #: utils/cpufreq-info.c:491
> @@ -491,7 +491,7 @@ msgstr ""
>  #: utils/cpufreq-info.c:492 utils/cpuidle-info.c:152
>  #, c-format
>  msgid "  -h, --help           Prints out this screen\n"
> -msgstr "  -h, --help           Gibt diese Kurzübersicht aus\n"
> +msgstr "  -h, --help           Gibt diese KurzÃ¼bersicht aus\n"
>  
>  #: utils/cpufreq-info.c:495
>  #, c-format
> @@ -501,7 +501,7 @@ msgid ""
>  msgstr ""
>  "Sofern kein anderer Parameter als '-c, --cpu' angegeben wird, liefert "
>  "dieses\n"
> -"Programm Informationen, die z.B. zum Berichten von Fehlern nützlich sind.\n"
> +"Programm Informationen, die z.B. zum Berichten von Fehlern nÃ¼tzlich sind.\n"
>  
>  #: utils/cpufreq-info.c:497
>  #, c-format
> @@ -557,7 +557,7 @@ msgid ""
>  "select\n"
>  msgstr ""
>  "  -d FREQ, --min FREQ      neue minimale Taktfrequenz, die der Regler\n"
> -"                           auswählen darf\n"
> +"                           auswÃ¤hlen darf\n"
>  
>  #: utils/cpufreq-set.c:28
>  #, c-format
> @@ -566,7 +566,7 @@ msgid ""
>  "select\n"
>  msgstr ""
>  "  -u FREQ, --max FREQ      neue maximale Taktfrequenz, die der Regler\n"
> -"                           auswählen darf\n"
> +"                           auswÃ¤hlen darf\n"
>  
>  #: utils/cpufreq-set.c:29
>  #, c-format
> @@ -579,20 +579,20 @@ msgid ""
>  "  -f FREQ, --freq FREQ     specific frequency to be set. Requires userspace\n"
>  "                           governor to be available and loaded\n"
>  msgstr ""
> -"  -f FREQ, --freq FREQ     setze exakte Taktfrequenz. Benötigt den Regler\n"
> +"  -f FREQ, --freq FREQ     setze exakte Taktfrequenz. BenÃ¶tigt den Regler\n"
>  "                           'userspace'.\n"
>  
>  #: utils/cpufreq-set.c:32
>  #, c-format
>  msgid "  -r, --related            Switches all hardware-related CPUs\n"
>  msgstr ""
> -"  -r, --related            Setze Werte für alle CPUs, deren Taktfrequenz\n"
> +"  -r, --related            Setze Werte fÃ¼r alle CPUs, deren Taktfrequenz\n"
>  "                           hardwarebedingt identisch ist.\n"
>  
>  #: utils/cpufreq-set.c:33 utils/cpupower-set.c:28 utils/cpupower-info.c:27
>  #, c-format
>  msgid "  -h, --help               Prints out this screen\n"
> -msgstr "  -h, --help               Gibt diese Kurzübersicht aus\n"
> +msgstr "  -h, --help               Gibt diese KurzÃ¼bersicht aus\n"
>  
>  #: utils/cpufreq-set.c:35
>  #, fuzzy, c-format
> @@ -618,8 +618,8 @@ msgstr ""
>  "   angenommen\n"
>  "2. Der Parameter -f bzw. --freq kann mit keinem anderen als dem Parameter\n"
>  "   -c bzw. --cpu kombiniert werden\n"
> -"3. FREQuenzen können in Hz, kHz (Standard), MHz, GHz oder THz eingegeben\n"
> -"   werden, indem der Wert und unmittelbar anschließend (ohne Leerzeichen!)\n"
> +"3. FREQuenzen kÃ¶nnen in Hz, kHz (Standard), MHz, GHz oder THz eingegeben\n"
> +"   werden, indem der Wert und unmittelbar anschlieÃŸend (ohne Leerzeichen!)\n"
>  "   die Einheit angegeben werden. (Bsp: 1GHz )\n"
>  "   (FREQuenz in kHz =^ MHz * 1000 =^ GHz * 1000000).\n"
>  
> @@ -638,7 +638,7 @@ msgid ""
>  msgstr ""
>  "Beim Einstellen ist ein Fehler aufgetreten. Typische Fehlerquellen sind:\n"
>  "- nicht ausreichende Rechte (Administrator)\n"
> -"- der Regler ist nicht verfügbar bzw. nicht geladen\n"
> +"- der Regler ist nicht verfÃ¼gbar bzw. nicht geladen\n"
>  "- die angegebene Taktik ist inkorrekt\n"
>  "- eine spezifische Frequenz wurde angegeben, aber der Regler 'userspace'\n"
>  "  kann entweder hardwarebedingt nicht genutzt werden oder ist nicht geladen\n"
> @@ -821,7 +821,7 @@ msgstr ""
>  #: utils/cpuidle-info.c:48
>  #, fuzzy, c-format
>  msgid "Available idle states:"
> -msgstr "  mögliche Taktfrequenzen: "
> +msgstr "  mÃ¶gliche Taktfrequenzen: "
>  
>  #: utils/cpuidle-info.c:71
>  #, c-format
> @@ -924,7 +924,7 @@ msgstr "Aufruf: cpufreq-info [Optionen]\n"
>  msgid "  -s, --silent         Only show general C-state information\n"
>  msgstr ""
>  "  -e, --debug          Erzeugt detaillierte Informationen, hilfreich\n"
> -"                       zum Aufspüren von Fehlern\n"
> +"                       zum AufspÃ¼ren von Fehlern\n"
>  
>  #: utils/cpuidle-info.c:150
>  #, fuzzy, c-format
> @@ -933,9 +933,9 @@ msgid ""
>  "acpi/processor/*/power\n"
>  "                       interface in older kernels\n"
>  msgstr ""
> -"  -o, --proc           Erzeugt Informationen in einem ähnlichem Format zu "
> +"  -o, --proc           Erzeugt Informationen in einem Ã¤hnlichem Format zu "
>  "dem\n"
> -"                       der /proc/cpufreq-Datei in 2.4. und frühen 2.6.\n"
> +"                       der /proc/cpufreq-Datei in 2.4. und frÃ¼hen 2.6.\n"
>  "                       Kernel-Versionen\n"
>  
>  #: utils/cpuidle-info.c:209
> @@ -949,7 +949,7 @@ msgstr ""
>  #~ "  -c CPU, --cpu CPU    CPU number which information shall be determined "
>  #~ "about\n"
>  #~ msgstr ""
> -#~ "  -c CPU, --cpu CPU    Nummer der CPU, über die Informationen "
> +#~ "  -c CPU, --cpu CPU    Nummer der CPU, Ã¼ber die Informationen "
>  #~ "herausgefunden werden sollen\n"
>  
>  #~ msgid ""
> diff --git a/tools/power/cpupower/po/fr.po b/tools/power/cpupower/po/fr.po
> index 245ad20a9bf9..b46ca2548f86 100644
> --- a/tools/power/cpupower/po/fr.po
> +++ b/tools/power/cpupower/po/fr.po
> @@ -212,7 +212,7 @@ msgstr ""
>  #: utils/cpupower.c:91
>  #, c-format
>  msgid "Report errors and bugs to %s, please.\n"
> -msgstr "Veuillez rapportez les erreurs et les bogues à %s, s'il vous plait.\n"
> +msgstr "Veuillez rapportez les erreurs et les bogues Ã  %s, s'il vous plait.\n"
>  
>  #: utils/cpupower.c:114
>  #, c-format
> @@ -227,14 +227,14 @@ msgstr ""
>  #: utils/cpufreq-info.c:31
>  #, c-format
>  msgid "Couldn't count the number of CPUs (%s: %s), assuming 1\n"
> -msgstr "Détermination du nombre de CPUs (%s : %s) impossible.  Assume 1\n"
> +msgstr "DÃ©termination du nombre de CPUs (%s : %s) impossible.  Assume 1\n"
>  
>  #: utils/cpufreq-info.c:63
>  #, c-format
>  msgid ""
>  "          minimum CPU frequency  -  maximum CPU frequency  -  governor\n"
>  msgstr ""
> -"         Fréquence CPU minimale - Fréquence CPU maximale  - régulateur\n"
> +"         FrÃ©quence CPU minimale - FrÃ©quence CPU maximale  - rÃ©gulateur\n"
>  
>  #: utils/cpufreq-info.c:151
>  #, c-format
> @@ -302,12 +302,12 @@ msgstr "  pilote : %s\n"
>  #: utils/cpufreq-info.c:219
>  #, fuzzy, c-format
>  msgid "  CPUs which run at the same hardware frequency: "
> -msgstr "  CPUs qui doivent changer de fréquences en même temps : "
> +msgstr "  CPUs qui doivent changer de frÃ©quences en mÃªme temps : "
>  
>  #: utils/cpufreq-info.c:230
>  #, fuzzy, c-format
>  msgid "  CPUs which need to have their frequency coordinated by software: "
> -msgstr "  CPUs qui doivent changer de fréquences en même temps : "
> +msgstr "  CPUs qui doivent changer de frÃ©quences en mÃªme temps : "
>  
>  #: utils/cpufreq-info.c:241
>  #, c-format
> @@ -317,22 +317,22 @@ msgstr ""
>  #: utils/cpufreq-info.c:247
>  #, c-format
>  msgid "  hardware limits: "
> -msgstr "  limitation matérielle : "
> +msgstr "  limitation matÃ©rielle : "
>  
>  #: utils/cpufreq-info.c:256
>  #, c-format
>  msgid "  available frequency steps: "
> -msgstr "  plage de fréquence : "
> +msgstr "  plage de frÃ©quence : "
>  
>  #: utils/cpufreq-info.c:269
>  #, c-format
>  msgid "  available cpufreq governors: "
> -msgstr "  régulateurs disponibles : "
> +msgstr "  rÃ©gulateurs disponibles : "
>  
>  #: utils/cpufreq-info.c:280
>  #, c-format
>  msgid "  current policy: frequency should be within "
> -msgstr "  tactique actuelle : la fréquence doit être comprise entre "
> +msgstr "  tactique actuelle : la frÃ©quence doit Ãªtre comprise entre "
>  
>  #: utils/cpufreq-info.c:282
>  #, c-format
> @@ -345,18 +345,18 @@ msgid ""
>  "The governor \"%s\" may decide which speed to use\n"
>  "                  within this range.\n"
>  msgstr ""
> -"Le régulateur \"%s\" est libre de choisir la vitesse\n"
> -"                  dans cette plage de fréquences.\n"
> +"Le rÃ©gulateur \"%s\" est libre de choisir la vitesse\n"
> +"                  dans cette plage de frÃ©quences.\n"
>  
>  #: utils/cpufreq-info.c:293
>  #, c-format
>  msgid "  current CPU frequency is "
> -msgstr "  la fréquence actuelle de ce CPU est "
> +msgstr "  la frÃ©quence actuelle de ce CPU est "
>  
>  #: utils/cpufreq-info.c:296
>  #, c-format
>  msgid " (asserted by call to hardware)"
> -msgstr " (vérifié par un appel direct du matériel)"
> +msgstr " (vÃ©rifiÃ© par un appel direct du matÃ©riel)"
>  
>  #: utils/cpufreq-info.c:304
>  #, c-format
> @@ -377,7 +377,7 @@ msgstr "Options :\n"
>  #: utils/cpufreq-info.c:474
>  #, fuzzy, c-format
>  msgid "  -e, --debug          Prints out debug information [default]\n"
> -msgstr "  -e, --debug          Afficher les informations de déboguage\n"
> +msgstr "  -e, --debug          Afficher les informations de dÃ©boguage\n"
>  
>  #: utils/cpufreq-info.c:475
>  #, c-format
> @@ -385,8 +385,8 @@ msgid ""
>  "  -f, --freq           Get frequency the CPU currently runs at, according\n"
>  "                       to the cpufreq core *\n"
>  msgstr ""
> -"  -f, --freq           Obtenir la fréquence actuelle du CPU selon le point\n"
> -"                       de vue du coeur du système de cpufreq *\n"
> +"  -f, --freq           Obtenir la frÃ©quence actuelle du CPU selon le point\n"
> +"                       de vue du coeur du systÃ¨me de cpufreq *\n"
>  
>  #: utils/cpufreq-info.c:477
>  #, c-format
> @@ -394,8 +394,8 @@ msgid ""
>  "  -w, --hwfreq         Get frequency the CPU currently runs at, by reading\n"
>  "                       it from hardware (only available to root) *\n"
>  msgstr ""
> -"  -w, --hwfreq         Obtenir la fréquence actuelle du CPU directement par\n"
> -"                       le matériel (doit être root) *\n"
> +"  -w, --hwfreq         Obtenir la frÃ©quence actuelle du CPU directement par\n"
> +"                       le matÃ©riel (doit Ãªtre root) *\n"
>  
>  #: utils/cpufreq-info.c:479
>  #, c-format
> @@ -403,13 +403,13 @@ msgid ""
>  "  -l, --hwlimits       Determine the minimum and maximum CPU frequency "
>  "allowed *\n"
>  msgstr ""
> -"  -l, --hwlimits       Affiche les fréquences minimales et maximales du CPU "
> +"  -l, --hwlimits       Affiche les frÃ©quences minimales et maximales du CPU "
>  "*\n"
>  
>  #: utils/cpufreq-info.c:480
>  #, c-format
>  msgid "  -d, --driver         Determines the used cpufreq kernel driver *\n"
> -msgstr "  -d, --driver         Affiche le pilote cpufreq utilisé *\n"
> +msgstr "  -d, --driver         Affiche le pilote cpufreq utilisÃ© *\n"
>  
>  #: utils/cpufreq-info.c:481
>  #, c-format
> @@ -420,7 +420,7 @@ msgstr "  -p, --policy         Affiche la tactique actuelle de cpufreq *\n"
>  #, c-format
>  msgid "  -g, --governors      Determines available cpufreq governors *\n"
>  msgstr ""
> -"  -g, --governors      Affiche les régulateurs disponibles de cpufreq *\n"
> +"  -g, --governors      Affiche les rÃ©gulateurs disponibles de cpufreq *\n"
>  
>  #: utils/cpufreq-info.c:483
>  #, fuzzy, c-format
> @@ -429,7 +429,7 @@ msgid ""
>  "frequency *\n"
>  msgstr ""
>  "  -a, --affected-cpus   Affiche quels sont les CPUs qui doivent changer de\n"
> -"                        fréquences en même temps *\n"
> +"                        frÃ©quences en mÃªme temps *\n"
>  
>  #: utils/cpufreq-info.c:484
>  #, fuzzy, c-format
> @@ -438,7 +438,7 @@ msgid ""
>  "                       coordinated by software *\n"
>  msgstr ""
>  "  -a, --affected-cpus   Affiche quels sont les CPUs qui doivent changer de\n"
> -"                        fréquences en même temps *\n"
> +"                        frÃ©quences en mÃªme temps *\n"
>  
>  #: utils/cpufreq-info.c:486
>  #, c-format
> @@ -453,7 +453,7 @@ msgid ""
>  "  -y, --latency        Determines the maximum latency on CPU frequency "
>  "changes *\n"
>  msgstr ""
> -"  -l, --hwlimits       Affiche les fréquences minimales et maximales du CPU "
> +"  -l, --hwlimits       Affiche les frÃ©quences minimales et maximales du CPU "
>  "*\n"
>  
>  #: utils/cpufreq-info.c:488
> @@ -469,7 +469,7 @@ msgid ""
>  "                       interface in 2.4. and early 2.6. kernels\n"
>  msgstr ""
>  "  -o, --proc           Affiche les informations en utilisant l'interface\n"
> -"                       fournie par /proc/cpufreq, présente dans les "
> +"                       fournie par /proc/cpufreq, prÃ©sente dans les "
>  "versions\n"
>  "                       2.4 et les anciennes versions 2.6 du noyau\n"
>  
> @@ -485,7 +485,7 @@ msgstr ""
>  #: utils/cpufreq-info.c:492 utils/cpuidle-info.c:152
>  #, c-format
>  msgid "  -h, --help           Prints out this screen\n"
> -msgstr "  -h, --help           affiche l'aide-mémoire\n"
> +msgstr "  -h, --help           affiche l'aide-mÃ©moire\n"
>  
>  #: utils/cpufreq-info.c:495
>  #, c-format
> @@ -493,8 +493,8 @@ msgid ""
>  "If no argument or only the -c, --cpu parameter is given, debug output about\n"
>  "cpufreq is printed which is useful e.g. for reporting bugs.\n"
>  msgstr ""
> -"Par défaut, les informations de déboguage seront affichées si aucun\n"
> -"argument, ou bien si seulement l'argument -c (--cpu) est donné, afin de\n"
> +"Par dÃ©faut, les informations de dÃ©boguage seront affichÃ©es si aucun\n"
> +"argument, ou bien si seulement l'argument -c (--cpu) est donnÃ©, afin de\n"
>  "faciliter les rapports de bogues par exemple\n"
>  
>  #: utils/cpufreq-info.c:497
> @@ -517,8 +517,8 @@ msgid ""
>  "You can't specify more than one --cpu parameter and/or\n"
>  "more than one output-specific argument\n"
>  msgstr ""
> -"On ne peut indiquer plus d'un paramètre --cpu, tout comme l'on ne peut\n"
> -"spécifier plus d'un argument de formatage\n"
> +"On ne peut indiquer plus d'un paramÃ¨tre --cpu, tout comme l'on ne peut\n"
> +"spÃ©cifier plus d'un argument de formatage\n"
>  
>  #: utils/cpufreq-info.c:600 utils/cpufreq-set.c:82 utils/cpupower-set.c:42
>  #: utils/cpupower-info.c:42 utils/cpuidle-info.c:213
> @@ -529,7 +529,7 @@ msgstr "option invalide\n"
>  #: utils/cpufreq-info.c:617
>  #, c-format
>  msgid "couldn't analyze CPU %d as it doesn't seem to be present\n"
> -msgstr "analyse du CPU %d impossible puisqu'il ne semble pas être présent\n"
> +msgstr "analyse du CPU %d impossible puisqu'il ne semble pas Ãªtre prÃ©sent\n"
>  
>  #: utils/cpufreq-info.c:620 utils/cpupower-info.c:142
>  #, c-format
> @@ -547,8 +547,8 @@ msgid ""
>  "  -d FREQ, --min FREQ      new minimum CPU frequency the governor may "
>  "select\n"
>  msgstr ""
> -"  -d FREQ, --min FREQ       nouvelle fréquence minimale du CPU à utiliser\n"
> -"                            par le régulateur\n"
> +"  -d FREQ, --min FREQ       nouvelle frÃ©quence minimale du CPU Ã  utiliser\n"
> +"                            par le rÃ©gulateur\n"
>  
>  #: utils/cpufreq-set.c:28
>  #, c-format
> @@ -556,13 +556,13 @@ msgid ""
>  "  -u FREQ, --max FREQ      new maximum CPU frequency the governor may "
>  "select\n"
>  msgstr ""
> -"  -u FREQ, --max FREQ       nouvelle fréquence maximale du CPU à utiliser\n"
> -"                            par le régulateur\n"
> +"  -u FREQ, --max FREQ       nouvelle frÃ©quence maximale du CPU Ã  utiliser\n"
> +"                            par le rÃ©gulateur\n"
>  
>  #: utils/cpufreq-set.c:29
>  #, c-format
>  msgid "  -g GOV, --governor GOV   new cpufreq governor\n"
> -msgstr "  -g GOV, --governor GOV   active le régulateur GOV\n"
> +msgstr "  -g GOV, --governor GOV   active le rÃ©gulateur GOV\n"
>  
>  #: utils/cpufreq-set.c:30
>  #, c-format
> @@ -570,9 +570,9 @@ msgid ""
>  "  -f FREQ, --freq FREQ     specific frequency to be set. Requires userspace\n"
>  "                           governor to be available and loaded\n"
>  msgstr ""
> -"  -f FREQ, --freq FREQ     fixe la fréquence du processeur à FREQ. Il faut\n"
> -"                           que le régulateur « userspace » soit disponible \n"
> -"                           et activé.\n"
> +"  -f FREQ, --freq FREQ     fixe la frÃ©quence du processeur Ã  FREQ. Il faut\n"
> +"                           que le rÃ©gulateur Â« userspace Â» soit disponible \n"
> +"                           et activÃ©.\n"
>  
>  #: utils/cpufreq-set.c:32
>  #, c-format
> @@ -582,7 +582,7 @@ msgstr ""
>  #: utils/cpufreq-set.c:33 utils/cpupower-set.c:28 utils/cpupower-info.c:27
>  #, fuzzy, c-format
>  msgid "  -h, --help               Prints out this screen\n"
> -msgstr "  -h, --help           affiche l'aide-mémoire\n"
> +msgstr "  -h, --help           affiche l'aide-mÃ©moire\n"
>  
>  #: utils/cpufreq-set.c:35
>  #, fuzzy, c-format
> @@ -602,11 +602,11 @@ msgid ""
>  "   (FREQuency in kHz =^ Hz * 0.001 =^ MHz * 1000 =^ GHz * 1000000).\n"
>  msgstr ""
>  "Remarque :\n"
> -"1. Le CPU numéro 0 sera utilisé par défaut si -c (ou --cpu) est omis ;\n"
> -"2. l'argument -f FREQ (ou --freq FREQ) ne peut être utilisé qu'avec --cpu ;\n"
> -"3. on pourra préciser l'unité des fréquences en postfixant sans aucune "
> +"1. Le CPU numÃ©ro 0 sera utilisÃ© par dÃ©faut si -c (ou --cpu) est omis ;\n"
> +"2. l'argument -f FREQ (ou --freq FREQ) ne peut Ãªtre utilisÃ© qu'avec --cpu ;\n"
> +"3. on pourra prÃ©ciser l'unitÃ© des frÃ©quences en postfixant sans aucune "
>  "espace\n"
> -"   les valeurs par hz, kHz (par défaut), MHz, GHz ou THz\n"
> +"   les valeurs par hz, kHz (par dÃ©faut), MHz, GHz ou THz\n"
>  "   (kHz =^ Hz * 0.001 =^ MHz * 1000 =^ GHz * 1000000).\n"
>  
>  #: utils/cpufreq-set.c:57
> @@ -622,21 +622,21 @@ msgid ""
>  "frequency\n"
>  "   or because the userspace governor isn't loaded?\n"
>  msgstr ""
> -"En ajustant les nouveaux paramètres, une erreur est apparue. Les sources\n"
> +"En ajustant les nouveaux paramÃ¨tres, une erreur est apparue. Les sources\n"
>  "d'erreur typique sont :\n"
> -"- droit d'administration insuffisant (êtes-vous root ?) ;\n"
> -"- le régulateur choisi n'est pas disponible, ou bien n'est pas disponible "
> +"- droit d'administration insuffisant (Ãªtes-vous root ?) ;\n"
> +"- le rÃ©gulateur choisi n'est pas disponible, ou bien n'est pas disponible "
>  "en\n"
>  "  tant que module noyau ;\n"
>  "- la tactique n'est pas disponible ;\n"
> -"- vous voulez utiliser l'option -f/--freq, mais le régulateur « userspace »\n"
> -"  n'est pas disponible, par exemple parce que le matériel ne le supporte\n"
> -"  pas, ou bien n'est tout simplement pas chargé.\n"
> +"- vous voulez utiliser l'option -f/--freq, mais le rÃ©gulateur Â« userspace Â»\n"
> +"  n'est pas disponible, par exemple parce que le matÃ©riel ne le supporte\n"
> +"  pas, ou bien n'est tout simplement pas chargÃ©.\n"
>  
>  #: utils/cpufreq-set.c:170
>  #, c-format
>  msgid "wrong, unknown or unhandled CPU?\n"
> -msgstr "CPU inconnu ou non supporté ?\n"
> +msgstr "CPU inconnu ou non supportÃ© ?\n"
>  
>  #: utils/cpufreq-set.c:302
>  #, c-format
> @@ -653,7 +653,7 @@ msgid ""
>  "At least one parameter out of -f/--freq, -d/--min, -u/--max, and\n"
>  "-g/--governor must be passed\n"
>  msgstr ""
> -"L'un de ces paramètres est obligatoire : -f/--freq, -d/--min, -u/--max et\n"
> +"L'un de ces paramÃ¨tres est obligatoire : -f/--freq, -d/--min, -u/--max et\n"
>  "-g/--governor\n"
>  
>  #: utils/cpufreq-set.c:347
> @@ -810,7 +810,7 @@ msgstr ""
>  #: utils/cpuidle-info.c:48
>  #, fuzzy, c-format
>  msgid "Available idle states:"
> -msgstr "  plage de fréquence : "
> +msgstr "  plage de frÃ©quence : "
>  
>  #: utils/cpuidle-info.c:71
>  #, c-format
> @@ -911,7 +911,7 @@ msgstr "Usage : cpufreq-info [options]\n"
>  #: utils/cpuidle-info.c:149
>  #, fuzzy, c-format
>  msgid "  -s, --silent         Only show general C-state information\n"
> -msgstr "  -e, --debug          Afficher les informations de déboguage\n"
> +msgstr "  -e, --debug          Afficher les informations de dÃ©boguage\n"
>  
>  #: utils/cpuidle-info.c:150
>  #, fuzzy, c-format
> @@ -921,7 +921,7 @@ msgid ""
>  "                       interface in older kernels\n"
>  msgstr ""
>  "  -o, --proc           Affiche les informations en utilisant l'interface\n"
> -"                       fournie par /proc/cpufreq, présente dans les "
> +"                       fournie par /proc/cpufreq, prÃ©sente dans les "
>  "versions\n"
>  "                       2.4 et les anciennes versions 2.6 du noyau\n"
>  
> @@ -929,19 +929,19 @@ msgstr ""
>  #, fuzzy, c-format
>  msgid "You can't specify more than one output-specific argument\n"
>  msgstr ""
> -"On ne peut indiquer plus d'un paramètre --cpu, tout comme l'on ne peut\n"
> -"spécifier plus d'un argument de formatage\n"
> +"On ne peut indiquer plus d'un paramÃ¨tre --cpu, tout comme l'on ne peut\n"
> +"spÃ©cifier plus d'un argument de formatage\n"
>  
>  #~ msgid ""
>  #~ "  -c CPU, --cpu CPU    CPU number which information shall be determined "
>  #~ "about\n"
>  #~ msgstr ""
> -#~ "  -c CPU, --cpu CPU    Numéro du CPU pour lequel l'information sera "
> -#~ "affichée\n"
> +#~ "  -c CPU, --cpu CPU    NumÃ©ro du CPU pour lequel l'information sera "
> +#~ "affichÃ©e\n"
>  
>  #~ msgid ""
>  #~ "  -c CPU, --cpu CPU        number of CPU where cpufreq settings shall be "
>  #~ "modified\n"
>  #~ msgstr ""
> -#~ "  -c CPU, --cpu CPU        numéro du CPU à prendre en compte pour les\n"
> +#~ "  -c CPU, --cpu CPU        numÃ©ro du CPU Ã  prendre en compte pour les\n"
>  #~ "                           changements\n"
> -- 
> 2.18.0
> 

^ permalink raw reply

* Re: [PATCH v07 6/9] pmt/numa: Disable arch_update_cpu_topology during CPU readd
From: Nathan Fontenot @ 2018-07-24 20:38 UTC (permalink / raw)
  To: Michael Bringmann, linuxppc-dev; +Cc: John Allen, Tyrel Datwyler, Thomas Falcon
In-Reply-To: <2c5645d4-2911-5e49-5e41-7966fb909142@linux.vnet.ibm.com>

On 07/13/2018 03:18 PM, Michael Bringmann wrote:
> pmt/numa: Disable arch_update_cpu_topology during post migration
> CPU readd updates when evaluating device-tree changes after LPM
> to avoid thread deadlocks trying to update node assignments.
> System timing between all of the threads and timers restarted in
> a migrated system overlapped frequently allowing tasks to start
> acquiring resources (get_online_cpus) needed by rebuild_sched_domains.
> Defer the operation of that function until after the CPU readd has
> completed.
> 
> Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/pseries/hotplug-cpu.c |    9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 1906ee57..df1791b 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -26,6 +26,7 @@
>   #include <linux/sched.h>	/* for idle_task_exit */
>   #include <linux/sched/hotplug.h>
>   #include <linux/cpu.h>
> +#include <linux/cpuset.h>
>   #include <linux/of.h>
>   #include <linux/slab.h>
>   #include <asm/prom.h>
> @@ -684,9 +685,15 @@ static int dlpar_cpu_readd_by_index(u32 drc_index)
> 
>   	pr_info("Attempting to re-add CPU, drc index %x\n", drc_index);
> 
> +	arch_update_cpu_topology_suspend();
>   	rc = dlpar_cpu_remove_by_index(drc_index, false);
> -	if (!rc)
> +	arch_update_cpu_topology_resume();
> +
> +	if (!rc) {
> +		arch_update_cpu_topology_suspend();
>   		rc = dlpar_cpu_add(drc_index, false);
> +		arch_update_cpu_topology_resume();
> +	}
> 

A couple of questions...Why not disable across the entire remove and add
operations instead of disabling for each operation?

Also, what about other CPU add/remove routines, do they need to do
similar disabling?

-Nathan

>   	if (rc)
>   		pr_info("Failed to update cpu at drc_index %lx\n",
> 

^ permalink raw reply

* Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8
From: Andrew Morton @ 2018-07-24 21:00 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Joe Perches, Samuel Ortiz, David S. Miller, Rob Herring,
	Michael Ellerman, Jonathan Cameron, linux-wireless, netdev,
	devicetree, linux-kernel, linux-arm-kernel, linux-crypto,
	linuxppc-dev, linux-iio, linux-pm, lvs-devel, netfilter-devel,
	coreteam
In-Reply-To: <20180724111600.4158975-1-arnd@arndb.de>

On Tue, 24 Jul 2018 13:13:25 +0200 Arnd Bergmann <arnd@arndb.de> wrote:

> Almost all files in the kernel are either plain text or UTF-8
> encoded. A couple however are ISO_8859-1, usually just a few
> characters in a C comments, for historic reasons.
> 
> This converts them all to UTF-8 for consistency.

Was "consistency" the only rationale?  The discussion is now outside my
memory horizon but I thought there were other reasons.

Will we be getting a checkpatch rule to keep things this way?

^ permalink raw reply

* Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8
From: Randy Dunlap @ 2018-07-24 21:02 UTC (permalink / raw)
  To: Andrew Morton, Arnd Bergmann
  Cc: Joe Perches, Samuel Ortiz, David S. Miller, Rob Herring,
	Michael Ellerman, Jonathan Cameron, linux-wireless, netdev,
	devicetree, linux-kernel, linux-arm-kernel, linux-crypto,
	linuxppc-dev, linux-iio, linux-pm, lvs-devel, netfilter-devel,
	coreteam
In-Reply-To: <20180724140010.e24a9964fd340afe2d98a994@linux-foundation.org>

On 07/24/2018 02:00 PM, Andrew Morton wrote:
> On Tue, 24 Jul 2018 13:13:25 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
> 
>> Almost all files in the kernel are either plain text or UTF-8
>> encoded. A couple however are ISO_8859-1, usually just a few
>> characters in a C comments, for historic reasons.
>>
>> This converts them all to UTF-8 for consistency.
> 
> Was "consistency" the only rationale?  The discussion is now outside my
> memory horizon but I thought there were other reasons.

kconfig tools prefer ASCII or utf-8.

email tools probably likewise.

user sanity?

> Will we be getting a checkpatch rule to keep things this way?



-- 
~Randy

^ permalink raw reply

* Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8
From: Jonathan Cameron @ 2018-07-24 21:04 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andrew Morton, Joe Perches, Samuel Ortiz, David S. Miller,
	Rob Herring, Michael Ellerman, linux-wireless, netdev, devicetree,
	linux-kernel, linux-arm-kernel, linux-crypto, linuxppc-dev,
	linux-iio, linux-pm, lvs-devel, netfilter-devel, coreteam
In-Reply-To: <20180724111600.4158975-1-arnd@arndb.de>

On Tue, 24 Jul 2018 13:13:25 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> Almost all files in the kernel are either plain text or UTF-8
> encoded. A couple however are ISO_8859-1, usually just a few
> characters in a C comments, for historic reasons.
>=20
> This converts them all to UTF-8 for consistency.
>=20
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
For IIO, Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Thanks for tidying this up.

Jonathan

> ---
>  .../devicetree/bindings/net/nfc/pn544.txt     |   2 +-
>  arch/arm/boot/dts/sun4i-a10-inet97fv2.dts     |   2 +-
>  arch/arm/crypto/sha256_glue.c                 |   2 +-
>  arch/arm/crypto/sha256_neon_glue.c            |   4 +-
>  drivers/crypto/vmx/ghashp8-ppc.pl             |  12 +-
>  drivers/iio/dac/ltc2632.c                     |   2 +-
>  drivers/power/reset/ltc2952-poweroff.c        |   4 +-
>  kernel/events/callchain.c                     |   2 +-
>  net/netfilter/ipvs/Kconfig                    |   8 +-
>  net/netfilter/ipvs/ip_vs_mh.c                 |   4 +-
>  tools/power/cpupower/po/de.po                 |  44 +++----
>  tools/power/cpupower/po/fr.po                 | 120 +++++++++---------
>  12 files changed, 103 insertions(+), 103 deletions(-)
>=20
> diff --git a/Documentation/devicetree/bindings/net/nfc/pn544.txt b/Docume=
ntation/devicetree/bindings/net/nfc/pn544.txt
> index 538a86f7b2b0..72593f056b75 100644
> --- a/Documentation/devicetree/bindings/net/nfc/pn544.txt
> +++ b/Documentation/devicetree/bindings/net/nfc/pn544.txt
> @@ -2,7 +2,7 @@
> =20
>  Required properties:
>  - compatible: Should be "nxp,pn544-i2c".
> -- clock-frequency: I_C work frequency.
> +- clock-frequency: I=C2=B2C work frequency.
>  - reg: address on the bus
>  - interrupt-parent: phandle for the interrupt gpio controller
>  - interrupts: GPIO interrupt to which the chip is connected
> diff --git a/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts b/arch/arm/boot/dt=
s/sun4i-a10-inet97fv2.dts
> index 5d096528e75a..71c27ea0b53e 100644
> --- a/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
> +++ b/arch/arm/boot/dts/sun4i-a10-inet97fv2.dts
> @@ -1,7 +1,7 @@
>  /*
>   * Copyright 2014 Open Source Support GmbH
>   *
> - * David Lanzend_rfer <david.lanzendoerfer@o2s.ch>
> + * David Lanzend=C3=B6rfer <david.lanzendoerfer@o2s.ch>
>   *
>   * This file is dual-licensed: you can use it either under the terms
>   * of the GPL or the X11 license, at your option. Note that this dual
> diff --git a/arch/arm/crypto/sha256_glue.c b/arch/arm/crypto/sha256_glue.c
> index bf8ccff2c9d0..0ae900e778f3 100644
> --- a/arch/arm/crypto/sha256_glue.c
> +++ b/arch/arm/crypto/sha256_glue.c
> @@ -2,7 +2,7 @@
>   * Glue code for the SHA256 Secure Hash Algorithm assembly implementation
>   * using optimized ARM assembler and NEON instructions.
>   *
> - * Copyright _ 2015 Google Inc.
> + * Copyright =C2=A9 2015 Google Inc.
>   *
>   * This file is based on sha256_ssse3_glue.c:
>   *   Copyright (C) 2013 Intel Corporation
> diff --git a/arch/arm/crypto/sha256_neon_glue.c b/arch/arm/crypto/sha256_=
neon_glue.c
> index 9bbee56fbdc8..1d82c6cd31a4 100644
> --- a/arch/arm/crypto/sha256_neon_glue.c
> +++ b/arch/arm/crypto/sha256_neon_glue.c
> @@ -2,10 +2,10 @@
>   * Glue code for the SHA256 Secure Hash Algorithm assembly implementation
>   * using NEON instructions.
>   *
> - * Copyright _ 2015 Google Inc.
> + * Copyright =C2=A9 2015 Google Inc.
>   *
>   * This file is based on sha512_neon_glue.c:
> - *   Copyright _ 2014 Jussi Kivilinna <jussi.kivilinna@iki.fi>
> + *   Copyright =C2=A9 2014 Jussi Kivilinna <jussi.kivilinna@iki.fi>
>   *
>   * This program is free software; you can redistribute it and/or modify =
it
>   * under the terms of the GNU General Public License as published by the=
 Free
> diff --git a/drivers/crypto/vmx/ghashp8-ppc.pl b/drivers/crypto/vmx/ghash=
p8-ppc.pl
> index f746af271460..38b06503ede0 100644
> --- a/drivers/crypto/vmx/ghashp8-ppc.pl
> +++ b/drivers/crypto/vmx/ghashp8-ppc.pl
> @@ -129,9 +129,9 @@ $code=3D<<___;
>  	 le?vperm	$IN,$IN,$IN,$lemask
>  	vxor		$zero,$zero,$zero
> =20
> -	vpmsumd		$Xl,$IN,$Hl		# H.lo_Xi.lo
> -	vpmsumd		$Xm,$IN,$H		# H.hi_Xi.lo+H.lo_Xi.hi
> -	vpmsumd		$Xh,$IN,$Hh		# H.hi_Xi.hi
> +	vpmsumd		$Xl,$IN,$Hl		# H.lo=C2=B7Xi.lo
> +	vpmsumd		$Xm,$IN,$H		# H.hi=C2=B7Xi.lo+H.lo=C2=B7Xi.hi
> +	vpmsumd		$Xh,$IN,$Hh		# H.hi=C2=B7Xi.hi
> =20
>  	vpmsumd		$t2,$Xl,$xC2		# 1st phase
> =20
> @@ -187,11 +187,11 @@ $code=3D<<___;
>  .align	5
>  Loop:
>  	 subic		$len,$len,16
> -	vpmsumd		$Xl,$IN,$Hl		# H.lo_Xi.lo
> +	vpmsumd		$Xl,$IN,$Hl		# H.lo=C2=B7Xi.lo
>  	 subfe.		r0,r0,r0		# borrow?-1:0
> -	vpmsumd		$Xm,$IN,$H		# H.hi_Xi.lo+H.lo_Xi.hi
> +	vpmsumd		$Xm,$IN,$H		# H.hi=C2=B7Xi.lo+H.lo=C2=B7Xi.hi
>  	 and		r0,r0,$len
> -	vpmsumd		$Xh,$IN,$Hh		# H.hi_Xi.hi
> +	vpmsumd		$Xh,$IN,$Hh		# H.hi=C2=B7Xi.hi
>  	 add		$inp,$inp,r0
> =20
>  	vpmsumd		$t2,$Xl,$xC2		# 1st phase
> diff --git a/drivers/iio/dac/ltc2632.c b/drivers/iio/dac/ltc2632.c
> index cca278eaa138..885105135580 100644
> --- a/drivers/iio/dac/ltc2632.c
> +++ b/drivers/iio/dac/ltc2632.c
> @@ -1,7 +1,7 @@
>  /*
>   * LTC2632 Digital to analog convertors spi driver
>   *
> - * Copyright 2017 Maxime Roussin-B_langer
> + * Copyright 2017 Maxime Roussin-B=C3=A9langer
>   * expanded by Silvan Murer <silvan.murer@gmail.com>
>   *
>   * Licensed under the GPL-2.
> diff --git a/drivers/power/reset/ltc2952-poweroff.c b/drivers/power/reset=
/ltc2952-poweroff.c
> index 6b911b6b10a6..c484584745bc 100644
> --- a/drivers/power/reset/ltc2952-poweroff.c
> +++ b/drivers/power/reset/ltc2952-poweroff.c
> @@ -2,7 +2,7 @@
>   * LTC2952 (PowerPath) driver
>   *
>   * Copyright (C) 2014, Xsens Technologies BV <info@xsens.com>
> - * Maintainer: Ren_ Moll <linux@r-moll.nl>
> + * Maintainer: Ren=C3=A9 Moll <linux@r-moll.nl>
>   *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU General Public License
> @@ -319,6 +319,6 @@ static struct platform_driver ltc2952_poweroff_driver=
 =3D {
> =20
>  module_platform_driver(ltc2952_poweroff_driver);
> =20
> -MODULE_AUTHOR("Ren_ Moll <rene.moll@xsens.com>");
> +MODULE_AUTHOR("Ren=C3=A9 Moll <rene.moll@xsens.com>");
>  MODULE_DESCRIPTION("LTC PowerPath power-off driver");
>  MODULE_LICENSE("GPL v2");
> diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
> index c187aa3df3c8..24a77c34e9ad 100644
> --- a/kernel/events/callchain.c
> +++ b/kernel/events/callchain.c
> @@ -4,7 +4,7 @@
>   *  Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de>
>   *  Copyright (C) 2008-2011 Red Hat, Inc., Ingo Molnar
>   *  Copyright (C) 2008-2011 Red Hat, Inc., Peter Zijlstra
> - *  Copyright  _  2009 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
> + *  Copyright  =C2=A9  2009 Paul Mackerras, IBM Corp. <paulus@au1.ibm.co=
m>
>   *
>   * For licensing details see kernel-base/COPYING
>   */
> diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
> index 05dc1b77e466..cad48d07c818 100644
> --- a/net/netfilter/ipvs/Kconfig
> +++ b/net/netfilter/ipvs/Kconfig
> @@ -296,10 +296,10 @@ config IP_VS_MH_TAB_INDEX
>  	  stored in a hash table. This table is assigned by a preference
>  	  list of the positions to each destination until all slots in
>  	  the table are filled. The index determines the prime for size of
> -	  the table as_251, 509, 1021, 2039, 4093, 8191, 16381, 32749,
> -	  65521 or 131071._When using weights to allow destinations to
> -	  receive more connections,_the table is assigned an amount
> -	  proportional to the weights specified._The table needs to be large
> +	  the table as=C2=A0251, 509, 1021, 2039, 4093, 8191, 16381, 32749,
> +	  65521 or 131071.=C2=A0When using weights to allow destinations to
> +	  receive more connections,=C2=A0the table is assigned an amount
> +	  proportional to the weights specified.=C2=A0The table needs to be lar=
ge
>  	  enough to effectively fit all the destinations multiplied by their
>  	  respective weights.
> =20
> diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> index 0f795b186eb3..94d9d349ebb0 100644
> --- a/net/netfilter/ipvs/ip_vs_mh.c
> +++ b/net/netfilter/ipvs/ip_vs_mh.c
> @@ -5,10 +5,10 @@
>   *
>   */
> =20
> -/* The mh algorithm is to assign_a preference list of all the lookup
> +/* The mh algorithm is to assign=C2=A0a preference list of all the lookup
>   * table positions to each destination and populate the table with
>   * the most-preferred position of destinations. Then it is to select
> - * destination with the hash key of source IP address_through looking
> + * destination with the hash key of source IP address=C2=A0through looki=
ng
>   * up a the lookup table.
>   *
>   * The algorithm is detailed in:
> diff --git a/tools/power/cpupower/po/de.po b/tools/power/cpupower/po/de.po
> index 78c09e51663a..840c17cc450a 100644
> --- a/tools/power/cpupower/po/de.po
> +++ b/tools/power/cpupower/po/de.po
> @@ -323,12 +323,12 @@ msgstr "  Hardwarebedingte Grenzen der Taktfrequenz=
: "
>  #: utils/cpufreq-info.c:256
>  #, c-format
>  msgid "  available frequency steps: "
> -msgstr "  m_gliche Taktfrequenzen: "
> +msgstr "  m=C3=B6gliche Taktfrequenzen: "
> =20
>  #: utils/cpufreq-info.c:269
>  #, c-format
>  msgid "  available cpufreq governors: "
> -msgstr "  m_gliche Regler: "
> +msgstr "  m=C3=B6gliche Regler: "
> =20
>  #: utils/cpufreq-info.c:280
>  #, c-format
> @@ -381,7 +381,7 @@ msgstr "Optionen:\n"
>  msgid "  -e, --debug          Prints out debug information [default]\n"
>  msgstr ""
>  "  -e, --debug          Erzeugt detaillierte Informationen, hilfreich\n"
> -"                       zum Aufsp_ren von Fehlern\n"
> +"                       zum Aufsp=C3=BCren von Fehlern\n"
> =20
>  #: utils/cpufreq-info.c:475
>  #, c-format
> @@ -424,7 +424,7 @@ msgstr "  -p, --policy         Findet die momentane T=
aktik heraus *\n"
>  #: utils/cpufreq-info.c:482
>  #, c-format
>  msgid "  -g, --governors      Determines available cpufreq governors *\n"
> -msgstr "  -g, --governors      Erzeugt eine Liste mit verf_gbaren Regler=
n *\n"
> +msgstr "  -g, --governors      Erzeugt eine Liste mit verf=C3=BCgbaren R=
eglern *\n"
> =20
>  #: utils/cpufreq-info.c:483
>  #, c-format
> @@ -450,7 +450,7 @@ msgstr ""
>  #, c-format
>  msgid "  -s, --stats          Shows cpufreq statistics if available\n"
>  msgstr ""
> -"  -s, --stats          Zeigt, sofern m_glich, Statistiken _ber cpufreq =
an.\n"
> +"  -s, --stats          Zeigt, sofern m=C3=B6glich, Statistiken =C3=BCbe=
r cpufreq an.\n"
> =20
>  #: utils/cpufreq-info.c:487
>  #, c-format
> @@ -473,9 +473,9 @@ msgid ""
>  "cpufreq\n"
>  "                       interface in 2.4. and early 2.6. kernels\n"
>  msgstr ""
> -"  -o, --proc           Erzeugt Informationen in einem _hnlichem Format =
zu "
> +"  -o, --proc           Erzeugt Informationen in einem =C3=A4hnlichem Fo=
rmat zu "
>  "dem\n"
> -"                       der /proc/cpufreq-Datei in 2.4. und fr_hen 2.6.\=
n"
> +"                       der /proc/cpufreq-Datei in 2.4. und fr=C3=BChen =
2.6.\n"
>  "                       Kernel-Versionen\n"
> =20
>  #: utils/cpufreq-info.c:491
> @@ -491,7 +491,7 @@ msgstr ""
>  #: utils/cpufreq-info.c:492 utils/cpuidle-info.c:152
>  #, c-format
>  msgid "  -h, --help           Prints out this screen\n"
> -msgstr "  -h, --help           Gibt diese Kurz_bersicht aus\n"
> +msgstr "  -h, --help           Gibt diese Kurz=C3=BCbersicht aus\n"
> =20
>  #: utils/cpufreq-info.c:495
>  #, c-format
> @@ -501,7 +501,7 @@ msgid ""
>  msgstr ""
>  "Sofern kein anderer Parameter als '-c, --cpu' angegeben wird, liefert "
>  "dieses\n"
> -"Programm Informationen, die z.B. zum Berichten von Fehlern n_tzlich sin=
d.\n"
> +"Programm Informationen, die z.B. zum Berichten von Fehlern n=C3=BCtzlic=
h sind.\n"
> =20
>  #: utils/cpufreq-info.c:497
>  #, c-format
> @@ -557,7 +557,7 @@ msgid ""
>  "select\n"
>  msgstr ""
>  "  -d FREQ, --min FREQ      neue minimale Taktfrequenz, die der Regler\n"
> -"                           ausw_hlen darf\n"
> +"                           ausw=C3=A4hlen darf\n"
> =20
>  #: utils/cpufreq-set.c:28
>  #, c-format
> @@ -566,7 +566,7 @@ msgid ""
>  "select\n"
>  msgstr ""
>  "  -u FREQ, --max FREQ      neue maximale Taktfrequenz, die der Regler\n"
> -"                           ausw_hlen darf\n"
> +"                           ausw=C3=A4hlen darf\n"
> =20
>  #: utils/cpufreq-set.c:29
>  #, c-format
> @@ -579,20 +579,20 @@ msgid ""
>  "  -f FREQ, --freq FREQ     specific frequency to be set. Requires users=
pace\n"
>  "                           governor to be available and loaded\n"
>  msgstr ""
> -"  -f FREQ, --freq FREQ     setze exakte Taktfrequenz. Ben_tigt den Regl=
er\n"
> +"  -f FREQ, --freq FREQ     setze exakte Taktfrequenz. Ben=C3=B6tigt den=
 Regler\n"
>  "                           'userspace'.\n"
> =20
>  #: utils/cpufreq-set.c:32
>  #, c-format
>  msgid "  -r, --related            Switches all hardware-related CPUs\n"
>  msgstr ""
> -"  -r, --related            Setze Werte f_r alle CPUs, deren Taktfrequen=
z\n"
> +"  -r, --related            Setze Werte f=C3=BCr alle CPUs, deren Taktfr=
equenz\n"
>  "                           hardwarebedingt identisch ist.\n"
> =20
>  #: utils/cpufreq-set.c:33 utils/cpupower-set.c:28 utils/cpupower-info.c:=
27
>  #, c-format
>  msgid "  -h, --help               Prints out this screen\n"
> -msgstr "  -h, --help               Gibt diese Kurz_bersicht aus\n"
> +msgstr "  -h, --help               Gibt diese Kurz=C3=BCbersicht aus\n"
> =20
>  #: utils/cpufreq-set.c:35
>  #, fuzzy, c-format
> @@ -618,8 +618,8 @@ msgstr ""
>  "   angenommen\n"
>  "2. Der Parameter -f bzw. --freq kann mit keinem anderen als dem Paramet=
er\n"
>  "   -c bzw. --cpu kombiniert werden\n"
> -"3. FREQuenzen k_nnen in Hz, kHz (Standard), MHz, GHz oder THz eingegebe=
n\n"
> -"   werden, indem der Wert und unmittelbar anschlie_end (ohne Leerzeiche=
n!)\n"
> +"3. FREQuenzen k=C3=B6nnen in Hz, kHz (Standard), MHz, GHz oder THz eing=
egeben\n"
> +"   werden, indem der Wert und unmittelbar anschlie=C3=9Fend (ohne Leerz=
eichen!)\n"
>  "   die Einheit angegeben werden. (Bsp: 1GHz )\n"
>  "   (FREQuenz in kHz =3D^ MHz * 1000 =3D^ GHz * 1000000).\n"
> =20
> @@ -638,7 +638,7 @@ msgid ""
>  msgstr ""
>  "Beim Einstellen ist ein Fehler aufgetreten. Typische Fehlerquellen sind=
:\n"
>  "- nicht ausreichende Rechte (Administrator)\n"
> -"- der Regler ist nicht verf_gbar bzw. nicht geladen\n"
> +"- der Regler ist nicht verf=C3=BCgbar bzw. nicht geladen\n"
>  "- die angegebene Taktik ist inkorrekt\n"
>  "- eine spezifische Frequenz wurde angegeben, aber der Regler 'userspace=
'\n"
>  "  kann entweder hardwarebedingt nicht genutzt werden oder ist nicht gel=
aden\n"
> @@ -821,7 +821,7 @@ msgstr ""
>  #: utils/cpuidle-info.c:48
>  #, fuzzy, c-format
>  msgid "Available idle states:"
> -msgstr "  m_gliche Taktfrequenzen: "
> +msgstr "  m=C3=B6gliche Taktfrequenzen: "
> =20
>  #: utils/cpuidle-info.c:71
>  #, c-format
> @@ -924,7 +924,7 @@ msgstr "Aufruf: cpufreq-info [Optionen]\n"
>  msgid "  -s, --silent         Only show general C-state information\n"
>  msgstr ""
>  "  -e, --debug          Erzeugt detaillierte Informationen, hilfreich\n"
> -"                       zum Aufsp_ren von Fehlern\n"
> +"                       zum Aufsp=C3=BCren von Fehlern\n"
> =20
>  #: utils/cpuidle-info.c:150
>  #, fuzzy, c-format
> @@ -933,9 +933,9 @@ msgid ""
>  "acpi/processor/*/power\n"
>  "                       interface in older kernels\n"
>  msgstr ""
> -"  -o, --proc           Erzeugt Informationen in einem _hnlichem Format =
zu "
> +"  -o, --proc           Erzeugt Informationen in einem =C3=A4hnlichem Fo=
rmat zu "
>  "dem\n"
> -"                       der /proc/cpufreq-Datei in 2.4. und fr_hen 2.6.\=
n"
> +"                       der /proc/cpufreq-Datei in 2.4. und fr=C3=BChen =
2.6.\n"
>  "                       Kernel-Versionen\n"
> =20
>  #: utils/cpuidle-info.c:209
> @@ -949,7 +949,7 @@ msgstr ""
>  #~ "  -c CPU, --cpu CPU    CPU number which information shall be determi=
ned "
>  #~ "about\n"
>  #~ msgstr ""
> -#~ "  -c CPU, --cpu CPU    Nummer der CPU, _ber die Informationen "
> +#~ "  -c CPU, --cpu CPU    Nummer der CPU, =C3=BCber die Informationen "
>  #~ "herausgefunden werden sollen\n"
> =20
>  #~ msgid ""
> diff --git a/tools/power/cpupower/po/fr.po b/tools/power/cpupower/po/fr.po
> index 245ad20a9bf9..b46ca2548f86 100644
> --- a/tools/power/cpupower/po/fr.po
> +++ b/tools/power/cpupower/po/fr.po
> @@ -212,7 +212,7 @@ msgstr ""
>  #: utils/cpupower.c:91
>  #, c-format
>  msgid "Report errors and bugs to %s, please.\n"
> -msgstr "Veuillez rapportez les erreurs et les bogues _ %s, s'il vous pla=
it.\n"
> +msgstr "Veuillez rapportez les erreurs et les bogues =C3=A0 %s, s'il vou=
s plait.\n"
> =20
>  #: utils/cpupower.c:114
>  #, c-format
> @@ -227,14 +227,14 @@ msgstr ""
>  #: utils/cpufreq-info.c:31
>  #, c-format
>  msgid "Couldn't count the number of CPUs (%s: %s), assuming 1\n"
> -msgstr "D_termination du nombre de CPUs (%s : %s) impossible.  Assume 1\=
n"
> +msgstr "D=C3=A9termination du nombre de CPUs (%s : %s) impossible.  Assu=
me 1\n"
> =20
>  #: utils/cpufreq-info.c:63
>  #, c-format
>  msgid ""
>  "          minimum CPU frequency  -  maximum CPU frequency  -  governor\=
n"
>  msgstr ""
> -"         Fr_quence CPU minimale - Fr_quence CPU maximale  - r_gulateur\=
n"
> +"         Fr=C3=A9quence CPU minimale - Fr=C3=A9quence CPU maximale  - r=
=C3=A9gulateur\n"
> =20
>  #: utils/cpufreq-info.c:151
>  #, c-format
> @@ -302,12 +302,12 @@ msgstr "  pilote : %s\n"
>  #: utils/cpufreq-info.c:219
>  #, fuzzy, c-format
>  msgid "  CPUs which run at the same hardware frequency: "
> -msgstr "  CPUs qui doivent changer de fr_quences en m_me temps : "
> +msgstr "  CPUs qui doivent changer de fr=C3=A9quences en m=C3=AAme temps=
 : "
> =20
>  #: utils/cpufreq-info.c:230
>  #, fuzzy, c-format
>  msgid "  CPUs which need to have their frequency coordinated by software=
: "
> -msgstr "  CPUs qui doivent changer de fr_quences en m_me temps : "
> +msgstr "  CPUs qui doivent changer de fr=C3=A9quences en m=C3=AAme temps=
 : "
> =20
>  #: utils/cpufreq-info.c:241
>  #, c-format
> @@ -317,22 +317,22 @@ msgstr ""
>  #: utils/cpufreq-info.c:247
>  #, c-format
>  msgid "  hardware limits: "
> -msgstr "  limitation mat_rielle : "
> +msgstr "  limitation mat=C3=A9rielle : "
> =20
>  #: utils/cpufreq-info.c:256
>  #, c-format
>  msgid "  available frequency steps: "
> -msgstr "  plage de fr_quence : "
> +msgstr "  plage de fr=C3=A9quence : "
> =20
>  #: utils/cpufreq-info.c:269
>  #, c-format
>  msgid "  available cpufreq governors: "
> -msgstr "  r_gulateurs disponibles : "
> +msgstr "  r=C3=A9gulateurs disponibles : "
> =20
>  #: utils/cpufreq-info.c:280
>  #, c-format
>  msgid "  current policy: frequency should be within "
> -msgstr "  tactique actuelle : la fr_quence doit _tre comprise entre "
> +msgstr "  tactique actuelle : la fr=C3=A9quence doit =C3=AAtre comprise =
entre "
> =20
>  #: utils/cpufreq-info.c:282
>  #, c-format
> @@ -345,18 +345,18 @@ msgid ""
>  "The governor \"%s\" may decide which speed to use\n"
>  "                  within this range.\n"
>  msgstr ""
> -"Le r_gulateur \"%s\" est libre de choisir la vitesse\n"
> -"                  dans cette plage de fr_quences.\n"
> +"Le r=C3=A9gulateur \"%s\" est libre de choisir la vitesse\n"
> +"                  dans cette plage de fr=C3=A9quences.\n"
> =20
>  #: utils/cpufreq-info.c:293
>  #, c-format
>  msgid "  current CPU frequency is "
> -msgstr "  la fr_quence actuelle de ce CPU est "
> +msgstr "  la fr=C3=A9quence actuelle de ce CPU est "
> =20
>  #: utils/cpufreq-info.c:296
>  #, c-format
>  msgid " (asserted by call to hardware)"
> -msgstr " (v_rifi_ par un appel direct du mat_riel)"
> +msgstr " (v=C3=A9rifi=C3=A9 par un appel direct du mat=C3=A9riel)"
> =20
>  #: utils/cpufreq-info.c:304
>  #, c-format
> @@ -377,7 +377,7 @@ msgstr "Options :\n"
>  #: utils/cpufreq-info.c:474
>  #, fuzzy, c-format
>  msgid "  -e, --debug          Prints out debug information [default]\n"
> -msgstr "  -e, --debug          Afficher les informations de d_boguage\n"
> +msgstr "  -e, --debug          Afficher les informations de d=C3=A9bogua=
ge\n"
> =20
>  #: utils/cpufreq-info.c:475
>  #, c-format
> @@ -385,8 +385,8 @@ msgid ""
>  "  -f, --freq           Get frequency the CPU currently runs at, accordi=
ng\n"
>  "                       to the cpufreq core *\n"
>  msgstr ""
> -"  -f, --freq           Obtenir la fr_quence actuelle du CPU selon le po=
int\n"
> -"                       de vue du coeur du syst_me de cpufreq *\n"
> +"  -f, --freq           Obtenir la fr=C3=A9quence actuelle du CPU selon =
le point\n"
> +"                       de vue du coeur du syst=C3=A8me de cpufreq *\n"
> =20
>  #: utils/cpufreq-info.c:477
>  #, c-format
> @@ -394,8 +394,8 @@ msgid ""
>  "  -w, --hwfreq         Get frequency the CPU currently runs at, by read=
ing\n"
>  "                       it from hardware (only available to root) *\n"
>  msgstr ""
> -"  -w, --hwfreq         Obtenir la fr_quence actuelle du CPU directement=
 par\n"
> -"                       le mat_riel (doit _tre root) *\n"
> +"  -w, --hwfreq         Obtenir la fr=C3=A9quence actuelle du CPU direct=
ement par\n"
> +"                       le mat=C3=A9riel (doit =C3=AAtre root) *\n"
> =20
>  #: utils/cpufreq-info.c:479
>  #, c-format
> @@ -403,13 +403,13 @@ msgid ""
>  "  -l, --hwlimits       Determine the minimum and maximum CPU frequency "
>  "allowed *\n"
>  msgstr ""
> -"  -l, --hwlimits       Affiche les fr_quences minimales et maximales du=
 CPU "
> +"  -l, --hwlimits       Affiche les fr=C3=A9quences minimales et maximal=
es du CPU "
>  "*\n"
> =20
>  #: utils/cpufreq-info.c:480
>  #, c-format
>  msgid "  -d, --driver         Determines the used cpufreq kernel driver =
*\n"
> -msgstr "  -d, --driver         Affiche le pilote cpufreq utilis_ *\n"
> +msgstr "  -d, --driver         Affiche le pilote cpufreq utilis=C3=A9 *\=
n"
> =20
>  #: utils/cpufreq-info.c:481
>  #, c-format
> @@ -420,7 +420,7 @@ msgstr "  -p, --policy         Affiche la tactique ac=
tuelle de cpufreq *\n"
>  #, c-format
>  msgid "  -g, --governors      Determines available cpufreq governors *\n"
>  msgstr ""
> -"  -g, --governors      Affiche les r_gulateurs disponibles de cpufreq *=
\n"
> +"  -g, --governors      Affiche les r=C3=A9gulateurs disponibles de cpuf=
req *\n"
> =20
>  #: utils/cpufreq-info.c:483
>  #, fuzzy, c-format
> @@ -429,7 +429,7 @@ msgid ""
>  "frequency *\n"
>  msgstr ""
>  "  -a, --affected-cpus   Affiche quels sont les CPUs qui doivent changer=
 de\n"
> -"                        fr_quences en m_me temps *\n"
> +"                        fr=C3=A9quences en m=C3=AAme temps *\n"
> =20
>  #: utils/cpufreq-info.c:484
>  #, fuzzy, c-format
> @@ -438,7 +438,7 @@ msgid ""
>  "                       coordinated by software *\n"
>  msgstr ""
>  "  -a, --affected-cpus   Affiche quels sont les CPUs qui doivent changer=
 de\n"
> -"                        fr_quences en m_me temps *\n"
> +"                        fr=C3=A9quences en m=C3=AAme temps *\n"
> =20
>  #: utils/cpufreq-info.c:486
>  #, c-format
> @@ -453,7 +453,7 @@ msgid ""
>  "  -y, --latency        Determines the maximum latency on CPU frequency "
>  "changes *\n"
>  msgstr ""
> -"  -l, --hwlimits       Affiche les fr_quences minimales et maximales du=
 CPU "
> +"  -l, --hwlimits       Affiche les fr=C3=A9quences minimales et maximal=
es du CPU "
>  "*\n"
> =20
>  #: utils/cpufreq-info.c:488
> @@ -469,7 +469,7 @@ msgid ""
>  "                       interface in 2.4. and early 2.6. kernels\n"
>  msgstr ""
>  "  -o, --proc           Affiche les informations en utilisant l'interfac=
e\n"
> -"                       fournie par /proc/cpufreq, pr_sente dans les "
> +"                       fournie par /proc/cpufreq, pr=C3=A9sente dans le=
s "
>  "versions\n"
>  "                       2.4 et les anciennes versions 2.6 du noyau\n"
> =20
> @@ -485,7 +485,7 @@ msgstr ""
>  #: utils/cpufreq-info.c:492 utils/cpuidle-info.c:152
>  #, c-format
>  msgid "  -h, --help           Prints out this screen\n"
> -msgstr "  -h, --help           affiche l'aide-m_moire\n"
> +msgstr "  -h, --help           affiche l'aide-m=C3=A9moire\n"
> =20
>  #: utils/cpufreq-info.c:495
>  #, c-format
> @@ -493,8 +493,8 @@ msgid ""
>  "If no argument or only the -c, --cpu parameter is given, debug output a=
bout\n"
>  "cpufreq is printed which is useful e.g. for reporting bugs.\n"
>  msgstr ""
> -"Par d_faut, les informations de d_boguage seront affich_es si aucun\n"
> -"argument, ou bien si seulement l'argument -c (--cpu) est donn_, afin de=
\n"
> +"Par d=C3=A9faut, les informations de d=C3=A9boguage seront affich=C3=A9=
es si aucun\n"
> +"argument, ou bien si seulement l'argument -c (--cpu) est donn=C3=A9, af=
in de\n"
>  "faciliter les rapports de bogues par exemple\n"
> =20
>  #: utils/cpufreq-info.c:497
> @@ -517,8 +517,8 @@ msgid ""
>  "You can't specify more than one --cpu parameter and/or\n"
>  "more than one output-specific argument\n"
>  msgstr ""
> -"On ne peut indiquer plus d'un param_tre --cpu, tout comme l'on ne peut\=
n"
> -"sp_cifier plus d'un argument de formatage\n"
> +"On ne peut indiquer plus d'un param=C3=A8tre --cpu, tout comme l'on ne =
peut\n"
> +"sp=C3=A9cifier plus d'un argument de formatage\n"
> =20
>  #: utils/cpufreq-info.c:600 utils/cpufreq-set.c:82 utils/cpupower-set.c:=
42
>  #: utils/cpupower-info.c:42 utils/cpuidle-info.c:213
> @@ -529,7 +529,7 @@ msgstr "option invalide\n"
>  #: utils/cpufreq-info.c:617
>  #, c-format
>  msgid "couldn't analyze CPU %d as it doesn't seem to be present\n"
> -msgstr "analyse du CPU %d impossible puisqu'il ne semble pas _tre pr_sen=
t\n"
> +msgstr "analyse du CPU %d impossible puisqu'il ne semble pas =C3=AAtre p=
r=C3=A9sent\n"
> =20
>  #: utils/cpufreq-info.c:620 utils/cpupower-info.c:142
>  #, c-format
> @@ -547,8 +547,8 @@ msgid ""
>  "  -d FREQ, --min FREQ      new minimum CPU frequency the governor may "
>  "select\n"
>  msgstr ""
> -"  -d FREQ, --min FREQ       nouvelle fr_quence minimale du CPU _ utilis=
er\n"
> -"                            par le r_gulateur\n"
> +"  -d FREQ, --min FREQ       nouvelle fr=C3=A9quence minimale du CPU =C3=
=A0 utiliser\n"
> +"                            par le r=C3=A9gulateur\n"
> =20
>  #: utils/cpufreq-set.c:28
>  #, c-format
> @@ -556,13 +556,13 @@ msgid ""
>  "  -u FREQ, --max FREQ      new maximum CPU frequency the governor may "
>  "select\n"
>  msgstr ""
> -"  -u FREQ, --max FREQ       nouvelle fr_quence maximale du CPU _ utilis=
er\n"
> -"                            par le r_gulateur\n"
> +"  -u FREQ, --max FREQ       nouvelle fr=C3=A9quence maximale du CPU =C3=
=A0 utiliser\n"
> +"                            par le r=C3=A9gulateur\n"
> =20
>  #: utils/cpufreq-set.c:29
>  #, c-format
>  msgid "  -g GOV, --governor GOV   new cpufreq governor\n"
> -msgstr "  -g GOV, --governor GOV   active le r_gulateur GOV\n"
> +msgstr "  -g GOV, --governor GOV   active le r=C3=A9gulateur GOV\n"
> =20
>  #: utils/cpufreq-set.c:30
>  #, c-format
> @@ -570,9 +570,9 @@ msgid ""
>  "  -f FREQ, --freq FREQ     specific frequency to be set. Requires users=
pace\n"
>  "                           governor to be available and loaded\n"
>  msgstr ""
> -"  -f FREQ, --freq FREQ     fixe la fr_quence du processeur _ FREQ. Il f=
aut\n"
> -"                           que le r_gulateur _ userspace _ soit disponi=
ble \n"
> -"                           et activ_.\n"
> +"  -f FREQ, --freq FREQ     fixe la fr=C3=A9quence du processeur =C3=A0 =
FREQ. Il faut\n"
> +"                           que le r=C3=A9gulateur =C2=AB userspace =C2=
=BB soit disponible \n"
> +"                           et activ=C3=A9.\n"
> =20
>  #: utils/cpufreq-set.c:32
>  #, c-format
> @@ -582,7 +582,7 @@ msgstr ""
>  #: utils/cpufreq-set.c:33 utils/cpupower-set.c:28 utils/cpupower-info.c:=
27
>  #, fuzzy, c-format
>  msgid "  -h, --help               Prints out this screen\n"
> -msgstr "  -h, --help           affiche l'aide-m_moire\n"
> +msgstr "  -h, --help           affiche l'aide-m=C3=A9moire\n"
> =20
>  #: utils/cpufreq-set.c:35
>  #, fuzzy, c-format
> @@ -602,11 +602,11 @@ msgid ""
>  "   (FREQuency in kHz =3D^ Hz * 0.001 =3D^ MHz * 1000 =3D^ GHz * 1000000=
).\n"
>  msgstr ""
>  "Remarque :\n"
> -"1. Le CPU num_ro 0 sera utilis_ par d_faut si -c (ou --cpu) est omis ;\=
n"
> -"2. l'argument -f FREQ (ou --freq FREQ) ne peut _tre utilis_ qu'avec --c=
pu ;\n"
> -"3. on pourra pr_ciser l'unit_ des fr_quences en postfixant sans aucune "
> +"1. Le CPU num=C3=A9ro 0 sera utilis=C3=A9 par d=C3=A9faut si -c (ou --c=
pu) est omis ;\n"
> +"2. l'argument -f FREQ (ou --freq FREQ) ne peut =C3=AAtre utilis=C3=A9 q=
u'avec --cpu ;\n"
> +"3. on pourra pr=C3=A9ciser l'unit=C3=A9 des fr=C3=A9quences en postfixa=
nt sans aucune "
>  "espace\n"
> -"   les valeurs par hz, kHz (par d_faut), MHz, GHz ou THz\n"
> +"   les valeurs par hz, kHz (par d=C3=A9faut), MHz, GHz ou THz\n"
>  "   (kHz =3D^ Hz * 0.001 =3D^ MHz * 1000 =3D^ GHz * 1000000).\n"
> =20
>  #: utils/cpufreq-set.c:57
> @@ -622,21 +622,21 @@ msgid ""
>  "frequency\n"
>  "   or because the userspace governor isn't loaded?\n"
>  msgstr ""
> -"En ajustant les nouveaux param_tres, une erreur est apparue. Les source=
s\n"
> +"En ajustant les nouveaux param=C3=A8tres, une erreur est apparue. Les s=
ources\n"
>  "d'erreur typique sont :\n"
> -"- droit d'administration insuffisant (_tes-vous root ?) ;\n"
> -"- le r_gulateur choisi n'est pas disponible, ou bien n'est pas disponib=
le "
> +"- droit d'administration insuffisant (=C3=AAtes-vous root ?) ;\n"
> +"- le r=C3=A9gulateur choisi n'est pas disponible, ou bien n'est pas dis=
ponible "
>  "en\n"
>  "  tant que module noyau ;\n"
>  "- la tactique n'est pas disponible ;\n"
> -"- vous voulez utiliser l'option -f/--freq, mais le r_gulateur _ userspa=
ce _\n"
> -"  n'est pas disponible, par exemple parce que le mat_riel ne le support=
e\n"
> -"  pas, ou bien n'est tout simplement pas charg_.\n"
> +"- vous voulez utiliser l'option -f/--freq, mais le r=C3=A9gulateur =C2=
=AB userspace =C2=BB\n"
> +"  n'est pas disponible, par exemple parce que le mat=C3=A9riel ne le su=
pporte\n"
> +"  pas, ou bien n'est tout simplement pas charg=C3=A9.\n"
> =20
>  #: utils/cpufreq-set.c:170
>  #, c-format
>  msgid "wrong, unknown or unhandled CPU?\n"
> -msgstr "CPU inconnu ou non support_ ?\n"
> +msgstr "CPU inconnu ou non support=C3=A9 ?\n"
> =20
>  #: utils/cpufreq-set.c:302
>  #, c-format
> @@ -653,7 +653,7 @@ msgid ""
>  "At least one parameter out of -f/--freq, -d/--min, -u/--max, and\n"
>  "-g/--governor must be passed\n"
>  msgstr ""
> -"L'un de ces param_tres est obligatoire : -f/--freq, -d/--min, -u/--max =
et\n"
> +"L'un de ces param=C3=A8tres est obligatoire : -f/--freq, -d/--min, -u/-=
-max et\n"
>  "-g/--governor\n"
> =20
>  #: utils/cpufreq-set.c:347
> @@ -810,7 +810,7 @@ msgstr ""
>  #: utils/cpuidle-info.c:48
>  #, fuzzy, c-format
>  msgid "Available idle states:"
> -msgstr "  plage de fr_quence : "
> +msgstr "  plage de fr=C3=A9quence : "
> =20
>  #: utils/cpuidle-info.c:71
>  #, c-format
> @@ -911,7 +911,7 @@ msgstr "Usage : cpufreq-info [options]\n"
>  #: utils/cpuidle-info.c:149
>  #, fuzzy, c-format
>  msgid "  -s, --silent         Only show general C-state information\n"
> -msgstr "  -e, --debug          Afficher les informations de d_boguage\n"
> +msgstr "  -e, --debug          Afficher les informations de d=C3=A9bogua=
ge\n"
> =20
>  #: utils/cpuidle-info.c:150
>  #, fuzzy, c-format
> @@ -921,7 +921,7 @@ msgid ""
>  "                       interface in older kernels\n"
>  msgstr ""
>  "  -o, --proc           Affiche les informations en utilisant l'interfac=
e\n"
> -"                       fournie par /proc/cpufreq, pr_sente dans les "
> +"                       fournie par /proc/cpufreq, pr=C3=A9sente dans le=
s "
>  "versions\n"
>  "                       2.4 et les anciennes versions 2.6 du noyau\n"
> =20
> @@ -929,19 +929,19 @@ msgstr ""
>  #, fuzzy, c-format
>  msgid "You can't specify more than one output-specific argument\n"
>  msgstr ""
> -"On ne peut indiquer plus d'un param_tre --cpu, tout comme l'on ne peut\=
n"
> -"sp_cifier plus d'un argument de formatage\n"
> +"On ne peut indiquer plus d'un param=C3=A8tre --cpu, tout comme l'on ne =
peut\n"
> +"sp=C3=A9cifier plus d'un argument de formatage\n"
> =20
>  #~ msgid ""
>  #~ "  -c CPU, --cpu CPU    CPU number which information shall be determi=
ned "
>  #~ "about\n"
>  #~ msgstr ""
> -#~ "  -c CPU, --cpu CPU    Num_ro du CPU pour lequel l'information sera "
> -#~ "affich_e\n"
> +#~ "  -c CPU, --cpu CPU    Num=C3=A9ro du CPU pour lequel l'information =
sera "
> +#~ "affich=C3=A9e\n"
> =20
>  #~ msgid ""
>  #~ "  -c CPU, --cpu CPU        number of CPU where cpufreq settings shal=
l be "
>  #~ "modified\n"
>  #~ msgstr ""
> -#~ "  -c CPU, --cpu CPU        num_ro du CPU _ prendre en compte pour le=
s\n"
> +#~ "  -c CPU, --cpu CPU        num=C3=A9ro du CPU =C3=A0 prendre en comp=
te pour les\n"
>  #~ "                           changements\n"

^ permalink raw reply

* Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8
From: Joe Perches @ 2018-07-25  0:13 UTC (permalink / raw)
  To: Andrew Morton, Arnd Bergmann
  Cc: Samuel Ortiz, David S. Miller, Rob Herring, Michael Ellerman,
	Jonathan Cameron, linux-wireless, netdev, devicetree,
	linux-kernel, linux-arm-kernel, linux-crypto, linuxppc-dev,
	linux-iio, linux-pm, lvs-devel, netfilter-devel, coreteam
In-Reply-To: <20180724140010.e24a9964fd340afe2d98a994@linux-foundation.org>

On Tue, 2018-07-24 at 14:00 -0700, Andrew Morton wrote:
> On Tue, 24 Jul 2018 13:13:25 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
> > Almost all files in the kernel are either plain text or UTF-8
> > encoded. A couple however are ISO_8859-1, usually just a few
> > characters in a C comments, for historic reasons.
> > This converts them all to UTF-8 for consistency.
[]
> Will we be getting a checkpatch rule to keep things this way?

How would that be done?

^ permalink raw reply

* Re: [PATCH v4 00/11] hugetlb: Factorize hugetlb architecture primitives
From: Paul Burton @ 2018-07-25  0:34 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: linux, catalin.marinas, will.deacon, tony.luck, fenghua.yu, ralf,
	jhogan, jejb, deller, benh, paulus, mpe, ysato, dalias, davem,
	tglx, mingo, hpa, x86, arnd, linux-arm-kernel, linux-kernel,
	linux-ia64, linux-mips, linux-parisc, linuxppc-dev, linux-sh,
	sparclinux, linux-arch, Naoya Horiguchi, Mike Kravetz
In-Reply-To: <20180705110716.3919-1-alex@ghiti.fr>

Hi Alexandre,

On Thu, Jul 05, 2018 at 11:07:05AM +0000, Alexandre Ghiti wrote:
> In order to reduce copy/paste of functions across architectures and then
> make riscv hugetlb port (and future ports) simpler and smaller, this
> patchset intends to factorize the numerous hugetlb primitives that are
> defined across all the architectures.
> 
> Except for prepare_hugepage_range, this patchset moves the versions that
> are just pass-through to standard pte primitives into
> asm-generic/hugetlb.h by using the same #ifdef semantic that can be
> found in asm-generic/pgtable.h, i.e. __HAVE_ARCH_***.
> 
> s390 architecture has not been tackled in this serie since it does not
> use asm-generic/hugetlb.h at all.
> powerpc could be factorized a bit more (cf huge_ptep_set_wrprotect).
> 
> This patchset has been compiled on x86 only. 

For MIPS these look good - I don't see any issues & they pass a build
test (using cavium_octeon_defconfig which enables huge pages), so:

    Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts

Thanks,
    Paul

^ permalink raw reply

* Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8
From: Andrew Morton @ 2018-07-25  0:55 UTC (permalink / raw)
  To: Joe Perches
  Cc: Arnd Bergmann, Samuel Ortiz, David S. Miller, Rob Herring,
	Michael Ellerman, Jonathan Cameron, linux-wireless, netdev,
	devicetree, linux-kernel, linux-arm-kernel, linux-crypto,
	linuxppc-dev, linux-iio, linux-pm, lvs-devel, netfilter-devel,
	coreteam
In-Reply-To: <aa39b6d15f44555cb79d5ebdef74aeab19003d6a.camel@perches.com>

On Tue, 24 Jul 2018 17:13:20 -0700 Joe Perches <joe@perches.com> wrote:

> On Tue, 2018-07-24 at 14:00 -0700, Andrew Morton wrote:
> > On Tue, 24 Jul 2018 13:13:25 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
> > > Almost all files in the kernel are either plain text or UTF-8
> > > encoded. A couple however are ISO_8859-1, usually just a few
> > > characters in a C comments, for historic reasons.
> > > This converts them all to UTF-8 for consistency.
> []
> > Will we be getting a checkpatch rule to keep things this way?
> 
> How would that be done?

I'm using this, seems to work.

        if ! file $p | grep -q -P ", ASCII text|, UTF-8 Unicode text"
        then
                echo $p: weird charset
        fi

^ permalink raw reply

* Re: [PATCH v7 4/4] kexec_file: Load kernel at top of system RAM if required
From: Baoquan He @ 2018-07-25  2:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, robh+dt, dan.j.williams, nicolas.pitre, josh,
	fengguang.wu, bp, andy.shevchenko, patrik.r.jakobsson, airlied,
	kys, haiyangz, sthemmin, dmitry.torokhov, frowand.list,
	keith.busch, jonathan.derrick, lorenzo.pieralisi, bhelgaas, tglx,
	brijesh.singh, jglisse, thomas.lendacky, gregkh, baiyaowei,
	richard.weiyang, devel, linux-input, linux-nvdimm, devicetree,
	linux-pci, ebiederm, vgoyal, dyoung, yinghai, monstr, davem,
	chris, jcmvbkbc, gustavo, maarten.lankhorst, seanpaul,
	linux-parisc, linuxppc-dev, kexec
In-Reply-To: <20180719124444.c893712cca2e6f2649d1bc0d@linux-foundation.org>

Hi Andrew,

On 07/19/18 at 12:44pm, Andrew Morton wrote:
> On Thu, 19 Jul 2018 23:17:53 +0800 Baoquan He <bhe@redhat.com> wrote:
> > > As far as I can tell, the above is the whole reason for the patchset,
> > > yes?  To avoid confusing users.
> > 
> > 
> > In fact, it's not just trying to avoid confusing users. Kexec loading
> > and kexec_file loading are just do the same thing in essence. Just we
> > need do kernel image verification on uefi system, have to port kexec
> > loading code to kernel. 
> > 
> > Kexec has been a formal feature in our distro, and customers owning
> > those kind of very large machine can make use of this feature to speed
> > up the reboot process. On uefi machine, the kexec_file loading will
> > search place to put kernel under 4G from top to down. As we know, the
> > 1st 4G space is DMA32 ZONE, dma, pci mmcfg, bios etc all try to consume
> > it. It may have possibility to not be able to find a usable space for
> > kernel/initrd. From the top down of the whole memory space, we don't
> > have this worry. 
> > 
> > And at the first post, I just posted below with AKASHI's
> > walk_system_ram_res_rev() version. Later you suggested to use
> > list_head to link child sibling of resource, see what the code change
> > looks like.
> > http://lkml.kernel.org/r/20180322033722.9279-1-bhe@redhat.com
> > 
> > Then I posted v2
> > http://lkml.kernel.org/r/20180408024724.16812-1-bhe@redhat.com
> > Rob Herring mentioned that other components which has this tree struct
> > have planned to do the same thing, replacing the singly linked list with
> > list_head to link resource child sibling. Just quote Rob's words as
> > below. I think this could be another reason.
> > 
> > ~~~~~ From Rob
> > The DT struct device_node also has the same tree structure with
> > parent, child, sibling pointers and converting to list_head had been
> > on the todo list for a while. ACPI also has some tree walking
> > functions (drivers/acpi/acpica/pstree.c). Perhaps there should be a
> > common tree struct and helpers defined either on top of list_head or a
> > ~~~~~
> > new struct if that saves some size.
> 
> Please let's get all this into the changelogs?

Sorry for late reply because of some urgent customer hotplug issues.

I am rewriting all change logs, and cover letter. Then found I was wrong
about the 2nd reason. The current kexec_file_load calls
kexec_locate_mem_hole() to go through all system RAM region, if one
region is larger than the size of kernel or initrd, it will search a
position in that region from top to down. Since kexec will jump to 2nd
kernel and don't need to care the 1st kernel's data, we can always find
a usable space to load kexec kernel/initrd under 4G.

So the only reason for this patch is keeping consistent with kexec_load
and avoid confusion.

And since x86 5-level paging mode has been added, we have another issue
for top-down searching in the whole system RAM. That is we support
dynamic 4-level to 5-level changing. Namely a kernel compiled with
5-level support, we can add 'no5lvl' to force 4-level. Then jumping from
a 5-level kernel to 4-level kernel, e.g we load kernel at the top of
system RAM in 5-level paging mode which might be bigger than 64TB, then
try to jump to 4-level kernel with the upper limit of 64TB. For this
case, we need add limit for kexec kernel loading if in 5-level kernel.

All this mess makes me hesitate to choose a deligate method. Maybe I
should drop this patchset.

> 
> > > 
> > > Is that sufficient?  Can we instead simplify their lives by providing
> > > better documentation or informative printks or better Kconfig text,
> > > etc?
> > > 
> > > And who *are* the people who are performing this configuration?  Random
> > > system administrators?  Linux distro engineers?  If the latter then
> > > they presumably aren't easily confused!
> > 
> > Kexec was invented for kernel developer to speed up their kernel
> > rebooting. Now high end sever admin, kernel developer and QE are also
> > keen to use it to reboot large box for faster feature testing, bug
> > debugging. Kernel dev could know this well, about kernel loading
> > position, admin or QE might not be aware of it very well. 
> > 
> > > 
> > > In other words, I'm trying to understand how much benefit this patchset
> > > will provide to our users as a whole.
> > 
> > Understood. The list_head replacing patch truly involes too many code
> > changes, it's risky. I am willing to try any idea from reviewers, won't
> > persuit they have to be accepted finally. If don't have a try, we don't
> > know what it looks like, and what impact it may have. I am fine to take
> > AKASHI's simple version of walk_system_ram_res_rev() to lower risk, even
> > though it could be a little bit low efficient.
> 
> The larger patch produces a better result.  We can handle it ;)

For this issue, if we stop changing the kexec top down searching code,
I am not sure if we should post this replacing with list_head patches
separately.

Thanks
Baoquan

^ permalink raw reply

* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Anshuman Khandual @ 2018-07-25  3:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel,
	virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren,
	david
In-Reply-To: <20180723120511-mutt-send-email-mst@kernel.org>

On 07/23/2018 02:38 PM, Michael S. Tsirkin wrote:
> On Mon, Jul 23, 2018 at 11:58:23AM +0530, Anshuman Khandual wrote:
>> On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote:
>>> On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
>>>> This patch series is the follow up on the discussions we had before about
>>>> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
>>>> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
>>>> were suggestions about doing away with two different paths of transactions
>>>> with the host/QEMU, first being the direct GPA and the other being the DMA
>>>> API based translations.
>>>>
>>>> First patch attempts to create a direct GPA mapping based DMA operations
>>>> structure called 'virtio_direct_dma_ops' with exact same implementation
>>>> of the direct GPA path which virtio core currently has but just wrapped in
>>>> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
>>>> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
>>>> existing semantics. The second patch does exactly that inside the function
>>>> virtio_finalize_features(). The third patch removes the default direct GPA
>>>> path from virtio core forcing it to use DMA API callbacks for all devices.
>>>> Now with that change, every device must have a DMA operations structure
>>>> associated with it. The fourth patch adds an additional hook which gives
>>>> the platform an opportunity to do yet another override if required. This
>>>> platform hook can be used on POWER Ultravisor based protected guests to
>>>> load up SWIOTLB DMA callbacks to do the required (as discussed previously
>>>> in the above mentioned thread how host is allowed to access only parts of
>>>> the guest GPA range) bounce buffering into the shared memory for all I/O
>>>> scatter gather buffers to be consumed on the host side.
>>>>
>>>> Please go through these patches and review whether this approach broadly
>>>> makes sense. I will appreciate suggestions, inputs, comments regarding
>>>> the patches or the approach in general. Thank you.
>>> I like how patches 1-3 look. Could you test performance
>>> with/without to see whether the extra indirection through
>>> use of DMA ops causes a measurable slow-down?
>>
>> I ran this simple DD command 10 times where /dev/vda is a virtio block
>> device of 10GB size.
>>
>> dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct
>>
>> With and without patches bandwidth which has a bit wide range does not
>> look that different from each other.
>>
>> Without patches
>> ===============
>>
>> ---------- 1 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s
>> ---------- 2 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s
>> ---------- 3 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s
>> ---------- 4 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s
>> ---------- 5 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s
>> ---------- 6 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s
>> ---------- 7 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s
>> ---------- 8 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s
>> ---------- 9 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s
>> ---------- 10 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s
>>
>>
>> With patches
>> ============
>>
>> ---------- 1 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s
>> ---------- 2 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s
>> ---------- 3 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s
>> ---------- 4 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s
>> ---------- 5 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s
>> ---------- 6 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s
>> ---------- 7 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s
>> ---------- 8 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s
>> ---------- 9 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.67647 s, 1.5 GB/s
>> ---------- 10 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.93957 s, 2.9 GB/s
>>
>> Does this look okay ?
> 
> You want to test IOPS with lots of small writes and using
> raw ramdisk on host.

Hello Michael,

I have conducted the following experiments and here are the results.

TEST SETUP
==========

A virtio block disk is mounted on the guest as follows.


    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' ioeventfd='off'/>
      <source file='/mnt/disk2.img'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

In the host back end its an QEMU raw image on tmpfs file system.

disk:

-rw-r--r-- 1 libvirt-qemu kvm  5.0G Jul 24 06:26 disk2.img

mount:

size=21G on /mnt type tmpfs (rw,relatime,size=22020096k)

TEST CONFIG
===========

FIO (https://linux.die.net/man/1/fio) is being run with and without
the patches.

Read test config:

[Sequential]
direct=1
ioengine=libaio
runtime=5m
time_based
filename=/dev/vda
bs=4k
numjobs=16
rw=read
unlink=1
iodepth=256


Write test config:

[Sequential]
direct=1
ioengine=libaio
runtime=5m
time_based
filename=/dev/vda
bs=4k
numjobs=16
rw=write
unlink=1
iodepth=256

The virtio block device comes up as /dev/vda on the guest with

/sys/block/vda/queue/nr_requests=128

TEST RESULTS
============

Without the patches
-------------------

Read test:

Run status group 0 (all jobs):
   READ: bw=550MiB/s (577MB/s), 33.2MiB/s-35.6MiB/s (34.9MB/s-37.4MB/s), io=161GiB (173GB), run=300001-300009msec

Disk stats (read/write):
  vda: ios=42249926/0, merge=0/0, ticks=1499920/0, in_queue=35672384, util=100.00%


Write test:

Run status group 0 (all jobs):
  WRITE: bw=514MiB/s (539MB/s), 31.5MiB/s-34.6MiB/s (33.0MB/s-36.2MB/s), io=151GiB (162GB), run=300001-300009msec

Disk stats (read/write):
  vda: ios=29/39459261, merge=0/0, ticks=0/1570580, in_queue=35745992, util=100.00%

With the patches
----------------

Read test:

Run status group 0 (all jobs):
   READ: bw=572MiB/s (600MB/s), 35.0MiB/s-37.2MiB/s (36.7MB/s-38.0MB/s), io=168GiB (180GB), run=300001-300006msec

Disk stats (read/write):
  vda: ios=43917611/0, merge=0/0, ticks=1934268/0, in_queue=35531688, util=100.00%
  
Write test:

Run status group 0 (all jobs):
  WRITE: bw=546MiB/s (572MB/s), 33.7MiB/s-35.0MiB/s (35.3MB/s-36.7MB/s), io=160GiB (172GB), run=300001-300007msec

Disk stats (read/write):
  vda: ios=14/41893878, merge=0/0, ticks=8/2107816, in_queue=35535716, util=100.00%

Results with and without the patches are similar.

^ permalink raw reply

* Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8
From: Michael Ellerman @ 2018-07-25  4:20 UTC (permalink / raw)
  To: Arnd Bergmann, Andrew Morton
  Cc: Joe Perches, Arnd Bergmann, Samuel Ortiz, David S. Miller,
	Rob Herring, Jonathan Cameron, linux-wireless, netdev, devicetree,
	linux-kernel, linux-arm-kernel, linux-crypto, linuxppc-dev,
	linux-iio, linux-pm, lvs-devel, netfilter-devel, coreteam
In-Reply-To: <20180724111600.4158975-1-arnd@arndb.de>

Arnd Bergmann <arnd@arndb.de> writes:

> Almost all files in the kernel are either plain text or UTF-8
> encoded. A couple however are ISO_8859-1, usually just a few
> characters in a C comments, for historic reasons.
>
> This converts them all to UTF-8 for consistency.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
...
>  drivers/crypto/vmx/ghashp8-ppc.pl             |  12 +-
...
> diff --git a/drivers/crypto/vmx/ghashp8-ppc.pl b/drivers/crypto/vmx/ghash=
p8-ppc.pl
> index f746af271460..38b06503ede0 100644
> --- a/drivers/crypto/vmx/ghashp8-ppc.pl
> +++ b/drivers/crypto/vmx/ghashp8-ppc.pl
> @@ -129,9 +129,9 @@ $code=3D<<___;
>  	 le?vperm	$IN,$IN,$IN,$lemask
>  	vxor		$zero,$zero,$zero
>=20=20
> -	vpmsumd		$Xl,$IN,$Hl		# H.loXi.lo
> -	vpmsumd		$Xm,$IN,$H		# H.hiXi.lo+H.loXi.hi
> -	vpmsumd		$Xh,$IN,$Hh		# H.hiXi.hi
> +	vpmsumd		$Xl,$IN,$Hl		# H.lo=C2=B7Xi.lo
> +	vpmsumd		$Xm,$IN,$H		# H.hi=C2=B7Xi.lo+H.lo=C2=B7Xi.hi
> +	vpmsumd		$Xh,$IN,$Hh		# H.hi=C2=B7Xi.hi
>=20=20
>  	vpmsumd		$t2,$Xl,$xC2		# 1st phase
>=20=20
> @@ -187,11 +187,11 @@ $code=3D<<___;
>  .align	5
>  Loop:
>  	 subic		$len,$len,16
> -	vpmsumd		$Xl,$IN,$Hl		# H.loXi.lo
> +	vpmsumd		$Xl,$IN,$Hl		# H.lo=C2=B7Xi.lo
>  	 subfe.		r0,r0,r0		# borrow?-1:0
> -	vpmsumd		$Xm,$IN,$H		# H.hiXi.lo+H.loXi.hi
> +	vpmsumd		$Xm,$IN,$H		# H.hi=C2=B7Xi.lo+H.lo=C2=B7Xi.hi
>  	 and		r0,r0,$len
> -	vpmsumd		$Xh,$IN,$Hh		# H.hiXi.hi
> +	vpmsumd		$Xh,$IN,$Hh		# H.hi=C2=B7Xi.hi
>  	 add		$inp,$inp,r0
>=20=20
>  	vpmsumd		$t2,$Xl,$xC2		# 1st phase

Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

cheers

^ permalink raw reply

* Re: [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices
From: Anshuman Khandual @ 2018-07-25  4:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel,
	virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren,
	david
In-Reply-To: <3dd36d8e-3bc8-91ba-cf6d-939495439d2c@linux.vnet.ibm.com>

On 07/23/2018 07:46 AM, Anshuman Khandual wrote:
> On 07/20/2018 06:45 PM, Michael S. Tsirkin wrote:
>> On Fri, Jul 20, 2018 at 09:29:41AM +0530, Anshuman Khandual wrote:
>>> Subject: Re: [RFC 4/4] virtio: Add platform specific DMA API translation for
>>> virito devices
>>
>> s/virito/virtio/
> 
> Oops, will fix it. Thanks for pointing out.
> 
>>
>>> This adds a hook which a platform can define in order to allow it to
>>> override virtio device's DMA OPS irrespective of whether it has the
>>> flag VIRTIO_F_IOMMU_PLATFORM set or not. We want to use this to do
>>> bounce-buffering of data on the new secure pSeries platform, currently
>>> under development, where a KVM host cannot access all of the memory
>>> space of a secure KVM guest.  The host can only access the pages which
>>> the guest has explicitly requested to be shared with the host, thus
>>> the virtio implementation in the guest has to copy data to and from
>>> shared pages.
>>>
>>> With this hook, the platform code in the secure guest can force the
>>> use of swiotlb for virtio buffers, with a back-end for swiotlb which
>>> will use a pool of pre-allocated shared pages.  Thus all data being
>>> sent or received by virtio devices will be copied through pages which
>>> the host has access to.
>>>
>>> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/include/asm/dma-mapping.h | 6 ++++++
>>>  arch/powerpc/platforms/pseries/iommu.c | 6 ++++++
>>>  drivers/virtio/virtio.c                | 7 +++++++
>>>  3 files changed, 19 insertions(+)
>>>
>>> diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
>>> index 8fa3945..bc5a9d3 100644
>>> --- a/arch/powerpc/include/asm/dma-mapping.h
>>> +++ b/arch/powerpc/include/asm/dma-mapping.h
>>> @@ -116,3 +116,9 @@ extern u64 __dma_get_required_mask(struct device *dev);
>>>  
>>>  #endif /* __KERNEL__ */
>>>  #endif	/* _ASM_DMA_MAPPING_H */
>>> +
>>> +#define platform_override_dma_ops platform_override_dma_ops
>>> +
>>> +struct virtio_device;
>>> +
>>> +extern void platform_override_dma_ops(struct virtio_device *vdev);
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>> index 06f0296..5773bc7 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -38,6 +38,7 @@
>>>  #include <linux/of.h>
>>>  #include <linux/iommu.h>
>>>  #include <linux/rculist.h>
>>> +#include <linux/virtio.h>
>>>  #include <asm/io.h>
>>>  #include <asm/prom.h>
>>>  #include <asm/rtas.h>
>>> @@ -1396,3 +1397,8 @@ static int __init disable_multitce(char *str)
>>>  __setup("multitce=", disable_multitce);
>>>  
>>>  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
>>> +
>>> +void platform_override_dma_ops(struct virtio_device *vdev)
>>> +{
>>> +	/* Override vdev->parent.dma_ops if required */
>>> +}
>>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>>> index 6b13987..432c332 100644
>>> --- a/drivers/virtio/virtio.c
>>> +++ b/drivers/virtio/virtio.c
>>> @@ -168,6 +168,12 @@ EXPORT_SYMBOL_GPL(virtio_add_status);
>>>  
>>>  const struct dma_map_ops virtio_direct_dma_ops;
>>>  
>>> +#ifndef platform_override_dma_ops
>>> +static inline void platform_override_dma_ops(struct virtio_device *vdev)
>>> +{
>>> +}
>>> +#endif
>>> +
>>>  int virtio_finalize_features(struct virtio_device *dev)
>>>  {
>>>  	int ret = dev->config->finalize_features(dev);
>>> @@ -179,6 +185,7 @@ int virtio_finalize_features(struct virtio_device *dev)
>>>  	if (virtio_has_iommu_quirk(dev))
>>>  		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
>>>  
>>> +	platform_override_dma_ops(dev);
>>
>> Is there a single place where virtio_has_iommu_quirk is called now?
> 
> Not other than this one. But in the proposed implementation of
> platform_override_dma_ops on powerpc, we will again check on
> virtio_has_iommu_quirk before overriding it with SWIOTLB.
> 
> void platform_override_dma_ops(struct virtio_device *vdev)
> {
>         if (is_ultravisor_platform() && virtio_has_iommu_quirk(vdev))
>                 set_dma_ops(vdev->dev.parent, &swiotlb_dma_ops);
> }
> 
>> If so, we could put this into virtio_has_iommu_quirk then.
> 
> Did you mean platform_override_dma_ops instead ? If so, yes that
> is possible. Default implementation of platform_override_dma_ops
> should just check on VIRTIO_F_IOMMU_PLATFORM feature and override
> with virtio_direct_dma_ops but arch implementation can check on
> what ever else they would like and override appropriately.
> 
> Default platform_override_dma_ops will be like this
> 
> #ifndef platform_override_dma_ops
> static inline void platform_override_dma_ops(struct virtio_device *vdev)
> {
> 	if(!virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
> 		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
> }
> #endif
> 
> Proposed powerpc implementation will be like this instead
> 
> void platform_override_dma_ops(struct virtio_device *vdev)
> {
> 	if (virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
> 		return;
> 
>         if (is_ultravisor_platform())
>                 set_dma_ops(vdev->dev.parent, &swiotlb_dma_ops);
> 	else
> 		set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
> 	
> }
> 
> There is a redundant definition of virtio_has_iommu_quirk in the tools
> directory (tools/virtio/linux/virtio_config.h) which does not seem to
> be used any where. I guess that can be removed without problem.

Does this sound okay ? It will merge patch 3 and 4 into a single one.
On the other hand it also passes the responsibility of dealing with
VIRTIO_F_IOMMU_PLATFORM flag to the architecture callback which might
be bit problematic. Keeping VIRTIO_F_IOMMU_PLATFORM handling in virtio
core at least makes sure that the device has a working DMA ops to fall
back on if the arch hook fails to take care of it somehow.

^ permalink raw reply

* Re: [PATCH v3 1/1] KVM: PPC: Book3S HV: pack VCORE IDs to access full VCPU ID space
From: Sam Bobroff @ 2018-07-25  5:26 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, kvm, kvm-ppc, david, clg
In-Reply-To: <20180723054337.GA29207@fergus>

[-- Attachment #1: Type: text/plain, Size: 5175 bytes --]

On Mon, Jul 23, 2018 at 03:43:37PM +1000, Paul Mackerras wrote:
> On Thu, Jul 19, 2018 at 12:25:10PM +1000, Sam Bobroff wrote:
> > From: Sam Bobroff <sam.bobroff@au1.ibm.com>
> > 
> > It is not currently possible to create the full number of possible
> > VCPUs (KVM_MAX_VCPUS) on Power9 with KVM-HV when the guest uses less
> > threads per core than it's core stride (or "VSMT mode"). This is
> > because the VCORE ID and XIVE offsets to grow beyond KVM_MAX_VCPUS
> > even though the VCPU ID is less than KVM_MAX_VCPU_ID.
> > 
> > To address this, "pack" the VCORE ID and XIVE offsets by using
> > knowledge of the way the VCPU IDs will be used when there are less
> > guest threads per core than the core stride. The primary thread of
> > each core will always be used first. Then, if the guest uses more than
> > one thread per core, these secondary threads will sequentially follow
> > the primary in each core.
> > 
> > So, the only way an ID above KVM_MAX_VCPUS can be seen, is if the
> > VCPUs are being spaced apart, so at least half of each core is empty
> > and IDs between KVM_MAX_VCPUS and (KVM_MAX_VCPUS * 2) can be mapped
> > into the second half of each core (4..7, in an 8-thread core).
> > 
> > Similarly, if IDs above KVM_MAX_VCPUS * 2 are seen, at least 3/4 of
> > each core is being left empty, and we can map down into the second and
> > third quarters of each core (2, 3 and 5, 6 in an 8-thread core).
> > 
> > Lastly, if IDs above KVM_MAX_VCPUS * 4 are seen, only the primary
> > threads are being used and 7/8 of the core is empty, allowing use of
> > the 1, 3, 5 and 7 thread slots.
> > 
> > (Strides less than 8 are handled similarly.)
> > 
> > This allows the VCORE ID or offset to be calculated quickly from the
> > VCPU ID or XIVE server numbers, without access to the VCPU structure.
> > 
> > Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
> 
> I have some comments relating to the situation where the stride
> (i.e. kvm->arch.emul_smt_mode) is less than 8; see below.
> 
> [snip]
> > +static inline u32 kvmppc_pack_vcpu_id(struct kvm *kvm, u32 id)
> > +{
> > +	const int block_offsets[MAX_SMT_THREADS] = {0, 4, 2, 6, 1, 3, 5, 7};
> 
> This needs to be {0, 4, 2, 6, 1, 5, 3, 7} (with the 3 and 5 swapped
> from what you have) for the case when stride == 4 and block == 3.  In
> that case we need block_offsets[block] to be 3; if it is 5, then we
> will collide with the case where block == 2 for the next virtual core.

Agh! Yes it does.

> > +	int stride = kvm->arch.emul_smt_mode;
> > +	int block = (id / KVM_MAX_VCPUS) * (MAX_SMT_THREADS / stride);
> > +	u32 packed_id;
> > +
> > +	BUG_ON(block >= MAX_SMT_THREADS);
> > +	packed_id = (id % KVM_MAX_VCPUS) + block_offsets[block];
> > +	BUG_ON(packed_id >= KVM_MAX_VCPUS);
> > +	return packed_id;
> > +}
> > +
> >  #endif /* __ASM_KVM_BOOK3S_H__ */
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index de686b340f4a..363c2fb0d89e 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -1816,7 +1816,7 @@ static int threads_per_vcore(struct kvm *kvm)
> >  	return threads_per_subcore;
> >  }
> >  
> > -static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
> > +static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int id)
> >  {
> >  	struct kvmppc_vcore *vcore;
> >  
> > @@ -1830,7 +1830,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
> >  	init_swait_queue_head(&vcore->wq);
> >  	vcore->preempt_tb = TB_NIL;
> >  	vcore->lpcr = kvm->arch.lpcr;
> > -	vcore->first_vcpuid = core * kvm->arch.smt_mode;
> > +	vcore->first_vcpuid = id;
> >  	vcore->kvm = kvm;
> >  	INIT_LIST_HEAD(&vcore->preempt_list);
> >  
> > @@ -2048,12 +2048,18 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
> >  	mutex_lock(&kvm->lock);
> >  	vcore = NULL;
> >  	err = -EINVAL;
> > -	core = id / kvm->arch.smt_mode;
> > +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > +		BUG_ON(kvm->arch.smt_mode != 1);
> > +		core = kvmppc_pack_vcpu_id(kvm, id);
> 
> We now have a way for userspace to trigger a BUG_ON, as far as I can
> see.  The only check on id up to this point is that it is less than
> KVM_MAX_VCPU_ID, which means that the BUG_ON(block >= MAX_SMT_THREADS)
> can be triggered, if kvm->arch.emul_smt_mode < MAX_SMT_THREADS, by
> giving an id that is greater than or equal to KVM_MAX_VCPUS *
> kvm->arch.emul_smt+mode.
> 
> > +	} else {
> > +		core = id / kvm->arch.smt_mode;
> > +	}
> >  	if (core < KVM_MAX_VCORES) {
> >  		vcore = kvm->arch.vcores[core];
> > +		BUG_ON(cpu_has_feature(CPU_FTR_ARCH_300) && vcore);
> 
> Doesn't this just mean that userspace has chosen an id big enough to
> cause a collision in the output space of kvmppc_pack_vcpu_id()?  How
> is this not user-triggerable?
> 
> Paul.

Yep, good point. Particularly when dealing with a malicious userspace
that won't follow QEMU's allocation pattern.

I'll re-work it and re-post. I'll discuss the changes in the next
version.

Thanks for the review!

Sam.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox