* Re: [PATCH v5 1/6] iommu: add generic boot option iommu.dma_mode
From: Leizhen (ThunderTown) @ 2019-04-23 2:45 UTC (permalink / raw)
To: Joerg Roedel
Cc: linux-ia64, Sebastian Ott, linux-doc, Hanjun Guo, Heiko Carstens,
Paul Mackerras, H . Peter Anvin, linux-s390, Jonathan Corbet,
Jean-Philippe Brucker, x86, Ingo Molnar, Fenghua Yu, Will Deacon,
John Garry, linuxppc-dev, Borislav Petkov, Thomas Gleixner,
Gerald Schaefer, Tony Luck, David Woodhouse, linux-kernel, iommu,
Martin Schwidefsky, Robin Murphy
In-Reply-To: <20190412111649.GK4518@8bytes.org>
On 2019/4/12 19:16, Joerg Roedel wrote:
> On Tue, Apr 09, 2019 at 08:53:03PM +0800, Zhen Lei wrote:
>> +static int __init iommu_dma_mode_setup(char *str)
>> +{
>> + if (!str)
>> + goto fail;
>> +
>> + if (!strncmp(str, "passthrough", 11))
>> + iommu_default_dma_mode = IOMMU_DMA_MODE_PASSTHROUGH;
>> + else if (!strncmp(str, "lazy", 4))
>> + iommu_default_dma_mode = IOMMU_DMA_MODE_LAZY;
>> + else if (!strncmp(str, "strict", 6))
>> + iommu_default_dma_mode = IOMMU_DMA_MODE_STRICT;
>> + else
>> + goto fail;
>> +
>> + pr_info("Force dma mode to be %d\n", iommu_default_dma_mode);
>
> Printing a number is not very desriptive or helpful to the user. Please
> print the name of the mode instead.
OK, thanks. I have given up adding iommu.dma_mode boot option according
to Robin and Will's suggestion. So these codes will be removed in v6.
>
>
> Regards,
>
> Joerg
>
> .
>
--
Thanks!
BestRegards
^ permalink raw reply
* Re: [PATCH kernel RFC 0/2] powerpc/ioda2: An attempt to allow DMA masks between 32 and 59
From: Russell Currey @ 2019-04-23 0:58 UTC (permalink / raw)
To: Alexey Kardashevskiy, linuxppc-dev
Cc: Alistair Popple, Oliver O'Halloran, David Gibson
In-Reply-To: <20190412064408.85399-1-aik@ozlabs.ru>
On Fri, 2019-04-12 at 16:44 +1000, Alexey Kardashevskiy wrote:
> This is an attempt to allow DMA mask 40 or similar which are not
> large
> enough to use either a PHB3 bypass mode or a sketchy bypass.
>
> This is based on sha1
> 582549e3fbe1 Linus Torvalds Merge tag 'for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
>
> Please comment. Thanks.
Seems to uniformly fail on POWER8 and sometimes on P9 too.
On P8:
[ 2.423206] Failed to allocate a TCE memory, level shift=26
[ 2.423351] pci 0001:03 : [PE# fd] Failed to create 32-bit TCE
table, err -12
On P9:
[ 0.303055] pci 0003:01 : [PE# 1fd] Setting up 32-bit TCE table
at 0..80000000
[ 0.303119] Failed to allocate a TCE memory, level shift=30
[ 0.303147] pci 0003:01 : [PE# 1fd] Failed to create 32-bit TCE
table, err -12
Is it selecting the wrong TCE size?
> Alexey Kardashevskiy (2):
powerpc/powernv/ioda: Allocate TCE table
> levels on demand for default
DMA window
powerpc/powernv/ioda2:
> Create bigger default window with 64k IOMMU
pages
arch/powerpc/include/asm/iommu.h | 8 ++-
arch/powerpc/platforms/powernv/pci.h | 2 +-
arch/powerpc/kernel/iommu.c | 58 +++++++++++++--
----
arch/powerpc/platforms/powernv/pci-ioda-tce.c | 19 +++---
arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++++-
5 files changed, 66 insertions(+), 35 deletions(-)
^ permalink raw reply
* Re: [PATCH v12 23/31] mm: don't do swap readahead during speculative page fault
From: Jerome Glisse @ 2019-04-22 21:36 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, Vinayak Menon, vinayak menon, akpm, Tim Chen,
haren
In-Reply-To: <20190416134522.17540-24-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:14PM +0200, Laurent Dufour wrote:
> Vinayak Menon faced a panic because one thread was page faulting a page in
> swap, while another one was mprotecting a part of the VMA leading to a VMA
> split.
> This raise a panic in swap_vma_readahead() because the VMA's boundaries
> were not more matching the faulting address.
>
> To avoid this, if the page is not found in the swap, the speculative page
> fault is aborted to retry a regular page fault.
>
> Reported-by: Vinayak Menon <vinmenon@codeaurora.org>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Note that you should also skip non swap entry in do_swap_page() when doing
speculative page fault at very least you need to is_device_private_entry()
case.
But this should either be part of patch 22 or another patch to fix swap
case.
> ---
> mm/memory.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 6e6bf61c0e5c..1991da97e2db 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2900,6 +2900,17 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> lru_cache_add_anon(page);
> swap_readpage(page, true);
> }
> + } else if (vmf->flags & FAULT_FLAG_SPECULATIVE) {
> + /*
> + * Don't try readahead during a speculative page fault
> + * as the VMA's boundaries may change in our back.
> + * If the page is not in the swap cache and synchronous
> + * read is disabled, fall back to the regular page
> + * fault mechanism.
> + */
> + delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
> + ret = VM_FAULT_RETRY;
> + goto out;
> } else {
> page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
> vmf);
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v12 00/31] Speculative page faults
From: Michel Lespinasse @ 2019-04-22 21:29 UTC (permalink / raw)
To: Laurent Dufour
Cc: Jan Kara, sergey.senozhatsky.work, Peter Zijlstra, Will Deacon,
Michal Hocko, linux-mm, Paul Mackerras, Punit Agrawal,
H. Peter Anvin, Mike Rapoport, Alexei Starovoitov,
Andrea Arcangeli, Andi Kleen, Minchan Kim, aneesh.kumar, x86,
Matthew Wilcox, Daniel Jordan, Ingo Molnar, David Rientjes,
Paul E. McKenney, Haiyan Song, Nick Piggin, sj38.park,
Jerome Glisse, dave, kemi.wang, Kirill A. Shutemov,
Thomas Gleixner, zhong jiang, Ganesh Mahendran, Yang Shi,
linuxppc-dev, LKML, Sergey Senozhatsky, vinayak menon,
Andrew Morton, Tim Chen, haren
In-Reply-To: <20190416134522.17540-1-ldufour@linux.ibm.com>
Hi Laurent,
Thanks a lot for copying me on this patchset. It took me a few days to
go through it - I had not been following the previous iterations of
this series so I had to catch up. I will be sending comments for
individual commits, but before tat I would like to discuss the series
as a whole.
I think these changes are a big step in the right direction. My main
reservation about them is that they are additive - adding some complexity
for speculative page faults - and I wonder if it'd be possible, over the
long term, to replace the existing complexity we have in mmap_sem retry
mechanisms instead of adding to it. This is not something that should
block your progress, but I think it would be good, as we introduce spf,
to evaluate whether we could eventually get all the way to removing the
mmap_sem retry mechanism, or if we will actually have to keep both.
The proposed spf mechanism only handles anon vmas. Is there a
fundamental reason why it couldn't handle mapped files too ?
My understanding is that the mechanism of verifying the vma after
taking back the ptl at the end of the fault would work there too ?
The file has to stay referenced during the fault, but holding the vma's
refcount could be made to cover that ? the vm_file refcount would have
to be released in __free_vma() instead of remove_vma; I'm not quite sure
if that has more implications than I realize ?
The proposed spf mechanism only works at the pte level after the page
tables have already been created. The non-spf page fault path takes the
mm->page_table_lock to protect against concurrent page table allocation
by multiple page faults; I think unmapping/freeing page tables could
be done under mm->page_table_lock too so that spf could implement
allocating new page tables by verifying the vma after taking the
mm->page_table_lock ?
The proposed spf mechanism depends on ARCH_HAS_PTE_SPECIAL.
I am not sure what is the issue there - is this due to the vma->vm_start
and vma->vm_pgoff reads in *__vm_normal_page() ?
My last potential concern is about performance. The numbers you have
look great, but I worry about potential regressions in PF performance
for threaded processes that don't currently encounter contention
(i.e. there may be just one thread actually doing all the work while
the others are blocked). I think one good proxy for measuring that
would be to measure a single threaded workload - kernbench would be
fine - without the special-case optimization in patch 22 where
handle_speculative_fault() immediately aborts in the single-threaded case.
Reviewed-by: Michel Lespinasse <walken@google.com>
This is for the series as a whole; I expect to do another review pass on
individual commits in the series when we have agreement on the toplevel
stuff (I noticed a few things like out-of-date commit messages but that's
really minor stuff).
I want to add a note about mmap_sem. In the past there has been
discussions about replacing it with an interval lock, but these never
went anywhere because, mostly, of the fact that such mechanisms were
too expensive to use in the page fault path. I think adding the spf
mechanism would invite us to revisit this issue - interval locks may
be a great way to avoid blocking between unrelated mmap_sem writers
(for example, do not delay stack creation for new threads while a
large mmap or munmap may be going on), and probably also to handle
mmap_sem readers that can't easily use the spf mechanism (for example,
gup callers which make use of the returned vmas). But again that is a
separate topic to explore which doesn't have to get resolved before
spf goes in.
^ permalink raw reply
* Re: [PATCH v12 22/31] mm: provide speculative fault infrastructure
From: Jerome Glisse @ 2019-04-22 21:26 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190416134522.17540-23-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:13PM +0200, Laurent Dufour wrote:
> From: Peter Zijlstra <peterz@infradead.org>
>
> Provide infrastructure to do a speculative fault (not holding
> mmap_sem).
>
> The not holding of mmap_sem means we can race against VMA
> change/removal and page-table destruction. We use the SRCU VMA freeing
> to keep the VMA around. We use the VMA seqcount to detect change
> (including umapping / page-table deletion) and we use gup_fast() style
> page-table walking to deal with page-table races.
>
> Once we've obtained the page and are ready to update the PTE, we
> validate if the state we started the fault with is still valid, if
> not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the
> PTE and we're done.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> [Manage the newly introduced pte_spinlock() for speculative page
> fault to fail if the VMA is touched in our back]
> [Rename vma_is_dead() to vma_has_changed() and declare it here]
> [Fetch p4d and pud]
> [Set vmd.sequence in __handle_mm_fault()]
> [Abort speculative path when handle_userfault() has to be called]
> [Add additional VMA's flags checks in handle_speculative_fault()]
> [Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()]
> [Don't set vmf->pte and vmf->ptl if pte_map_lock() failed]
> [Remove warning comment about waiting for !seq&1 since we don't want
> to wait]
> [Remove warning about no huge page support, mention it explictly]
> [Don't call do_fault() in the speculative path as __do_fault() calls
> vma->vm_ops->fault() which may want to release mmap_sem]
> [Only vm_fault pointer argument for vma_has_changed()]
> [Fix check against huge page, calling pmd_trans_huge()]
> [Use READ_ONCE() when reading VMA's fields in the speculative path]
> [Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for
> processing done in vm_normal_page()]
> [Check that vma->anon_vma is already set when starting the speculative
> path]
> [Check for memory policy as we can't support MPOL_INTERLEAVE case due to
> the processing done in mpol_misplaced()]
> [Don't support VMA growing up or down]
> [Move check on vm_sequence just before calling handle_pte_fault()]
> [Don't build SPF services if !CONFIG_SPECULATIVE_PAGE_FAULT]
> [Add mem cgroup oom check]
> [Use READ_ONCE to access p*d entries]
> [Replace deprecated ACCESS_ONCE() by READ_ONCE() in vma_has_changed()]
> [Don't fetch pte again in handle_pte_fault() when running the speculative
> path]
> [Check PMD against concurrent collapsing operation]
> [Try spin lock the pte during the speculative path to avoid deadlock with
> other CPU's invalidating the TLB and requiring this CPU to catch the
> inter processor's interrupt]
> [Move define of FAULT_FLAG_SPECULATIVE here]
> [Introduce __handle_speculative_fault() and add a check against
> mm->mm_users in handle_speculative_fault() defined in mm.h]
> [Abort if vm_ops->fault is set instead of checking only vm_ops]
> [Use find_vma_rcu() and call put_vma() when we are done with the VMA]
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Few comments and questions for this one see below.
> ---
> include/linux/hugetlb_inline.h | 2 +-
> include/linux/mm.h | 30 +++
> include/linux/pagemap.h | 4 +-
> mm/internal.h | 15 ++
> mm/memory.c | 344 ++++++++++++++++++++++++++++++++-
> 5 files changed, 389 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h
> index 0660a03d37d9..9e25283d6fc9 100644
> --- a/include/linux/hugetlb_inline.h
> +++ b/include/linux/hugetlb_inline.h
> @@ -8,7 +8,7 @@
>
> static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma)
> {
> - return !!(vma->vm_flags & VM_HUGETLB);
> + return !!(READ_ONCE(vma->vm_flags) & VM_HUGETLB);
> }
>
> #else
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f761a9c65c74..ec609cbad25a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -381,6 +381,7 @@ extern pgprot_t protection_map[16];
> #define FAULT_FLAG_USER 0x40 /* The fault originated in userspace */
> #define FAULT_FLAG_REMOTE 0x80 /* faulting for non current tsk/mm */
> #define FAULT_FLAG_INSTRUCTION 0x100 /* The fault was during an instruction fetch */
> +#define FAULT_FLAG_SPECULATIVE 0x200 /* Speculative fault, not holding mmap_sem */
>
> #define FAULT_FLAG_TRACE \
> { FAULT_FLAG_WRITE, "WRITE" }, \
> @@ -409,6 +410,10 @@ struct vm_fault {
> gfp_t gfp_mask; /* gfp mask to be used for allocations */
> pgoff_t pgoff; /* Logical page offset based on vma */
> unsigned long address; /* Faulting virtual address */
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> + unsigned int sequence;
> + pmd_t orig_pmd; /* value of PMD at the time of fault */
> +#endif
> pmd_t *pmd; /* Pointer to pmd entry matching
> * the 'address' */
> pud_t *pud; /* Pointer to pud entry matching
> @@ -1524,6 +1529,31 @@ int invalidate_inode_page(struct page *page);
> #ifdef CONFIG_MMU
> extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
> unsigned long address, unsigned int flags);
> +
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +extern vm_fault_t __handle_speculative_fault(struct mm_struct *mm,
> + unsigned long address,
> + unsigned int flags);
> +static inline vm_fault_t handle_speculative_fault(struct mm_struct *mm,
> + unsigned long address,
> + unsigned int flags)
> +{
> + /*
> + * Try speculative page fault for multithreaded user space task only.
> + */
> + if (!(flags & FAULT_FLAG_USER) || atomic_read(&mm->mm_users) == 1)
> + return VM_FAULT_RETRY;
> + return __handle_speculative_fault(mm, address, flags);
> +}
> +#else
> +static inline vm_fault_t handle_speculative_fault(struct mm_struct *mm,
> + unsigned long address,
> + unsigned int flags)
> +{
> + return VM_FAULT_RETRY;
> +}
> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
> +
> extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
> unsigned long address, unsigned int fault_flags,
> bool *unlocked);
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 2e8438a1216a..2fcfaa910007 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -457,8 +457,8 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
> pgoff_t pgoff;
> if (unlikely(is_vm_hugetlb_page(vma)))
> return linear_hugepage_index(vma, address);
> - pgoff = (address - vma->vm_start) >> PAGE_SHIFT;
> - pgoff += vma->vm_pgoff;
> + pgoff = (address - READ_ONCE(vma->vm_start)) >> PAGE_SHIFT;
> + pgoff += READ_ONCE(vma->vm_pgoff);
> return pgoff;
> }
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 1e368e4afe3c..ed91b199cb8c 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -58,6 +58,21 @@ static inline void put_vma(struct vm_area_struct *vma)
> extern struct vm_area_struct *find_vma_rcu(struct mm_struct *mm,
> unsigned long addr);
>
> +
> +static inline bool vma_has_changed(struct vm_fault *vmf)
> +{
> + int ret = RB_EMPTY_NODE(&vmf->vma->vm_rb);
> + unsigned int seq = READ_ONCE(vmf->vma->vm_sequence.sequence);
> +
> + /*
> + * Matches both the wmb in write_seqlock_{begin,end}() and
> + * the wmb in vma_rb_erase().
> + */
> + smp_rmb();
> +
> + return ret || seq != vmf->sequence;
> +}
> +
> #else /* CONFIG_SPECULATIVE_PAGE_FAULT */
>
> static inline void get_vma(struct vm_area_struct *vma)
> diff --git a/mm/memory.c b/mm/memory.c
> index 46f877b6abea..6e6bf61c0e5c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -522,7 +522,8 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
> if (page)
> dump_page(page, "bad pte");
> pr_alert("addr:%p vm_flags:%08lx anon_vma:%p mapping:%p index:%lx\n",
> - (void *)addr, vma->vm_flags, vma->anon_vma, mapping, index);
> + (void *)addr, READ_ONCE(vma->vm_flags), vma->anon_vma,
> + mapping, index);
> pr_alert("file:%pD fault:%pf mmap:%pf readpage:%pf\n",
> vma->vm_file,
> vma->vm_ops ? vma->vm_ops->fault : NULL,
> @@ -2082,6 +2083,118 @@ int apply_to_page_range(struct mm_struct *mm, unsigned long addr,
> }
> EXPORT_SYMBOL_GPL(apply_to_page_range);
>
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +static bool pte_spinlock(struct vm_fault *vmf)
> +{
> + bool ret = false;
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + pmd_t pmdval;
> +#endif
> +
> + /* Check if vma is still valid */
> + if (!(vmf->flags & FAULT_FLAG_SPECULATIVE)) {
> + vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> + spin_lock(vmf->ptl);
> + return true;
> + }
> +
> +again:
> + local_irq_disable();
> + if (vma_has_changed(vmf))
> + goto out;
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + /*
> + * We check if the pmd value is still the same to ensure that there
> + * is not a huge collapse operation in progress in our back.
> + */
> + pmdval = READ_ONCE(*vmf->pmd);
> + if (!pmd_same(pmdval, vmf->orig_pmd))
> + goto out;
> +#endif
> +
> + vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> + if (unlikely(!spin_trylock(vmf->ptl))) {
> + local_irq_enable();
> + goto again;
> + }
Do we want to constantly retry taking the spinlock ? Shouldn't it
be limited ? If we fail few times it is probably better to give
up on that speculative page fault.
So maybe putting everything within a for(i; i < MAX_TRY; ++i) loop
would be cleaner.
> +
> + if (vma_has_changed(vmf)) {
> + spin_unlock(vmf->ptl);
> + goto out;
> + }
> +
> + ret = true;
> +out:
> + local_irq_enable();
> + return ret;
> +}
> +
> +static bool pte_map_lock(struct vm_fault *vmf)
> +{
> + bool ret = false;
> + pte_t *pte;
> + spinlock_t *ptl;
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + pmd_t pmdval;
> +#endif
> +
> + if (!(vmf->flags & FAULT_FLAG_SPECULATIVE)) {
> + vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
> + vmf->address, &vmf->ptl);
> + return true;
> + }
> +
> + /*
> + * The first vma_has_changed() guarantees the page-tables are still
> + * valid, having IRQs disabled ensures they stay around, hence the
> + * second vma_has_changed() to make sure they are still valid once
> + * we've got the lock. After that a concurrent zap_pte_range() will
> + * block on the PTL and thus we're safe.
> + */
> +again:
> + local_irq_disable();
> + if (vma_has_changed(vmf))
> + goto out;
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + /*
> + * We check if the pmd value is still the same to ensure that there
> + * is not a huge collapse operation in progress in our back.
> + */
> + pmdval = READ_ONCE(*vmf->pmd);
> + if (!pmd_same(pmdval, vmf->orig_pmd))
> + goto out;
> +#endif
> +
> + /*
> + * Same as pte_offset_map_lock() except that we call
> + * spin_trylock() in place of spin_lock() to avoid race with
> + * unmap path which may have the lock and wait for this CPU
> + * to invalidate TLB but this CPU has irq disabled.
> + * Since we are in a speculative patch, accept it could fail
> + */
> + ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> + pte = pte_offset_map(vmf->pmd, vmf->address);
> + if (unlikely(!spin_trylock(ptl))) {
> + pte_unmap(pte);
> + local_irq_enable();
> + goto again;
> + }
Same comment as above shouldn't be limited to a maximum number of retry ?
> +
> + if (vma_has_changed(vmf)) {
> + pte_unmap_unlock(pte, ptl);
> + goto out;
> + }
> +
> + vmf->pte = pte;
> + vmf->ptl = ptl;
> + ret = true;
> +out:
> + local_irq_enable();
> + return ret;
> +}
> +#else
> static inline bool pte_spinlock(struct vm_fault *vmf)
> {
> vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
> @@ -2095,6 +2208,7 @@ static inline bool pte_map_lock(struct vm_fault *vmf)
> vmf->address, &vmf->ptl);
> return true;
> }
> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
>
> /*
> * handle_pte_fault chooses page fault handler according to an entry which was
> @@ -2999,6 +3113,14 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> ret = check_stable_address_space(vma->vm_mm);
> if (ret)
> goto unlock;
> + /*
> + * Don't call the userfaultfd during the speculative path.
> + * We already checked for the VMA to not be managed through
> + * userfaultfd, but it may be set in our back once we have lock
> + * the pte. In such a case we can ignore it this time.
> + */
> + if (vmf->flags & FAULT_FLAG_SPECULATIVE)
> + goto setpte;
Bit confuse by the comment above, if userfaultfd is set in the back
then shouldn't the speculative fault abort ? So wouldn't the following
be correct:
if (userfaultfd_missing(vma)) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
if (vmf->flags & FAULT_FLAG_SPECULATIVE)
return VM_FAULT_RETRY;
...
> /* Deliver the page fault to userland, check inside PT lock */
> if (userfaultfd_missing(vma)) {
> pte_unmap_unlock(vmf->pte, vmf->ptl);
> @@ -3041,7 +3163,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> goto unlock_and_release;
>
> /* Deliver the page fault to userland, check inside PT lock */
> - if (userfaultfd_missing(vma)) {
> + if (!(vmf->flags & FAULT_FLAG_SPECULATIVE) &&
> + userfaultfd_missing(vma)) {
Same comment as above but this also seems more wrong then above. What
i propose above would look more correct in both cases ie we still want
to check for userfaultfd but if we are in speculative fault then we
just want to abort the speculative fault.
> pte_unmap_unlock(vmf->pte, vmf->ptl);
> mem_cgroup_cancel_charge(page, memcg, false);
> put_page(page);
> @@ -3836,6 +3959,15 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> pte_t entry;
>
> if (unlikely(pmd_none(*vmf->pmd))) {
> + /*
> + * In the case of the speculative page fault handler we abort
> + * the speculative path immediately as the pmd is probably
> + * in the way to be converted in a huge one. We will try
> + * again holding the mmap_sem (which implies that the collapse
> + * operation is done).
> + */
> + if (vmf->flags & FAULT_FLAG_SPECULATIVE)
> + return VM_FAULT_RETRY;
> /*
> * Leave __pte_alloc() until later: because vm_ops->fault may
> * want to allocate huge page, and if we expose page table
> @@ -3843,7 +3975,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> * concurrent faults and from rmap lookups.
> */
> vmf->pte = NULL;
> - } else {
> + } else if (!(vmf->flags & FAULT_FLAG_SPECULATIVE)) {
> /* See comment in pte_alloc_one_map() */
> if (pmd_devmap_trans_unstable(vmf->pmd))
> return 0;
> @@ -3852,6 +3984,9 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> * pmd from under us anymore at this point because we hold the
> * mmap_sem read mode and khugepaged takes it in write mode.
> * So now it's safe to run pte_offset_map().
> + * This is not applicable to the speculative page fault handler
> + * but in that case, the pte is fetched earlier in
> + * handle_speculative_fault().
> */
> vmf->pte = pte_offset_map(vmf->pmd, vmf->address);
> vmf->orig_pte = *vmf->pte;
> @@ -3874,6 +4009,8 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
> if (!vmf->pte) {
> if (vma_is_anonymous(vmf->vma))
> return do_anonymous_page(vmf);
> + else if (vmf->flags & FAULT_FLAG_SPECULATIVE)
> + return VM_FAULT_RETRY;
Maybe a small comment about speculative page fault not applying to
file back vma.
> else
> return do_fault(vmf);
> }
> @@ -3971,6 +4108,9 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> vmf.pmd = pmd_alloc(mm, vmf.pud, address);
> if (!vmf.pmd)
> return VM_FAULT_OOM;
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> + vmf.sequence = raw_read_seqcount(&vma->vm_sequence);
> +#endif
> if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) {
> ret = create_huge_pmd(&vmf);
> if (!(ret & VM_FAULT_FALLBACK))
> @@ -4004,6 +4144,204 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> return handle_pte_fault(&vmf);
> }
>
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +/*
> + * Tries to handle the page fault in a speculative way, without grabbing the
> + * mmap_sem.
> + */
> +vm_fault_t __handle_speculative_fault(struct mm_struct *mm,
> + unsigned long address,
> + unsigned int flags)
> +{
> + struct vm_fault vmf = {
> + .address = address,
> + };
> + pgd_t *pgd, pgdval;
> + p4d_t *p4d, p4dval;
> + pud_t pudval;
> + int seq;
> + vm_fault_t ret = VM_FAULT_RETRY;
> + struct vm_area_struct *vma;
> +#ifdef CONFIG_NUMA
> + struct mempolicy *pol;
> +#endif
> +
> + /* Clear flags that may lead to release the mmap_sem to retry */
> + flags &= ~(FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_KILLABLE);
> + flags |= FAULT_FLAG_SPECULATIVE;
> +
> + vma = find_vma_rcu(mm, address);
> + if (!vma)
> + return ret;
> +
> + /* rmb <-> seqlock,vma_rb_erase() */
> + seq = raw_read_seqcount(&vma->vm_sequence);
> + if (seq & 1)
> + goto out_put;
A comment explaining that odd sequence number means that we are racing
with a write_begin and write_end would be welcome above.
> +
> + /*
> + * Can't call vm_ops service has we don't know what they would do
> + * with the VMA.
> + * This include huge page from hugetlbfs.
> + */
> + if (vma->vm_ops && vma->vm_ops->fault)
> + goto out_put;
> +
> + /*
> + * __anon_vma_prepare() requires the mmap_sem to be held
> + * because vm_next and vm_prev must be safe. This can't be guaranteed
> + * in the speculative path.
> + */
> + if (unlikely(!vma->anon_vma))
> + goto out_put;
Maybe also remind people that once the vma->anon_vma is set then its
value will not change and thus we do not need to protect against such
thing (unlike vm_flags or other vma field below and above).
> +
> + vmf.vma_flags = READ_ONCE(vma->vm_flags);
> + vmf.vma_page_prot = READ_ONCE(vma->vm_page_prot);
> +
> + /* Can't call userland page fault handler in the speculative path */
> + if (unlikely(vmf.vma_flags & VM_UFFD_MISSING))
> + goto out_put;
> +
> + if (vmf.vma_flags & VM_GROWSDOWN || vmf.vma_flags & VM_GROWSUP)
> + /*
> + * This could be detected by the check address against VMA's
> + * boundaries but we want to trace it as not supported instead
> + * of changed.
> + */
> + goto out_put;
> +
> + if (address < READ_ONCE(vma->vm_start)
> + || READ_ONCE(vma->vm_end) <= address)
> + goto out_put;
> +
> + if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,
> + flags & FAULT_FLAG_INSTRUCTION,
> + flags & FAULT_FLAG_REMOTE)) {
> + ret = VM_FAULT_SIGSEGV;
> + goto out_put;
> + }
> +
> + /* This is one is required to check that the VMA has write access set */
> + if (flags & FAULT_FLAG_WRITE) {
> + if (unlikely(!(vmf.vma_flags & VM_WRITE))) {
> + ret = VM_FAULT_SIGSEGV;
> + goto out_put;
> + }
> + } else if (unlikely(!(vmf.vma_flags & (VM_READ|VM_EXEC|VM_WRITE)))) {
> + ret = VM_FAULT_SIGSEGV;
> + goto out_put;
> + }
> +
> +#ifdef CONFIG_NUMA
> + /*
> + * MPOL_INTERLEAVE implies additional checks in
> + * mpol_misplaced() which are not compatible with the
> + *speculative page fault processing.
> + */
> + pol = __get_vma_policy(vma, address);
> + if (!pol)
> + pol = get_task_policy(current);
> + if (pol && pol->mode == MPOL_INTERLEAVE)
> + goto out_put;
> +#endif
> +
> + /*
> + * Do a speculative lookup of the PTE entry.
> + */
> + local_irq_disable();
> + pgd = pgd_offset(mm, address);
> + pgdval = READ_ONCE(*pgd);
> + if (pgd_none(pgdval) || unlikely(pgd_bad(pgdval)))
> + goto out_walk;
> +
> + p4d = p4d_offset(pgd, address);
> + p4dval = READ_ONCE(*p4d);
> + if (p4d_none(p4dval) || unlikely(p4d_bad(p4dval)))
> + goto out_walk;
> +
> + vmf.pud = pud_offset(p4d, address);
> + pudval = READ_ONCE(*vmf.pud);
> + if (pud_none(pudval) || unlikely(pud_bad(pudval)))
> + goto out_walk;
> +
> + /* Huge pages at PUD level are not supported. */
> + if (unlikely(pud_trans_huge(pudval)))
> + goto out_walk;
> +
> + vmf.pmd = pmd_offset(vmf.pud, address);
> + vmf.orig_pmd = READ_ONCE(*vmf.pmd);
> + /*
> + * pmd_none could mean that a hugepage collapse is in progress
> + * in our back as collapse_huge_page() mark it before
> + * invalidating the pte (which is done once the IPI is catched
> + * by all CPU and we have interrupt disabled).
> + * For this reason we cannot handle THP in a speculative way since we
> + * can't safely identify an in progress collapse operation done in our
> + * back on that PMD.
> + * Regarding the order of the following checks, see comment in
> + * pmd_devmap_trans_unstable()
> + */
> + if (unlikely(pmd_devmap(vmf.orig_pmd) ||
> + pmd_none(vmf.orig_pmd) || pmd_trans_huge(vmf.orig_pmd) ||
> + is_swap_pmd(vmf.orig_pmd)))
> + goto out_walk;
> +
> + /*
> + * The above does not allocate/instantiate page-tables because doing so
> + * would lead to the possibility of instantiating page-tables after
> + * free_pgtables() -- and consequently leaking them.
> + *
> + * The result is that we take at least one !speculative fault per PMD
> + * in order to instantiate it.
> + */
> +
> + vmf.pte = pte_offset_map(vmf.pmd, address);
> + vmf.orig_pte = READ_ONCE(*vmf.pte);
> + barrier(); /* See comment in handle_pte_fault() */
> + if (pte_none(vmf.orig_pte)) {
> + pte_unmap(vmf.pte);
> + vmf.pte = NULL;
> + }
> +
> + vmf.vma = vma;
> + vmf.pgoff = linear_page_index(vma, address);
> + vmf.gfp_mask = __get_fault_gfp_mask(vma);
> + vmf.sequence = seq;
> + vmf.flags = flags;
> +
> + local_irq_enable();
> +
> + /*
> + * We need to re-validate the VMA after checking the bounds, otherwise
> + * we might have a false positive on the bounds.
> + */
> + if (read_seqcount_retry(&vma->vm_sequence, seq))
> + goto out_put;
> +
> + mem_cgroup_enter_user_fault();
> + ret = handle_pte_fault(&vmf);
> + mem_cgroup_exit_user_fault();
> +
> + put_vma(vma);
> +
> + /*
> + * The task may have entered a memcg OOM situation but
> + * if the allocation error was handled gracefully (no
> + * VM_FAULT_OOM), there is no need to kill anything.
> + * Just clean up the OOM state peacefully.
> + */
> + if (task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM))
> + mem_cgroup_oom_synchronize(false);
> + return ret;
> +
> +out_walk:
> + local_irq_enable();
> +out_put:
> + put_vma(vma);
> + return ret;
> +}
> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
> +
> /*
> * By the time we get here, we already hold the mm semaphore
> *
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v12 21/31] mm: Introduce find_vma_rcu()
From: Jerome Glisse @ 2019-04-22 20:57 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190416134522.17540-22-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:12PM +0200, Laurent Dufour wrote:
> This allows to search for a VMA structure without holding the mmap_sem.
>
> The search is repeated while the mm seqlock is changing and until we found
> a valid VMA.
>
> While under the RCU protection, a reference is taken on the VMA, so the
> caller must call put_vma() once it not more need the VMA structure.
>
> At the time a VMA is inserted in the MM RB tree, in vma_rb_insert(), a
> reference is taken to the VMA by calling get_vma().
>
> When removing a VMA from the MM RB tree, the VMA is not release immediately
> but at the end of the RCU grace period through vm_rcu_put(). This ensures
> that the VMA remains allocated until the end the RCU grace period.
>
> Since the vm_file pointer, if valid, is released in put_vma(), there is no
> guarantee that the file pointer will be valid on the returned VMA.
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Minor comments about comment (i love recursion :)) see below.
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> ---
> include/linux/mm_types.h | 1 +
> mm/internal.h | 5 ++-
> mm/mmap.c | 76 ++++++++++++++++++++++++++++++++++++++--
> 3 files changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6a6159e11a3f..9af6694cb95d 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -287,6 +287,7 @@ struct vm_area_struct {
>
> #ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> atomic_t vm_ref_count;
> + struct rcu_head vm_rcu;
> #endif
> struct rb_node vm_rb;
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 302382bed406..1e368e4afe3c 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -55,7 +55,10 @@ static inline void put_vma(struct vm_area_struct *vma)
> __free_vma(vma);
> }
>
> -#else
> +extern struct vm_area_struct *find_vma_rcu(struct mm_struct *mm,
> + unsigned long addr);
> +
> +#else /* CONFIG_SPECULATIVE_PAGE_FAULT */
>
> static inline void get_vma(struct vm_area_struct *vma)
> {
> diff --git a/mm/mmap.c b/mm/mmap.c
> index c106440dcae7..34bf261dc2c8 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -179,6 +179,18 @@ static inline void mm_write_sequnlock(struct mm_struct *mm)
> {
> write_sequnlock(&mm->mm_seq);
> }
> +
> +static void __vm_rcu_put(struct rcu_head *head)
> +{
> + struct vm_area_struct *vma = container_of(head, struct vm_area_struct,
> + vm_rcu);
> + put_vma(vma);
> +}
> +static void vm_rcu_put(struct vm_area_struct *vma)
> +{
> + VM_BUG_ON_VMA(!RB_EMPTY_NODE(&vma->vm_rb), vma);
> + call_rcu(&vma->vm_rcu, __vm_rcu_put);
> +}
> #else
> static inline void mm_write_seqlock(struct mm_struct *mm)
> {
> @@ -190,6 +202,8 @@ static inline void mm_write_sequnlock(struct mm_struct *mm)
>
> void __free_vma(struct vm_area_struct *vma)
> {
> + if (IS_ENABLED(CONFIG_SPECULATIVE_PAGE_FAULT))
> + VM_BUG_ON_VMA(!RB_EMPTY_NODE(&vma->vm_rb), vma);
> mpol_put(vma_policy(vma));
> vm_area_free(vma);
> }
> @@ -197,11 +211,24 @@ void __free_vma(struct vm_area_struct *vma)
> /*
> * Close a vm structure and free it, returning the next.
> */
> -static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
> +static struct vm_area_struct *__remove_vma(struct vm_area_struct *vma)
> {
> struct vm_area_struct *next = vma->vm_next;
>
> might_sleep();
> + if (IS_ENABLED(CONFIG_SPECULATIVE_PAGE_FAULT) &&
> + !RB_EMPTY_NODE(&vma->vm_rb)) {
> + /*
> + * If the VMA is still linked in the RB tree, we must release
> + * that reference by calling put_vma().
> + * This should only happen when called from exit_mmap().
> + * We forcely clear the node to satisfy the chec in
^
Typo: chec -> check
> + * __free_vma(). This is safe since the RB tree is not walked
> + * anymore.
> + */
> + RB_CLEAR_NODE(&vma->vm_rb);
> + put_vma(vma);
> + }
> if (vma->vm_ops && vma->vm_ops->close)
> vma->vm_ops->close(vma);
> if (vma->vm_file)
> @@ -211,6 +238,13 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
> return next;
> }
>
> +static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
> +{
> + if (IS_ENABLED(CONFIG_SPECULATIVE_PAGE_FAULT))
> + VM_BUG_ON_VMA(!RB_EMPTY_NODE(&vma->vm_rb), vma);
Adding a comment here explaining the BUG_ON so people can understand
what is wrong if that happens. For instance:
/*
* remove_vma() should be call only once a vma have been remove from the rbtree
* at which point the vma->vm_rb is an empty node. The exception is when vmas
* are destroy through exit_mmap() in which case we do not bother updating the
* rbtree (see comment in __remove_vma()).
*/
> + return __remove_vma(vma);
> +}
> +
> static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags,
> struct list_head *uf);
> SYSCALL_DEFINE1(brk, unsigned long, brk)
> @@ -475,7 +509,7 @@ static inline void vma_rb_insert(struct vm_area_struct *vma,
>
> /* All rb_subtree_gap values must be consistent prior to insertion */
> validate_mm_rb(root, NULL);
> -
> + get_vma(vma);
> rb_insert_augmented(&vma->vm_rb, root, &vma_gap_callbacks);
> }
>
> @@ -491,6 +525,14 @@ static void __vma_rb_erase(struct vm_area_struct *vma, struct mm_struct *mm)
> mm_write_seqlock(mm);
> rb_erase_augmented(&vma->vm_rb, root, &vma_gap_callbacks);
> mm_write_sequnlock(mm); /* wmb */
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> + /*
> + * Ensure the removal is complete before clearing the node.
> + * Matched by vma_has_changed()/handle_speculative_fault().
> + */
> + RB_CLEAR_NODE(&vma->vm_rb);
> + vm_rcu_put(vma);
> +#endif
> }
>
> static __always_inline void vma_rb_erase_ignore(struct vm_area_struct *vma,
> @@ -2331,6 +2373,34 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
>
> EXPORT_SYMBOL(find_vma);
>
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +/*
> + * Like find_vma() but under the protection of RCU and the mm sequence counter.
> + * The vma returned has to be relaesed by the caller through the call to
> + * put_vma()
> + */
> +struct vm_area_struct *find_vma_rcu(struct mm_struct *mm, unsigned long addr)
> +{
> + struct vm_area_struct *vma = NULL;
> + unsigned int seq;
> +
> + do {
> + if (vma)
> + put_vma(vma);
> +
> + seq = read_seqbegin(&mm->mm_seq);
> +
> + rcu_read_lock();
> + vma = find_vma(mm, addr);
> + if (vma)
> + get_vma(vma);
> + rcu_read_unlock();
> + } while (read_seqretry(&mm->mm_seq, seq));
> +
> + return vma;
> +}
> +#endif
> +
> /*
> * Same as find_vma, but also return a pointer to the previous VMA in *pprev.
> */
> @@ -3231,7 +3301,7 @@ void exit_mmap(struct mm_struct *mm)
> while (vma) {
> if (vma->vm_flags & VM_ACCOUNT)
> nr_accounted += vma_pages(vma);
> - vma = remove_vma(vma);
> + vma = __remove_vma(vma);
> }
> vm_unacct_memory(nr_accounted);
> }
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH] [v2] x86/mpx: fix recursive munmap() corruption
From: Yang Shi @ 2019-04-22 17:08 UTC (permalink / raw)
To: Dave Hansen, linux-kernel
Cc: linux-arch, hjl.tools, mhocko, rguenther, richard, gxt, jdike,
x86, linux-um, stable, luto, linux-mm, paulus, akpm, linuxppc-dev,
vbabka, anton.ivanov
In-Reply-To: <20190419194747.5E1AD6DC@viggo.jf.intel.com>
On 4/19/19 12:47 PM, Dave Hansen wrote:
> Changes from v1:
> * Fix compile errors on UML and non-x86 arches
> * Clarify commit message and Fixes about the origin of the
> bug and add the impact to powerpc / uml / unicore32
>
> --
>
> This is a bit of a mess, to put it mildly. But, it's a bug
> that only seems to have showed up in 4.20 but wasn't noticed
> until now because nobody uses MPX.
>
> MPX has the arch_unmap() hook inside of munmap() because MPX
> uses bounds tables that protect other areas of memory. When
> memory is unmapped, there is also a need to unmap the MPX
> bounds tables. Barring this, unused bounds tables can eat 80%
> of the address space.
>
> But, the recursive do_munmap() that gets called vi arch_unmap()
> wreaks havoc with __do_munmap()'s state. It can result in
> freeing populated page tables, accessing bogus VMA state,
> double-freed VMAs and more.
>
> To fix this, call arch_unmap() before __do_unmap() has a chance
> to do anything meaningful. Also, remove the 'vma' argument
> and force the MPX code to do its own, independent VMA lookup.
>
> == UML / unicore32 impact ==
>
> Remove unused 'vma' argument to arch_unmap(). No functional
> change.
>
> I compile tested this on UML but not unicore32.
>
> == powerpc impact ==
>
> powerpc uses arch_unmap() well to watch for munmap() on the
> VDSO and zeroes out 'current->mm->context.vdso_base'. Moving
> arch_unmap() makes this happen earlier in __do_munmap(). But,
> 'vdso_base' seems to only be used in perf and in the signal
> delivery that happens near the return to userspace. I can not
> find any likely impact to powerpc, other than the zeroing
> happening a little earlier.
>
> powerpc does not use the 'vma' argument and is unaffected by
> its removal.
>
> I compile-tested a 64-bit powerpc defconfig.
>
> == x86 impact ==
>
> For the common success case this is functionally identical to
> what was there before. For the munmap() failure case, it's
> possible that some MPX tables will be zapped for memory that
> continues to be in use. But, this is an extraordinarily
> unlikely scenario and the harm would be that MPX provides no
> protection since the bounds table got reset (zeroed).
>
> I can't imagine anyone doing this:
>
> ptr = mmap();
> // use ptr
> ret = munmap(ptr);
> if (ret)
> // oh, there was an error, I'll
> // keep using ptr.
>
> Because if you're doing munmap(), you are *done* with the
> memory. There's probably no good data in there _anyway_.
>
> This passes the original reproducer from Richard Biener as
> well as the existing mpx selftests/.
>
> ====
>
> The long story:
>
> munmap() has a couple of pieces:
> 1. Find the affected VMA(s)
> 2. Split the start/end one(s) if neceesary
> 3. Pull the VMAs out of the rbtree
> 4. Actually zap the memory via unmap_region(), including
> freeing page tables (or queueing them to be freed).
> 5. Fixup some of the accounting (like fput()) and actually
> free the VMA itself.
>
> This specific ordering was actually introduced by:
>
> dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
>
> during the 4.20 merge window. The previous __do_munmap() code
> was actually safe because the only thing after arch_unmap() was
> remove_vma_list(). arch_unmap() could not see 'vma' in the
> rbtree because it was detached, so it is not even capable of
> doing operations unsafe for remove_vma_list()'s use of 'vma'.
>
> Richard Biener reported a test that shows this in dmesg:
>
> [1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551
> [1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576
>
> What triggered this was the recursive do_munmap() called via
> arch_unmap(). It was freeing page tables that has not been
> properly zapped.
>
> But, the problem was bigger than this. For one, arch_unmap()
> can free VMAs. But, the calling __do_munmap() has variables
> that *point* to VMAs and obviously can't handle them just
> getting freed while the pointer is still in use.
>
> I tried a couple of things here. First, I tried to fix the page
> table freeing problem in isolation, but I then found the VMA
> issue. I also tried having the MPX code return a flag if it
> modified the rbtree which would force __do_munmap() to re-walk
> to restart. That spiralled out of control in complexity pretty
> fast.
>
> Just moving arch_unmap() and accepting that the bonkers failure
> case might eat some bounds tables seems like the simplest viable
> fix.
>
> This was also reported in the following kernel bugzilla entry:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=203123
>
> There are some reports that dd2283f2605 ("mm: mmap: zap pages
> with read mmap_sem in munmap") triggered this issue. While that
> commit certainly made the issues easier to hit, I belive the
> fundamental issue has been with us as long as MPX itself, thus
> the Fixes: tag below is for one of the original MPX commits.
>
> Reported-by: Richard Biener <rguenther@suse.de>
> Reported-by: H.J. Lu <hjl.tools@gmail.com>
> Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
> Cc: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: x86@kernel.org
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: stable@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-um@lists.infradead.org
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: linux-arch@vger.kernel.org
> Cc: Guan Xuetao <gxt@pku.edu.cn>
> Cc: Jeff Dike <jdike@addtoit.com>
> Cc: Richard Weinberger <richard@nod.at>
> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
>
> ---
>
> b/arch/powerpc/include/asm/mmu_context.h | 1 -
> b/arch/um/include/asm/mmu_context.h | 1 -
> b/arch/unicore32/include/asm/mmu_context.h | 1 -
> b/arch/x86/include/asm/mmu_context.h | 6 +++---
> b/arch/x86/include/asm/mpx.h | 5 ++---
> b/arch/x86/mm/mpx.c | 10 ++++++----
> b/include/asm-generic/mm_hooks.h | 1 -
> b/mm/mmap.c | 15 ++++++++-------
> 8 files changed, 19 insertions(+), 21 deletions(-)
>
> diff -puN mm/mmap.c~mpx-rss-pass-no-vma mm/mmap.c
> --- a/mm/mmap.c~mpx-rss-pass-no-vma 2019-04-19 09:31:09.851509404 -0700
> +++ b/mm/mmap.c 2019-04-19 09:31:09.864509404 -0700
> @@ -2730,9 +2730,17 @@ int __do_munmap(struct mm_struct *mm, un
> return -EINVAL;
>
> len = PAGE_ALIGN(len);
> + end = start + len;
> if (len == 0)
> return -EINVAL;
>
> + /*
> + * arch_unmap() might do unmaps itself. It must be called
> + * and finish any rbtree manipulation before this code
> + * runs and also starts to manipulate the rbtree.
> + */
> + arch_unmap(mm, start, end);
> +
> /* Find the first overlapping VMA */
> vma = find_vma(mm, start);
> if (!vma)
> @@ -2741,7 +2749,6 @@ int __do_munmap(struct mm_struct *mm, un
> /* we have start < vma->vm_end */
>
> /* if it doesn't overlap, we have nothing.. */
> - end = start + len;
> if (vma->vm_start >= end)
> return 0;
>
> @@ -2811,12 +2818,6 @@ int __do_munmap(struct mm_struct *mm, un
> /* Detach vmas from rbtree */
> detach_vmas_to_be_unmapped(mm, vma, prev, end);
>
> - /*
> - * mpx unmap needs to be called with mmap_sem held for write.
> - * It is safe to call it before unmap_region().
> - */
> - arch_unmap(mm, vma, start, end);
> -
> if (downgrade)
> downgrade_write(&mm->mmap_sem);
Thanks for debugging this. The change looks good to me. Reviewed-by:
Yang Shi <yang.shi@linux.alibaba.com>
>
> diff -puN arch/x86/include/asm/mmu_context.h~mpx-rss-pass-no-vma arch/x86/include/asm/mmu_context.h
> --- a/arch/x86/include/asm/mmu_context.h~mpx-rss-pass-no-vma 2019-04-19 09:31:09.853509404 -0700
> +++ b/arch/x86/include/asm/mmu_context.h 2019-04-19 09:31:09.865509404 -0700
> @@ -277,8 +277,8 @@ static inline void arch_bprm_mm_init(str
> mpx_mm_init(mm);
> }
>
> -static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
> - unsigned long start, unsigned long end)
> +static inline void arch_unmap(struct mm_struct *mm, unsigned long start,
> + unsigned long end)
> {
> /*
> * mpx_notify_unmap() goes and reads a rarely-hot
> @@ -298,7 +298,7 @@ static inline void arch_unmap(struct mm_
> * consistently wrong.
> */
> if (unlikely(cpu_feature_enabled(X86_FEATURE_MPX)))
> - mpx_notify_unmap(mm, vma, start, end);
> + mpx_notify_unmap(mm, start, end);
> }
>
> /*
> diff -puN include/asm-generic/mm_hooks.h~mpx-rss-pass-no-vma include/asm-generic/mm_hooks.h
> --- a/include/asm-generic/mm_hooks.h~mpx-rss-pass-no-vma 2019-04-19 09:31:09.856509404 -0700
> +++ b/include/asm-generic/mm_hooks.h 2019-04-19 09:31:09.865509404 -0700
> @@ -18,7 +18,6 @@ static inline void arch_exit_mmap(struct
> }
>
> static inline void arch_unmap(struct mm_struct *mm,
> - struct vm_area_struct *vma,
> unsigned long start, unsigned long end)
> {
> }
> diff -puN arch/x86/mm/mpx.c~mpx-rss-pass-no-vma arch/x86/mm/mpx.c
> --- a/arch/x86/mm/mpx.c~mpx-rss-pass-no-vma 2019-04-19 09:31:09.858509404 -0700
> +++ b/arch/x86/mm/mpx.c 2019-04-19 09:31:09.866509404 -0700
> @@ -881,9 +881,10 @@ static int mpx_unmap_tables(struct mm_st
> * the virtual address region start...end have already been split if
> * necessary, and the 'vma' is the first vma in this range (start -> end).
> */
> -void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
> - unsigned long start, unsigned long end)
> +void mpx_notify_unmap(struct mm_struct *mm, unsigned long start,
> + unsigned long end)
> {
> + struct vm_area_struct *vma;
> int ret;
>
> /*
> @@ -902,11 +903,12 @@ void mpx_notify_unmap(struct mm_struct *
> * which should not occur normally. Being strict about it here
> * helps ensure that we do not have an exploitable stack overflow.
> */
> - do {
> + vma = find_vma(mm, start);
> + while (vma && vma->vm_start < end) {
> if (vma->vm_flags & VM_MPX)
> return;
> vma = vma->vm_next;
> - } while (vma && vma->vm_start < end);
> + }
>
> ret = mpx_unmap_tables(mm, start, end);
> if (ret)
> diff -puN arch/x86/include/asm/mpx.h~mpx-rss-pass-no-vma arch/x86/include/asm/mpx.h
> --- a/arch/x86/include/asm/mpx.h~mpx-rss-pass-no-vma 2019-04-19 09:31:09.860509404 -0700
> +++ b/arch/x86/include/asm/mpx.h 2019-04-19 09:31:09.866509404 -0700
> @@ -78,8 +78,8 @@ static inline void mpx_mm_init(struct mm
> */
> mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
> }
> -void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
> - unsigned long start, unsigned long end);
> +void mpx_notify_unmap(struct mm_struct *mm, unsigned long start,
> + unsigned long end);
>
> unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len,
> unsigned long flags);
> @@ -100,7 +100,6 @@ static inline void mpx_mm_init(struct mm
> {
> }
> static inline void mpx_notify_unmap(struct mm_struct *mm,
> - struct vm_area_struct *vma,
> unsigned long start, unsigned long end)
> {
> }
> diff -puN arch/um/include/asm/mmu_context.h~mpx-rss-pass-no-vma arch/um/include/asm/mmu_context.h
> --- a/arch/um/include/asm/mmu_context.h~mpx-rss-pass-no-vma 2019-04-19 09:42:05.789507768 -0700
> +++ b/arch/um/include/asm/mmu_context.h 2019-04-19 09:42:57.962507638 -0700
> @@ -22,7 +22,6 @@ static inline int arch_dup_mmap(struct m
> }
> extern void arch_exit_mmap(struct mm_struct *mm);
> static inline void arch_unmap(struct mm_struct *mm,
> - struct vm_area_struct *vma,
> unsigned long start, unsigned long end)
> {
> }
> diff -puN arch/unicore32/include/asm/mmu_context.h~mpx-rss-pass-no-vma arch/unicore32/include/asm/mmu_context.h
> --- a/arch/unicore32/include/asm/mmu_context.h~mpx-rss-pass-no-vma 2019-04-19 09:42:06.189507767 -0700
> +++ b/arch/unicore32/include/asm/mmu_context.h 2019-04-19 09:43:25.425507569 -0700
> @@ -88,7 +88,6 @@ static inline int arch_dup_mmap(struct m
> }
>
> static inline void arch_unmap(struct mm_struct *mm,
> - struct vm_area_struct *vma,
> unsigned long start, unsigned long end)
> {
> }
> diff -puN arch/powerpc/include/asm/mmu_context.h~mpx-rss-pass-no-vma arch/powerpc/include/asm/mmu_context.h
> --- a/arch/powerpc/include/asm/mmu_context.h~mpx-rss-pass-no-vma 2019-04-19 09:42:06.388507766 -0700
> +++ b/arch/powerpc/include/asm/mmu_context.h 2019-04-19 09:43:27.392507564 -0700
> @@ -237,7 +237,6 @@ extern void arch_exit_mmap(struct mm_str
> #endif
>
> static inline void arch_unmap(struct mm_struct *mm,
> - struct vm_area_struct *vma,
> unsigned long start, unsigned long end)
> {
> if (start <= mm->context.vdso_base && mm->context.vdso_base < end)
> _
^ permalink raw reply
* [PATCH v2 18/79] docs: kbuild: convert docs to ReST and rename to *.rst
From: Mauro Carvalho Chehab @ 2019-04-22 13:27 UTC (permalink / raw)
To: Linux Doc Mailing List
Cc: linux-wireless, linux-fbdev, Emmanuel Grumbach, Stanislaw Gruszka,
Greg Kroah-Hartman, bridge, Palmer Dabbelt, alsa-devel, dri-devel,
Ofer Levi, Masahiro Yamada, Harry Wei, Paul Mackerras,
Mauro Carvalho Chehab, linux-kbuild, linux-riscv, Vincent Chen,
Aurelien Jacquiot, Jonas Bonn, Alex Shi, linux-c6x-dev,
linux-scsi, Jonathan Corbet, Bartlomiej Zolnierkiewicz, netdev,
Marek Vasut, coreteam, Federico Vaga, Mark Salter,
Alexey Kuznetsov, linux-snps-arc, Roopa Prabhu, Pablo Neira Ayuso,
devel, Albert Ou, Johannes Berg, Intel Linux Wireless,
Nikolay Aleksandrov, James E.J. Bottomley, Jozsef Kadlecsik,
linuxppc-dev, Mauro Carvalho Chehab, openrisc, Greentime Hu,
linux-mtd, Takashi Iwai, Jaroslav Kysela, Stafford Horne,
Stefan Kristiansson, Kalle Valo, Jon Maloy, Michal Simek,
Michal Marek, tipc-discussion, Teddy Wang, Martin K. Petersen,
Boris Brezillon, Hideaki YOSHIFUJI, Vineet Gupta, linux-usb,
Florian Westphal, linux-kernel, Sudip Mukherjee,
Miguel Ojeda Sandonis, netfilter-devel, Richard Weinberger,
Ying Xue, Luca Coelho, Brian Norris, David Woodhouse,
David S. Miller
In-Reply-To: <cover.1555938375.git.mchehab+samsung@kernel.org>
The kbuild documentation clearly shows that the documents
there are written at different times: some use markdown,
some use their own peculiar logic to split sections.
Convert everything to ReST without affecting too much
the author's style and avoiding adding uneeded markups.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
Documentation/admin-guide/README.rst | 2 +-
| 5 +-
Documentation/kbuild/index.rst | 27 +
Documentation/kbuild/issues.rst | 11 +
.../kbuild/{kbuild.txt => kbuild.rst} | 119 ++--
...nfig-language.txt => kconfig-language.rst} | 232 ++++----
...anguage.txt => kconfig-macro-language.rst} | 37 +-
.../kbuild/{kconfig.txt => kconfig.rst} | 136 +++--
.../kbuild/{makefiles.txt => makefiles.rst} | 518 +++++++++++-------
.../kbuild/{modules.txt => modules.rst} | 168 +++---
Documentation/kernel-hacking/hacking.rst | 4 +-
Documentation/process/coding-style.rst | 2 +-
Documentation/process/submit-checklist.rst | 2 +-
.../it_IT/kernel-hacking/hacking.rst | 4 +-
.../it_IT/process/submit-checklist.rst | 2 +-
.../zh_CN/process/coding-style.rst | 2 +-
.../zh_CN/process/submit-checklist.rst | 2 +-
Kconfig | 2 +-
arch/arc/plat-eznps/Kconfig | 2 +-
arch/c6x/Kconfig | 2 +-
arch/microblaze/Kconfig.debug | 2 +-
arch/microblaze/Kconfig.platform | 2 +-
arch/nds32/Kconfig | 2 +-
arch/openrisc/Kconfig | 2 +-
arch/powerpc/sysdev/Kconfig | 2 +-
arch/riscv/Kconfig | 2 +-
drivers/auxdisplay/Kconfig | 2 +-
drivers/firmware/Kconfig | 2 +-
drivers/mtd/devices/Kconfig | 2 +-
drivers/net/ethernet/smsc/Kconfig | 6 +-
drivers/net/wireless/intel/iwlegacy/Kconfig | 4 +-
drivers/net/wireless/intel/iwlwifi/Kconfig | 2 +-
drivers/parport/Kconfig | 2 +-
drivers/scsi/Kconfig | 4 +-
drivers/staging/sm750fb/Kconfig | 2 +-
drivers/usb/misc/Kconfig | 4 +-
drivers/video/fbdev/Kconfig | 14 +-
net/bridge/netfilter/Kconfig | 2 +-
net/ipv4/netfilter/Kconfig | 2 +-
net/ipv6/netfilter/Kconfig | 2 +-
net/netfilter/Kconfig | 16 +-
net/tipc/Kconfig | 2 +-
scripts/Kbuild.include | 4 +-
scripts/Makefile.host | 2 +-
scripts/kconfig/symbol.c | 2 +-
.../tests/err_recursive_dep/expected_stderr | 14 +-
sound/oss/dmasound/Kconfig | 6 +-
47 files changed, 826 insertions(+), 561 deletions(-)
rename Documentation/kbuild/{headers_install.txt => headers_install.rst} (96%)
create mode 100644 Documentation/kbuild/index.rst
create mode 100644 Documentation/kbuild/issues.rst
rename Documentation/kbuild/{kbuild.txt => kbuild.rst} (72%)
rename Documentation/kbuild/{kconfig-language.txt => kconfig-language.rst} (85%)
rename Documentation/kbuild/{kconfig-macro-language.txt => kconfig-macro-language.rst} (94%)
rename Documentation/kbuild/{kconfig.txt => kconfig.rst} (80%)
rename Documentation/kbuild/{makefiles.txt => makefiles.rst} (84%)
rename Documentation/kbuild/{modules.txt => modules.rst} (84%)
diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst
index a582c780c3bd..cc6151fc0845 100644
--- a/Documentation/admin-guide/README.rst
+++ b/Documentation/admin-guide/README.rst
@@ -227,7 +227,7 @@ Configuring the kernel
"make tinyconfig" Configure the tiniest possible kernel.
You can find more information on using the Linux kernel config tools
- in Documentation/kbuild/kconfig.txt.
+ in Documentation/kbuild/kconfig.rst.
- NOTES on ``make config``:
diff --git a/Documentation/kbuild/headers_install.txt b/Documentation/kbuild/headers_install.rst
similarity index 96%
rename from Documentation/kbuild/headers_install.txt
rename to Documentation/kbuild/headers_install.rst
index f0153adb95e2..1ab7294e41ac 100644
--- a/Documentation/kbuild/headers_install.txt
+++ b/Documentation/kbuild/headers_install.rst
@@ -1,3 +1,4 @@
+=============================================
Exporting kernel headers for use by userspace
=============================================
@@ -22,14 +23,14 @@ older kernel.
The "make headers_install" command can be run in the top level directory of the
kernel source code (or using a standard out-of-tree build). It takes two
-optional arguments:
+optional arguments::
make headers_install ARCH=i386 INSTALL_HDR_PATH=/usr
ARCH indicates which architecture to produce headers for, and defaults to the
current architecture. The linux/asm directory of the exported kernel headers
is platform-specific, to see a complete list of supported architectures use
-the command:
+the command::
ls -d include/asm-* | sed 's/.*-//'
diff --git a/Documentation/kbuild/index.rst b/Documentation/kbuild/index.rst
new file mode 100644
index 000000000000..42d4cbe4460c
--- /dev/null
+++ b/Documentation/kbuild/index.rst
@@ -0,0 +1,27 @@
+:orphan:
+
+===================
+Kernel Build System
+===================
+
+.. toctree::
+ :maxdepth: 1
+
+ kconfig-language
+ kconfig-macro-language
+
+ kbuild
+ kconfig
+ makefiles
+ modules
+
+ headers_install
+
+ issues
+
+.. only:: subproject and html
+
+ Indices
+ =======
+
+ * :ref:`genindex`
diff --git a/Documentation/kbuild/issues.rst b/Documentation/kbuild/issues.rst
new file mode 100644
index 000000000000..9fdded4b681c
--- /dev/null
+++ b/Documentation/kbuild/issues.rst
@@ -0,0 +1,11 @@
+Recursion issue #1
+------------------
+
+ .. include:: Kconfig.recursion-issue-01
+ :literal:
+
+Recursion issue #2
+------------------
+
+ .. include:: Kconfig.recursion-issue-02
+ :literal:
diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.rst
similarity index 72%
rename from Documentation/kbuild/kbuild.txt
rename to Documentation/kbuild/kbuild.rst
index 8a3830b39c7d..ef6298ba4410 100644
--- a/Documentation/kbuild/kbuild.txt
+++ b/Documentation/kbuild/kbuild.rst
@@ -1,96 +1,108 @@
+======
+Kbuild
+======
+
+
Output files
+============
modules.order
---------------------------------------------------
+-------------
This file records the order in which modules appear in Makefiles. This
is used by modprobe to deterministically resolve aliases that match
multiple modules.
modules.builtin
---------------------------------------------------
+---------------
This file lists all modules that are built into the kernel. This is used
by modprobe to not fail when trying to load something builtin.
Environment variables
+=====================
KCPPFLAGS
---------------------------------------------------
+---------
Additional options to pass when preprocessing. The preprocessing options
will be used in all cases where kbuild does preprocessing including
building C files and assembler files.
KAFLAGS
---------------------------------------------------
+-------
Additional options to the assembler (for built-in and modules).
AFLAGS_MODULE
---------------------------------------------------
+-------------
Additional module specific options to use for $(AS).
AFLAGS_KERNEL
---------------------------------------------------
+-------------
Additional options for $(AS) when used for assembler
code for code that is compiled as built-in.
KCFLAGS
---------------------------------------------------
+-------
Additional options to the C compiler (for built-in and modules).
CFLAGS_KERNEL
---------------------------------------------------
+-------------
Additional options for $(CC) when used to compile
code that is compiled as built-in.
CFLAGS_MODULE
---------------------------------------------------
+-------------
Additional module specific options to use for $(CC).
LDFLAGS_MODULE
---------------------------------------------------
+--------------
Additional options used for $(LD) when linking modules.
HOSTCFLAGS
---------------------------------------------------
+----------
Additional flags to be passed to $(HOSTCC) when building host programs.
HOSTCXXFLAGS
---------------------------------------------------
+------------
Additional flags to be passed to $(HOSTCXX) when building host programs.
HOSTLDFLAGS
---------------------------------------------------
+-----------
Additional flags to be passed when linking host programs.
HOSTLDLIBS
---------------------------------------------------
+----------
Additional libraries to link against when building host programs.
KBUILD_KCONFIG
---------------------------------------------------
+--------------
Set the top-level Kconfig file to the value of this environment
variable. The default name is "Kconfig".
KBUILD_VERBOSE
---------------------------------------------------
+--------------
Set the kbuild verbosity. Can be assigned same values as "V=...".
+
See make help for the full list.
+
Setting "V=..." takes precedence over KBUILD_VERBOSE.
KBUILD_EXTMOD
---------------------------------------------------
+-------------
Set the directory to look for the kernel source when building external
modules.
+
Setting "M=..." takes precedence over KBUILD_EXTMOD.
KBUILD_OUTPUT
---------------------------------------------------
+-------------
Specify the output directory when building the kernel.
+
The output directory can also be specified using "O=...".
+
Setting "O=..." takes precedence over KBUILD_OUTPUT.
KBUILD_DEBARCH
---------------------------------------------------
+--------------
For the deb-pkg target, allows overriding the normal heuristics deployed by
deb-pkg. Normally deb-pkg attempts to guess the right architecture based on
the UTS_MACHINE variable, and on some architectures also the kernel config.
@@ -98,44 +110,48 @@ The value of KBUILD_DEBARCH is assumed (not checked) to be a valid Debian
architecture.
ARCH
---------------------------------------------------
+----
Set ARCH to the architecture to be built.
+
In most cases the name of the architecture is the same as the
directory name found in the arch/ directory.
+
But some architectures such as x86 and sparc have aliases.
-x86: i386 for 32 bit, x86_64 for 64 bit
-sh: sh for 32 bit, sh64 for 64 bit
-sparc: sparc32 for 32 bit, sparc64 for 64 bit
+
+- x86: i386 for 32 bit, x86_64 for 64 bit
+- sh: sh for 32 bit, sh64 for 64 bit
+- sparc: sparc32 for 32 bit, sparc64 for 64 bit
CROSS_COMPILE
---------------------------------------------------
+-------------
Specify an optional fixed part of the binutils filename.
CROSS_COMPILE can be a part of the filename or the full path.
CROSS_COMPILE is also used for ccache in some setups.
CF
---------------------------------------------------
+--
Additional options for sparse.
-CF is often used on the command-line like this:
+
+CF is often used on the command-line like this::
make CF=-Wbitwise C=2
INSTALL_PATH
---------------------------------------------------
+------------
INSTALL_PATH specifies where to place the updated kernel and system map
images. Default is /boot, but you can set it to other values.
INSTALLKERNEL
---------------------------------------------------
+-------------
Install script called when using "make install".
The default name is "installkernel".
The script will be called with the following arguments:
- $1 - kernel version
- $2 - kernel image file
- $3 - kernel map file
- $4 - default install path (use root directory if blank)
+ - $1 - kernel version
+ - $2 - kernel image file
+ - $3 - kernel map file
+ - $4 - default install path (use root directory if blank)
The implementation of "make install" is architecture specific
and it may differ from the above.
@@ -144,32 +160,33 @@ INSTALLKERNEL is provided to enable the possibility to
specify a custom installer when cross compiling a kernel.
MODLIB
---------------------------------------------------
+------
Specify where to install modules.
-The default value is:
+The default value is::
$(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)
The value can be overridden in which case the default value is ignored.
INSTALL_MOD_PATH
---------------------------------------------------
+----------------
INSTALL_MOD_PATH specifies a prefix to MODLIB for module directory
relocations required by build roots. This is not defined in the
makefile but the argument can be passed to make if needed.
INSTALL_MOD_STRIP
---------------------------------------------------
+-----------------
INSTALL_MOD_STRIP, if defined, will cause modules to be
stripped after they are installed. If INSTALL_MOD_STRIP is '1', then
the default option --strip-debug will be used. Otherwise,
INSTALL_MOD_STRIP value will be used as the options to the strip command.
INSTALL_HDR_PATH
---------------------------------------------------
+----------------
INSTALL_HDR_PATH specifies where to install user space headers when
executing "make headers_*".
-The default value is:
+
+The default value is::
$(objtree)/usr
@@ -179,65 +196,65 @@ The output directory is often set using "O=..." on the commandline.
The value can be overridden in which case the default value is ignored.
KBUILD_SIGN_PIN
---------------------------------------------------
+---------------
This variable allows a passphrase or PIN to be passed to the sign-file
utility when signing kernel modules, if the private key requires such.
KBUILD_MODPOST_WARN
---------------------------------------------------
+-------------------
KBUILD_MODPOST_WARN can be set to avoid errors in case of undefined
symbols in the final module linking stage. It changes such errors
into warnings.
KBUILD_MODPOST_NOFINAL
---------------------------------------------------
+----------------------
KBUILD_MODPOST_NOFINAL can be set to skip the final link of modules.
This is solely useful to speed up test compiles.
KBUILD_EXTRA_SYMBOLS
---------------------------------------------------
+--------------------
For modules that use symbols from other modules.
See more details in modules.txt.
ALLSOURCE_ARCHS
---------------------------------------------------
+---------------
For tags/TAGS/cscope targets, you can specify more than one arch
-to be included in the databases, separated by blank space. E.g.:
+to be included in the databases, separated by blank space. E.g.::
$ make ALLSOURCE_ARCHS="x86 mips arm" tags
-To get all available archs you can also specify all. E.g.:
+To get all available archs you can also specify all. E.g.::
$ make ALLSOURCE_ARCHS=all tags
KBUILD_ENABLE_EXTRA_GCC_CHECKS
---------------------------------------------------
+------------------------------
If enabled over the make command line with "W=1", it turns on additional
gcc -W... options for more extensive build-time checking.
KBUILD_BUILD_TIMESTAMP
---------------------------------------------------
+----------------------
Setting this to a date string overrides the timestamp used in the
UTS_VERSION definition (uname -v in the running kernel). The value has to
be a string that can be passed to date -d. The default value
is the output of the date command at one point during build.
KBUILD_BUILD_USER, KBUILD_BUILD_HOST
---------------------------------------------------
+------------------------------------
These two variables allow to override the user@host string displayed during
boot and in /proc/version. The default value is the output of the commands
whoami and host, respectively.
KBUILD_LDS
---------------------------------------------------
+----------
The linker script with full path. Assigned by the top-level Makefile.
KBUILD_VMLINUX_OBJS
---------------------------------------------------
+-------------------
All object files for vmlinux. They are linked to vmlinux in the same
order as listed in KBUILD_VMLINUX_OBJS.
KBUILD_VMLINUX_LIBS
---------------------------------------------------
+-------------------
All .a "lib" files for vmlinux. KBUILD_VMLINUX_OBJS and KBUILD_VMLINUX_LIBS
together specify all the object files used to link vmlinux.
diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.rst
similarity index 85%
rename from Documentation/kbuild/kconfig-language.txt
rename to Documentation/kbuild/kconfig-language.rst
index 864e740811da..2bc8a7803365 100644
--- a/Documentation/kbuild/kconfig-language.txt
+++ b/Documentation/kbuild/kconfig-language.rst
@@ -1,8 +1,12 @@
+================
+Kconfig Language
+================
+
Introduction
------------
The configuration database is a collection of configuration options
-organized in a tree structure:
+organized in a tree structure::
+- Code maturity level options
| +- Prompt for development and/or incomplete code/drivers
@@ -25,9 +29,9 @@ Menu entries
------------
Most entries define a config option; all other entries help to organize
-them. A single configuration option is defined like this:
+them. A single configuration option is defined like this::
-config MODVERSIONS
+ config MODVERSIONS
bool "Set version information on all module symbols"
depends on MODULES
help
@@ -52,10 +56,12 @@ applicable everywhere (see syntax).
Every config option must have a type. There are only two basic types:
tristate and string; the other types are based on these two. The type
definition optionally accepts an input prompt, so these two examples
- are equivalent:
+ are equivalent::
bool "Networking support"
- and
+
+ and::
+
bool
prompt "Networking support"
@@ -98,8 +104,10 @@ applicable everywhere (see syntax).
d) Hardware or infrastructure that everybody expects, such as CONFIG_NET
or CONFIG_BLOCK. These are rare exceptions.
-- type definition + default value:
+- type definition + default value::
+
"def_bool"/"def_tristate" <expr> ["if" <expr>]
+
This is a shorthand notation for a type definition plus a value.
Optionally dependencies for this default value can be added with "if".
@@ -107,11 +115,13 @@ applicable everywhere (see syntax).
This defines a dependency for this menu entry. If multiple
dependencies are defined, they are connected with '&&'. Dependencies
are applied to all other options within this menu entry (which also
- accept an "if" expression), so these two examples are equivalent:
+ accept an "if" expression), so these two examples are equivalent::
bool "foo" if BAR
default y if BAR
- and
+
+ and::
+
depends on BAR
bool "foo"
default y
@@ -124,6 +134,7 @@ applicable everywhere (see syntax).
times, the limit is set to the largest selection.
Reverse dependencies can only be used with boolean or tristate
symbols.
+
Note:
select should be used with care. select will force
a symbol to a value without visiting the dependencies.
@@ -139,24 +150,26 @@ applicable everywhere (see syntax).
symbol except that the "implied" symbol's value may still be set to n
from a direct dependency or with a visible prompt.
- Given the following example:
+ Given the following example::
- config FOO
+ config FOO
tristate
imply BAZ
- config BAZ
+ config BAZ
tristate
depends on BAR
The following values are possible:
+ === === ============= ==============
FOO BAR BAZ's default choice for BAZ
- --- --- ------------- --------------
+ === === ============= ==============
n y n N/m/y
m y m M/y/n
y y y Y/n
y n * N
+ === === ============= ==============
This is useful e.g. with multiple drivers that want to indicate their
ability to hook into a secondary subsystem while allowing the user to
@@ -208,9 +221,9 @@ Menu dependencies
Dependencies define the visibility of a menu entry and can also reduce
the input range of tristate symbols. The tristate logic used in the
expressions uses one more state than normal boolean logic to express the
-module state. Dependency expressions have the following syntax:
+module state. Dependency expressions have the following syntax::
-<expr> ::= <symbol> (1)
+ <expr> ::= <symbol> (1)
<symbol> '=' <symbol> (2)
<symbol> '!=' <symbol> (3)
<symbol1> '<' <symbol2> (4)
@@ -222,7 +235,7 @@ module state. Dependency expressions have the following syntax:
<expr> '&&' <expr> (7)
<expr> '||' <expr> (8)
-Expressions are listed in decreasing order of precedence.
+Expressions are listed in decreasing order of precedence.
(1) Convert the symbol into an expression. Boolean and tristate symbols
are simply converted into the respective expression values. All
@@ -255,15 +268,15 @@ Menu structure
--------------
The position of a menu entry in the tree is determined in two ways. First
-it can be specified explicitly:
+it can be specified explicitly::
-menu "Network device support"
+ menu "Network device support"
depends on NET
-config NETDEVICES
+ config NETDEVICES
...
-endmenu
+ endmenu
All entries within the "menu" ... "endmenu" block become a submenu of
"Network device support". All subentries inherit the dependencies from
@@ -275,17 +288,18 @@ dependencies. If a menu entry somehow depends on the previous entry, it
can be made a submenu of it. First, the previous (parent) symbol must
be part of the dependency list and then one of these two conditions
must be true:
+
- the child entry must become invisible, if the parent is set to 'n'
-- the child entry must only be visible, if the parent is visible
+- the child entry must only be visible, if the parent is visible::
-config MODULES
+ config MODULES
bool "Enable loadable module support"
-config MODVERSIONS
+ config MODVERSIONS
bool "Set version information on all module symbols"
depends on MODULES
-comment "module support disabled"
+ comment "module support disabled"
depends on !MODULES
MODVERSIONS directly depends on MODULES, this means it's only visible if
@@ -299,6 +313,7 @@ Kconfig syntax
The configuration file describes a series of menu entries, where every
line starts with a keyword (except help texts). The following keywords
end a menu entry:
+
- config
- menuconfig
- choice/endchoice
@@ -306,17 +321,17 @@ end a menu entry:
- menu/endmenu
- if/endif
- source
+
The first five also start the definition of a menu entry.
-config:
-
+config::
"config" <symbol>
<config options>
This defines a config symbol <symbol> and accepts any of above
attributes as options.
-menuconfig:
+menuconfig::
"menuconfig" <symbol>
<config options>
@@ -325,43 +340,43 @@ hint to front ends, that all suboptions should be displayed as a
separate list of options. To make sure all the suboptions will really
show up under the menuconfig entry and not outside of it, every item
from the <config options> list must depend on the menuconfig symbol.
-In practice, this is achieved by using one of the next two constructs:
+In practice, this is achieved by using one of the next two constructs::
-(1):
-menuconfig M
-if M
- config C1
- config C2
-endif
+ (1):
+ menuconfig M
+ if M
+ config C1
+ config C2
+ endif
-(2):
-menuconfig M
-config C1
- depends on M
-config C2
- depends on M
+ (2):
+ menuconfig M
+ config C1
+ depends on M
+ config C2
+ depends on M
In the following examples (3) and (4), C1 and C2 still have the M
dependency, but will not appear under menuconfig M anymore, because
-of C0, which doesn't depend on M:
+of C0, which doesn't depend on M::
-(3):
-menuconfig M
- config C0
-if M
- config C1
- config C2
-endif
+ (3):
+ menuconfig M
+ config C0
+ if M
+ config C1
+ config C2
+ endif
-(4):
-menuconfig M
-config C0
-config C1
- depends on M
-config C2
- depends on M
+ (4):
+ menuconfig M
+ config C0
+ config C1
+ depends on M
+ config C2
+ depends on M
-choices:
+choices::
"choice" [symbol]
<choice options>
@@ -387,7 +402,7 @@ definitions of that choice. If a [symbol] is associated to the choice,
then you may define the same choice (i.e. with the same entries) in another
place.
-comment:
+comment::
"comment" <prompt>
<comment options>
@@ -396,7 +411,7 @@ This defines a comment which is displayed to the user during the
configuration process and is also echoed to the output files. The only
possible options are dependencies.
-menu:
+menu::
"menu" <prompt>
<menu options>
@@ -407,7 +422,7 @@ This defines a menu block, see "Menu structure" above for more
information. The only possible options are dependencies and "visible"
attributes.
-if:
+if::
"if" <expr>
<if block>
@@ -416,13 +431,13 @@ if:
This defines an if block. The dependency expression <expr> is appended
to all enclosed menu entries.
-source:
+source::
"source" <prompt>
This reads the specified configuration file. This file is always parsed.
-mainmenu:
+mainmenu::
"mainmenu" <prompt>
@@ -452,20 +467,21 @@ that is defined in a common Kconfig file and selected by the relevant
architectures.
An example is the generic IOMAP functionality.
-We would in lib/Kconfig see:
+We would in lib/Kconfig see::
-# Generic IOMAP is used to ...
-config HAVE_GENERIC_IOMAP
+ # Generic IOMAP is used to ...
+ config HAVE_GENERIC_IOMAP
-config GENERIC_IOMAP
+ config GENERIC_IOMAP
depends on HAVE_GENERIC_IOMAP && FOO
-And in lib/Makefile we would see:
-obj-$(CONFIG_GENERIC_IOMAP) += iomap.o
+And in lib/Makefile we would see::
-For each architecture using the generic IOMAP functionality we would see:
+ obj-$(CONFIG_GENERIC_IOMAP) += iomap.o
-config X86
+For each architecture using the generic IOMAP functionality we would see::
+
+ config X86
select ...
select HAVE_GENERIC_IOMAP
select ...
@@ -484,25 +500,25 @@ Adding features that need compiler support
There are several features that need compiler support. The recommended way
to describe the dependency on the compiler feature is to use "depends on"
-followed by a test macro.
+followed by a test macro::
-config STACKPROTECTOR
+ config STACKPROTECTOR
bool "Stack Protector buffer overflow detection"
depends on $(cc-option,-fstack-protector)
...
If you need to expose a compiler capability to makefiles and/or C source files,
-CC_HAS_ is the recommended prefix for the config option.
+`CC_HAS_` is the recommended prefix for the config option::
-config CC_HAS_STACKPROTECTOR_NONE
+ config CC_HAS_STACKPROTECTOR_NONE
def_bool $(cc-option,-fno-stack-protector)
Build as module only
~~~~~~~~~~~~~~~~~~~~
To restrict a component build to module-only, qualify its config symbol
-with "depends on m". E.g.:
+with "depends on m". E.g.::
-config FOO
+ config FOO
depends on BAR && m
limits FOO to module (=m) or disabled (=n).
@@ -529,18 +545,18 @@ Simple Kconfig recursive issue
Read: Documentation/kbuild/Kconfig.recursion-issue-01
-Test with:
+Test with::
-make KBUILD_KCONFIG=Documentation/kbuild/Kconfig.recursion-issue-01 allnoconfig
+ make KBUILD_KCONFIG=Documentation/kbuild/Kconfig.recursion-issue-01 allnoconfig
Cumulative Kconfig recursive issue
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Read: Documentation/kbuild/Kconfig.recursion-issue-02
-Test with:
+Test with::
-make KBUILD_KCONFIG=Documentation/kbuild/Kconfig.recursion-issue-02 allnoconfig
+ make KBUILD_KCONFIG=Documentation/kbuild/Kconfig.recursion-issue-02 allnoconfig
Practical solutions to kconfig recursive issue
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -551,7 +567,9 @@ historical issues resolved through these different solutions.
a) Remove any superfluous "select FOO" or "depends on FOO"
b) Match dependency semantics:
+
b1) Swap all "select FOO" to "depends on FOO" or,
+
b2) Swap all "depends on FOO" to "select FOO"
The resolution to a) can be tested with the sample Kconfig file
@@ -566,8 +584,9 @@ Documentation/kbuild/Kconfig.recursion-issue-02.
Below is a list of examples of prior fixes for these types of recursive issues;
all errors appear to involve one or more select's and one or more "depends on".
+============ ===================================
commit fix
-====== ===
+============ ===================================
06b718c01208 select A -> depends on A
c22eacfe82f9 depends on A -> depends on B
6a91e854442c select A -> depends on A
@@ -590,6 +609,7 @@ d9f9ab51e55e select A -> depends on A
0c51a4d8abd6 depends on A -> select A (3)
e98062ed6dc4 select A -> depends on A (3)
91e5d284a7f1 select A -> (null)
+============ ===================================
(1) Partial (or no) quote of error.
(2) That seems to be the gist of that fix.
@@ -616,11 +636,11 @@ Semantics of Kconfig
~~~~~~~~~~~~~~~~~~~~
The use of Kconfig is broad, Linux is now only one of Kconfig's users:
-one study has completed a broad analysis of Kconfig use in 12 projects [0].
+one study has completed a broad analysis of Kconfig use in 12 projects [0]_.
Despite its widespread use, and although this document does a reasonable job
in documenting basic Kconfig syntax a more precise definition of Kconfig
semantics is welcomed. One project deduced Kconfig semantics through
-the use of the xconfig configurator [1]. Work should be done to confirm if
+the use of the xconfig configurator [1]_. Work should be done to confirm if
the deduced semantics matches our intended Kconfig design goals.
Having well defined semantics can be useful for tools for practical
@@ -628,42 +648,42 @@ evaluation of depenencies, for instance one such use known case was work to
express in boolean abstraction of the inferred semantics of Kconfig to
translate Kconfig logic into boolean formulas and run a SAT solver on this to
find dead code / features (always inactive), 114 dead features were found in
-Linux using this methodology [1] (Section 8: Threats to validity).
+Linux using this methodology [1]_ (Section 8: Threats to validity).
Confirming this could prove useful as Kconfig stands as one of the the leading
-industrial variability modeling languages [1] [2]. Its study would help
+industrial variability modeling languages [1]_ [2]_. Its study would help
evaluate practical uses of such languages, their use was only theoretical
and real world requirements were not well understood. As it stands though
only reverse engineering techniques have been used to deduce semantics from
-variability modeling languages such as Kconfig [3].
+variability modeling languages such as Kconfig [3]_.
-[0] http://www.eng.uwaterloo.ca/~shshe/kconfig_semantics.pdf
-[1] http://gsd.uwaterloo.ca/sites/default/files/vm-2013-berger.pdf
-[2] http://gsd.uwaterloo.ca/sites/default/files/ase241-berger_0.pdf
-[3] http://gsd.uwaterloo.ca/sites/default/files/icse2011.pdf
+.. [0] http://www.eng.uwaterloo.ca/~shshe/kconfig_semantics.pdf
+.. [1] http://gsd.uwaterloo.ca/sites/default/files/vm-2013-berger.pdf
+.. [2] http://gsd.uwaterloo.ca/sites/default/files/ase241-berger_0.pdf
+.. [3] http://gsd.uwaterloo.ca/sites/default/files/icse2011.pdf
Full SAT solver for Kconfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Although SAT solvers [0] haven't yet been used by Kconfig directly, as noted in
-the previous subsection, work has been done however to express in boolean
+Although SAT solvers [4]_ haven't yet been used by Kconfig directly, as noted
+in the previous subsection, work has been done however to express in boolean
abstraction the inferred semantics of Kconfig to translate Kconfig logic into
-boolean formulas and run a SAT solver on it [1]. Another known related project
-is CADOS [2] (former VAMOS [3]) and the tools, mainly undertaker [4], which has
-been introduced first with [5]. The basic concept of undertaker is to exract
-variability models from Kconfig, and put them together with a propositional
-formula extracted from CPP #ifdefs and build-rules into a SAT solver in order
-to find dead code, dead files, and dead symbols. If using a SAT solver is
-desirable on Kconfig one approach would be to evaluate repurposing such efforts
-somehow on Kconfig. There is enough interest from mentors of existing projects
-to not only help advise how to integrate this work upstream but also help
-maintain it long term. Interested developers should visit:
+boolean formulas and run a SAT solver on it [5]_. Another known related project
+is CADOS [6]_ (former VAMOS [7]_) and the tools, mainly undertaker [8]_, which
+has been introduced first with [9]_. The basic concept of undertaker is to
+exract variability models from Kconfig, and put them together with a
+propositional formula extracted from CPP #ifdefs and build-rules into a SAT
+solver in order to find dead code, dead files, and dead symbols. If using a SAT
+solver is desirable on Kconfig one approach would be to evaluate repurposing
+such efforts somehow on Kconfig. There is enough interest from mentors of
+existing projects to not only help advise how to integrate this work upstream
+but also help maintain it long term. Interested developers should visit:
http://kernelnewbies.org/KernelProjects/kconfig-sat
-[0] http://www.cs.cornell.edu/~sabhar/chapters/SATSolvers-KR-Handbook.pdf
-[1] http://gsd.uwaterloo.ca/sites/default/files/vm-2013-berger.pdf
-[2] https://cados.cs.fau.de
-[3] https://vamos.cs.fau.de
-[4] https://undertaker.cs.fau.de
-[5] https://www4.cs.fau.de/Publications/2011/tartler_11_eurosys.pdf
+.. [4] http://www.cs.cornell.edu/~sabhar/chapters/SATSolvers-KR-Handbook.pdf
+.. [5] http://gsd.uwaterloo.ca/sites/default/files/vm-2013-berger.pdf
+.. [6] https://cados.cs.fau.de
+.. [7] https://vamos.cs.fau.de
+.. [8] https://undertaker.cs.fau.de
+.. [9] https://www4.cs.fau.de/Publications/2011/tartler_11_eurosys.pdf
diff --git a/Documentation/kbuild/kconfig-macro-language.txt b/Documentation/kbuild/kconfig-macro-language.rst
similarity index 94%
rename from Documentation/kbuild/kconfig-macro-language.txt
rename to Documentation/kbuild/kconfig-macro-language.rst
index 07da2ea68dce..35b3263b7e40 100644
--- a/Documentation/kbuild/kconfig-macro-language.txt
+++ b/Documentation/kbuild/kconfig-macro-language.rst
@@ -1,3 +1,7 @@
+======================
+Kconfig macro language
+======================
+
Concept
-------
@@ -7,7 +11,7 @@ targets and prerequisites. The other is a macro language for performing textual
substitution.
There is clear distinction between the two language stages. For example, you
-can write a makefile like follows:
+can write a makefile like follows::
APP := foo
SRC := foo.c
@@ -17,7 +21,7 @@ can write a makefile like follows:
$(CC) -o $(APP) $(SRC)
The macro language replaces the variable references with their expanded form,
-and handles as if the source file were input like follows:
+and handles as if the source file were input like follows::
foo: foo.c
gcc -o foo foo.c
@@ -26,7 +30,7 @@ Then, Make analyzes the dependency graph and determines the targets to be
updated.
The idea is quite similar in Kconfig - it is possible to describe a Kconfig
-file like this:
+file like this::
CC := gcc
@@ -34,7 +38,7 @@ file like this:
def_bool $(shell, $(srctree)/scripts/gcc-check-foo.sh $(CC))
The macro language in Kconfig processes the source file into the following
-intermediate:
+intermediate::
config CC_HAS_FOO
def_bool y
@@ -69,7 +73,7 @@ variable. The righthand side of += is expanded immediately if the lefthand
side was originally defined as a simple variable. Otherwise, its evaluation is
deferred.
-The variable reference can take parameters, in the following form:
+The variable reference can take parameters, in the following form::
$(name,arg1,arg2,arg3)
@@ -141,7 +145,7 @@ Make vs Kconfig
Kconfig adopts Make-like macro language, but the function call syntax is
slightly different.
-A function call in Make looks like this:
+A function call in Make looks like this::
$(func-name arg1,arg2,arg3)
@@ -149,14 +153,14 @@ The function name and the first argument are separated by at least one
whitespace. Then, leading whitespaces are trimmed from the first argument,
while whitespaces in the other arguments are kept. You need to use a kind of
trick to start the first parameter with spaces. For example, if you want
-to make "info" function print " hello", you can write like follows:
+to make "info" function print " hello", you can write like follows::
empty :=
space := $(empty) $(empty)
$(info $(space)$(space)hello)
Kconfig uses only commas for delimiters, and keeps all whitespaces in the
-function call. Some people prefer putting a space after each comma delimiter:
+function call. Some people prefer putting a space after each comma delimiter::
$(func-name, arg1, arg2, arg3)
@@ -166,7 +170,7 @@ Make - for example, $(subst .c, .o, $(sources)) is a typical mistake; it
replaces ".c" with " .o".
In Make, a user-defined function is referenced by using a built-in function,
-'call', like this:
+'call', like this::
$(call my-func,arg1,arg2,arg3)
@@ -179,12 +183,12 @@ Likewise, $(info hello, world) prints "hello, world" to stdout. You could say
this is _useful_ inconsistency.
In Kconfig, for simpler implementation and grammatical consistency, commas that
-appear in the $( ) context are always delimiters. It means
+appear in the $( ) context are always delimiters. It means::
$(shell, echo hello, world)
is an error because it is passing two parameters where the 'shell' function
-accepts only one. To pass commas in arguments, you can use the following trick:
+accepts only one. To pass commas in arguments, you can use the following trick::
comma := ,
$(shell, echo hello$(comma) world)
@@ -195,7 +199,7 @@ Caveats
A variable (or function) cannot be expanded across tokens. So, you cannot use
a variable as a shorthand for an expression that consists of multiple tokens.
-The following works:
+The following works::
RANGE_MIN := 1
RANGE_MAX := 3
@@ -204,7 +208,7 @@ The following works:
int "foo"
range $(RANGE_MIN) $(RANGE_MAX)
-But, the following does not work:
+But, the following does not work::
RANGES := 1 3
@@ -213,7 +217,7 @@ But, the following does not work:
range $(RANGES)
A variable cannot be expanded to any keyword in Kconfig. The following does
-not work:
+not work::
MY_TYPE := tristate
@@ -223,7 +227,8 @@ not work:
Obviously from the design, $(shell command) is expanded in the textual
substitution phase. You cannot pass symbols to the 'shell' function.
-The following does not work as expected.
+
+The following does not work as expected::
config ENDIAN_FLAG
string
@@ -234,7 +239,7 @@ The following does not work as expected.
def_bool $(shell $(srctree)/scripts/gcc-check-flag ENDIAN_FLAG)
Instead, you can do like follows so that any function call is statically
-expanded.
+expanded::
config CC_HAS_ENDIAN_FLAG
bool
diff --git a/Documentation/kbuild/kconfig.txt b/Documentation/kbuild/kconfig.rst
similarity index 80%
rename from Documentation/kbuild/kconfig.txt
rename to Documentation/kbuild/kconfig.rst
index 68c82914c0f3..88129af7e539 100644
--- a/Documentation/kbuild/kconfig.txt
+++ b/Documentation/kbuild/kconfig.rst
@@ -1,4 +1,8 @@
-This file contains some assistance for using "make *config".
+===================
+Kconfig make config
+===================
+
+This file contains some assistance for using `make *config`.
Use "make help" to list all of the possible configuration targets.
@@ -6,9 +10,8 @@ The xconfig ('qconf'), menuconfig ('mconf'), and nconfig ('nconf')
programs also have embedded help text. Be sure to check that for
navigation, search, and other general help text.
-======================================================================
General
---------------------------------------------------
+-------
New kernel releases often introduce new config symbols. Often more
important, new kernel releases may rename config symbols. When
@@ -17,51 +20,55 @@ this happens, using a previously working .config file and running
for you, so you may find that you need to see what NEW kernel
symbols have been introduced.
-To see a list of new config symbols, use
+To see a list of new config symbols, use::
cp user/some/old.config .config
make listnewconfig
and the config program will list any new symbols, one per line.
-Alternatively, you can use the brute force method:
+Alternatively, you can use the brute force method::
make oldconfig
scripts/diffconfig .config.old .config | less
-______________________________________________________________________
-Environment variables for '*config'
+----------------------------------------------------------------------
+
+Environment variables for `*config`
KCONFIG_CONFIG
---------------------------------------------------
+--------------
This environment variable can be used to specify a default kernel config
file name to override the default name of ".config".
KCONFIG_OVERWRITECONFIG
---------------------------------------------------
+-----------------------
If you set KCONFIG_OVERWRITECONFIG in the environment, Kconfig will not
break symlinks when .config is a symlink to somewhere else.
-CONFIG_
---------------------------------------------------
-If you set CONFIG_ in the environment, Kconfig will prefix all symbols
+`CONFIG_`
+---------
+If you set `CONFIG_` in the environment, Kconfig will prefix all symbols
with its value when saving the configuration, instead of using the default,
-"CONFIG_".
+`CONFIG_`.
+
+----------------------------------------------------------------------
-______________________________________________________________________
Environment variables for '{allyes/allmod/allno/rand}config'
KCONFIG_ALLCONFIG
---------------------------------------------------
+-----------------
(partially based on lkml email from/by Rob Landley, re: miniconfig)
+
--------------------------------------------------
+
The allyesconfig/allmodconfig/allnoconfig/randconfig variants can also
use the environment variable KCONFIG_ALLCONFIG as a flag or a filename
that contains config symbols that the user requires to be set to a
specific value. If KCONFIG_ALLCONFIG is used without a filename where
-KCONFIG_ALLCONFIG == "" or KCONFIG_ALLCONFIG == "1", "make *config"
+KCONFIG_ALLCONFIG == "" or KCONFIG_ALLCONFIG == "1", `make *config`
checks for a file named "all{yes/mod/no/def/random}.config"
-(corresponding to the *config command that was used) for symbol values
+(corresponding to the `*config` command that was used) for symbol values
that are to be forced. If this file is not found, it checks for a
file named "all.config" to contain forced values.
@@ -74,43 +81,55 @@ This 'KCONFIG_ALLCONFIG' file is a config file which contains
(usually a subset of all) preset config symbols. These variable
settings are still subject to normal dependency checks.
-Examples:
+Examples::
+
KCONFIG_ALLCONFIG=custom-notebook.config make allnoconfig
-or
+
+or::
+
KCONFIG_ALLCONFIG=mini.config make allnoconfig
-or
+
+or::
+
make KCONFIG_ALLCONFIG=mini.config allnoconfig
These examples will disable most options (allnoconfig) but enable or
disable the options that are explicitly listed in the specified
mini-config files.
-______________________________________________________________________
+----------------------------------------------------------------------
+
Environment variables for 'randconfig'
KCONFIG_SEED
---------------------------------------------------
+------------
You can set this to the integer value used to seed the RNG, if you want
to somehow debug the behaviour of the kconfig parser/frontends.
If not set, the current time will be used.
KCONFIG_PROBABILITY
---------------------------------------------------
+-------------------
This variable can be used to skew the probabilities. This variable can
be unset or empty, or set to three different formats:
+
+ ======================= ================== =====================
KCONFIG_PROBABILITY y:n split y:m:n split
- -----------------------------------------------------------------
+ ======================= ================== =====================
unset or empty 50 : 50 33 : 33 : 34
N N : 100-N N/2 : N/2 : 100-N
[1] N:M N+M : 100-(N+M) N : M : 100-(N+M)
[2] N:M:L N : 100-N M : L : 100-(M+L)
+ ======================= ================== =====================
where N, M and L are integers (in base 10) in the range [0,100], and so
that:
+
[1] N+M is in the range [0,100]
+
[2] M+L is in the range [0,100]
-Examples:
+Examples::
+
KCONFIG_PROBABILITY=10
10% of booleans will be set to 'y', 90% to 'n'
5% of tristates will be set to 'y', 5% to 'm', 90% to 'n'
@@ -121,34 +140,36 @@ Examples:
10% of booleans will be set to 'y', 90% to 'n'
15% of tristates will be set to 'y', 15% to 'm', 70% to 'n'
-______________________________________________________________________
+----------------------------------------------------------------------
+
Environment variables for 'syncconfig'
KCONFIG_NOSILENTUPDATE
---------------------------------------------------
+----------------------
If this variable has a non-blank value, it prevents silent kernel
config updates (requires explicit updates).
KCONFIG_AUTOCONFIG
---------------------------------------------------
+------------------
This environment variable can be set to specify the path & name of the
"auto.conf" file. Its default value is "include/config/auto.conf".
KCONFIG_TRISTATE
---------------------------------------------------
+----------------
This environment variable can be set to specify the path & name of the
"tristate.conf" file. Its default value is "include/config/tristate.conf".
KCONFIG_AUTOHEADER
---------------------------------------------------
+------------------
This environment variable can be set to specify the path & name of the
"autoconf.h" (header) file.
Its default value is "include/generated/autoconf.h".
-======================================================================
+----------------------------------------------------------------------
+
menuconfig
---------------------------------------------------
+----------
SEARCHING for CONFIG symbols
@@ -158,7 +179,8 @@ Searching in menuconfig:
names, so you have to know something close to what you are
looking for.
- Example:
+ Example::
+
/hotplug
This lists all config symbols that contain "hotplug",
e.g., HOTPLUG_CPU, MEMORY_HOTPLUG.
@@ -166,48 +188,55 @@ Searching in menuconfig:
For search help, enter / followed by TAB-TAB (to highlight
<Help>) and Enter. This will tell you that you can also use
regular expressions (regexes) in the search string, so if you
- are not interested in MEMORY_HOTPLUG, you could try
+ are not interested in MEMORY_HOTPLUG, you could try::
/^hotplug
When searching, symbols are sorted thus:
+
- first, exact matches, sorted alphabetically (an exact match
is when the search matches the complete symbol name);
- then, other matches, sorted alphabetically.
+
For example: ^ATH.K matches:
+
ATH5K ATH9K ATH5K_AHB ATH5K_DEBUG [...] ATH6KL ATH6KL_DEBUG
[...] ATH9K_AHB ATH9K_BTCOEX_SUPPORT ATH9K_COMMON [...]
+
of which only ATH5K and ATH9K match exactly and so are sorted
first (and in alphabetical order), then come all other symbols,
sorted in alphabetical order.
-______________________________________________________________________
+----------------------------------------------------------------------
+
User interface options for 'menuconfig'
MENUCONFIG_COLOR
---------------------------------------------------
+----------------
It is possible to select different color themes using the variable
-MENUCONFIG_COLOR. To select a theme use:
+MENUCONFIG_COLOR. To select a theme use::
make MENUCONFIG_COLOR=<theme> menuconfig
-Available themes are:
- mono => selects colors suitable for monochrome displays
- blackbg => selects a color scheme with black background
- classic => theme with blue background. The classic look
- bluetitle => a LCD friendly version of classic. (default)
+Available themes are::
+
+ - mono => selects colors suitable for monochrome displays
+ - blackbg => selects a color scheme with black background
+ - classic => theme with blue background. The classic look
+ - bluetitle => a LCD friendly version of classic. (default)
MENUCONFIG_MODE
---------------------------------------------------
+---------------
This mode shows all sub-menus in one large tree.
-Example:
+Example::
+
make MENUCONFIG_MODE=single_menu menuconfig
+----------------------------------------------------------------------
-======================================================================
nconfig
---------------------------------------------------
+-------
nconfig is an alternate text-based configurator. It lists function
keys across the bottom of the terminal (window) that execute commands.
@@ -231,16 +260,16 @@ Searching in nconfig:
given string or regular expression (regex).
NCONFIG_MODE
---------------------------------------------------
+------------
This mode shows all sub-menus in one large tree.
-Example:
+Example::
make NCONFIG_MODE=single_menu nconfig
+----------------------------------------------------------------------
-======================================================================
xconfig
---------------------------------------------------
+-------
Searching in xconfig:
@@ -260,13 +289,12 @@ Searching in xconfig:
to return to the main menu.
-======================================================================
+----------------------------------------------------------------------
+
gconfig
---------------------------------------------------
+-------
Searching in gconfig:
There is no search command in gconfig. However, gconfig does
have several different viewing choices, modes, and options.
-
-###
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.rst
similarity index 84%
rename from Documentation/kbuild/makefiles.txt
rename to Documentation/kbuild/makefiles.rst
index 03c065855eaf..9274cdcc9bd2 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.rst
@@ -1,8 +1,10 @@
+======================
Linux Kernel Makefiles
+======================
This document describes the Linux kernel Makefiles.
-=== Table of Contents
+.. Table of Contents
=== 1 Overview
=== 2 Who does what
@@ -54,9 +56,10 @@ This document describes the Linux kernel Makefiles.
=== 10 Credits
=== 11 TODO
-=== 1 Overview
+1 Overview
+==========
-The Makefiles have five parts:
+The Makefiles have five parts::
Makefile the top Makefile.
.config the kernel configuration file.
@@ -85,7 +88,8 @@ scripts/Makefile.* contains all the definitions/rules etc. that
are used to build the kernel based on the kbuild makefiles.
-=== 2 Who does what
+2 Who does what
+===============
People have four different relationships with the kernel Makefiles.
@@ -110,7 +114,8 @@ These people need to know about all aspects of the kernel Makefiles.
This document is aimed towards normal developers and arch developers.
-=== 3 The kbuild files
+3 The kbuild files
+==================
Most Makefiles within the kernel are kbuild Makefiles that use the
kbuild infrastructure. This chapter introduces the syntax used in the
@@ -122,7 +127,8 @@ file will be used.
Section 3.1 "Goal definitions" is a quick intro, further chapters provide
more details, with real examples.
---- 3.1 Goal definitions
+3.1 Goal definitions
+--------------------
Goal definitions are the main part (heart) of the kbuild Makefile.
These lines define the files to be built, any special compilation
@@ -130,7 +136,8 @@ more details, with real examples.
The most simple kbuild makefile contains one line:
- Example:
+ Example::
+
obj-y += foo.o
This tells kbuild that there is one object in that directory, named
@@ -139,14 +146,16 @@ more details, with real examples.
If foo.o shall be built as a module, the variable obj-m is used.
Therefore the following pattern is often used:
- Example:
+ Example::
+
obj-$(CONFIG_FOO) += foo.o
$(CONFIG_FOO) evaluates to either y (for built-in) or m (for module).
If CONFIG_FOO is neither y nor m, then the file will not be compiled
nor linked.
---- 3.2 Built-in object goals - obj-y
+3.2 Built-in object goals - obj-y
+---------------------------------
The kbuild Makefile specifies object files for vmlinux
in the $(obj-y) lists. These lists depend on the kernel
@@ -167,14 +176,16 @@ more details, with real examples.
order may e.g. change the order in which your SCSI
controllers are detected, and thus your disks are renumbered.
- Example:
+ Example::
+
#drivers/isdn/i4l/Makefile
# Makefile for the kernel ISDN subsystem and device drivers.
# Each configuration option enables a list of files.
obj-$(CONFIG_ISDN_I4L) += isdn.o
obj-$(CONFIG_ISDN_PPP_BSDCOMP) += isdn_bsdcomp.o
---- 3.3 Loadable module goals - obj-m
+3.3 Loadable module goals - obj-m
+---------------------------------
$(obj-m) specifies object files which are built as loadable
kernel modules.
@@ -183,7 +194,8 @@ more details, with real examples.
files. In the case of one source file, the kbuild makefile
simply adds the file to $(obj-m).
- Example:
+ Example::
+
#drivers/isdn/i4l/Makefile
obj-$(CONFIG_ISDN_PPP_BSDCOMP) += isdn_bsdcomp.o
@@ -195,7 +207,8 @@ more details, with real examples.
module from, so you have to tell it by setting a $(<module_name>-y)
variable.
- Example:
+ Example::
+
#drivers/isdn/i4l/Makefile
obj-$(CONFIG_ISDN_I4L) += isdn.o
isdn-y := isdn_net_lib.o isdn_v110.o isdn_common.o
@@ -205,10 +218,11 @@ more details, with real examples.
"$(LD) -r" on the list of these files to generate isdn.o.
Due to kbuild recognizing $(<module_name>-y) for composite objects,
- you can use the value of a CONFIG_ symbol to optionally include an
+ you can use the value of a `CONFIG_` symbol to optionally include an
object file as part of a composite object.
- Example:
+ Example::
+
#fs/ext2/Makefile
obj-$(CONFIG_EXT2_FS) += ext2.o
ext2-y := balloc.o dir.o file.o ialloc.o inode.o ioctl.o \
@@ -225,12 +239,14 @@ more details, with real examples.
kbuild will build an ext2.o file for you out of the individual
parts and then link this into built-in.a, as you would expect.
---- 3.4 Objects which export symbols
+3.4 Objects which export symbols
+--------------------------------
No special notation is required in the makefiles for
modules exporting symbols.
---- 3.5 Library file goals - lib-y
+3.5 Library file goals - lib-y
+------------------------------
Objects listed with obj-* are used for modules, or
combined in a built-in.a for that specific directory.
@@ -247,18 +263,21 @@ more details, with real examples.
and to be part of a library. Therefore the same directory
may contain both a built-in.a and a lib.a file.
- Example:
+ Example::
+
#arch/x86/lib/Makefile
lib-y := delay.o
This will create a library lib.a based on delay.o. For kbuild to
actually recognize that there is a lib.a being built, the directory
shall be listed in libs-y.
+
See also "6.4 List directories to visit when descending".
- Use of lib-y is normally restricted to lib/ and arch/*/lib.
+ Use of lib-y is normally restricted to `lib/` and `arch/*/lib`.
---- 3.6 Descending down in directories
+3.6 Descending down in directories
+----------------------------------
A Makefile is only responsible for building objects in its own
directory. Files in subdirectories should be taken care of by
@@ -270,7 +289,8 @@ more details, with real examples.
ext2 lives in a separate directory, and the Makefile present in fs/
tells kbuild to descend down using the following assignment.
- Example:
+ Example::
+
#fs/Makefile
obj-$(CONFIG_EXT2_FS) += ext2/
@@ -281,11 +301,12 @@ more details, with real examples.
the directory, it is the Makefile in the subdirectory that
specifies what is modular and what is built-in.
- It is good practice to use a CONFIG_ variable when assigning directory
+ It is good practice to use a `CONFIG_` variable when assigning directory
names. This allows kbuild to totally skip the directory if the
- corresponding CONFIG_ option is neither 'y' nor 'm'.
+ corresponding `CONFIG_` option is neither 'y' nor 'm'.
---- 3.7 Compilation flags
+3.7 Compilation flags
+---------------------
ccflags-y, asflags-y and ldflags-y
These three flags apply only to the kbuild makefile in which they
@@ -297,7 +318,8 @@ more details, with real examples.
ccflags-y specifies options for compiling with $(CC).
- Example:
+ Example::
+
# drivers/acpi/acpica/Makefile
ccflags-y := -Os -D_LINUX -DBUILDING_ACPICA
ccflags-$(CONFIG_ACPI_DEBUG) += -DACPI_DEBUG_OUTPUT
@@ -308,13 +330,15 @@ more details, with real examples.
asflags-y specifies options for assembling with $(AS).
- Example:
+ Example::
+
#arch/sparc/kernel/Makefile
asflags-y := -ansi
ldflags-y specifies options for linking with $(LD).
- Example:
+ Example::
+
#arch/cris/boot/compressed/Makefile
ldflags-y += -T $(srctree)/$(src)/decompress_$(arch-y).lds
@@ -325,18 +349,19 @@ more details, with real examples.
Options specified using subdir-* are added to the commandline before
the options specified using the non-subdir variants.
- Example:
+ Example::
+
subdir-ccflags-y := -Werror
CFLAGS_$@, AFLAGS_$@
-
CFLAGS_$@ and AFLAGS_$@ only apply to commands in current
kbuild makefile.
$(CFLAGS_$@) specifies per-file options for $(CC). The $@
part has a literal value which specifies the file that it is for.
- Example:
+ Example::
+
# drivers/scsi/Makefile
CFLAGS_aha152x.o = -DAHA152X_STAT -DAUTOCONF
CFLAGS_gdth.o = # -DDEBUG_GDTH=2 -D__SERIAL__ -D__COM2__ \
@@ -347,24 +372,27 @@ more details, with real examples.
$(AFLAGS_$@) is a similar feature for source files in assembly
languages.
- Example:
+ Example::
+
# arch/arm/kernel/Makefile
AFLAGS_head.o := -DTEXT_OFFSET=$(TEXT_OFFSET)
AFLAGS_crunch-bits.o := -Wa,-mcpu=ep9312
AFLAGS_iwmmxt.o := -Wa,-mcpu=iwmmxt
---- 3.9 Dependency tracking
+3.9 Dependency tracking
+-----------------------
Kbuild tracks dependencies on the following:
- 1) All prerequisite files (both *.c and *.h)
- 2) CONFIG_ options used in all prerequisite files
+ 1) All prerequisite files (both `*.c` and `*.h`)
+ 2) `CONFIG_` options used in all prerequisite files
3) Command-line used to compile target
Thus, if you change an option to $(CC) all affected files will
be re-compiled.
---- 3.10 Special Rules
+3.10 Special Rules
+------------------
Special rules are used when the kbuild infrastructure does
not provide the required support. A typical example is
@@ -379,43 +407,47 @@ more details, with real examples.
Two variables are used when defining special rules:
- $(src)
- $(src) is a relative path which points to the directory
- where the Makefile is located. Always use $(src) when
- referring to files located in the src tree.
+ $(src)
+ $(src) is a relative path which points to the directory
+ where the Makefile is located. Always use $(src) when
+ referring to files located in the src tree.
- $(obj)
- $(obj) is a relative path which points to the directory
- where the target is saved. Always use $(obj) when
- referring to generated files.
+ $(obj)
+ $(obj) is a relative path which points to the directory
+ where the target is saved. Always use $(obj) when
+ referring to generated files.
+
+ Example::
- Example:
#drivers/scsi/Makefile
$(obj)/53c8xx_d.h: $(src)/53c7,8xx.scr $(src)/script_asm.pl
$(CPP) -DCHIP=810 - < $< | ... $(src)/script_asm.pl
- This is a special rule, following the normal syntax
- required by make.
- The target file depends on two prerequisite files. References
- to the target file are prefixed with $(obj), references
- to prerequisites are referenced with $(src) (because they are not
- generated files).
-
- $(kecho)
- echoing information to user in a rule is often a good practice
- but when execution "make -s" one does not expect to see any output
- except for warnings/errors.
- To support this kbuild defines $(kecho) which will echo out the
- text following $(kecho) to stdout except if "make -s" is used.
-
- Example:
+ This is a special rule, following the normal syntax
+ required by make.
+
+ The target file depends on two prerequisite files. References
+ to the target file are prefixed with $(obj), references
+ to prerequisites are referenced with $(src) (because they are not
+ generated files).
+
+ $(kecho)
+ echoing information to user in a rule is often a good practice
+ but when execution "make -s" one does not expect to see any output
+ except for warnings/errors.
+ To support this kbuild defines $(kecho) which will echo out the
+ text following $(kecho) to stdout except if "make -s" is used.
+
+ Example::
+
#arch/blackfin/boot/Makefile
$(obj)/vmImage: $(obj)/vmlinux.gz
$(call if_changed,uimage)
@$(kecho) 'Kernel: $@ is ready'
---- 3.11 $(CC) support functions
+3.11 $(CC) support functions
+----------------------------
The kernel may be built with several different versions of
$(CC), each supporting a unique set of features and options.
@@ -425,10 +457,11 @@ more details, with real examples.
as-option
as-option is used to check if $(CC) -- when used to compile
- assembler (*.S) files -- supports the given option. An optional
+ assembler (`*.S`) files -- supports the given option. An optional
second option may be specified if the first option is not supported.
- Example:
+ Example::
+
#arch/sh/Makefile
cflags-y += $(call as-option,-Wa$(comma)-isa=$(isa-y),)
@@ -442,7 +475,8 @@ more details, with real examples.
supports the given option. An optional second option may be
specified if first option are not supported.
- Example:
+ Example::
+
#arch/x86/kernel/Makefile
vsyscall-flags += $(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
@@ -461,7 +495,8 @@ more details, with real examples.
cc-option is used to check if $(CC) supports a given option, and if
not supported to use an optional second option.
- Example:
+ Example::
+
#arch/x86/Makefile
cflags-y += $(call cc-option,-march=pentium-mmx,-march=i586)
@@ -475,7 +510,8 @@ more details, with real examples.
cc-option-yn is used to check if gcc supports a given option
and return 'y' if supported, otherwise 'n'.
- Example:
+ Example::
+
#arch/ppc/Makefile
biarch := $(call cc-option-yn, -m32)
aflags-$(biarch) += -a32
@@ -493,7 +529,8 @@ more details, with real examples.
because gcc 4.4 and later accept any unknown -Wno-* option and only
warn about it if there is another warning in the source file.
- Example:
+ Example::
+
KBUILD_CFLAGS += $(call cc-disable-warning, unused-but-set-variable)
In the above example, -Wno-unused-but-set-variable will be added to
@@ -504,7 +541,8 @@ more details, with real examples.
if version expression is true, or the fifth (if given) if the version
expression is false.
- Example:
+ Example::
+
#fs/reiserfs/Makefile
ccflags-y := $(call cc-ifversion, -lt, 0402, -O1)
@@ -529,7 +567,8 @@ more details, with real examples.
build (host arch is different from target arch). And if CROSS_COMPILE
is already set then leave it with the old value.
- Example:
+ Example::
+
#arch/m68k/Makefile
ifneq ($(SUBARCH),$(ARCH))
ifeq ($(CROSS_COMPILE),)
@@ -537,7 +576,8 @@ more details, with real examples.
endif
endif
---- 3.12 $(LD) support functions
+3.12 $(LD) support functions
+----------------------------
ld-option
ld-option is used to check if $(LD) supports the supplied option.
@@ -545,12 +585,14 @@ more details, with real examples.
The second argument is an optional option that can be used if the
first option is not supported by $(LD).
- Example:
+ Example::
+
#Makefile
LDFLAGS_vmlinux += $(call ld-option, -X)
-=== 4 Host Program support
+4 Host Program support
+======================
Kbuild supports building executables on the host for use during the
compilation stage.
@@ -564,21 +606,24 @@ This can be done in two ways. Either add the dependency in a rule,
or utilise the variable $(always).
Both possibilities are described in the following.
---- 4.1 Simple Host Program
+4.1 Simple Host Program
+-----------------------
In some cases there is a need to compile and run a program on the
computer where the build is running.
The following line tells kbuild that the program bin2hex shall be
built on the build host.
- Example:
+ Example::
+
hostprogs-y := bin2hex
Kbuild assumes in the above example that bin2hex is made from a single
c-source file named bin2hex.c located in the same directory as
the Makefile.
---- 4.2 Composite Host Programs
+4.2 Composite Host Programs
+---------------------------
Host programs can be made up based on composite objects.
The syntax used to define composite objects for host programs is
@@ -586,7 +631,8 @@ Both possibilities are described in the following.
$(<executable>-objs) lists all objects used to link the final
executable.
- Example:
+ Example::
+
#scripts/lxdialog/Makefile
hostprogs-y := lxdialog
lxdialog-objs := checklist.o lxdialog.o
@@ -594,16 +640,19 @@ Both possibilities are described in the following.
Objects with extension .o are compiled from the corresponding .c
files. In the above example, checklist.c is compiled to checklist.o
and lxdialog.c is compiled to lxdialog.o.
+
Finally, the two .o files are linked to the executable, lxdialog.
Note: The syntax <executable>-y is not permitted for host-programs.
---- 4.3 Using C++ for host programs
+4.3 Using C++ for host programs
+-------------------------------
kbuild offers support for host programs written in C++. This was
introduced solely to support kconfig, and is not recommended
for general use.
- Example:
+ Example::
+
#scripts/kconfig/Makefile
hostprogs-y := qconf
qconf-cxxobjs := qconf.o
@@ -614,13 +663,15 @@ Both possibilities are described in the following.
If qconf is composed of a mixture of .c and .cc files, then an
additional line can be used to identify this.
- Example:
+ Example::
+
#scripts/kconfig/Makefile
hostprogs-y := qconf
qconf-cxxobjs := qconf.o
qconf-objs := check.o
---- 4.4 Controlling compiler options for host programs
+4.4 Controlling compiler options for host programs
+--------------------------------------------------
When compiling host programs, it is possible to set specific flags.
The programs will always be compiled utilising $(HOSTCC) passed
@@ -628,27 +679,31 @@ Both possibilities are described in the following.
To set flags that will take effect for all host programs created
in that Makefile, use the variable HOST_EXTRACFLAGS.
- Example:
+ Example::
+
#scripts/lxdialog/Makefile
HOST_EXTRACFLAGS += -I/usr/include/ncurses
To set specific flags for a single file the following construction
is used:
- Example:
+ Example::
+
#arch/ppc64/boot/Makefile
HOSTCFLAGS_piggyback.o := -DKERNELBASE=$(KERNELBASE)
It is also possible to specify additional options to the linker.
- Example:
+ Example::
+
#scripts/kconfig/Makefile
HOSTLDLIBS_qconf := -L$(QTDIR)/lib
When linking qconf, it will be passed the extra option
"-L$(QTDIR)/lib".
---- 4.5 When host programs are actually built
+4.5 When host programs are actually built
+-----------------------------------------
Kbuild will only build host-programs when they are referenced
as a prerequisite.
@@ -656,7 +711,8 @@ Both possibilities are described in the following.
(1) List the prerequisite explicitly in a special rule.
- Example:
+ Example::
+
#drivers/pci/Makefile
hostprogs-y := gen-devlist
$(obj)/devlist.h: $(src)/pci.ids $(obj)/gen-devlist
@@ -667,11 +723,13 @@ Both possibilities are described in the following.
the host programs in special rules must be prefixed with $(obj).
(2) Use $(always)
+
When there is no suitable special rule, and the host program
shall be built when a makefile is entered, the $(always)
variable shall be used.
- Example:
+ Example::
+
#scripts/lxdialog/Makefile
hostprogs-y := lxdialog
always := $(hostprogs-y)
@@ -679,11 +737,13 @@ Both possibilities are described in the following.
This will tell kbuild to build lxdialog even if not referenced in
any rule.
---- 4.6 Using hostprogs-$(CONFIG_FOO)
+4.6 Using hostprogs-$(CONFIG_FOO)
+---------------------------------
A typical pattern in a Kbuild file looks like this:
- Example:
+ Example::
+
#scripts/Makefile
hostprogs-$(CONFIG_KALLSYMS) += kallsyms
@@ -693,7 +753,8 @@ Both possibilities are described in the following.
like hostprogs-y. But only hostprogs-y is recommended to be used
when no CONFIG symbols are involved.
-=== 5 Kbuild clean infrastructure
+5 Kbuild clean infrastructure
+=============================
"make clean" deletes most generated files in the obj tree where the kernel
is compiled. This includes generated files such as host programs.
@@ -705,7 +766,8 @@ generated by kbuild are deleted all over the kernel src tree when
Additional files can be specified in kbuild makefiles by use of $(clean-files).
- Example:
+ Example::
+
#lib/Makefile
clean-files := crc32table.h
@@ -715,7 +777,8 @@ Makefile, except if prefixed with $(objtree).
To delete a directory hierarchy use:
- Example:
+ Example::
+
#scripts/package/Makefile
clean-dirs := $(objtree)/debian/
@@ -725,7 +788,8 @@ subdirectories.
To exclude certain files from make clean, use the $(no-clean-files) variable.
This is only a special case used in the top level Kbuild file:
- Example:
+ Example::
+
#Kbuild
no-clean-files := $(bounds-file) $(offsets-file)
@@ -733,7 +797,8 @@ Usually kbuild descends down in subdirectories due to "obj-* := dir/",
but in the architecture makefiles where the kbuild infrastructure
is not sufficient this sometimes needs to be explicit.
- Example:
+ Example::
+
#arch/x86/boot/Makefile
subdir- := compressed/
@@ -743,7 +808,8 @@ directory compressed/ when "make clean" is executed.
To support the clean infrastructure in the Makefiles that build the
final bootimage there is an optional target named archclean:
- Example:
+ Example::
+
#arch/x86/Makefile
archclean:
$(Q)$(MAKE) $(clean)=arch/x86/boot
@@ -759,7 +825,8 @@ is not operational at that point.
Note 2: All directories listed in core-y, libs-y, drivers-y and net-y will
be visited during "make clean".
-=== 6 Architecture Makefiles
+6 Architecture Makefiles
+========================
The top level Makefile sets up the environment and does the preparation,
before starting to descend down in the individual directories.
@@ -770,6 +837,7 @@ To do so, arch/$(ARCH)/Makefile sets up a number of variables and defines
a few targets.
When kbuild executes, the following steps are followed (roughly):
+
1) Configuration of the kernel => produce .config
2) Store kernel version in include/linux/version.h
3) Updating all other prerequisites to the target prepare:
@@ -787,37 +855,45 @@ When kbuild executes, the following steps are followed (roughly):
- Preparing initrd images and the like
---- 6.1 Set variables to tweak the build to the architecture
+6.1 Set variables to tweak the build to the architecture
+--------------------------------------------------------
- LDFLAGS Generic $(LD) options
+ LDFLAGS
+ Generic $(LD) options
Flags used for all invocations of the linker.
Often specifying the emulation is sufficient.
- Example:
+ Example::
+
#arch/s390/Makefile
LDFLAGS := -m elf_s390
+
Note: ldflags-y can be used to further customise
the flags used. See chapter 3.7.
- LDFLAGS_vmlinux Options for $(LD) when linking vmlinux
+ LDFLAGS_vmlinux
+ Options for $(LD) when linking vmlinux
LDFLAGS_vmlinux is used to specify additional flags to pass to
the linker when linking the final vmlinux image.
LDFLAGS_vmlinux uses the LDFLAGS_$@ support.
- Example:
+ Example::
+
#arch/x86/Makefile
LDFLAGS_vmlinux := -e stext
- OBJCOPYFLAGS objcopy flags
+ OBJCOPYFLAGS
+ objcopy flags
When $(call if_changed,objcopy) is used to translate a .o file,
the flags specified in OBJCOPYFLAGS will be used.
$(call if_changed,objcopy) is often used to generate raw binaries on
vmlinux.
- Example:
+ Example::
+
#arch/s390/Makefile
OBJCOPYFLAGS := -O binary
@@ -828,30 +904,34 @@ When kbuild executes, the following steps are followed (roughly):
In this example, the binary $(obj)/image is a binary version of
vmlinux. The usage of $(call if_changed,xxx) will be described later.
- KBUILD_AFLAGS $(AS) assembler flags
+ KBUILD_AFLAGS
+ $(AS) assembler flags
Default value - see top level Makefile
Append or modify as required per architecture.
- Example:
+ Example::
+
#arch/sparc64/Makefile
KBUILD_AFLAGS += -m64 -mcpu=ultrasparc
- KBUILD_CFLAGS $(CC) compiler flags
+ KBUILD_CFLAGS
+ $(CC) compiler flags
Default value - see top level Makefile
Append or modify as required per architecture.
Often, the KBUILD_CFLAGS variable depends on the configuration.
- Example:
+ Example::
+
#arch/x86/boot/compressed/Makefile
cflags-$(CONFIG_X86_32) := -march=i386
cflags-$(CONFIG_X86_64) := -mcmodel=small
KBUILD_CFLAGS += $(cflags-y)
Many arch Makefiles dynamically run the target C compiler to
- probe supported options:
+ probe supported options::
#arch/x86/Makefile
@@ -867,32 +947,39 @@ When kbuild executes, the following steps are followed (roughly):
The first example utilises the trick that a config option expands
to 'y' when selected.
- KBUILD_AFLAGS_KERNEL $(AS) options specific for built-in
+ KBUILD_AFLAGS_KERNEL
+ $(AS) options specific for built-in
$(KBUILD_AFLAGS_KERNEL) contains extra C compiler flags used to compile
resident kernel code.
- KBUILD_AFLAGS_MODULE Options for $(AS) when building modules
+ KBUILD_AFLAGS_MODULE
+ Options for $(AS) when building modules
$(KBUILD_AFLAGS_MODULE) is used to add arch-specific options that
are used for $(AS).
+
From commandline AFLAGS_MODULE shall be used (see kbuild.txt).
- KBUILD_CFLAGS_KERNEL $(CC) options specific for built-in
+ KBUILD_CFLAGS_KERNEL
+ $(CC) options specific for built-in
$(KBUILD_CFLAGS_KERNEL) contains extra C compiler flags used to compile
resident kernel code.
- KBUILD_CFLAGS_MODULE Options for $(CC) when building modules
+ KBUILD_CFLAGS_MODULE
+ Options for $(CC) when building modules
$(KBUILD_CFLAGS_MODULE) is used to add arch-specific options that
are used for $(CC).
From commandline CFLAGS_MODULE shall be used (see kbuild.txt).
- KBUILD_LDFLAGS_MODULE Options for $(LD) when linking modules
+ KBUILD_LDFLAGS_MODULE
+ Options for $(LD) when linking modules
$(KBUILD_LDFLAGS_MODULE) is used to add arch-specific options
used when linking modules. This is often a linker script.
+
From commandline LDFLAGS_MODULE shall be used (see kbuild.txt).
KBUILD_ARFLAGS Options for $(AR) when creating archives
@@ -908,7 +995,8 @@ When kbuild executes, the following steps are followed (roughly):
means for an architecture to override the defaults.
---- 6.2 Add prerequisites to archheaders:
+6.2 Add prerequisites to archheaders
+------------------------------------
The archheaders: rule is used to generate header files that
may be installed into user space by "make header_install" or
@@ -921,13 +1009,15 @@ When kbuild executes, the following steps are followed (roughly):
architecture itself.
---- 6.3 Add prerequisites to archprepare:
+6.3 Add prerequisites to archprepare
+------------------------------------
The archprepare: rule is used to list prerequisites that need to be
built before starting to descend down in the subdirectories.
This is usually used for header files containing assembler constants.
- Example:
+ Example::
+
#arch/arm/Makefile
archprepare: maketools
@@ -937,7 +1027,8 @@ When kbuild executes, the following steps are followed (roughly):
generating offset header files.
---- 6.4 List directories to visit when descending
+6.4 List directories to visit when descending
+---------------------------------------------
An arch Makefile cooperates with the top Makefile to define variables
which specify how to build the vmlinux file. Note that there is no
@@ -945,28 +1036,34 @@ When kbuild executes, the following steps are followed (roughly):
machinery is all architecture-independent.
- head-y, init-y, core-y, libs-y, drivers-y, net-y
+ head-y, init-y, core-y, libs-y, drivers-y, net-y
+ $(head-y) lists objects to be linked first in vmlinux.
- $(head-y) lists objects to be linked first in vmlinux.
- $(libs-y) lists directories where a lib.a archive can be located.
- The rest list directories where a built-in.a object file can be
- located.
+ $(libs-y) lists directories where a lib.a archive can be located.
- $(init-y) objects will be located after $(head-y).
- Then the rest follows in this order:
- $(core-y), $(libs-y), $(drivers-y) and $(net-y).
+ The rest list directories where a built-in.a object file can be
+ located.
- The top level Makefile defines values for all generic directories,
- and arch/$(ARCH)/Makefile only adds architecture-specific directories.
+ $(init-y) objects will be located after $(head-y).
+
+ Then the rest follows in this order:
+
+ $(core-y), $(libs-y), $(drivers-y) and $(net-y).
+
+ The top level Makefile defines values for all generic directories,
+ and arch/$(ARCH)/Makefile only adds architecture-specific
+ directories.
+
+ Example::
- Example:
#arch/sparc64/Makefile
core-y += arch/sparc64/kernel/
libs-y += arch/sparc64/prom/ arch/sparc64/lib/
drivers-$(CONFIG_OPROFILE) += arch/sparc64/oprofile/
---- 6.5 Architecture-specific boot images
+6.5 Architecture-specific boot images
+-------------------------------------
An arch Makefile specifies goals that take the vmlinux file, compress
it, wrap it in bootstrapping code, and copy the resulting files
@@ -984,7 +1081,8 @@ When kbuild executes, the following steps are followed (roughly):
arch/$(ARCH)/Makefile, and use the full path when calling down
into the arch/$(ARCH)/boot/Makefile.
- Example:
+ Example::
+
#arch/x86/Makefile
boot := arch/x86/boot
bzImage: vmlinux
@@ -997,7 +1095,8 @@ When kbuild executes, the following steps are followed (roughly):
but executing "make help" will list all relevant targets.
To support this, $(archhelp) must be defined.
- Example:
+ Example::
+
#arch/x86/Makefile
define archhelp
echo '* bzImage - Image (arch/$(ARCH)/boot/bzImage)'
@@ -1011,25 +1110,30 @@ When kbuild executes, the following steps are followed (roughly):
Add a new prerequisite to all: to select a default goal different
from vmlinux.
- Example:
+ Example::
+
#arch/x86/Makefile
all: bzImage
When "make" is executed without arguments, bzImage will be built.
---- 6.6 Building non-kbuild targets
+6.6 Building non-kbuild targets
+-------------------------------
extra-y
-
extra-y specifies additional targets created in the current
- directory, in addition to any targets specified by obj-*.
+ directory, in addition to any targets specified by `obj-*`.
Listing all targets in extra-y is required for two purposes:
+
1) Enable kbuild to check changes in command lines
+
- When $(call if_changed,xxx) is used
+
2) kbuild knows what files to delete during "make clean"
- Example:
+ Example::
+
#arch/x86/kernel/Makefile
extra-y := head.o init_task.o
@@ -1037,16 +1141,17 @@ When kbuild executes, the following steps are followed (roughly):
shall be built, but shall not be linked as part of built-in.a.
---- 6.7 Commands useful for building a boot image
+6.7 Commands useful for building a boot image
+---------------------------------------------
- Kbuild provides a few macros that are useful when building a
- boot image.
+ Kbuild provides a few macros that are useful when building a
+ boot image.
if_changed
-
if_changed is the infrastructure used for the following commands.
- Usage:
+ Usage::
+
target: source(s) FORCE
$(call if_changed,ld/objcopy/gzip/...)
@@ -1064,12 +1169,16 @@ When kbuild executes, the following steps are followed (roughly):
Note: It is a typical mistake to forget the FORCE prerequisite.
Another common pitfall is that whitespace is sometimes
significant; for instance, the below will fail (note the extra space
- after the comma):
+ after the comma)::
+
target: source(s) FORCE
- #WRONG!# $(call if_changed, ld/objcopy/gzip/...)
- Note: if_changed should not be used more than once per target.
+ **WRONG!** $(call if_changed, ld/objcopy/gzip/...)
+
+ Note:
+ if_changed should not be used more than once per target.
It stores the executed command in a corresponding .cmd
+
file and multiple calls would result in overwrites and
unwanted results when the target is up to date and only the
tests on changed commands trigger execution of commands.
@@ -1077,7 +1186,8 @@ When kbuild executes, the following steps are followed (roughly):
ld
Link target. Often, LDFLAGS_$@ is used to set specific options to ld.
- Example:
+ Example::
+
#arch/x86/boot/Makefile
LDFLAGS_bootsect := -Ttext 0x0 -s --oformat binary
LDFLAGS_setup := -Ttext 0x0 -s --oformat binary -e begtext
@@ -1091,12 +1201,15 @@ When kbuild executes, the following steps are followed (roughly):
LDFLAGS_$@ syntax - one for each potential target.
$(targets) are assigned all potential targets, by which kbuild knows
the targets and will:
+
1) check for commandline changes
2) delete target during make clean
The ": %: %.o" part of the prerequisite is a shorthand that
frees us from listing the setup.o and bootsect.o files.
- Note: It is a common mistake to forget the "targets :=" assignment,
+
+ Note:
+ It is a common mistake to forget the "targets :=" assignment,
resulting in the target file being recompiled for no
obvious reason.
@@ -1108,7 +1221,8 @@ When kbuild executes, the following steps are followed (roughly):
gzip
Compress target. Use maximum compression to compress target.
- Example:
+ Example::
+
#arch/x86/boot/compressed/Makefile
$(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y) FORCE
$(call if_changed,gzip)
@@ -1119,26 +1233,30 @@ When kbuild executes, the following steps are followed (roughly):
in an init section in the image. Platform code *must* copy the
blob to non-init memory prior to calling unflatten_device_tree().
- To use this command, simply add *.dtb into obj-y or targets, or make
- some other target depend on %.dtb
+ To use this command, simply add `*.dtb` into obj-y or targets, or make
+ some other target depend on `%.dtb`
- A central rule exists to create $(obj)/%.dtb from $(src)/%.dts;
+ A central rule exists to create `$(obj)/%.dtb` from `$(src)/%.dts`;
architecture Makefiles do no need to explicitly write out that rule.
- Example:
+ Example::
+
targets += $(dtb-y)
DTC_FLAGS ?= -p 1024
---- 6.8 Custom kbuild commands
+6.8 Custom kbuild commands
+--------------------------
When kbuild is executing with KBUILD_VERBOSE=0, then only a shorthand
of a command is normally displayed.
To enable this behaviour for custom commands kbuild requires
- two variables to be set:
- quiet_cmd_<command> - what shall be echoed
- cmd_<command> - the command to execute
+ two variables to be set::
+
+ quiet_cmd_<command> - what shall be echoed
+ cmd_<command> - the command to execute
+
+ Example::
- Example:
#
quiet_cmd_image = BUILD $@
cmd_image = $(obj)/tools/build $(BUILDFLAGS) \
@@ -1149,9 +1267,9 @@ When kbuild executes, the following steps are followed (roughly):
$(call if_changed,image)
@echo 'Kernel: $@ is ready'
- When updating the $(obj)/bzImage target, the line
+ When updating the $(obj)/bzImage target, the line:
- BUILD arch/x86/boot/bzImage
+ BUILD arch/x86/boot/bzImage
will be displayed with "make KBUILD_VERBOSE=0".
@@ -1162,9 +1280,10 @@ When kbuild executes, the following steps are followed (roughly):
arch/$(ARCH)/kernel/vmlinux.lds is used.
The script is a preprocessed variant of the file vmlinux.lds.S
located in the same directory.
- kbuild knows .lds files and includes a rule *lds.S -> *lds.
+ kbuild knows .lds files and includes a rule `*lds.S` -> `*lds`.
+
+ Example::
- Example:
#arch/x86/kernel/Makefile
always := vmlinux.lds
@@ -1176,17 +1295,19 @@ When kbuild executes, the following steps are followed (roughly):
The assignment to $(CPPFLAGS_vmlinux.lds) tells kbuild to use the
specified options when building the target vmlinux.lds.
- When building the *.lds target, kbuild uses the variables:
- KBUILD_CPPFLAGS : Set in top-level Makefile
- cppflags-y : May be set in the kbuild makefile
- CPPFLAGS_$(@F) : Target-specific flags.
- Note that the full filename is used in this
- assignment.
+ When building the `*.lds` target, kbuild uses the variables::
- The kbuild infrastructure for *lds files is used in several
+ KBUILD_CPPFLAGS : Set in top-level Makefile
+ cppflags-y : May be set in the kbuild makefile
+ CPPFLAGS_$(@F) : Target-specific flags.
+ Note that the full filename is used in this
+ assignment.
+
+ The kbuild infrastructure for `*lds` files is used in several
architecture-specific files.
---- 6.10 Generic header files
+6.10 Generic header files
+-------------------------
The directory include/asm-generic contains the header files
that may be shared between individual architectures.
@@ -1194,7 +1315,8 @@ When kbuild executes, the following steps are followed (roughly):
to list the file in the Kbuild file.
See "7.2 generic-y" for further info on syntax etc.
---- 6.11 Post-link pass
+6.11 Post-link pass
+-------------------
If the file arch/xxx/Makefile.postlink exists, this makefile
will be invoked for post-link objects (vmlinux and modules.ko)
@@ -1209,15 +1331,17 @@ When kbuild executes, the following steps are followed (roughly):
For example, powerpc uses this to check relocation sanity of
the linked vmlinux file.
-=== 7 Kbuild syntax for exported headers
+7 Kbuild syntax for exported headers
+------------------------------------
The kernel includes a set of headers that is exported to userspace.
Many headers can be exported as-is but other headers require a
minimal pre-processing before they are ready for user-space.
The pre-processing does:
+
- drop kernel-specific annotations
- drop include of compiler.h
-- drop all sections that are kernel internal (guarded by ifdef __KERNEL__)
+- drop all sections that are kernel internal (guarded by `ifdef __KERNEL__`)
All headers under include/uapi/, include/generated/uapi/,
arch/<arch>/include/uapi/ and arch/<arch>/include/generated/uapi/
@@ -1227,40 +1351,45 @@ A Kbuild file may be defined under arch/<arch>/include/uapi/asm/ and
arch/<arch>/include/asm/ to list asm files coming from asm-generic.
See subsequent chapter for the syntax of the Kbuild file.
---- 7.1 no-export-headers
+7.1 no-export-headers
+---------------------
no-export-headers is essentially used by include/uapi/linux/Kbuild to
avoid exporting specific headers (e.g. kvm.h) on architectures that do
not support it. It should be avoided as much as possible.
---- 7.2 generic-y
+7.2 generic-y
+-------------
If an architecture uses a verbatim copy of a header from
include/asm-generic then this is listed in the file
arch/$(ARCH)/include/asm/Kbuild like this:
- Example:
+ Example::
+
#arch/x86/include/asm/Kbuild
generic-y += termios.h
generic-y += rtc.h
During the prepare phase of the build a wrapper include
- file is generated in the directory:
+ file is generated in the directory::
arch/$(ARCH)/include/generated/asm
When a header is exported where the architecture uses
the generic header a similar wrapper is generated as part
- of the set of exported headers in the directory:
+ of the set of exported headers in the directory::
usr/include/asm
The generated wrapper will in both cases look like the following:
- Example: termios.h
+ Example: termios.h::
+
#include <asm-generic/termios.h>
---- 7.3 generated-y
+7.3 generated-y
+---------------
If an architecture generates other header files alongside generic-y
wrappers, generated-y specifies them.
@@ -1268,11 +1397,13 @@ See subsequent chapter for the syntax of the Kbuild file.
This prevents them being treated as stale asm-generic wrappers and
removed.
- Example:
+ Example::
+
#arch/x86/include/asm/Kbuild
generated-y += syscalls_32.h
---- 7.4 mandatory-y
+7.4 mandatory-y
+---------------
mandatory-y is essentially used by include/(uapi/)asm-generic/Kbuild
to define the minimum set of ASM headers that all architectures must have.
@@ -1284,12 +1415,12 @@ See subsequent chapter for the syntax of the Kbuild file.
The convention is to list one subdir per line and
preferably in alphabetic order.
-=== 8 Kbuild Variables
+8 Kbuild Variables
+==================
The top Makefile exports the following variables:
VERSION, PATCHLEVEL, SUBLEVEL, EXTRAVERSION
-
These variables define the current kernel version. A few arch
Makefiles actually use these values directly; they should use
$(KERNELRELEASE) instead.
@@ -1303,32 +1434,28 @@ The top Makefile exports the following variables:
such as "-pre4", and is often blank.
KERNELRELEASE
-
$(KERNELRELEASE) is a single string such as "2.4.0-pre4", suitable
for constructing installation directory names or showing in
version strings. Some arch Makefiles use it for this purpose.
ARCH
-
This variable defines the target architecture, such as "i386",
"arm", or "sparc". Some kbuild Makefiles test $(ARCH) to
determine which files to compile.
By default, the top Makefile sets $(ARCH) to be the same as the
host system architecture. For a cross build, a user may
- override the value of $(ARCH) on the command line:
+ override the value of $(ARCH) on the command line::
make ARCH=m68k ...
INSTALL_PATH
-
This variable defines a place for the arch Makefiles to install
the resident kernel image and System.map file.
Use this for architecture-specific install targets.
INSTALL_MOD_PATH, MODLIB
-
$(INSTALL_MOD_PATH) specifies a prefix to $(MODLIB) for module
installation. This variable is not defined in the Makefile but
may be passed in by the user if desired.
@@ -1339,7 +1466,6 @@ The top Makefile exports the following variables:
override this value on the command line if desired.
INSTALL_MOD_STRIP
-
If this variable is specified, it will cause modules to be stripped
after they are installed. If INSTALL_MOD_STRIP is '1', then the
default option --strip-debug will be used. Otherwise, the
@@ -1347,7 +1473,8 @@ The top Makefile exports the following variables:
command.
-=== 9 Makefile language
+9 Makefile language
+===================
The kernel Makefiles are designed to be run with GNU Make. The Makefiles
use only the documented features of GNU Make, but they do use many
@@ -1366,18 +1493,17 @@ time the left-hand side is used.
There are some cases where "=" is appropriate. Usually, though, ":="
is the right choice.
-=== 10 Credits
+10 Credits
+==========
-Original version made by Michael Elizabeth Chastain, <mailto:mec@shout.net>
-Updates by Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de>
-Updates by Sam Ravnborg <sam@ravnborg.org>
-Language QA by Jan Engelhardt <jengelh@gmx.de>
+- Original version made by Michael Elizabeth Chastain, <mailto:mec@shout.net>
+- Updates by Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de>
+- Updates by Sam Ravnborg <sam@ravnborg.org>
+- Language QA by Jan Engelhardt <jengelh@gmx.de>
-=== 11 TODO
+11 TODO
+=======
- Describe how kbuild supports shipped files with _shipped.
- Generating offset header files.
- Add more variables to section 7?
-
-
-
diff --git a/Documentation/kbuild/modules.txt b/Documentation/kbuild/modules.rst
similarity index 84%
rename from Documentation/kbuild/modules.txt
rename to Documentation/kbuild/modules.rst
index 80295c613e37..24e763482650 100644
--- a/Documentation/kbuild/modules.txt
+++ b/Documentation/kbuild/modules.rst
@@ -1,8 +1,10 @@
+=========================
Building External Modules
+=========================
This document describes how to build an out-of-tree kernel module.
-=== Table of Contents
+.. Table of Contents
=== 1 Introduction
=== 2 How to Build External Modules
@@ -31,7 +33,8 @@ This document describes how to build an out-of-tree kernel module.
-=== 1. Introduction
+1. Introduction
+===============
"kbuild" is the build system used by the Linux kernel. Modules must use
kbuild to stay compatible with changes in the build infrastructure and
@@ -48,7 +51,8 @@ easily accomplished, and a complete example will be presented in
section 3.
-=== 2. How to Build External Modules
+2. How to Build External Modules
+================================
To build external modules, you must have a prebuilt kernel available
that contains the configuration and header files used in the build.
@@ -65,25 +69,27 @@ NOTE: "modules_prepare" will not build Module.symvers even if
CONFIG_MODVERSIONS is set; therefore, a full kernel build needs to be
executed to make module versioning work.
---- 2.1 Command Syntax
+2.1 Command Syntax
+==================
- The command to build an external module is:
+ The command to build an external module is::
$ make -C <path_to_kernel_src> M=$PWD
The kbuild system knows that an external module is being built
due to the "M=<dir>" option given in the command.
- To build against the running kernel use:
+ To build against the running kernel use::
$ make -C /lib/modules/`uname -r`/build M=$PWD
Then to install the module(s) just built, add the target
- "modules_install" to the command:
+ "modules_install" to the command::
$ make -C /lib/modules/`uname -r`/build M=$PWD modules_install
---- 2.2 Options
+2.2 Options
+===========
($KDIR refers to the path of the kernel source directory.)
@@ -100,7 +106,8 @@ executed to make module versioning work.
directory where the external module (kbuild file) is
located.
---- 2.3 Targets
+2.3 Targets
+===========
When building an external module, only a subset of the "make"
targets are available.
@@ -130,26 +137,29 @@ executed to make module versioning work.
help
List the available targets for external modules.
---- 2.4 Building Separate Files
+2.4 Building Separate Files
+===========================
It is possible to build single files that are part of a module.
This works equally well for the kernel, a module, and even for
external modules.
- Example (The module foo.ko, consist of bar.o and baz.o):
+ Example (The module foo.ko, consist of bar.o and baz.o)::
+
make -C $KDIR M=$PWD bar.lst
make -C $KDIR M=$PWD baz.o
make -C $KDIR M=$PWD foo.ko
make -C $KDIR M=$PWD ./
-=== 3. Creating a Kbuild File for an External Module
+3. Creating a Kbuild File for an External Module
+================================================
In the last section we saw the command to build a module for the
running kernel. The module is not actually built, however, because a
build file is required. Contained in this file will be the name of
the module(s) being built, along with the list of requisite source
-files. The file may be as simple as a single line:
+files. The file may be as simple as a single line::
obj-m := <module_name>.o
@@ -157,15 +167,15 @@ The kbuild system will build <module_name>.o from <module_name>.c,
and, after linking, will result in the kernel module <module_name>.ko.
The above line can be put in either a "Kbuild" file or a "Makefile."
When the module is built from multiple sources, an additional line is
-needed listing the files:
+needed listing the files::
<module_name>-y := <src1>.o <src2>.o ...
NOTE: Further documentation describing the syntax used by kbuild is
-located in Documentation/kbuild/makefiles.txt.
+located in Documentation/kbuild/makefiles.rst.
The examples below demonstrate how to create a build file for the
-module 8123.ko, which is built from the following files:
+module 8123.ko, which is built from the following files::
8123_if.c
8123_if.h
@@ -181,7 +191,8 @@ module 8123.ko, which is built from the following files:
but should be filtered out from kbuild due to possible name
clashes.
- Example 1:
+ Example 1::
+
--> filename: Makefile
ifneq ($(KERNELRELEASE),)
# kbuild part of makefile
@@ -209,14 +220,16 @@ module 8123.ko, which is built from the following files:
line; the second pass is by the kbuild system, which is
initiated by the parameterized "make" in the default target.
---- 3.2 Separate Kbuild File and Makefile
+3.2 Separate Kbuild File and Makefile
+-------------------------------------
In newer versions of the kernel, kbuild will first look for a
file named "Kbuild," and only if that is not found, will it
then look for a makefile. Utilizing a "Kbuild" file allows us
to split up the makefile from example 1 into two files:
- Example 2:
+ Example 2::
+
--> filename: Kbuild
obj-m := 8123.o
8123-y := 8123_if.o 8123_pci.o 8123_bin.o
@@ -238,7 +251,8 @@ module 8123.ko, which is built from the following files:
The next example shows a backward compatible version.
- Example 3:
+ Example 3::
+
--> filename: Kbuild
obj-m := 8123.o
8123-y := 8123_if.o 8123_pci.o 8123_bin.o
@@ -266,7 +280,8 @@ module 8123.ko, which is built from the following files:
makefiles, to be used when the "make" and kbuild parts are
split into separate files.
---- 3.3 Binary Blobs
+3.3 Binary Blobs
+----------------
Some external modules need to include an object file as a blob.
kbuild has support for this, but requires the blob file to be
@@ -277,7 +292,7 @@ module 8123.ko, which is built from the following files:
Throughout this section, 8123_bin.o_shipped has been used to
build the kernel module 8123.ko; it has been included as
- 8123_bin.o.
+ 8123_bin.o::
8123-y := 8123_if.o 8123_pci.o 8123_bin.o
@@ -285,11 +300,12 @@ module 8123.ko, which is built from the following files:
files and the binary file, kbuild will pick up different rules
when creating the object file for the module.
---- 3.4 Building Multiple Modules
+3.4 Building Multiple Modules
+=============================
kbuild supports building multiple modules with a single build
file. For example, if you wanted to build two modules, foo.ko
- and bar.ko, the kbuild lines would be:
+ and bar.ko, the kbuild lines would be::
obj-m := foo.o bar.o
foo-y := <foo_srcs>
@@ -298,7 +314,8 @@ module 8123.ko, which is built from the following files:
It is that simple!
-=== 4. Include Files
+4. Include Files
+================
Within the kernel, header files are kept in standard locations
according to the following rule:
@@ -310,22 +327,25 @@ according to the following rule:
of the kernel that are located in different directories, then
the file is placed in include/linux/.
- NOTE: There are two notable exceptions to this rule: larger
- subsystems have their own directory under include/, such as
- include/scsi; and architecture specific headers are located
- under arch/$(ARCH)/include/.
+ NOTE:
+ There are two notable exceptions to this rule: larger
+ subsystems have their own directory under include/, such as
+ include/scsi; and architecture specific headers are located
+ under arch/$(ARCH)/include/.
---- 4.1 Kernel Includes
+4.1 Kernel Includes
+-------------------
To include a header file located under include/linux/, simply
- use:
+ use::
#include <linux/module.h>
kbuild will add options to "gcc" so the relevant directories
are searched.
---- 4.2 Single Subdirectory
+4.2 Single Subdirectory
+-----------------------
External modules tend to place header files in a separate
include/ directory where their source is located, although this
@@ -334,7 +354,7 @@ according to the following rule:
Using the example from section 3, if we moved 8123_if.h to a
subdirectory named include, the resulting kbuild file would
- look like:
+ look like::
--> filename: Kbuild
obj-m := 8123.o
@@ -346,23 +366,24 @@ according to the following rule:
the path. This is a limitation of kbuild: there must be no
space present.
---- 4.3 Several Subdirectories
+4.3 Several Subdirectories
+--------------------------
kbuild can handle files that are spread over several directories.
- Consider the following example:
+ Consider the following example::
- .
- |__ src
- | |__ complex_main.c
- | |__ hal
- | |__ hardwareif.c
- | |__ include
- | |__ hardwareif.h
- |__ include
- |__ complex.h
+ .
+ |__ src
+ | |__ complex_main.c
+ | |__ hal
+ | |__ hardwareif.c
+ | |__ include
+ | |__ hardwareif.h
+ |__ include
+ |__ complex.h
To build the module complex.ko, we then need the following
- kbuild file:
+ kbuild file::
--> filename: Kbuild
obj-m := complex.o
@@ -385,7 +406,8 @@ according to the following rule:
file is located.
-=== 5. Module Installation
+5. Module Installation
+======================
Modules which are included in the kernel are installed in the
directory:
@@ -396,11 +418,12 @@ And external modules are installed in:
/lib/modules/$(KERNELRELEASE)/extra/
---- 5.1 INSTALL_MOD_PATH
+5.1 INSTALL_MOD_PATH
+--------------------
Above are the default directories but as always some level of
customization is possible. A prefix can be added to the
- installation path using the variable INSTALL_MOD_PATH:
+ installation path using the variable INSTALL_MOD_PATH::
$ make INSTALL_MOD_PATH=/frodo modules_install
=> Install dir: /frodo/lib/modules/$(KERNELRELEASE)/kernel/
@@ -410,20 +433,22 @@ And external modules are installed in:
calling "make." This has effect when installing both in-tree
and out-of-tree modules.
---- 5.2 INSTALL_MOD_DIR
+5.2 INSTALL_MOD_DIR
+-------------------
External modules are by default installed to a directory under
/lib/modules/$(KERNELRELEASE)/extra/, but you may wish to
locate modules for a specific functionality in a separate
directory. For this purpose, use INSTALL_MOD_DIR to specify an
- alternative name to "extra."
+ alternative name to "extra."::
$ make INSTALL_MOD_DIR=gandalf -C $KDIR \
M=$PWD modules_install
=> Install dir: /lib/modules/$(KERNELRELEASE)/gandalf/
-=== 6. Module Versioning
+6. Module Versioning
+====================
Module versioning is enabled by the CONFIG_MODVERSIONS tag, and is used
as a simple ABI consistency check. A CRC value of the full prototype
@@ -435,14 +460,16 @@ module.
Module.symvers contains a list of all exported symbols from a kernel
build.
---- 6.1 Symbols From the Kernel (vmlinux + modules)
+6.1 Symbols From the Kernel (vmlinux + modules)
+-----------------------------------------------
During a kernel build, a file named Module.symvers will be
generated. Module.symvers contains all exported symbols from
the kernel and compiled modules. For each symbol, the
corresponding CRC value is also stored.
- The syntax of the Module.symvers file is:
+ The syntax of the Module.symvers file is::
+
<CRC> <Symbol> <module>
0x2d036834 scsi_remove_host drivers/scsi/scsi_mod
@@ -451,10 +478,12 @@ build.
would read 0x00000000.
Module.symvers serves two purposes:
+
1) It lists all exported symbols from vmlinux and all modules.
2) It lists the CRC if CONFIG_MODVERSIONS is enabled.
---- 6.2 Symbols and External Modules
+6.2 Symbols and External Modules
+--------------------------------
When building an external module, the build system needs access
to the symbols from the kernel to check if all external symbols
@@ -481,17 +510,17 @@ build.
foo.ko needs symbols from bar.ko, you can use a
common top-level kbuild file so both modules are
compiled in the same build. Consider the following
- directory layout:
+ directory layout::
- ./foo/ <= contains foo.ko
- ./bar/ <= contains bar.ko
+ ./foo/ <= contains foo.ko
+ ./bar/ <= contains bar.ko
- The top-level kbuild file would then look like:
+ The top-level kbuild file would then look like::
- #./Kbuild (or ./Makefile):
- obj-y := foo/ bar/
+ #./Kbuild (or ./Makefile):
+ obj-y := foo/ bar/
- And executing
+ And executing::
$ make -C $KDIR M=$PWD
@@ -518,14 +547,16 @@ build.
initialization of its symbol tables.
-=== 7. Tips & Tricks
+7. Tips & Tricks
+================
---- 7.1 Testing for CONFIG_FOO_BAR
+7.1 Testing for CONFIG_FOO_BAR
+------------------------------
- Modules often need to check for certain CONFIG_ options to
+ Modules often need to check for certain `CONFIG_` options to
decide if a specific feature is included in the module. In
- kbuild this is done by referencing the CONFIG_ variable
- directly.
+ kbuild this is done by referencing the `CONFIG_` variable
+ directly::
#fs/ext2/Makefile
obj-$(CONFIG_EXT2_FS) += ext2.o
@@ -534,8 +565,7 @@ build.
ext2-$(CONFIG_EXT2_FS_XATTR) += xattr.o
External modules have traditionally used "grep" to check for
- specific CONFIG_ settings directly in .config. This usage is
+ specific `CONFIG_` settings directly in .config. This usage is
broken. As introduced before, external modules should use
kbuild for building and can therefore use the same methods as
- in-tree modules when testing for CONFIG_ definitions.
-
+ in-tree modules when testing for `CONFIG_` definitions.
diff --git a/Documentation/kernel-hacking/hacking.rst b/Documentation/kernel-hacking/hacking.rst
index d824e4feaff3..5891a701a159 100644
--- a/Documentation/kernel-hacking/hacking.rst
+++ b/Documentation/kernel-hacking/hacking.rst
@@ -718,7 +718,7 @@ make a neat patch, there's administrative work to be done:
- Usually you want a configuration option for your kernel hack. Edit
``Kconfig`` in the appropriate directory. The Config language is
simple to use by cut and paste, and there's complete documentation in
- ``Documentation/kbuild/kconfig-language.txt``.
+ ``Documentation/kbuild/kconfig-language.rst``.
In your description of the option, make sure you address both the
expert user and the user who knows nothing about your feature.
@@ -728,7 +728,7 @@ make a neat patch, there's administrative work to be done:
- Edit the ``Makefile``: the CONFIG variables are exported here so you
can usually just add a "obj-$(CONFIG_xxx) += xxx.o" line. The syntax
- is documented in ``Documentation/kbuild/makefiles.txt``.
+ is documented in ``Documentation/kbuild/makefiles.rst``.
- Put yourself in ``CREDITS`` if you've done something noteworthy,
usually beyond a single file (your name should be at the top of the
diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index fa864a51e6ea..f4a2198187f9 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -686,7 +686,7 @@ filesystems) should advertise this prominently in their prompt string::
...
For full documentation on the configuration files, see the file
-Documentation/kbuild/kconfig-language.txt.
+Documentation/kbuild/kconfig-language.rst.
11) Data structures
diff --git a/Documentation/process/submit-checklist.rst b/Documentation/process/submit-checklist.rst
index c88867b173d9..365efc9e4aa8 100644
--- a/Documentation/process/submit-checklist.rst
+++ b/Documentation/process/submit-checklist.rst
@@ -39,7 +39,7 @@ and elsewhere regarding submitting Linux kernel patches.
6) Any new or modified ``CONFIG`` options do not muck up the config menu and
default to off unless they meet the exception criteria documented in
- ``Documentation/kbuild/kconfig-language.txt`` Menu attributes: default value.
+ ``Documentation/kbuild/kconfig-language.rst`` Menu attributes: default value.
7) All new ``Kconfig`` options have help text.
diff --git a/Documentation/translations/it_IT/kernel-hacking/hacking.rst b/Documentation/translations/it_IT/kernel-hacking/hacking.rst
index 7178e517af0a..24c592852bf1 100644
--- a/Documentation/translations/it_IT/kernel-hacking/hacking.rst
+++ b/Documentation/translations/it_IT/kernel-hacking/hacking.rst
@@ -755,7 +755,7 @@ anche per avere patch pulite, c'è del lavoro amministrativo da fare:
- Solitamente vorrete un'opzione di configurazione per la vostra modifica
al kernel. Modificate ``Kconfig`` nella cartella giusta. Il linguaggio
Config è facile con copia ed incolla, e c'è una completa documentazione
- nel file ``Documentation/kbuild/kconfig-language.txt``.
+ nel file ``Documentation/kbuild/kconfig-language.rst``.
Nella descrizione della vostra opzione, assicuratevi di parlare sia agli
utenti esperti sia agli utente che non sanno nulla del vostro lavoro.
@@ -767,7 +767,7 @@ anche per avere patch pulite, c'è del lavoro amministrativo da fare:
- Modificate il file ``Makefile``: le variabili CONFIG sono esportate qui,
quindi potete solitamente aggiungere una riga come la seguete
"obj-$(CONFIG_xxx) += xxx.o". La sintassi è documentata nel file
- ``Documentation/kbuild/makefiles.txt``.
+ ``Documentation/kbuild/makefiles.rst``.
- Aggiungete voi stessi in ``CREDITS`` se avete fatto qualcosa di notevole,
solitamente qualcosa che supera il singolo file (comunque il vostro nome
diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst b/Documentation/translations/it_IT/process/submit-checklist.rst
index 70e65a7b3620..ea74cae958d7 100644
--- a/Documentation/translations/it_IT/process/submit-checklist.rst
+++ b/Documentation/translations/it_IT/process/submit-checklist.rst
@@ -43,7 +43,7 @@ sottomissione delle patch, in particolare
6) Le opzioni ``CONFIG``, nuove o modificate, non scombussolano il menu
di configurazione e sono preimpostate come disabilitate a meno che non
- soddisfino i criteri descritti in ``Documentation/kbuild/kconfig-language.txt``
+ soddisfino i criteri descritti in ``Documentation/kbuild/kconfig-language.rst``
alla punto "Voci di menu: valori predefiniti".
7) Tutte le nuove opzioni ``Kconfig`` hanno un messaggio di aiuto.
diff --git a/Documentation/translations/zh_CN/process/coding-style.rst b/Documentation/translations/zh_CN/process/coding-style.rst
index 5479c591c2f7..4f6237392e65 100644
--- a/Documentation/translations/zh_CN/process/coding-style.rst
+++ b/Documentation/translations/zh_CN/process/coding-style.rst
@@ -599,7 +599,7 @@ Documentation/doc-guide/ 和 scripts/kernel-doc 以获得详细信息。
depends on ADFS_FS
...
-要查看配置文件的完整文档,请看 Documentation/kbuild/kconfig-language.txt。
+要查看配置文件的完整文档,请看 Documentation/kbuild/kconfig-language.rst。
11) 数据结构
diff --git a/Documentation/translations/zh_CN/process/submit-checklist.rst b/Documentation/translations/zh_CN/process/submit-checklist.rst
index 89061aa8fdbe..f4785d2b0491 100644
--- a/Documentation/translations/zh_CN/process/submit-checklist.rst
+++ b/Documentation/translations/zh_CN/process/submit-checklist.rst
@@ -38,7 +38,7 @@ Linux内核补丁提交清单
违规行为。
6) 任何新的或修改过的 ``CONFIG`` 选项都不会弄脏配置菜单,并默认为关闭,除非
- 它们符合 ``Documentation/kbuild/kconfig-language.txt`` 中记录的异常条件,
+ 它们符合 ``Documentation/kbuild/kconfig-language.rst`` 中记录的异常条件,
菜单属性:默认值.
7) 所有新的 ``kconfig`` 选项都有帮助文本。
diff --git a/Kconfig b/Kconfig
index 48a80beab685..a73589e6a72b 100644
--- a/Kconfig
+++ b/Kconfig
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
#
# For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
#
mainmenu "Linux/$(ARCH) $(KERNELVERSION) Kernel Configuration"
diff --git a/arch/arc/plat-eznps/Kconfig b/arch/arc/plat-eznps/Kconfig
index 2eaecfb063a7..a376a50d3fea 100644
--- a/arch/arc/plat-eznps/Kconfig
+++ b/arch/arc/plat-eznps/Kconfig
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
#
# For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
#
menuconfig ARC_PLAT_EZNPS
diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
index eeb0471268a0..c5e6b70e1510 100644
--- a/arch/c6x/Kconfig
+++ b/arch/c6x/Kconfig
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
#
# For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
#
config C6X
diff --git a/arch/microblaze/Kconfig.debug b/arch/microblaze/Kconfig.debug
index dc2e3c45e8a2..40e4bfbc8f5a 100644
--- a/arch/microblaze/Kconfig.debug
+++ b/arch/microblaze/Kconfig.debug
@@ -1,5 +1,5 @@
# For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
config TRACE_IRQFLAGS_SUPPORT
def_bool y
diff --git a/arch/microblaze/Kconfig.platform b/arch/microblaze/Kconfig.platform
index 7361974417dc..139a5e592af7 100644
--- a/arch/microblaze/Kconfig.platform
+++ b/arch/microblaze/Kconfig.platform
@@ -1,5 +1,5 @@
# For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
#
# Platform selection Kconfig menu for MicroBlaze targets
#
diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index 55559ca0efe4..8ae70fca2010 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -1,6 +1,6 @@
#
# For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
#
config NDS32
diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig
index 7cfb20555b10..bf326f0edd2f 100644
--- a/arch/openrisc/Kconfig
+++ b/arch/openrisc/Kconfig
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
#
# For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
#
config OPENRISC
diff --git a/arch/powerpc/sysdev/Kconfig b/arch/powerpc/sysdev/Kconfig
index e0dbec780fe9..d23288c4abf6 100644
--- a/arch/powerpc/sysdev/Kconfig
+++ b/arch/powerpc/sysdev/Kconfig
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
# For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
#
config PPC4xx_PCI_EXPRESS
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e66745decea1..355c2dab07b1 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -1,6 +1,6 @@
#
# For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
#
config 64BIT
diff --git a/drivers/auxdisplay/Kconfig b/drivers/auxdisplay/Kconfig
index c52c738e554a..dd61fdd400f0 100644
--- a/drivers/auxdisplay/Kconfig
+++ b/drivers/auxdisplay/Kconfig
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
#
# For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
#
# Auxiliary display drivers configuration.
#
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 11fda9eb2466..14b0c99dc843 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -1,6 +1,6 @@
#
# For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
#
menu "Firmware Drivers"
diff --git a/drivers/mtd/devices/Kconfig b/drivers/mtd/devices/Kconfig
index 7fcdaf6c279d..84ed5e02b484 100644
--- a/drivers/mtd/devices/Kconfig
+++ b/drivers/mtd/devices/Kconfig
@@ -47,7 +47,7 @@ config MTD_MS02NV
If you want to compile this driver as a module ( = code which can be
inserted in and removed from the running kernel whenever you want),
- say M here and read <file:Documentation/kbuild/modules.txt>.
+ say M here and read <file:Documentation/kbuild/modules.rst>.
The module will be called ms02-nv.
config MTD_DATAFLASH
diff --git a/drivers/net/ethernet/smsc/Kconfig b/drivers/net/ethernet/smsc/Kconfig
index 79612060d0ba..61aeb39361c5 100644
--- a/drivers/net/ethernet/smsc/Kconfig
+++ b/drivers/net/ethernet/smsc/Kconfig
@@ -48,7 +48,7 @@ config SMC91X
This driver is also available as a module ( = code which can be
inserted in and removed from the running kernel whenever you want).
The module will be called smc91x. If you want to compile it as a
- module, say M here and read <file:Documentation/kbuild/modules.txt>.
+ module, say M here and read <file:Documentation/kbuild/modules.rst>.
config PCMCIA_SMC91C92
tristate "SMC 91Cxx PCMCIA support"
@@ -85,7 +85,7 @@ config SMC911X
This driver is also available as a module. The module will be
called smc911x. If you want to compile it as a module, say M
- here and read <file:Documentation/kbuild/modules.txt>
+ here and read <file:Documentation/kbuild/modules.rst>
config SMSC911X
tristate "SMSC LAN911x/LAN921x families embedded ethernet support"
@@ -120,6 +120,6 @@ config SMSC9420
This driver is also available as a module. The module will be
called smsc9420. If you want to compile it as a module, say M
- here and read <file:Documentation/kbuild/modules.txt>
+ here and read <file:Documentation/kbuild/modules.rst>
endif # NET_VENDOR_SMSC
diff --git a/drivers/net/wireless/intel/iwlegacy/Kconfig b/drivers/net/wireless/intel/iwlegacy/Kconfig
index fb919727b8bb..9a6a7e9cae1c 100644
--- a/drivers/net/wireless/intel/iwlegacy/Kconfig
+++ b/drivers/net/wireless/intel/iwlegacy/Kconfig
@@ -31,7 +31,7 @@ config IWL4965
If you want to compile the driver as a module ( = code which can be
inserted in and removed from the running kernel whenever you want),
- say M here and read <file:Documentation/kbuild/modules.txt>. The
+ say M here and read <file:Documentation/kbuild/modules.rst>. The
module will be called iwl4965.
config IWL3945
@@ -57,7 +57,7 @@ config IWL3945
If you want to compile the driver as a module ( = code which can be
inserted in and removed from the running kernel whenever you want),
- say M here and read <file:Documentation/kbuild/modules.txt>. The
+ say M here and read <file:Documentation/kbuild/modules.rst>. The
module will be called iwl3945.
menu "iwl3945 / iwl4965 Debugging Options"
diff --git a/drivers/net/wireless/intel/iwlwifi/Kconfig b/drivers/net/wireless/intel/iwlwifi/Kconfig
index 83d5bceea08f..3368dbb0bd7e 100644
--- a/drivers/net/wireless/intel/iwlwifi/Kconfig
+++ b/drivers/net/wireless/intel/iwlwifi/Kconfig
@@ -39,7 +39,7 @@ config IWLWIFI
If you want to compile the driver as a module ( = code which can be
inserted in and removed from the running kernel whenever you want),
- say M here and read <file:Documentation/kbuild/modules.txt>. The
+ say M here and read <file:Documentation/kbuild/modules.rst>. The
module will be called iwlwifi.
if IWLWIFI
diff --git a/drivers/parport/Kconfig b/drivers/parport/Kconfig
index a97f4eada60b..bd89ba53ad0e 100644
--- a/drivers/parport/Kconfig
+++ b/drivers/parport/Kconfig
@@ -1,6 +1,6 @@
#
# For a description of the syntax of this configuration file,
-# see Documentation/kbuild/kconfig-language.txt.
+# see Documentation/kbuild/kconfig-language.rst.
#
# Parport configuration.
#
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index d528018e6fa8..2482344cfa80 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -182,7 +182,7 @@ config CHR_DEV_SCH
If you want to compile this as a module ( = code which can be
inserted in and removed from the running kernel whenever you want),
- say M here and read <file:Documentation/kbuild/modules.txt> and
+ say M here and read <file:Documentation/kbuild/modules.rst> and
<file:Documentation/scsi/scsi.txt>. The module will be called ch.o.
If unsure, say N.
@@ -1473,7 +1473,7 @@ config ZFCP
This driver is also available as a module. This module will be
called zfcp. If you want to compile it as a module, say M here
- and read <file:Documentation/kbuild/modules.txt>.
+ and read <file:Documentation/kbuild/modules.rst>.
config SCSI_PMCRAID
tristate "PMC SIERRA Linux MaxRAID adapter support"
diff --git a/drivers/staging/sm750fb/Kconfig b/drivers/staging/sm750fb/Kconfig
index fb5a086bf9b1..8c0d8a873d5b 100644
--- a/drivers/staging/sm750fb/Kconfig
+++ b/drivers/staging/sm750fb/Kconfig
@@ -12,4 +12,4 @@ config FB_SM750
This driver is also available as a module. The module will be
called sm750fb. If you want to compile it as a module, say M
- here and read <file:Documentation/kbuild/modules.txt>.
+ here and read <file:Documentation/kbuild/modules.rst>.
diff --git a/drivers/usb/misc/Kconfig b/drivers/usb/misc/Kconfig
index be04c117fe80..bf43e6bb8a6c 100644
--- a/drivers/usb/misc/Kconfig
+++ b/drivers/usb/misc/Kconfig
@@ -16,7 +16,7 @@ config USB_EMI62
This code is also available as a module ( = code which can be
inserted in and removed from the running kernel whenever you want).
The module will be called audio. If you want to compile it as a
- module, say M here and read <file:Documentation/kbuild/modules.txt>.
+ module, say M here and read <file:Documentation/kbuild/modules.rst>.
config USB_EMI26
tristate "EMI 2|6 USB Audio interface support"
@@ -67,7 +67,7 @@ config USB_LEGOTOWER
inserted in and removed from the running kernel whenever you want).
The module will be called legousbtower. If you want to compile it as
a module, say M here and read
- <file:Documentation/kbuild/modules.txt>.
+ <file:Documentation/kbuild/modules.rst>.
config USB_LCD
tristate "USB LCD driver support"
diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
index 9fdae1d10cbc..adf91bff213a 100644
--- a/drivers/video/fbdev/Kconfig
+++ b/drivers/video/fbdev/Kconfig
@@ -290,7 +290,7 @@ config FB_ARMCLCD
If you want to compile this as a module (=code which can be
inserted into and removed from the running kernel), say M
- here and read <file:Documentation/kbuild/modules.txt>. The module
+ here and read <file:Documentation/kbuild/modules.rst>. The module
will be called amba-clcd.
config FB_ACORN
@@ -1755,7 +1755,7 @@ config FB_PXA
This driver is also available as a module ( = code which can be
inserted and removed from the running kernel whenever you want). The
module will be called pxafb. If you want to compile it as a module,
- say M here and read <file:Documentation/kbuild/modules.txt>.
+ say M here and read <file:Documentation/kbuild/modules.rst>.
If unsure, say N.
@@ -1836,7 +1836,7 @@ config FB_W100
This driver is also available as a module ( = code which can be
inserted and removed from the running kernel whenever you want). The
module will be called w100fb. If you want to compile it as a module,
- say M here and read <file:Documentation/kbuild/modules.txt>.
+ say M here and read <file:Documentation/kbuild/modules.rst>.
If unsure, say N.
@@ -1865,7 +1865,7 @@ config FB_TMIO
This driver is also available as a module ( = code which can be
inserted and removed from the running kernel whenever you want). The
module will be called tmiofb. If you want to compile it as a module,
- say M here and read <file:Documentation/kbuild/modules.txt>.
+ say M here and read <file:Documentation/kbuild/modules.rst>.
If unsure, say N.
@@ -1911,7 +1911,7 @@ config FB_S3C2410
This driver is also available as a module ( = code which can be
inserted and removed from the running kernel whenever you want). The
module will be called s3c2410fb. If you want to compile it as a module,
- say M here and read <file:Documentation/kbuild/modules.txt>.
+ say M here and read <file:Documentation/kbuild/modules.rst>.
If unsure, say N.
config FB_S3C2410_DEBUG
@@ -1948,7 +1948,7 @@ config FB_SM501
This driver is also available as a module ( = code which can be
inserted and removed from the running kernel whenever you want). The
module will be called sm501fb. If you want to compile it as a module,
- say M here and read <file:Documentation/kbuild/modules.txt>.
+ say M here and read <file:Documentation/kbuild/modules.rst>.
If unsure, say N.
@@ -2292,7 +2292,7 @@ config FB_SM712
This driver is also available as a module. The module will be
called sm712fb. If you want to compile it as a module, say M
- here and read <file:Documentation/kbuild/modules.txt>.
+ here and read <file:Documentation/kbuild/modules.rst>.
source "drivers/video/fbdev/omap/Kconfig"
source "drivers/video/fbdev/omap2/Kconfig"
diff --git a/net/bridge/netfilter/Kconfig b/net/bridge/netfilter/Kconfig
index 9a0159aebe1a..1643984f7064 100644
--- a/net/bridge/netfilter/Kconfig
+++ b/net/bridge/netfilter/Kconfig
@@ -113,7 +113,7 @@ config BRIDGE_EBT_LIMIT
equivalent of the iptables limit match.
If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+ <file:Documentation/kbuild/modules.rst>. If unsure, say `N'.
config BRIDGE_EBT_MARK
tristate "ebt: mark filter support"
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 1412b029f37f..a104bdec97e7 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -307,7 +307,7 @@ config IP_NF_RAW
and OUTPUT chains.
If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+ <file:Documentation/kbuild/modules.rst>. If unsure, say `N'.
# security table for MAC policy
config IP_NF_SECURITY
diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index 086fc669279e..c47b2197997c 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -240,7 +240,7 @@ config IP6_NF_RAW
and OUTPUT chains.
If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+ <file:Documentation/kbuild/modules.rst>. If unsure, say `N'.
# security table for MAC policy
config IP6_NF_SECURITY
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 02b281d3c167..1f4a4d9f80b4 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -1055,7 +1055,7 @@ config NETFILTER_XT_TARGET_TRACE
the tables, chains, rules.
If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+ <file:Documentation/kbuild/modules.rst>. If unsure, say `N'.
config NETFILTER_XT_TARGET_SECMARK
tristate '"SECMARK" target support'
@@ -1114,7 +1114,7 @@ config NETFILTER_XT_MATCH_ADDRTYPE
eg. UNICAST, LOCAL, BROADCAST, ...
If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+ <file:Documentation/kbuild/modules.rst>. If unsure, say `N'.
config NETFILTER_XT_MATCH_BPF
tristate '"bpf" match support'
@@ -1159,7 +1159,7 @@ config NETFILTER_XT_MATCH_COMMENT
comments in your iptables ruleset.
If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+ <file:Documentation/kbuild/modules.rst>. If unsure, say `N'.
config NETFILTER_XT_MATCH_CONNBYTES
tristate '"connbytes" per-connection counter match support'
@@ -1170,7 +1170,7 @@ config NETFILTER_XT_MATCH_CONNBYTES
number of bytes and/or packets for each direction within a connection.
If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+ <file:Documentation/kbuild/modules.rst>. If unsure, say `N'.
config NETFILTER_XT_MATCH_CONNLABEL
tristate '"connlabel" match support'
@@ -1236,7 +1236,7 @@ config NETFILTER_XT_MATCH_DCCP
and DCCP flags.
If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+ <file:Documentation/kbuild/modules.rst>. If unsure, say `N'.
config NETFILTER_XT_MATCH_DEVGROUP
tristate '"devgroup" match support'
@@ -1472,7 +1472,7 @@ config NETFILTER_XT_MATCH_QUOTA
byte counter.
If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+ <file:Documentation/kbuild/modules.rst>. If unsure, say `N'.
config NETFILTER_XT_MATCH_RATEEST
tristate '"rateest" match support'
@@ -1496,7 +1496,7 @@ config NETFILTER_XT_MATCH_REALM
in tc world.
If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+ <file:Documentation/kbuild/modules.rst>. If unsure, say `N'.
config NETFILTER_XT_MATCH_RECENT
tristate '"recent" match support'
@@ -1518,7 +1518,7 @@ config NETFILTER_XT_MATCH_SCTP
and SCTP chunk types.
If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+ <file:Documentation/kbuild/modules.rst>. If unsure, say `N'.
config NETFILTER_XT_MATCH_SOCKET
tristate '"socket" match support'
diff --git a/net/tipc/Kconfig b/net/tipc/Kconfig
index e450212121d2..7d70db9ae120 100644
--- a/net/tipc/Kconfig
+++ b/net/tipc/Kconfig
@@ -16,7 +16,7 @@ menuconfig TIPC
This protocol support is also available as a module ( = code which
can be inserted in and removed from the running kernel whenever you
want). The module will be called tipc. If you want to compile it
- as a module, say M here and read <file:Documentation/kbuild/modules.txt>.
+ as a module, say M here and read <file:Documentation/kbuild/modules.rst>.
If in doubt, say N.
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 7484b9d8272f..9cf6a00f5711 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -67,7 +67,7 @@ endef
######
# gcc support functions
-# See documentation in Documentation/kbuild/makefiles.txt
+# See documentation in Documentation/kbuild/makefiles.rst
# cc-cross-prefix
# Usage: CROSS_COMPILE := $(call cc-cross-prefix, m68k-linux-gnu- m68k-linux-)
@@ -217,7 +217,7 @@ objectify = $(foreach o,$(1),$(if $(filter /%,$(o)),$(o),$(obj)/$(o)))
# if_changed_dep - as if_changed, but uses fixdep to reveal dependencies
# including used config symbols
# if_changed_rule - as if_changed but execute rule instead
-# See Documentation/kbuild/makefiles.txt for more info
+# See Documentation/kbuild/makefiles.rst for more info
ifneq ($(KBUILD_NOCMDDEP),1)
# Check if both arguments are the same including their order. Result is empty
diff --git a/scripts/Makefile.host b/scripts/Makefile.host
index 73b804197fca..d2e0b739b075 100644
--- a/scripts/Makefile.host
+++ b/scripts/Makefile.host
@@ -6,7 +6,7 @@
#
# Both C and C++ are supported, but preferred language is C for such utilities.
#
-# Sample syntax (see Documentation/kbuild/makefiles.txt for reference)
+# Sample syntax (see Documentation/kbuild/makefiles.rst for reference)
# hostprogs-y := bin2hex
# Will compile bin2hex.c and create an executable named bin2hex
#
diff --git a/scripts/kconfig/symbol.c b/scripts/kconfig/symbol.c
index 1f9266dadedf..09fd6fa18e1a 100644
--- a/scripts/kconfig/symbol.c
+++ b/scripts/kconfig/symbol.c
@@ -1114,7 +1114,7 @@ static void sym_check_print_recursive(struct symbol *last_sym)
}
fprintf(stderr,
- "For a resolution refer to Documentation/kbuild/kconfig-language.txt\n"
+ "For a resolution refer to Documentation/kbuild/kconfig-language.rst\n"
"subsection \"Kconfig recursive dependency limitations\"\n"
"\n");
diff --git a/scripts/kconfig/tests/err_recursive_dep/expected_stderr b/scripts/kconfig/tests/err_recursive_dep/expected_stderr
index 84679b104655..c9f4abf9a791 100644
--- a/scripts/kconfig/tests/err_recursive_dep/expected_stderr
+++ b/scripts/kconfig/tests/err_recursive_dep/expected_stderr
@@ -1,38 +1,38 @@
Kconfig:11:error: recursive dependency detected!
Kconfig:11: symbol B is selected by B
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
subsection "Kconfig recursive dependency limitations"
Kconfig:5:error: recursive dependency detected!
Kconfig:5: symbol A depends on A
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
subsection "Kconfig recursive dependency limitations"
Kconfig:17:error: recursive dependency detected!
Kconfig:17: symbol C1 depends on C2
Kconfig:21: symbol C2 depends on C1
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
subsection "Kconfig recursive dependency limitations"
Kconfig:32:error: recursive dependency detected!
Kconfig:32: symbol D2 is selected by D1
Kconfig:27: symbol D1 depends on D2
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
subsection "Kconfig recursive dependency limitations"
Kconfig:37:error: recursive dependency detected!
Kconfig:37: symbol E1 depends on E2
Kconfig:42: symbol E2 is implied by E1
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
subsection "Kconfig recursive dependency limitations"
Kconfig:60:error: recursive dependency detected!
Kconfig:60: symbol G depends on G
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
subsection "Kconfig recursive dependency limitations"
Kconfig:51:error: recursive dependency detected!
Kconfig:51: symbol F2 depends on F1
Kconfig:49: symbol F1 default value contains F2
-For a resolution refer to Documentation/kbuild/kconfig-language.txt
+For a resolution refer to Documentation/kbuild/kconfig-language.rst
subsection "Kconfig recursive dependency limitations"
diff --git a/sound/oss/dmasound/Kconfig b/sound/oss/dmasound/Kconfig
index f456574a964d..d8f1b726681d 100644
--- a/sound/oss/dmasound/Kconfig
+++ b/sound/oss/dmasound/Kconfig
@@ -10,7 +10,7 @@ config DMASOUND_ATARI
This driver is also available as a module ( = code which can be
inserted in and removed from the running kernel whenever you
want). If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>.
+ <file:Documentation/kbuild/modules.rst>.
config DMASOUND_PAULA
tristate "Amiga DMA sound support"
@@ -24,7 +24,7 @@ config DMASOUND_PAULA
This driver is also available as a module ( = code which can be
inserted in and removed from the running kernel whenever you
want). If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>.
+ <file:Documentation/kbuild/modules.rst>.
config DMASOUND_Q40
tristate "Q40 sound support"
@@ -38,7 +38,7 @@ config DMASOUND_Q40
This driver is also available as a module ( = code which can be
inserted in and removed from the running kernel whenever you
want). If you want to compile it as a module, say M here and read
- <file:Documentation/kbuild/modules.txt>.
+ <file:Documentation/kbuild/modules.rst>.
config DMASOUND
tristate
--
2.20.1
^ permalink raw reply related
* Re: [PATCH v12 20/31] mm: introduce vma reference counter
From: Jerome Glisse @ 2019-04-22 20:36 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190416134522.17540-21-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:11PM +0200, Laurent Dufour wrote:
> The final goal is to be able to use a VMA structure without holding the
> mmap_sem and to be sure that the structure will not be freed in our back.
>
> The lockless use of the VMA will be done through RCU protection and thus a
> dedicated freeing service is required to manage it asynchronously.
>
> As reported in a 2010's thread [1], this may impact file handling when a
> file is still referenced while the mapping is no more there. As the final
> goal is to handle anonymous VMA in a speculative way and not file backed
> mapping, we could close and free the file pointer in a synchronous way, as
> soon as we are guaranteed to not use it without holding the mmap_sem. For
> sanity reason, in a minimal effort, the vm_file file pointer is unset once
> the file pointer is put.
>
> [1] https://lore.kernel.org/linux-mm/20100104182429.833180340@chello.nl/
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Using kref would have been better from my POV even with RCU freeing
but anyway:
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> ---
> include/linux/mm.h | 4 ++++
> include/linux/mm_types.h | 3 +++
> mm/internal.h | 27 +++++++++++++++++++++++++++
> mm/mmap.c | 13 +++++++++----
> 4 files changed, 43 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f14b2c9ddfd4..f761a9c65c74 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -529,6 +529,9 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
> vma->vm_mm = mm;
> vma->vm_ops = &dummy_vm_ops;
> INIT_LIST_HEAD(&vma->anon_vma_chain);
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> + atomic_set(&vma->vm_ref_count, 1);
> +#endif
> }
>
> static inline void vma_set_anonymous(struct vm_area_struct *vma)
> @@ -1418,6 +1421,7 @@ static inline void INIT_VMA(struct vm_area_struct *vma)
> INIT_LIST_HEAD(&vma->anon_vma_chain);
> #ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> seqcount_init(&vma->vm_sequence);
> + atomic_set(&vma->vm_ref_count, 1);
> #endif
> }
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 24b3f8ce9e42..6a6159e11a3f 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -285,6 +285,9 @@ struct vm_area_struct {
> /* linked list of VM areas per task, sorted by address */
> struct vm_area_struct *vm_next, *vm_prev;
>
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> + atomic_t vm_ref_count;
> +#endif
> struct rb_node vm_rb;
>
> /*
> diff --git a/mm/internal.h b/mm/internal.h
> index 9eeaf2b95166..302382bed406 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -40,6 +40,33 @@ void page_writeback_init(void);
>
> vm_fault_t do_swap_page(struct vm_fault *vmf);
>
> +
> +extern void __free_vma(struct vm_area_struct *vma);
> +
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +static inline void get_vma(struct vm_area_struct *vma)
> +{
> + atomic_inc(&vma->vm_ref_count);
> +}
> +
> +static inline void put_vma(struct vm_area_struct *vma)
> +{
> + if (atomic_dec_and_test(&vma->vm_ref_count))
> + __free_vma(vma);
> +}
> +
> +#else
> +
> +static inline void get_vma(struct vm_area_struct *vma)
> +{
> +}
> +
> +static inline void put_vma(struct vm_area_struct *vma)
> +{
> + __free_vma(vma);
> +}
> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
> +
> void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
> unsigned long floor, unsigned long ceiling);
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f7f6027a7dff..c106440dcae7 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -188,6 +188,12 @@ static inline void mm_write_sequnlock(struct mm_struct *mm)
> }
> #endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
>
> +void __free_vma(struct vm_area_struct *vma)
> +{
> + mpol_put(vma_policy(vma));
> + vm_area_free(vma);
> +}
> +
> /*
> * Close a vm structure and free it, returning the next.
> */
> @@ -200,8 +206,8 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
> vma->vm_ops->close(vma);
> if (vma->vm_file)
> fput(vma->vm_file);
> - mpol_put(vma_policy(vma));
> - vm_area_free(vma);
> + vma->vm_file = NULL;
> + put_vma(vma);
> return next;
> }
>
> @@ -990,8 +996,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
> if (next->anon_vma)
> anon_vma_merge(vma, next);
> mm->map_count--;
> - mpol_put(vma_policy(next));
> - vm_area_free(next);
> + put_vma(next);
> /*
> * In mprotect's case 6 (see comments on vma_merge),
> * we must remove another next too. It would clutter
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v12 19/31] mm: protect the RB tree with a sequence lock
From: Jerome Glisse @ 2019-04-22 20:33 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190416134522.17540-20-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:10PM +0200, Laurent Dufour wrote:
> Introducing a per mm_struct seqlock, mm_seq field, to protect the changes
> made in the MM RB tree. This allows to walk the RB tree without grabbing
> the mmap_sem, and on the walk is done to double check that sequence counter
> was stable during the walk.
>
> The mm seqlock is held while inserting and removing entries into the MM RB
> tree. Later in this series, it will be check when looking for a VMA
> without holding the mmap_sem.
>
> This is based on the initial work from Peter Zijlstra:
> https://lore.kernel.org/linux-mm/20100104182813.479668508@chello.nl/
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> ---
> include/linux/mm_types.h | 3 +++
> kernel/fork.c | 3 +++
> mm/init-mm.c | 3 +++
> mm/mmap.c | 48 +++++++++++++++++++++++++++++++---------
> 4 files changed, 46 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index e78f72eb2576..24b3f8ce9e42 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -358,6 +358,9 @@ struct mm_struct {
> struct {
> struct vm_area_struct *mmap; /* list of VMAs */
> struct rb_root mm_rb;
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> + seqlock_t mm_seq;
> +#endif
> u64 vmacache_seqnum; /* per-thread vmacache */
> #ifdef CONFIG_MMU
> unsigned long (*get_unmapped_area) (struct file *filp,
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 2992d2c95256..3a1739197ebc 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1008,6 +1008,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
> mm->mmap = NULL;
> mm->mm_rb = RB_ROOT;
> mm->vmacache_seqnum = 0;
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> + seqlock_init(&mm->mm_seq);
> +#endif
> atomic_set(&mm->mm_users, 1);
> atomic_set(&mm->mm_count, 1);
> init_rwsem(&mm->mmap_sem);
> diff --git a/mm/init-mm.c b/mm/init-mm.c
> index a787a319211e..69346b883a4e 100644
> --- a/mm/init-mm.c
> +++ b/mm/init-mm.c
> @@ -27,6 +27,9 @@
> */
> struct mm_struct init_mm = {
> .mm_rb = RB_ROOT,
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> + .mm_seq = __SEQLOCK_UNLOCKED(init_mm.mm_seq),
> +#endif
> .pgd = swapper_pg_dir,
> .mm_users = ATOMIC_INIT(2),
> .mm_count = ATOMIC_INIT(1),
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 13460b38b0fb..f7f6027a7dff 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -170,6 +170,24 @@ void unlink_file_vma(struct vm_area_struct *vma)
> }
> }
>
> +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
> +static inline void mm_write_seqlock(struct mm_struct *mm)
> +{
> + write_seqlock(&mm->mm_seq);
> +}
> +static inline void mm_write_sequnlock(struct mm_struct *mm)
> +{
> + write_sequnlock(&mm->mm_seq);
> +}
> +#else
> +static inline void mm_write_seqlock(struct mm_struct *mm)
> +{
> +}
> +static inline void mm_write_sequnlock(struct mm_struct *mm)
> +{
> +}
> +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */
> +
> /*
> * Close a vm structure and free it, returning the next.
> */
> @@ -445,26 +463,32 @@ static void vma_gap_update(struct vm_area_struct *vma)
> }
>
> static inline void vma_rb_insert(struct vm_area_struct *vma,
> - struct rb_root *root)
> + struct mm_struct *mm)
> {
> + struct rb_root *root = &mm->mm_rb;
> +
> /* All rb_subtree_gap values must be consistent prior to insertion */
> validate_mm_rb(root, NULL);
>
> rb_insert_augmented(&vma->vm_rb, root, &vma_gap_callbacks);
> }
>
> -static void __vma_rb_erase(struct vm_area_struct *vma, struct rb_root *root)
> +static void __vma_rb_erase(struct vm_area_struct *vma, struct mm_struct *mm)
> {
> + struct rb_root *root = &mm->mm_rb;
> +
> /*
> * Note rb_erase_augmented is a fairly large inline function,
> * so make sure we instantiate it only once with our desired
> * augmented rbtree callbacks.
> */
> + mm_write_seqlock(mm);
> rb_erase_augmented(&vma->vm_rb, root, &vma_gap_callbacks);
> + mm_write_sequnlock(mm); /* wmb */
> }
>
> static __always_inline void vma_rb_erase_ignore(struct vm_area_struct *vma,
> - struct rb_root *root,
> + struct mm_struct *mm,
> struct vm_area_struct *ignore)
> {
> /*
> @@ -472,21 +496,21 @@ static __always_inline void vma_rb_erase_ignore(struct vm_area_struct *vma,
> * with the possible exception of the "next" vma being erased if
> * next->vm_start was reduced.
> */
> - validate_mm_rb(root, ignore);
> + validate_mm_rb(&mm->mm_rb, ignore);
>
> - __vma_rb_erase(vma, root);
> + __vma_rb_erase(vma, mm);
> }
>
> static __always_inline void vma_rb_erase(struct vm_area_struct *vma,
> - struct rb_root *root)
> + struct mm_struct *mm)
> {
> /*
> * All rb_subtree_gap values must be consistent prior to erase,
> * with the possible exception of the vma being erased.
> */
> - validate_mm_rb(root, vma);
> + validate_mm_rb(&mm->mm_rb, vma);
>
> - __vma_rb_erase(vma, root);
> + __vma_rb_erase(vma, mm);
> }
>
> /*
> @@ -601,10 +625,12 @@ void __vma_link_rb(struct mm_struct *mm, struct vm_area_struct *vma,
> * immediately update the gap to the correct value. Finally we
> * rebalance the rbtree after all augmented values have been set.
> */
> + mm_write_seqlock(mm);
> rb_link_node(&vma->vm_rb, rb_parent, rb_link);
> vma->rb_subtree_gap = 0;
> vma_gap_update(vma);
> - vma_rb_insert(vma, &mm->mm_rb);
> + vma_rb_insert(vma, mm);
> + mm_write_sequnlock(mm);
> }
>
> static void __vma_link_file(struct vm_area_struct *vma)
> @@ -680,7 +706,7 @@ static __always_inline void __vma_unlink_common(struct mm_struct *mm,
> {
> struct vm_area_struct *next;
>
> - vma_rb_erase_ignore(vma, &mm->mm_rb, ignore);
> + vma_rb_erase_ignore(vma, mm, ignore);
> next = vma->vm_next;
> if (has_prev)
> prev->vm_next = next;
> @@ -2674,7 +2700,7 @@ detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma,
> insertion_point = (prev ? &prev->vm_next : &mm->mmap);
> vma->vm_prev = NULL;
> do {
> - vma_rb_erase(vma, &mm->mm_rb);
> + vma_rb_erase(vma, mm);
> mm->map_count--;
> tail_vma = vma;
> vma = vma->vm_next;
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v12 18/31] mm: protect against PTE changes done by dup_mmap()
From: Jerome Glisse @ 2019-04-22 20:32 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, Vinayak Menon, vinayak menon, akpm, Tim Chen,
haren
In-Reply-To: <20190416134522.17540-19-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:09PM +0200, Laurent Dufour wrote:
> Vinayak Menon and Ganesh Mahendran reported that the following scenario may
> lead to thread being blocked due to data corruption:
>
> CPU 1 CPU 2 CPU 3
> Process 1, Process 1, Process 1,
> Thread A Thread B Thread C
>
> while (1) { while (1) { while(1) {
> pthread_mutex_lock(l) pthread_mutex_lock(l) fork
> pthread_mutex_unlock(l) pthread_mutex_unlock(l) }
> } }
>
> In the details this happens because :
>
> CPU 1 CPU 2 CPU 3
> fork()
> copy_pte_range()
> set PTE rdonly
> got to next VMA...
> . PTE is seen rdonly PTE still writable
> . thread is writing to page
> . -> page fault
> . copy the page Thread writes to page
> . . -> no page fault
> . update the PTE
> . flush TLB for that PTE
> flush TLB PTE are now rdonly
Should the fork be on CPU3 to be consistant with the top thing (just to
make it easier to read and go from one to the other as thread can move
from one CPU to another).
>
> So the write done by the CPU 3 is interfering with the page copy operation
> done by CPU 2, leading to the data corruption.
>
> To avoid this we mark all the VMA involved in the COW mechanism as changing
> by calling vm_write_begin(). This ensures that the speculative page fault
> handler will not try to handle a fault on these pages.
> The marker is set until the TLB is flushed, ensuring that all the CPUs will
> now see the PTE as not writable.
> Once the TLB is flush, the marker is removed by calling vm_write_end().
>
> The variable last is used to keep tracked of the latest VMA marked to
> handle the error path where part of the VMA may have been marked.
>
> Since multiple VMA from the same mm may have the sequence count increased
> during this process, the use of the vm_raw_write_begin/end() is required to
> avoid lockdep false warning messages.
>
> Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
> Reported-by: Vinayak Menon <vinmenon@codeaurora.org>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
A minor comment (see below)
Reviewed-by: Jérome Glisse <jglisse@redhat.com>
> ---
> kernel/fork.c | 30 ++++++++++++++++++++++++++++--
> 1 file changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index f8dae021c2e5..2992d2c95256 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -462,7 +462,7 @@ EXPORT_SYMBOL(free_task);
> static __latent_entropy int dup_mmap(struct mm_struct *mm,
> struct mm_struct *oldmm)
> {
> - struct vm_area_struct *mpnt, *tmp, *prev, **pprev;
> + struct vm_area_struct *mpnt, *tmp, *prev, **pprev, *last = NULL;
> struct rb_node **rb_link, *rb_parent;
> int retval;
> unsigned long charge;
> @@ -581,8 +581,18 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> rb_parent = &tmp->vm_rb;
>
> mm->map_count++;
> - if (!(tmp->vm_flags & VM_WIPEONFORK))
> + if (!(tmp->vm_flags & VM_WIPEONFORK)) {
> + if (IS_ENABLED(CONFIG_SPECULATIVE_PAGE_FAULT)) {
> + /*
> + * Mark this VMA as changing to prevent the
> + * speculative page fault hanlder to process
> + * it until the TLB are flushed below.
> + */
> + last = mpnt;
> + vm_raw_write_begin(mpnt);
> + }
> retval = copy_page_range(mm, oldmm, mpnt);
> + }
>
> if (tmp->vm_ops && tmp->vm_ops->open)
> tmp->vm_ops->open(tmp);
> @@ -595,6 +605,22 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> out:
> up_write(&mm->mmap_sem);
> flush_tlb_mm(oldmm);
> +
> + if (IS_ENABLED(CONFIG_SPECULATIVE_PAGE_FAULT)) {
You do not need to check for CONFIG_SPECULATIVE_PAGE_FAULT as last
will always be NULL if it is not enabled but maybe the compiler will
miss the optimization opportunity if you only have the for() loop
below.
> + /*
> + * Since the TLB has been flush, we can safely unmark the
> + * copied VMAs and allows the speculative page fault handler to
> + * process them again.
> + * Walk back the VMA list from the last marked VMA.
> + */
> + for (; last; last = last->vm_prev) {
> + if (last->vm_flags & VM_DONTCOPY)
> + continue;
> + if (!(last->vm_flags & VM_WIPEONFORK))
> + vm_raw_write_end(last);
> + }
> + }
> +
> up_write(&oldmm->mmap_sem);
> dup_userfaultfd_complete(&uf);
> fail_uprobe_end:
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v12 17/31] mm: introduce __page_add_new_anon_rmap()
From: Jerome Glisse @ 2019-04-22 20:18 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190416134522.17540-18-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:08PM +0200, Laurent Dufour wrote:
> When dealing with speculative page fault handler, we may race with VMA
> being split or merged. In this case the vma->vm_start and vm->vm_end
> fields may not match the address the page fault is occurring.
>
> This can only happens when the VMA is split but in that case, the
> anon_vma pointer of the new VMA will be the same as the original one,
> because in __split_vma the new->anon_vma is set to src->anon_vma when
> *new = *vma.
>
> So even if the VMA boundaries are not correct, the anon_vma pointer is
> still valid.
>
> If the VMA has been merged, then the VMA in which it has been merged
> must have the same anon_vma pointer otherwise the merge can't be done.
>
> So in all the case we know that the anon_vma is valid, since we have
> checked before starting the speculative page fault that the anon_vma
> pointer is valid for this VMA and since there is an anon_vma this
> means that at one time a page has been backed and that before the VMA
> is cleaned, the page table lock would have to be grab to clean the
> PTE, and the anon_vma field is checked once the PTE is locked.
>
> This patch introduce a new __page_add_new_anon_rmap() service which
> doesn't check for the VMA boundaries, and create a new inline one
> which do the check.
>
> When called from a page fault handler, if this is not a speculative one,
> there is a guarantee that vm_start and vm_end match the faulting address,
> so this check is useless. In the context of the speculative page fault
> handler, this check may be wrong but anon_vma is still valid as explained
> above.
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> ---
> include/linux/rmap.h | 12 ++++++++++--
> mm/memory.c | 8 ++++----
> mm/rmap.c | 5 ++---
> 3 files changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 988d176472df..a5d282573093 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -174,8 +174,16 @@ void page_add_anon_rmap(struct page *, struct vm_area_struct *,
> unsigned long, bool);
> void do_page_add_anon_rmap(struct page *, struct vm_area_struct *,
> unsigned long, int);
> -void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
> - unsigned long, bool);
> +void __page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
> + unsigned long, bool);
> +static inline void page_add_new_anon_rmap(struct page *page,
> + struct vm_area_struct *vma,
> + unsigned long address, bool compound)
> +{
> + VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> + __page_add_new_anon_rmap(page, vma, address, compound);
> +}
> +
> void page_add_file_rmap(struct page *, bool);
> void page_remove_rmap(struct page *, bool);
>
> diff --git a/mm/memory.c b/mm/memory.c
> index be93f2c8ebe0..46f877b6abea 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2347,7 +2347,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> * thread doing COW.
> */
> ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
> - page_add_new_anon_rmap(new_page, vma, vmf->address, false);
> + __page_add_new_anon_rmap(new_page, vma, vmf->address, false);
> mem_cgroup_commit_charge(new_page, memcg, false, false);
> __lru_cache_add_active_or_unevictable(new_page, vmf->vma_flags);
> /*
> @@ -2897,7 +2897,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>
> /* ksm created a completely new copy */
> if (unlikely(page != swapcache && swapcache)) {
> - page_add_new_anon_rmap(page, vma, vmf->address, false);
> + __page_add_new_anon_rmap(page, vma, vmf->address, false);
> mem_cgroup_commit_charge(page, memcg, false, false);
> __lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
> } else {
> @@ -3049,7 +3049,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> }
>
> inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
> - page_add_new_anon_rmap(page, vma, vmf->address, false);
> + __page_add_new_anon_rmap(page, vma, vmf->address, false);
> mem_cgroup_commit_charge(page, memcg, false, false);
> __lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
> setpte:
> @@ -3328,7 +3328,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
> /* copy-on-write page */
> if (write && !(vmf->vma_flags & VM_SHARED)) {
> inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
> - page_add_new_anon_rmap(page, vma, vmf->address, false);
> + __page_add_new_anon_rmap(page, vma, vmf->address, false);
> mem_cgroup_commit_charge(page, memcg, false, false);
> __lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
> } else {
> diff --git a/mm/rmap.c b/mm/rmap.c
> index e5dfe2ae6b0d..2148e8ce6e34 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1140,7 +1140,7 @@ void do_page_add_anon_rmap(struct page *page,
> }
>
> /**
> - * page_add_new_anon_rmap - add pte mapping to a new anonymous page
> + * __page_add_new_anon_rmap - add pte mapping to a new anonymous page
> * @page: the page to add the mapping to
> * @vma: the vm area in which the mapping is added
> * @address: the user virtual address mapped
> @@ -1150,12 +1150,11 @@ void do_page_add_anon_rmap(struct page *page,
> * This means the inc-and-test can be bypassed.
> * Page does not have to be locked.
> */
> -void page_add_new_anon_rmap(struct page *page,
> +void __page_add_new_anon_rmap(struct page *page,
> struct vm_area_struct *vma, unsigned long address, bool compound)
> {
> int nr = compound ? hpage_nr_pages(page) : 1;
>
> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
> __SetPageSwapBacked(page);
> if (compound) {
> VM_BUG_ON_PAGE(!PageTransHuge(page), page);
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v12 16/31] mm: introduce __vm_normal_page()
From: Jerome Glisse @ 2019-04-22 20:15 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190416134522.17540-17-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:07PM +0200, Laurent Dufour wrote:
> When dealing with the speculative fault path we should use the VMA's field
> cached value stored in the vm_fault structure.
>
> Currently vm_normal_page() is using the pointer to the VMA to fetch the
> vm_flags value. This patch provides a new __vm_normal_page() which is
> receiving the vm_flags flags value as parameter.
>
> Note: The speculative path is turned on for architecture providing support
> for special PTE flag. So only the first block of vm_normal_page is used
> during the speculative path.
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> ---
> include/linux/mm.h | 18 +++++++++++++++---
> mm/memory.c | 21 ++++++++++++---------
> 2 files changed, 27 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f465bb2b049e..f14b2c9ddfd4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1421,9 +1421,21 @@ static inline void INIT_VMA(struct vm_area_struct *vma)
> #endif
> }
>
> -struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> - pte_t pte, bool with_public_device);
> -#define vm_normal_page(vma, addr, pte) _vm_normal_page(vma, addr, pte, false)
> +struct page *__vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> + pte_t pte, bool with_public_device,
> + unsigned long vma_flags);
> +static inline struct page *_vm_normal_page(struct vm_area_struct *vma,
> + unsigned long addr, pte_t pte,
> + bool with_public_device)
> +{
> + return __vm_normal_page(vma, addr, pte, with_public_device,
> + vma->vm_flags);
> +}
> +static inline struct page *vm_normal_page(struct vm_area_struct *vma,
> + unsigned long addr, pte_t pte)
> +{
> + return _vm_normal_page(vma, addr, pte, false);
> +}
>
> struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
> pmd_t pmd);
> diff --git a/mm/memory.c b/mm/memory.c
> index 85ec5ce5c0a8..be93f2c8ebe0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -533,7 +533,8 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
> }
>
> /*
> - * vm_normal_page -- This function gets the "struct page" associated with a pte.
> + * __vm_normal_page -- This function gets the "struct page" associated with
> + * a pte.
> *
> * "Special" mappings do not wish to be associated with a "struct page" (either
> * it doesn't exist, or it exists but they don't want to touch it). In this
> @@ -574,8 +575,9 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
> * PFNMAP mappings in order to support COWable mappings.
> *
> */
> -struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> - pte_t pte, bool with_public_device)
> +struct page *__vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> + pte_t pte, bool with_public_device,
> + unsigned long vma_flags)
> {
> unsigned long pfn = pte_pfn(pte);
>
> @@ -584,7 +586,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> goto check_pfn;
> if (vma->vm_ops && vma->vm_ops->find_special_page)
> return vma->vm_ops->find_special_page(vma, addr);
> - if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> + if (vma_flags & (VM_PFNMAP | VM_MIXEDMAP))
> return NULL;
> if (is_zero_pfn(pfn))
> return NULL;
> @@ -620,8 +622,8 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>
> /* !CONFIG_ARCH_HAS_PTE_SPECIAL case follows: */
>
> - if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
> - if (vma->vm_flags & VM_MIXEDMAP) {
> + if (unlikely(vma_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
> + if (vma_flags & VM_MIXEDMAP) {
> if (!pfn_valid(pfn))
> return NULL;
> goto out;
> @@ -630,7 +632,7 @@ struct page *_vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> off = (addr - vma->vm_start) >> PAGE_SHIFT;
> if (pfn == vma->vm_pgoff + off)
> return NULL;
> - if (!is_cow_mapping(vma->vm_flags))
> + if (!is_cow_mapping(vma_flags))
> return NULL;
> }
> }
> @@ -2532,7 +2534,8 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
> {
> struct vm_area_struct *vma = vmf->vma;
>
> - vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
> + vmf->page = __vm_normal_page(vma, vmf->address, vmf->orig_pte, false,
> + vmf->vma_flags);
> if (!vmf->page) {
> /*
> * VM_MIXEDMAP !pfn_valid() case, or VM_SOFTDIRTY clear on a
> @@ -3706,7 +3709,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
> ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte);
> update_mmu_cache(vma, vmf->address, vmf->pte);
>
> - page = vm_normal_page(vma, vmf->address, pte);
> + page = __vm_normal_page(vma, vmf->address, pte, false, vmf->vma_flags);
> if (!page) {
> pte_unmap_unlock(vmf->pte, vmf->ptl);
> return 0;
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v12 15/31] mm: introduce __lru_cache_add_active_or_unevictable
From: Jerome Glisse @ 2019-04-22 20:11 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190416134522.17540-16-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:06PM +0200, Laurent Dufour wrote:
> The speculative page fault handler which is run without holding the
> mmap_sem is calling lru_cache_add_active_or_unevictable() but the vm_flags
> is not guaranteed to remain constant.
> Introducing __lru_cache_add_active_or_unevictable() which has the vma flags
> value parameter instead of the vma pointer.
>
> Acked-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> ---
> include/linux/swap.h | 10 ++++++++--
> mm/memory.c | 8 ++++----
> mm/swap.c | 6 +++---
> 3 files changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 4bfb5c4ac108..d33b94eb3c69 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -343,8 +343,14 @@ extern void deactivate_file_page(struct page *page);
> extern void mark_page_lazyfree(struct page *page);
> extern void swap_setup(void);
>
> -extern void lru_cache_add_active_or_unevictable(struct page *page,
> - struct vm_area_struct *vma);
> +extern void __lru_cache_add_active_or_unevictable(struct page *page,
> + unsigned long vma_flags);
> +
> +static inline void lru_cache_add_active_or_unevictable(struct page *page,
> + struct vm_area_struct *vma)
> +{
> + return __lru_cache_add_active_or_unevictable(page, vma->vm_flags);
> +}
>
> /* linux/mm/vmscan.c */
> extern unsigned long zone_reclaimable_pages(struct zone *zone);
> diff --git a/mm/memory.c b/mm/memory.c
> index 56802850e72c..85ec5ce5c0a8 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2347,7 +2347,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
> ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
> page_add_new_anon_rmap(new_page, vma, vmf->address, false);
> mem_cgroup_commit_charge(new_page, memcg, false, false);
> - lru_cache_add_active_or_unevictable(new_page, vma);
> + __lru_cache_add_active_or_unevictable(new_page, vmf->vma_flags);
> /*
> * We call the notify macro here because, when using secondary
> * mmu page tables (such as kvm shadow page tables), we want the
> @@ -2896,7 +2896,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> if (unlikely(page != swapcache && swapcache)) {
> page_add_new_anon_rmap(page, vma, vmf->address, false);
> mem_cgroup_commit_charge(page, memcg, false, false);
> - lru_cache_add_active_or_unevictable(page, vma);
> + __lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
> } else {
> do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
> mem_cgroup_commit_charge(page, memcg, true, false);
> @@ -3048,7 +3048,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
> page_add_new_anon_rmap(page, vma, vmf->address, false);
> mem_cgroup_commit_charge(page, memcg, false, false);
> - lru_cache_add_active_or_unevictable(page, vma);
> + __lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
> setpte:
> set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
>
> @@ -3327,7 +3327,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
> inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
> page_add_new_anon_rmap(page, vma, vmf->address, false);
> mem_cgroup_commit_charge(page, memcg, false, false);
> - lru_cache_add_active_or_unevictable(page, vma);
> + __lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
> } else {
> inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
> page_add_file_rmap(page, false);
> diff --git a/mm/swap.c b/mm/swap.c
> index 3a75722e68a9..a55f0505b563 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -450,12 +450,12 @@ void lru_cache_add(struct page *page)
> * directly back onto it's zone's unevictable list, it does NOT use a
> * per cpu pagevec.
> */
> -void lru_cache_add_active_or_unevictable(struct page *page,
> - struct vm_area_struct *vma)
> +void __lru_cache_add_active_or_unevictable(struct page *page,
> + unsigned long vma_flags)
> {
> VM_BUG_ON_PAGE(PageLRU(page), page);
>
> - if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED))
> + if (likely((vma_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED))
> SetPageActive(page);
> else if (!TestSetPageMlocked(page)) {
> /*
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v12 14/31] mm/migrate: Pass vm_fault pointer to migrate_misplaced_page()
From: Jerome Glisse @ 2019-04-22 20:09 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190416134522.17540-15-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:05PM +0200, Laurent Dufour wrote:
> migrate_misplaced_page() is only called during the page fault handling so
> it's better to pass the pointer to the struct vm_fault instead of the vma.
>
> This way during the speculative page fault path the saved vma->vm_flags
> could be used.
>
> Acked-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> ---
> include/linux/migrate.h | 4 ++--
> mm/memory.c | 2 +-
> mm/migrate.c | 4 ++--
> 3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index e13d9bf2f9a5..0197e40325f8 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -125,14 +125,14 @@ static inline void __ClearPageMovable(struct page *page)
> #ifdef CONFIG_NUMA_BALANCING
> extern bool pmd_trans_migrating(pmd_t pmd);
> extern int migrate_misplaced_page(struct page *page,
> - struct vm_area_struct *vma, int node);
> + struct vm_fault *vmf, int node);
> #else
> static inline bool pmd_trans_migrating(pmd_t pmd)
> {
> return false;
> }
> static inline int migrate_misplaced_page(struct page *page,
> - struct vm_area_struct *vma, int node)
> + struct vm_fault *vmf, int node)
> {
> return -EAGAIN; /* can't migrate now */
> }
> diff --git a/mm/memory.c b/mm/memory.c
> index d0de58464479..56802850e72c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3747,7 +3747,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
> }
>
> /* Migrate to the requested node */
> - migrated = migrate_misplaced_page(page, vma, target_nid);
> + migrated = migrate_misplaced_page(page, vmf, target_nid);
> if (migrated) {
> page_nid = target_nid;
> flags |= TNF_MIGRATED;
> diff --git a/mm/migrate.c b/mm/migrate.c
> index a9138093a8e2..633bd9abac54 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1938,7 +1938,7 @@ bool pmd_trans_migrating(pmd_t pmd)
> * node. Caller is expected to have an elevated reference count on
> * the page that will be dropped by this function before returning.
> */
> -int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
> +int migrate_misplaced_page(struct page *page, struct vm_fault *vmf,
> int node)
> {
> pg_data_t *pgdat = NODE_DATA(node);
> @@ -1951,7 +1951,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
> * with execute permissions as they are probably shared libraries.
> */
> if (page_mapcount(page) != 1 && page_is_file_cache(page) &&
> - (vma->vm_flags & VM_EXEC))
> + (vmf->vma_flags & VM_EXEC))
> goto out;
>
> /*
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v12 13/31] mm: cache some VMA fields in the vm_fault structure
From: Jerome Glisse @ 2019-04-22 20:06 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190416134522.17540-14-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:04PM +0200, Laurent Dufour wrote:
> When handling speculative page fault, the vma->vm_flags and
> vma->vm_page_prot fields are read once the page table lock is released. So
> there is no more guarantee that these fields would not change in our back.
> They will be saved in the vm_fault structure before the VMA is checked for
> changes.
>
> In the detail, when we deal with a speculative page fault, the mmap_sem is
> not taken, so parallel VMA's changes can occurred. When a VMA change is
> done which will impact the page fault processing, we assumed that the VMA
> sequence counter will be changed. In the page fault processing, at the
> time the PTE is locked, we checked the VMA sequence counter to detect
> changes done in our back. If no change is detected we can continue further.
> But this doesn't prevent the VMA to not be changed in our back while the
> PTE is locked. So VMA's fields which are used while the PTE is locked must
> be saved to ensure that we are using *static* values. This is important
> since the PTE changes will be made on regards to these VMA fields and they
> need to be consistent. This concerns the vma->vm_flags and
> vma->vm_page_prot VMA fields.
>
> This patch also set the fields in hugetlb_no_page() and
> __collapse_huge_page_swapin even if it is not need for the callee.
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
I am unsure about something see below, so you might need to update
that one but it would not change the structure of the patch thus:
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> ---
> include/linux/mm.h | 10 +++++++--
> mm/huge_memory.c | 6 +++---
> mm/hugetlb.c | 2 ++
> mm/khugepaged.c | 2 ++
> mm/memory.c | 53 ++++++++++++++++++++++++----------------------
> mm/migrate.c | 2 +-
> 6 files changed, 44 insertions(+), 31 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5d45b7d8718d..f465bb2b049e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -439,6 +439,12 @@ struct vm_fault {
> * page table to avoid allocation from
> * atomic context.
> */
> + /*
> + * These entries are required when handling speculative page fault.
> + * This way the page handling is done using consistent field values.
> + */
> + unsigned long vma_flags;
> + pgprot_t vma_page_prot;
> };
>
> /* page entry size for vm->huge_fault() */
> @@ -781,9 +787,9 @@ void free_compound_page(struct page *page);
> * pte_mkwrite. But get_user_pages can cause write faults for mappings
> * that do not have writing enabled, when used by access_process_vm.
> */
> -static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
> +static inline pte_t maybe_mkwrite(pte_t pte, unsigned long vma_flags)
> {
> - if (likely(vma->vm_flags & VM_WRITE))
> + if (likely(vma_flags & VM_WRITE))
> pte = pte_mkwrite(pte);
> return pte;
> }
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 823688414d27..865886a689ee 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1244,8 +1244,8 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
>
> for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
> pte_t entry;
> - entry = mk_pte(pages[i], vma->vm_page_prot);
> - entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> + entry = mk_pte(pages[i], vmf->vma_page_prot);
> + entry = maybe_mkwrite(pte_mkdirty(entry), vmf->vma_flags);
> memcg = (void *)page_private(pages[i]);
> set_page_private(pages[i], 0);
> page_add_new_anon_rmap(pages[i], vmf->vma, haddr, false);
> @@ -2228,7 +2228,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> entry = pte_swp_mksoft_dirty(entry);
> } else {
> entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
> - entry = maybe_mkwrite(entry, vma);
> + entry = maybe_mkwrite(entry, vma->vm_flags);
> if (!write)
> entry = pte_wrprotect(entry);
> if (!young)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 109f5de82910..13246da4bc50 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3812,6 +3812,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> .vma = vma,
> .address = haddr,
> .flags = flags,
> + .vma_flags = vma->vm_flags,
> + .vma_page_prot = vma->vm_page_prot,
Shouldn't you use READ_ONCE ? I doubt compiler will do something creative
with struct initialization but as you are using WRITE_ONCE to update those
fields maybe pairing read with READ_ONCE where the mmap_sem is not always
taken might make sense.
> /*
> * Hard to debug if it ends up being
> * used by a callee that assumes
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 6a0cbca3885e..42469037240a 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -888,6 +888,8 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm,
> .flags = FAULT_FLAG_ALLOW_RETRY,
> .pmd = pmd,
> .pgoff = linear_page_index(vma, address),
> + .vma_flags = vma->vm_flags,
> + .vma_page_prot = vma->vm_page_prot,
Same as above.
[...]
> return VM_FAULT_FALLBACK;
> @@ -3924,6 +3925,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma,
> .flags = flags,
> .pgoff = linear_page_index(vma, address),
> .gfp_mask = __get_fault_gfp_mask(vma),
> + .vma_flags = vma->vm_flags,
> + .vma_page_prot = vma->vm_page_prot,
Same as above
> };
> unsigned int dirty = flags & FAULT_FLAG_WRITE;
> struct mm_struct *mm = vma->vm_mm;
> diff --git a/mm/migrate.c b/mm/migrate.c
> index f2ecc2855a12..a9138093a8e2 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -240,7 +240,7 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma,
> */
> entry = pte_to_swp_entry(*pvmw.pte);
> if (is_write_migration_entry(entry))
> - pte = maybe_mkwrite(pte, vma);
> + pte = maybe_mkwrite(pte, vma->vm_flags);
>
> if (unlikely(is_zone_device_page(new))) {
> if (is_device_private_page(new)) {
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v12 12/31] mm: protect SPF handler against anon_vma changes
From: Jerome Glisse @ 2019-04-22 19:53 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190416134522.17540-13-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:03PM +0200, Laurent Dufour wrote:
> The speculative page fault handler must be protected against anon_vma
> changes. This is because page_add_new_anon_rmap() is called during the
> speculative path.
>
> In addition, don't try speculative page fault if the VMA don't have an
> anon_vma structure allocated because its allocation should be
> protected by the mmap_sem.
>
> In __vma_adjust() when importer->anon_vma is set, there is no need to
> protect against speculative page faults since speculative page fault
> is aborted if the vma->anon_vma is not set.
>
> When calling page_add_new_anon_rmap() vma->anon_vma is necessarily
> valid since we checked for it when locking the pte and the anon_vma is
> removed once the pte is unlocked. So even if the speculative page
> fault handler is running concurrently with do_unmap(), as the pte is
> locked in unmap_region() - through unmap_vmas() - and the anon_vma
> unlinked later, because we check for the vma sequence counter which is
> updated in unmap_page_range() before locking the pte, and then in
> free_pgtables() so when locking the pte the change will be detected.
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> ---
> mm/memory.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 423fa8ea0569..2cf7b6185daa 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -377,7 +377,9 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma,
> * Hide vma from rmap and truncate_pagecache before freeing
> * pgtables
> */
> + vm_write_begin(vma);
> unlink_anon_vmas(vma);
> + vm_write_end(vma);
> unlink_file_vma(vma);
>
> if (is_vm_hugetlb_page(vma)) {
> @@ -391,7 +393,9 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma,
> && !is_vm_hugetlb_page(next)) {
> vma = next;
> next = vma->vm_next;
> + vm_write_begin(vma);
> unlink_anon_vmas(vma);
> + vm_write_end(vma);
> unlink_file_vma(vma);
> }
> free_pgd_range(tlb, addr, vma->vm_end,
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v12 11/31] mm: protect mremap() against SPF hanlder
From: Jerome Glisse @ 2019-04-22 19:51 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190416134522.17540-12-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:02PM +0200, Laurent Dufour wrote:
> If a thread is remapping an area while another one is faulting on the
> destination area, the SPF handler may fetch the vma from the RB tree before
> the pte has been moved by the other thread. This means that the moved ptes
> will overwrite those create by the page fault handler leading to page
> leaked.
>
> CPU 1 CPU2
> enter mremap()
> unmap the dest area
> copy_vma() Enter speculative page fault handler
> >> at this time the dest area is present in the RB tree
> fetch the vma matching dest area
> create a pte as the VMA matched
> Exit the SPF handler
> <data written in the new page>
> move_ptes()
> > it is assumed that the dest area is empty,
> > the move ptes overwrite the page mapped by the CPU2.
>
> To prevent that, when the VMA matching the dest area is extended or created
> by copy_vma(), it should be marked as non available to the SPF handler.
> The usual way to so is to rely on vm_write_begin()/end().
> This is already in __vma_adjust() called by copy_vma() (through
> vma_merge()). But __vma_adjust() is calling vm_write_end() before returning
> which create a window for another thread.
> This patch adds a new parameter to vma_merge() which is passed down to
> vma_adjust().
> The assumption is that copy_vma() is returning a vma which should be
> released by calling vm_raw_write_end() by the callee once the ptes have
> been moved.
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Small comment about a comment below but can be fix as a fixup
patch nothing earth shattering.
> ---
> include/linux/mm.h | 24 ++++++++++++++++-----
> mm/mmap.c | 53 +++++++++++++++++++++++++++++++++++-----------
> mm/mremap.c | 13 ++++++++++++
> 3 files changed, 73 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 906b9e06f18e..5d45b7d8718d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2343,18 +2343,32 @@ void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
>
> /* mmap.c */
> extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin);
> +
> extern int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
> unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
> - struct vm_area_struct *expand);
> + struct vm_area_struct *expand, bool keep_locked);
> +
> static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start,
> unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert)
> {
> - return __vma_adjust(vma, start, end, pgoff, insert, NULL);
> + return __vma_adjust(vma, start, end, pgoff, insert, NULL, false);
> }
> -extern struct vm_area_struct *vma_merge(struct mm_struct *,
> +
> +extern struct vm_area_struct *__vma_merge(struct mm_struct *mm,
> + struct vm_area_struct *prev, unsigned long addr, unsigned long end,
> + unsigned long vm_flags, struct anon_vma *anon, struct file *file,
> + pgoff_t pgoff, struct mempolicy *mpol,
> + struct vm_userfaultfd_ctx uff, bool keep_locked);
> +
> +static inline struct vm_area_struct *vma_merge(struct mm_struct *mm,
> struct vm_area_struct *prev, unsigned long addr, unsigned long end,
> - unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
> - struct mempolicy *, struct vm_userfaultfd_ctx);
> + unsigned long vm_flags, struct anon_vma *anon, struct file *file,
> + pgoff_t off, struct mempolicy *pol, struct vm_userfaultfd_ctx uff)
> +{
> + return __vma_merge(mm, prev, addr, end, vm_flags, anon, file, off,
> + pol, uff, false);
> +}
> +
> extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
> extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
> unsigned long addr, int new_below);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index b77ec0149249..13460b38b0fb 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -714,7 +714,7 @@ static inline void __vma_unlink_prev(struct mm_struct *mm,
> */
> int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
> unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
> - struct vm_area_struct *expand)
> + struct vm_area_struct *expand, bool keep_locked)
> {
> struct mm_struct *mm = vma->vm_mm;
> struct vm_area_struct *next = vma->vm_next, *orig_vma = vma;
> @@ -830,8 +830,12 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
>
> importer->anon_vma = exporter->anon_vma;
> error = anon_vma_clone(importer, exporter);
> - if (error)
> + if (error) {
> + if (next && next != vma)
> + vm_raw_write_end(next);
> + vm_raw_write_end(vma);
> return error;
> + }
> }
> }
> again:
> @@ -1025,7 +1029,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
>
> if (next && next != vma)
> vm_raw_write_end(next);
> - vm_raw_write_end(vma);
> + if (!keep_locked)
> + vm_raw_write_end(vma);
>
> validate_mm(mm);
>
> @@ -1161,12 +1166,13 @@ can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
> * parameter) may establish ptes with the wrong permissions of NNNN
> * instead of the right permissions of XXXX.
> */
> -struct vm_area_struct *vma_merge(struct mm_struct *mm,
> +struct vm_area_struct *__vma_merge(struct mm_struct *mm,
> struct vm_area_struct *prev, unsigned long addr,
> unsigned long end, unsigned long vm_flags,
> struct anon_vma *anon_vma, struct file *file,
> pgoff_t pgoff, struct mempolicy *policy,
> - struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
> + struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
> + bool keep_locked)
> {
> pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
> struct vm_area_struct *area, *next;
> @@ -1214,10 +1220,11 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
> /* cases 1, 6 */
> err = __vma_adjust(prev, prev->vm_start,
> next->vm_end, prev->vm_pgoff, NULL,
> - prev);
> + prev, keep_locked);
> } else /* cases 2, 5, 7 */
> err = __vma_adjust(prev, prev->vm_start,
> - end, prev->vm_pgoff, NULL, prev);
> + end, prev->vm_pgoff, NULL, prev,
> + keep_locked);
> if (err)
> return NULL;
> khugepaged_enter_vma_merge(prev, vm_flags);
> @@ -1234,10 +1241,12 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
> vm_userfaultfd_ctx)) {
> if (prev && addr < prev->vm_end) /* case 4 */
> err = __vma_adjust(prev, prev->vm_start,
> - addr, prev->vm_pgoff, NULL, next);
> + addr, prev->vm_pgoff, NULL, next,
> + keep_locked);
> else { /* cases 3, 8 */
> err = __vma_adjust(area, addr, next->vm_end,
> - next->vm_pgoff - pglen, NULL, next);
> + next->vm_pgoff - pglen, NULL, next,
> + keep_locked);
> /*
> * In case 3 area is already equal to next and
> * this is a noop, but in case 8 "area" has
> @@ -3259,9 +3268,20 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
>
> if (find_vma_links(mm, addr, addr + len, &prev, &rb_link, &rb_parent))
> return NULL; /* should never get here */
> - new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
> - vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
> - vma->vm_userfaultfd_ctx);
> +
> + /* There is 3 cases to manage here in
> + * AAAA AAAA AAAA AAAA
> + * PPPP.... PPPP......NNNN PPPP....NNNN PP........NN
> + * PPPPPPPP(A) PPPP..NNNNNNNN(B) PPPPPPPPPPPP(1) NULL
> + * PPPPPPPPNNNN(2)
> + * PPPPNNNNNNNN(3)
> + *
> + * new_vma == prev in case A,1,2
> + * new_vma == next in case B,3
> + */
> + new_vma = __vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
> + vma->anon_vma, vma->vm_file, pgoff,
> + vma_policy(vma), vma->vm_userfaultfd_ctx, true);
> if (new_vma) {
> /*
> * Source vma may have been merged into new_vma
> @@ -3299,6 +3319,15 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
> get_file(new_vma->vm_file);
> if (new_vma->vm_ops && new_vma->vm_ops->open)
> new_vma->vm_ops->open(new_vma);
> + /*
> + * As the VMA is linked right now, it may be hit by the
> + * speculative page fault handler. But we don't want it to
> + * to start mapping page in this area until the caller has
> + * potentially move the pte from the moved VMA. To prevent
> + * that we protect it right now, and let the caller unprotect
> + * it once the move is done.
> + */
It would be better to say:
/*
* Block speculative page fault on the new VMA before "linking" it as
* as once it is linked then it may be hit by speculative page fault.
* But we don't want it to start mapping page in this area until the
* caller has potentially move the pte from the moved VMA. To prevent
* that we protect it before linking and let the caller unprotect it
* once the move is done.
*/
> + vm_raw_write_begin(new_vma);
> vma_link(mm, new_vma, prev, rb_link, rb_parent);
> *need_rmap_locks = false;
> }
> diff --git a/mm/mremap.c b/mm/mremap.c
> index fc241d23cd97..ae5c3379586e 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -357,6 +357,14 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> if (!new_vma)
> return -ENOMEM;
>
> + /* new_vma is returned protected by copy_vma, to prevent speculative
> + * page fault to be done in the destination area before we move the pte.
> + * Now, we must also protect the source VMA since we don't want pages
> + * to be mapped in our back while we are copying the PTEs.
> + */
> + if (vma != new_vma)
> + vm_raw_write_begin(vma);
> +
> moved_len = move_page_tables(vma, old_addr, new_vma, new_addr, old_len,
> need_rmap_locks);
> if (moved_len < old_len) {
> @@ -373,6 +381,8 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> */
> move_page_tables(new_vma, new_addr, vma, old_addr, moved_len,
> true);
> + if (vma != new_vma)
> + vm_raw_write_end(vma);
> vma = new_vma;
> old_len = new_len;
> old_addr = new_addr;
> @@ -381,7 +391,10 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> mremap_userfaultfd_prep(new_vma, uf);
> arch_remap(mm, old_addr, old_addr + old_len,
> new_addr, new_addr + new_len);
> + if (vma != new_vma)
> + vm_raw_write_end(vma);
> }
> + vm_raw_write_end(new_vma);
>
> /* Conceal VM_ACCOUNT so old reservation is not undone */
> if (vm_flags & VM_ACCOUNT) {
> --
> 2.21.0
>
^ permalink raw reply
* Re: [PATCH v12 10/31] mm: protect VMA modifications using VMA sequence count
From: Jerome Glisse @ 2019-04-22 19:43 UTC (permalink / raw)
To: Laurent Dufour
Cc: jack, sergey.senozhatsky.work, peterz, Will Deacon, mhocko,
linux-mm, paulus, Punit Agrawal, hpa, Michel Lespinasse,
Alexei Starovoitov, Andrea Arcangeli, ak, Minchan Kim,
aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan, Ingo Molnar,
David Rientjes, paulmck, Haiyan Song, npiggin, sj38.park, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190416134522.17540-11-ldufour@linux.ibm.com>
On Tue, Apr 16, 2019 at 03:45:01PM +0200, Laurent Dufour wrote:
> The VMA sequence count has been introduced to allow fast detection of
> VMA modification when running a page fault handler without holding
> the mmap_sem.
>
> This patch provides protection against the VMA modification done in :
> - madvise()
> - mpol_rebind_policy()
> - vma_replace_policy()
> - change_prot_numa()
> - mlock(), munlock()
> - mprotect()
> - mmap_region()
> - collapse_huge_page()
> - userfaultd registering services
>
> In addition, VMA fields which will be read during the speculative fault
> path needs to be written using WRITE_ONCE to prevent write to be split
> and intermediate values to be pushed to other CPUs.
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> ---
> fs/proc/task_mmu.c | 5 ++++-
> fs/userfaultfd.c | 17 ++++++++++++----
> mm/khugepaged.c | 3 +++
> mm/madvise.c | 6 +++++-
> mm/mempolicy.c | 51 ++++++++++++++++++++++++++++++----------------
> mm/mlock.c | 13 +++++++-----
> mm/mmap.c | 28 ++++++++++++++++---------
> mm/mprotect.c | 4 +++-
> mm/swap_state.c | 10 ++++++---
> 9 files changed, 95 insertions(+), 42 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 01d4eb0e6bd1..0864c050b2de 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1162,8 +1162,11 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
> goto out_mm;
> }
> for (vma = mm->mmap; vma; vma = vma->vm_next) {
> - vma->vm_flags &= ~VM_SOFTDIRTY;
> + vm_write_begin(vma);
> + WRITE_ONCE(vma->vm_flags,
> + vma->vm_flags & ~VM_SOFTDIRTY);
> vma_set_page_prot(vma);
> + vm_write_end(vma);
> }
> downgrade_write(&mm->mmap_sem);
> break;
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 3b30301c90ec..2e0f98cadd81 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -667,8 +667,11 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs)
>
> octx = vma->vm_userfaultfd_ctx.ctx;
> if (!octx || !(octx->features & UFFD_FEATURE_EVENT_FORK)) {
> + vm_write_begin(vma);
> vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
> - vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING);
> + WRITE_ONCE(vma->vm_flags,
> + vma->vm_flags & ~(VM_UFFD_WP | VM_UFFD_MISSING));
> + vm_write_end(vma);
> return 0;
> }
>
> @@ -908,8 +911,10 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
> vma = prev;
> else
> prev = vma;
> - vma->vm_flags = new_flags;
> + vm_write_begin(vma);
> + WRITE_ONCE(vma->vm_flags, new_flags);
> vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
> + vm_write_end(vma);
> }
> skip_mm:
> up_write(&mm->mmap_sem);
> @@ -1474,8 +1479,10 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> * the next vma was merged into the current one and
> * the current one has not been updated yet.
> */
> - vma->vm_flags = new_flags;
> + vm_write_begin(vma);
> + WRITE_ONCE(vma->vm_flags, new_flags);
> vma->vm_userfaultfd_ctx.ctx = ctx;
> + vm_write_end(vma);
>
> skip:
> prev = vma;
> @@ -1636,8 +1643,10 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
> * the next vma was merged into the current one and
> * the current one has not been updated yet.
> */
> - vma->vm_flags = new_flags;
> + vm_write_begin(vma);
> + WRITE_ONCE(vma->vm_flags, new_flags);
> vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
> + vm_write_end(vma);
>
> skip:
> prev = vma;
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index a335f7c1fac4..6a0cbca3885e 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1011,6 +1011,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> if (mm_find_pmd(mm, address) != pmd)
> goto out;
>
> + vm_write_begin(vma);
> anon_vma_lock_write(vma->anon_vma);
>
> pte = pte_offset_map(pmd, address);
> @@ -1046,6 +1047,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> pmd_populate(mm, pmd, pmd_pgtable(_pmd));
> spin_unlock(pmd_ptl);
> anon_vma_unlock_write(vma->anon_vma);
> + vm_write_end(vma);
> result = SCAN_FAIL;
> goto out;
> }
> @@ -1081,6 +1083,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> set_pmd_at(mm, address, pmd, _pmd);
> update_mmu_cache_pmd(vma, address, pmd);
> spin_unlock(pmd_ptl);
> + vm_write_end(vma);
>
> *hpage = NULL;
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index a692d2a893b5..6cf07dc546fc 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -184,7 +184,9 @@ static long madvise_behavior(struct vm_area_struct *vma,
> /*
> * vm_flags is protected by the mmap_sem held in write mode.
> */
> - vma->vm_flags = new_flags;
> + vm_write_begin(vma);
> + WRITE_ONCE(vma->vm_flags, new_flags);
> + vm_write_end(vma);
> out:
> return error;
> }
> @@ -450,9 +452,11 @@ static void madvise_free_page_range(struct mmu_gather *tlb,
> .private = tlb,
> };
>
> + vm_write_begin(vma);
> tlb_start_vma(tlb, vma);
> walk_page_range(addr, end, &free_walk);
> tlb_end_vma(tlb, vma);
> + vm_write_end(vma);
> }
>
> static int madvise_free_single_vma(struct vm_area_struct *vma,
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 2219e747df49..94c103c5034a 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -380,8 +380,11 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new)
> struct vm_area_struct *vma;
>
> down_write(&mm->mmap_sem);
> - for (vma = mm->mmap; vma; vma = vma->vm_next)
> + for (vma = mm->mmap; vma; vma = vma->vm_next) {
> + vm_write_begin(vma);
> mpol_rebind_policy(vma->vm_policy, new);
> + vm_write_end(vma);
> + }
> up_write(&mm->mmap_sem);
> }
>
> @@ -575,9 +578,11 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
> {
> int nr_updated;
>
> + vm_write_begin(vma);
> nr_updated = change_protection(vma, addr, end, PAGE_NONE, 0, 1);
> if (nr_updated)
> count_vm_numa_events(NUMA_PTE_UPDATES, nr_updated);
> + vm_write_end(vma);
>
> return nr_updated;
> }
> @@ -683,6 +688,7 @@ static int vma_replace_policy(struct vm_area_struct *vma,
> if (IS_ERR(new))
> return PTR_ERR(new);
>
> + vm_write_begin(vma);
> if (vma->vm_ops && vma->vm_ops->set_policy) {
> err = vma->vm_ops->set_policy(vma, new);
> if (err)
> @@ -690,11 +696,17 @@ static int vma_replace_policy(struct vm_area_struct *vma,
> }
>
> old = vma->vm_policy;
> - vma->vm_policy = new; /* protected by mmap_sem */
> + /*
> + * The speculative page fault handler accesses this field without
> + * hodling the mmap_sem.
> + */
> + WRITE_ONCE(vma->vm_policy, new);
> + vm_write_end(vma);
> mpol_put(old);
>
> return 0;
> err_out:
> + vm_write_end(vma);
> mpol_put(new);
> return err;
> }
> @@ -1654,23 +1666,28 @@ COMPAT_SYSCALL_DEFINE4(migrate_pages, compat_pid_t, pid,
> struct mempolicy *__get_vma_policy(struct vm_area_struct *vma,
> unsigned long addr)
> {
> - struct mempolicy *pol = NULL;
> + struct mempolicy *pol;
>
> - if (vma) {
> - if (vma->vm_ops && vma->vm_ops->get_policy) {
> - pol = vma->vm_ops->get_policy(vma, addr);
> - } else if (vma->vm_policy) {
> - pol = vma->vm_policy;
> + if (!vma)
> + return NULL;
>
> - /*
> - * shmem_alloc_page() passes MPOL_F_SHARED policy with
> - * a pseudo vma whose vma->vm_ops=NULL. Take a reference
> - * count on these policies which will be dropped by
> - * mpol_cond_put() later
> - */
> - if (mpol_needs_cond_ref(pol))
> - mpol_get(pol);
> - }
> + if (vma->vm_ops && vma->vm_ops->get_policy)
> + return vma->vm_ops->get_policy(vma, addr);
> +
> + /*
> + * This could be called without holding the mmap_sem in the
> + * speculative page fault handler's path.
> + */
> + pol = READ_ONCE(vma->vm_policy);
> + if (pol) {
> + /*
> + * shmem_alloc_page() passes MPOL_F_SHARED policy with
> + * a pseudo vma whose vma->vm_ops=NULL. Take a reference
> + * count on these policies which will be dropped by
> + * mpol_cond_put() later
> + */
> + if (mpol_needs_cond_ref(pol))
> + mpol_get(pol);
> }
>
> return pol;
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 080f3b36415b..f390903d9bbb 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -445,7 +445,9 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
> void munlock_vma_pages_range(struct vm_area_struct *vma,
> unsigned long start, unsigned long end)
> {
> - vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
> + vm_write_begin(vma);
> + WRITE_ONCE(vma->vm_flags, vma->vm_flags & VM_LOCKED_CLEAR_MASK);
> + vm_write_end(vma);
>
> while (start < end) {
> struct page *page;
> @@ -569,10 +571,11 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
> * It's okay if try_to_unmap_one unmaps a page just after we
> * set VM_LOCKED, populate_vma_page_range will bring it back.
> */
> -
> - if (lock)
> - vma->vm_flags = newflags;
> - else
> + if (lock) {
> + vm_write_begin(vma);
> + WRITE_ONCE(vma->vm_flags, newflags);
> + vm_write_end(vma);
> + } else
> munlock_vma_pages_range(vma, start, end);
>
> out:
> diff --git a/mm/mmap.c b/mm/mmap.c
> index a4e4d52a5148..b77ec0149249 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -877,17 +877,18 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
> }
>
> if (start != vma->vm_start) {
> - vma->vm_start = start;
> + WRITE_ONCE(vma->vm_start, start);
> start_changed = true;
> }
> if (end != vma->vm_end) {
> - vma->vm_end = end;
> + WRITE_ONCE(vma->vm_end, end);
> end_changed = true;
> }
> - vma->vm_pgoff = pgoff;
> + WRITE_ONCE(vma->vm_pgoff, pgoff);
> if (adjust_next) {
> - next->vm_start += adjust_next << PAGE_SHIFT;
> - next->vm_pgoff += adjust_next;
> + WRITE_ONCE(next->vm_start,
> + next->vm_start + (adjust_next << PAGE_SHIFT));
> + WRITE_ONCE(next->vm_pgoff, next->vm_pgoff + adjust_next);
> }
>
> if (root) {
> @@ -1850,12 +1851,14 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> out:
> perf_event_mmap(vma);
>
> + vm_write_begin(vma);
> vm_stat_account(mm, vm_flags, len >> PAGE_SHIFT);
> if (vm_flags & VM_LOCKED) {
> if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
> is_vm_hugetlb_page(vma) ||
> vma == get_gate_vma(current->mm))
> - vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
> + WRITE_ONCE(vma->vm_flags,
> + vma->vm_flags &= VM_LOCKED_CLEAR_MASK);
> else
> mm->locked_vm += (len >> PAGE_SHIFT);
> }
> @@ -1870,9 +1873,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> * then new mapped in-place (which must be aimed as
> * a completely new data area).
> */
> - vma->vm_flags |= VM_SOFTDIRTY;
> + WRITE_ONCE(vma->vm_flags, vma->vm_flags | VM_SOFTDIRTY);
>
> vma_set_page_prot(vma);
> + vm_write_end(vma);
>
> return addr;
>
> @@ -2430,7 +2434,9 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> mm->locked_vm += grow;
> vm_stat_account(mm, vma->vm_flags, grow);
> anon_vma_interval_tree_pre_update_vma(vma);
> - vma->vm_end = address;
> + vm_write_begin(vma);
> + WRITE_ONCE(vma->vm_end, address);
> + vm_write_end(vma);
> anon_vma_interval_tree_post_update_vma(vma);
> if (vma->vm_next)
> vma_gap_update(vma->vm_next);
> @@ -2510,8 +2516,10 @@ int expand_downwards(struct vm_area_struct *vma,
> mm->locked_vm += grow;
> vm_stat_account(mm, vma->vm_flags, grow);
> anon_vma_interval_tree_pre_update_vma(vma);
> - vma->vm_start = address;
> - vma->vm_pgoff -= grow;
> + vm_write_begin(vma);
> + WRITE_ONCE(vma->vm_start, address);
> + WRITE_ONCE(vma->vm_pgoff, vma->vm_pgoff - grow);
> + vm_write_end(vma);
> anon_vma_interval_tree_post_update_vma(vma);
> vma_gap_update(vma);
> spin_unlock(&mm->page_table_lock);
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 65242f1e4457..78fce873ca3a 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -427,12 +427,14 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
> * vm_flags and vm_page_prot are protected by the mmap_sem
> * held in write mode.
> */
> - vma->vm_flags = newflags;
> + vm_write_begin(vma);
> + WRITE_ONCE(vma->vm_flags, newflags);
> dirty_accountable = vma_wants_writenotify(vma, vma->vm_page_prot);
> vma_set_page_prot(vma);
>
> change_protection(vma, start, end, vma->vm_page_prot,
> dirty_accountable, 0);
> + vm_write_end(vma);
>
> /*
> * Private VM_LOCKED VMA becoming writable: trigger COW to avoid major
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index eb714165afd2..c45f9122b457 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -523,7 +523,11 @@ static unsigned long swapin_nr_pages(unsigned long offset)
> * This has been extended to use the NUMA policies from the mm triggering
> * the readahead.
> *
> - * Caller must hold read mmap_sem if vmf->vma is not NULL.
> + * Caller must hold down_read on the vma->vm_mm if vmf->vma is not NULL.
> + * This is needed to ensure the VMA will not be freed in our back. In the case
> + * of the speculative page fault handler, this cannot happen, even if we don't
> + * hold the mmap_sem. Callees are assumed to take care of reading VMA's fields
> + * using READ_ONCE() to read consistent values.
> */
> struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
> struct vm_fault *vmf)
> @@ -624,9 +628,9 @@ static inline void swap_ra_clamp_pfn(struct vm_area_struct *vma,
> unsigned long *start,
> unsigned long *end)
> {
> - *start = max3(lpfn, PFN_DOWN(vma->vm_start),
> + *start = max3(lpfn, PFN_DOWN(READ_ONCE(vma->vm_start)),
> PFN_DOWN(faddr & PMD_MASK));
> - *end = min3(rpfn, PFN_DOWN(vma->vm_end),
> + *end = min3(rpfn, PFN_DOWN(READ_ONCE(vma->vm_end)),
> PFN_DOWN((faddr & PMD_MASK) + PMD_SIZE));
> }
>
> --
> 2.21.0
>
^ permalink raw reply
* [PATCH v3 3/3] ASoC: fsl_sai: Move clock operation to PM runtime
From: Daniel Baluta @ 2019-04-22 19:02 UTC (permalink / raw)
To: broonie@kernel.org
Cc: Daniel Baluta, alsa-devel@alsa-project.org, timur@kernel.org,
Xiubo.Lee@gmail.com, linuxppc-dev@lists.ozlabs.org, S.j. Wang,
tiwai@suse.com, lgirdwood@gmail.com, perex@perex.cz, dl-linux-imx,
festevam@gmail.com, linux-kernel@vger.kernel.org
In-Reply-To: <20190422190213.30726-1-daniel.baluta@nxp.com>
From: Shengjiu Wang <shengjiu.wang@nxp.com>
Turn off/on clocks when device enters suspend/resume. This
can help saving power.
As a further optimization, we turn off/on mclk only when SAI
is in master mode because otherwise mclk is externally provided.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com>
---
sound/soc/fsl/fsl_sai.c | 54 +++++++++++++++++++++++++++++++++--------
1 file changed, 44 insertions(+), 10 deletions(-)
diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 8623b7f882b9..7fd1a81ec1aa 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -596,15 +596,8 @@ static int fsl_sai_startup(struct snd_pcm_substream *substream,
{
struct fsl_sai *sai = snd_soc_dai_get_drvdata(cpu_dai);
bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
- struct device *dev = &sai->pdev->dev;
int ret;
- ret = clk_prepare_enable(sai->bus_clk);
- if (ret) {
- dev_err(dev, "failed to enable bus clock: %d\n", ret);
- return ret;
- }
-
regmap_update_bits(sai->regmap, FSL_SAI_xCR3(tx), FSL_SAI_CR3_TRCE,
FSL_SAI_CR3_TRCE);
@@ -621,8 +614,6 @@ static void fsl_sai_shutdown(struct snd_pcm_substream *substream,
bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
regmap_update_bits(sai->regmap, FSL_SAI_xCR3(tx), FSL_SAI_CR3_TRCE, 0);
-
- clk_disable_unprepare(sai->bus_clk);
}
static const struct snd_soc_dai_ops fsl_sai_pcm_dai_ops = {
@@ -932,6 +923,14 @@ static int fsl_sai_runtime_suspend(struct device *dev)
{
struct fsl_sai *sai = dev_get_drvdata(dev);
+ if (sai->mclk_streams & BIT(SNDRV_PCM_STREAM_CAPTURE))
+ clk_disable_unprepare(sai->mclk_clk[sai->mclk_id[0]]);
+
+ if (sai->mclk_streams & BIT(SNDRV_PCM_STREAM_PLAYBACK))
+ clk_disable_unprepare(sai->mclk_clk[sai->mclk_id[1]]);
+
+ clk_disable_unprepare(sai->bus_clk);
+
regcache_cache_only(sai->regmap, true);
regcache_mark_dirty(sai->regmap);
@@ -941,6 +940,25 @@ static int fsl_sai_runtime_suspend(struct device *dev)
static int fsl_sai_runtime_resume(struct device *dev)
{
struct fsl_sai *sai = dev_get_drvdata(dev);
+ int ret;
+
+ ret = clk_prepare_enable(sai->bus_clk);
+ if (ret) {
+ dev_err(dev, "failed to enable bus clock: %d\n", ret);
+ return ret;
+ }
+
+ if (sai->mclk_streams & BIT(SNDRV_PCM_STREAM_PLAYBACK)) {
+ ret = clk_prepare_enable(sai->mclk_clk[sai->mclk_id[1]]);
+ if (ret)
+ goto disable_bus_clk;
+ }
+
+ if (sai->mclk_streams & BIT(SNDRV_PCM_STREAM_CAPTURE)) {
+ ret = clk_prepare_enable(sai->mclk_clk[sai->mclk_id[0]]);
+ if (ret)
+ goto disable_tx_clk;
+ }
regcache_cache_only(sai->regmap, false);
regmap_write(sai->regmap, FSL_SAI_TCSR, FSL_SAI_CSR_SR);
@@ -948,7 +966,23 @@ static int fsl_sai_runtime_resume(struct device *dev)
usleep_range(1000, 2000);
regmap_write(sai->regmap, FSL_SAI_TCSR, 0);
regmap_write(sai->regmap, FSL_SAI_RCSR, 0);
- return regcache_sync(sai->regmap);
+
+ ret = regcache_sync(sai->regmap);
+ if (ret)
+ goto disable_rx_clk;
+
+ return 0;
+
+disable_rx_clk:
+ if (sai->mclk_streams & BIT(SNDRV_PCM_STREAM_CAPTURE))
+ clk_disable_unprepare(sai->mclk_clk[sai->mclk_id[0]]);
+disable_tx_clk:
+ if (sai->mclk_streams & BIT(SNDRV_PCM_STREAM_PLAYBACK))
+ clk_disable_unprepare(sai->mclk_clk[sai->mclk_id[1]]);
+disable_bus_clk:
+ clk_disable_unprepare(sai->bus_clk);
+
+ return ret;
}
#endif /* CONFIG_PM */
--
2.17.1
^ permalink raw reply related
* [PATCH v3 2/3] ASoC: fsl_sai: Add support for runtime pm
From: Daniel Baluta @ 2019-04-22 19:02 UTC (permalink / raw)
To: broonie@kernel.org
Cc: Daniel Baluta, alsa-devel@alsa-project.org, timur@kernel.org,
Xiubo.Lee@gmail.com, linuxppc-dev@lists.ozlabs.org, S.j. Wang,
tiwai@suse.com, lgirdwood@gmail.com, perex@perex.cz, dl-linux-imx,
festevam@gmail.com, linux-kernel@vger.kernel.org
In-Reply-To: <20190422190213.30726-1-daniel.baluta@nxp.com>
Basically the same actions as for system PM, so make use
of pm_runtime_force_suspend/pm_runtime_force_resume.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com>
---
sound/soc/fsl/fsl_sai.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index d9df98975cf8..8623b7f882b9 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -9,6 +9,7 @@
#include <linux/dmaengine.h>
#include <linux/module.h>
#include <linux/of_address.h>
+#include <linux/pm_runtime.h>
#include <linux/regmap.h>
#include <linux/slab.h>
#include <linux/time.h>
@@ -900,6 +901,8 @@ static int fsl_sai_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, sai);
+ pm_runtime_enable(&pdev->dev);
+
ret = devm_snd_soc_register_component(&pdev->dev, &fsl_component,
&fsl_sai_dai, 1);
if (ret)
@@ -911,6 +914,11 @@ static int fsl_sai_probe(struct platform_device *pdev)
return devm_snd_dmaengine_pcm_register(&pdev->dev, NULL, 0);
}
+static int fsl_sai_remove(struct platform_device *pdev)
+{
+ pm_runtime_disable(&pdev->dev);
+}
+
static const struct of_device_id fsl_sai_ids[] = {
{ .compatible = "fsl,vf610-sai", },
{ .compatible = "fsl,imx6sx-sai", },
@@ -919,8 +927,8 @@ static const struct of_device_id fsl_sai_ids[] = {
};
MODULE_DEVICE_TABLE(of, fsl_sai_ids);
-#ifdef CONFIG_PM_SLEEP
-static int fsl_sai_suspend(struct device *dev)
+#ifdef CONFIG_PM
+static int fsl_sai_runtime_suspend(struct device *dev)
{
struct fsl_sai *sai = dev_get_drvdata(dev);
@@ -930,7 +938,7 @@ static int fsl_sai_suspend(struct device *dev)
return 0;
}
-static int fsl_sai_resume(struct device *dev)
+static int fsl_sai_runtime_resume(struct device *dev)
{
struct fsl_sai *sai = dev_get_drvdata(dev);
@@ -942,14 +950,18 @@ static int fsl_sai_resume(struct device *dev)
regmap_write(sai->regmap, FSL_SAI_RCSR, 0);
return regcache_sync(sai->regmap);
}
-#endif /* CONFIG_PM_SLEEP */
+#endif /* CONFIG_PM */
static const struct dev_pm_ops fsl_sai_pm_ops = {
- SET_SYSTEM_SLEEP_PM_OPS(fsl_sai_suspend, fsl_sai_resume)
+ SET_RUNTIME_PM_OPS(fsl_sai_runtime_suspend,
+ fsl_sai_runtime_resume, NULL)
+ SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+ pm_runtime_force_resume)
};
static struct platform_driver fsl_sai_driver = {
.probe = fsl_sai_probe,
+ .remove = fsl_sai_remove,
.driver = {
.name = "fsl-sai",
.pm = &fsl_sai_pm_ops,
--
2.17.1
^ permalink raw reply related
* [PATCH v3 1/3] ASoC: fsl_sai: Update is_slave_mode with correct value
From: Daniel Baluta @ 2019-04-22 19:02 UTC (permalink / raw)
To: broonie@kernel.org
Cc: Daniel Baluta, alsa-devel@alsa-project.org, timur@kernel.org,
Xiubo.Lee@gmail.com, linuxppc-dev@lists.ozlabs.org, S.j. Wang,
tiwai@suse.com, lgirdwood@gmail.com, perex@perex.cz, dl-linux-imx,
festevam@gmail.com, linux-kernel@vger.kernel.org
In-Reply-To: <20190422190213.30726-1-daniel.baluta@nxp.com>
is_slave_mode defaults to false because sai structure
that contains it is kzalloc'ed.
Anyhow, if we decide to set the following configuration
SAI slave -> SAI master, is_slave_mode will remain set on true
altough SAI being master it should be set to false.
Fix this by updating is_slave_mode for each call of
fsl_sai_set_dai_fmt.
Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com>
Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>
---
sound/soc/fsl/fsl_sai.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index b563004fb89f..d9df98975cf8 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -268,12 +268,14 @@ static int fsl_sai_set_dai_fmt_tr(struct snd_soc_dai *cpu_dai,
case SND_SOC_DAIFMT_CBS_CFS:
val_cr2 |= FSL_SAI_CR2_BCD_MSTR;
val_cr4 |= FSL_SAI_CR4_FSD_MSTR;
+ sai->is_slave_mode = false;
break;
case SND_SOC_DAIFMT_CBM_CFM:
sai->is_slave_mode = true;
break;
case SND_SOC_DAIFMT_CBS_CFM:
val_cr2 |= FSL_SAI_CR2_BCD_MSTR;
+ sai->is_slave_mode = false;
break;
case SND_SOC_DAIFMT_CBM_CFS:
val_cr4 |= FSL_SAI_CR4_FSD_MSTR;
--
2.17.1
^ permalink raw reply related
* [PATCH v3 0/3] Add runtime PM for SAI digital audio interface
From: Daniel Baluta @ 2019-04-22 19:02 UTC (permalink / raw)
To: broonie@kernel.org
Cc: Daniel Baluta, alsa-devel@alsa-project.org, timur@kernel.org,
Xiubo.Lee@gmail.com, linuxppc-dev@lists.ozlabs.org, S.j. Wang,
tiwai@suse.com, lgirdwood@gmail.com, perex@perex.cz, dl-linux-imx,
festevam@gmail.com, linux-kernel@vger.kernel.org
First patch fixes a bug by correctly setting is_slave_mode, then
second patch adds support for runtime PM and finally 3rd patch moves
clock handling from startup/shtudown function to runtime PM handlers.
Changes since v2: (after Viorel's comments)
- no need to check for is_slave_mode when enabling/disabling the clocks
because sai->mclk_streams is only set when SAI is in master mode.
Changes since v1: (after Nicolin's comments)
- added patch 1
- added fsl_sai_remove in order to call pm_runtime_disable
- only disable/enable mclk when SAI in master mode.
Daniel Baluta (2):
ASoC: fsl_sai: Update is_slave_mode with correct value
ASoC: fsl_sai: Add support for runtime pm
Shengjiu Wang (1):
ASoC: fsl_sai: Move clock operation to PM runtime
sound/soc/fsl/fsl_sai.c | 78 +++++++++++++++++++++++++++++++++--------
1 file changed, 63 insertions(+), 15 deletions(-)
--
2.17.1
^ permalink raw reply
* Re: [PATCH 2/2] ASoC: fsl: Move clock operation to PM runtime
From: Nicolin Chen @ 2019-04-22 18:15 UTC (permalink / raw)
To: Viorel Suman
Cc: alsa-devel@alsa-project.org, lgirdwood@gmail.com,
timur@kernel.org, Xiubo.Lee@gmail.com, Daniel Baluta, S.j. Wang,
linux-kernel@vger.kernel.org, tiwai@suse.com, broonie@kernel.org,
dl-linux-imx, festevam@gmail.com, perex@perex.cz,
linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1555930942.31656.4.camel@nxp.com>
On Mon, Apr 22, 2019 at 11:02:22AM +0000, Viorel Suman wrote:
> Hi Nicolin,
>
> On Sb, 2019-04-20 at 22:54 -0700, Nicolin Chen wrote:
> > On Sat, Apr 20, 2019 at 03:59:05PM +0000, Daniel Baluta wrote:
> > >
> > > Turn off/on clocks when device enters suspend/resume. This
> > > helps saving power.
> > >
> > > @@ -934,6 +933,25 @@ static int fsl_sai_runtime_suspend(struct device *dev)
> > > static int fsl_sai_runtime_resume(struct device *dev)
> > > {
> > > struct fsl_sai *sai = dev_get_drvdata(dev);
> > > + int ret;
> > > +
> > > + ret = clk_prepare_enable(sai->bus_clk);
> > > + if (ret) {
> > > + dev_err(dev, "failed to enable bus clock: %d\n", ret);
> > > + return ret;
> > > + }
> > > +
> > > + if (sai->mclk_streams & BIT(SNDRV_PCM_STREAM_PLAYBACK)) {
> > > + ret = clk_prepare_enable(sai->mclk_clk[sai->mclk_id[1]]);
> > > + if (ret)
> > > + goto disable_bus_clk;
> > > + }
> > > +
> > > + if (sai->mclk_streams & BIT(SNDRV_PCM_STREAM_CAPTURE)) {
> > > + ret = clk_prepare_enable(sai->mclk_clk[sai->mclk_id[0]]);
> > > + if (ret)
> > > + goto disable_tx_clk;
> > > + }
> > The driver only enables mclk_clks for I2S master mode. But this
> > change enables them for I2S slave mode also. It doesn't sound a
> > right thing to me since we are supposed to save power?
>
> This change does not enable them for I2S slave mode, please check "fsl_sai_hw_params"
> and "fsl_sai_hw_free" functions: the field "sai->mclk_streams" is modified only for
> the case when "if (!sai->is_slave_mode)";
Thanks for the input. This should be fine then.
Nicolin
^ permalink raw reply
* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
From: Daniel Jordan @ 2019-04-22 15:54 UTC (permalink / raw)
To: Andrew Morton
Cc: Mark Rutland, Davidlohr Bueso, kvm, Alan Tull,
Alexey Kardashevskiy, linux-fpga, linux-kernel, kvm-ppc,
Daniel Jordan, linux-mm, Alex Williamson, Moritz Fischer,
Christoph Lameter, linuxppc-dev, Wu Hao
In-Reply-To: <20190416163351.5e4e075ddfad0677239fc23a@linux-foundation.org>
On Tue, Apr 16, 2019 at 04:33:51PM -0700, Andrew Morton wrote:
Sorry for the delay, I was on vacation all last week.
> What's the status of this patchset, btw?
>
> I have a note here that
> powerpc-mmu-drop-mmap_sem-now-that-locked_vm-is-atomic.patch is to be
> updated.
Yes, the series needs a few updates. v2 should appear in the next day or two.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox