Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 3/4] mm: drop unused argument of zap_page_range()
From: kbuild test robot @ 2016-12-16 17:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: kbuild-all, Michal Hocko, Peter Zijlstra, Rik van Riel,
	Andrew Morton, linux-mm, linux-kernel
In-Reply-To: <20161216141556.75130-3-kirill.shutemov@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 4269 bytes --]

Hi Kirill,

[auto build test WARNING on mmotm/master]
[also build test WARNING on v4.9 next-20161216]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Kirill-A-Shutemov/mm-drop-zap_details-ignore_dirty/20161216-231509
base:   git://git.cmpxchg.org/linux-mmotm.git master
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   lib/crc32.c:148: warning: No description found for parameter 'tab)[256]'
   lib/crc32.c:148: warning: Excess function parameter 'tab' description in 'crc32_le_generic'
   lib/crc32.c:293: warning: No description found for parameter 'tab)[256]'
   lib/crc32.c:293: warning: Excess function parameter 'tab' description in 'crc32_be_generic'
   lib/crc32.c:1: warning: no structured comments found
   lib/idr.c:223: warning: No description found for parameter 'start'
   lib/idr.c:223: warning: No description found for parameter 'id'
   lib/idr.c:223: warning: Excess function parameter 'starting_id' description in 'ida_get_new_above'
   lib/idr.c:223: warning: Excess function parameter 'p_id' description in 'ida_get_new_above'
   lib/idr.c:1: warning: no structured comments found
       Was looking for 'IDA description'.
   lib/idr.c:223: warning: No description found for parameter 'start'
   lib/idr.c:223: warning: No description found for parameter 'id'
   lib/idr.c:223: warning: Excess function parameter 'starting_id' description in 'ida_get_new_above'
   lib/idr.c:223: warning: Excess function parameter 'p_id' description in 'ida_get_new_above'
>> mm/memory.c:1379: warning: Excess function parameter 'details' description in 'zap_page_range'
   drivers/pci/msi.c:623: warning: No description found for parameter 'affd'
   drivers/pci/msi.c:623: warning: Excess function parameter 'affinity' description in 'msi_capability_init'

vim +1379 mm/memory.c

f5cc4eef9 Al Viro            2012-03-05  1363  	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next)
4f74d2c8e Linus Torvalds     2012-05-06  1364  		unmap_single_vma(tlb, vma, start_addr, end_addr, NULL);
cddb8a5c1 Andrea Arcangeli   2008-07-28  1365  	mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
^1da177e4 Linus Torvalds     2005-04-16  1366  }
^1da177e4 Linus Torvalds     2005-04-16  1367  
^1da177e4 Linus Torvalds     2005-04-16  1368  /**
^1da177e4 Linus Torvalds     2005-04-16  1369   * zap_page_range - remove user pages in a given range
^1da177e4 Linus Torvalds     2005-04-16  1370   * @vma: vm_area_struct holding the applicable pages
eb4546bbb Randy Dunlap       2012-06-20  1371   * @start: starting address of pages to zap
^1da177e4 Linus Torvalds     2005-04-16  1372   * @size: number of bytes to zap
8a5f14a23 Kirill A. Shutemov 2015-02-10  1373   * @details: details of shared cache invalidation
f5cc4eef9 Al Viro            2012-03-05  1374   *
f5cc4eef9 Al Viro            2012-03-05  1375   * Caller must protect the VMA list
^1da177e4 Linus Torvalds     2005-04-16  1376   */
7e027b14d Linus Torvalds     2012-05-06  1377  void zap_page_range(struct vm_area_struct *vma, unsigned long start,
1ddef4086 Kirill A. Shutemov 2016-12-16  1378  		unsigned long size)
^1da177e4 Linus Torvalds     2005-04-16 @1379  {
^1da177e4 Linus Torvalds     2005-04-16  1380  	struct mm_struct *mm = vma->vm_mm;
d16dfc550 Peter Zijlstra     2011-05-24  1381  	struct mmu_gather tlb;
7e027b14d Linus Torvalds     2012-05-06  1382  	unsigned long end = start + size;
^1da177e4 Linus Torvalds     2005-04-16  1383  
^1da177e4 Linus Torvalds     2005-04-16  1384  	lru_add_drain();
2b047252d Linus Torvalds     2013-08-15  1385  	tlb_gather_mmu(&tlb, mm, start, end);
365e9c87a Hugh Dickins       2005-10-29  1386  	update_hiwater_rss(mm);
7e027b14d Linus Torvalds     2012-05-06  1387  	mmu_notifier_invalidate_range_start(mm, start, end);

:::::: The code at line 1379 was first introduced by commit
:::::: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:::::: TO: Linus Torvalds <torvalds@ppc970.osdl.org>
:::::: CC: Linus Torvalds <torvalds@ppc970.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6474 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA
From: Robert Richter @ 2016-12-16 17:10 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel@lists.infradead.org, Will Deacon,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, Catalin Marinas,
	Andrew Morton, Hanjun Guo, Yisheng Xie, James Morse
In-Reply-To: <CAKv+Gu8K+mokbjzM8EpTJoCp3XAKK1_Doq1Zx=A2CCWTT6FbYg@mail.gmail.com>

On 15.12.16 16:07:26, Ard Biesheuvel wrote:
> On 15 December 2016 at 15:39, Robert Richter <robert.richter@cavium.com> wrote:
> > I was going to do some measurements but my kernel crashes now with a
> > page fault in efi_rtc_probe():
> >
> > [   21.663393] Unable to handle kernel paging request at virtual address 20251000
> > [   21.663396] pgd = ffff000009090000
> > [   21.663401] [20251000] *pgd=0000010ffff90003
> > [   21.663402] , *pud=0000010ffff90003
> > [   21.663404] , *pmd=0000000fdc030003
> > [   21.663405] , *pte=00e8832000250707
> >
> > The sparsemem config requires the whole section to be initialized.
> > Your patches do not address this.
> >
> 
> 96000047 is a third level translation fault, and the PTE address has
> RES0 bits set. I don't see how this is related to sparsemem, could you
> explain?

When initializing the whole section it works. Maybe it uncovers
another bug. Did not yet start debugging this.

> 
> > On 14.12.16 09:11:47, Ard Biesheuvel wrote:
> >> +config HOLES_IN_ZONE
> >> +     def_bool y
> >> +     depends on NUMA
> >
> > This enables pfn_valid_within() for arm64 and causes the check for
> > each page of a section. The arm64 implementation of pfn_valid() is
> > already expensive (traversing memblock areas). Now, this is increased
> > by a factor of 2^18 for 4k page size (16384 for 64k). We need to
> > initialize the whole section to avoid that.
> >
> 
> I know that. But if you want something for -stable, we should have
> something that is correct first, and only then care about the
> performance hit (if there is one)

I would prefer to check for a performance penalty *before* we put it
into stable. There is nor risk at all with the patch I am proposing.
See: https://lkml.org/lkml/2016/12/16/412

-Robert

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 2/2] arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA
From: Robert Richter @ 2016-12-16 17:14 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Ard Biesheuvel, linux-arm-kernel, will.deacon, linux-kernel,
	linux-mm, catalin.marinas, akpm, xieyisheng1, james.morse
In-Reply-To: <125f3064-bbec-d923-ad9f-b2d152ee2c2d@linaro.org>

On 16.12.16 09:57:20, Hanjun Guo wrote:
> Hi Robert,
> 
> On 2016/12/15 23:39, Robert Richter wrote:
> >I was going to do some measurements but my kernel crashes now with a
> >page fault in efi_rtc_probe():
> >
> >[   21.663393] Unable to handle kernel paging request at virtual address 20251000
> >[   21.663396] pgd = ffff000009090000
> >[   21.663401] [20251000] *pgd=0000010ffff90003
> >[   21.663402] , *pud=0000010ffff90003
> >[   21.663404] , *pmd=0000000fdc030003
> >[   21.663405] , *pte=00e8832000250707
> >
> >The sparsemem config requires the whole section to be initialized.
> >Your patches do not address this.
> 
> This patch set is running properly on D05, both the boot and
> LTP MM stress test are ok, seems it's a different configuration
> of memory mappings in firmware, just a stupid question, which
> part is related to this problem, is it only the Reserved memory?

The problem are efi reserved regions that are no longer reserved but
marked as nomap pages. Those are excluded from page initialization
causing parts of a memory section not being initialized.

-Robert

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
From: Johannes Weiner @ 2016-12-16 17:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Nils Holland, linux-kernel, linux-mm, Chris Mason, David Sterba,
	linux-btrfs, Michal Hocko
In-Reply-To: <20161216155808.12809-3-mhocko@kernel.org>

On Fri, Dec 16, 2016 at 04:58:08PM +0100, Michal Hocko wrote:
> @@ -1013,7 +1013,7 @@ bool out_of_memory(struct oom_control *oc)
>  	 * make sure exclude 0 mask - all other users should have at least
>  	 * ___GFP_DIRECT_RECLAIM to get here.
>  	 */
> -	if (oc->gfp_mask && !(oc->gfp_mask & (__GFP_FS|__GFP_NOFAIL)))
> +	if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS))
>  		return true;

This makes sense, we should go back to what we had here. Because it's
not that the reported OOMs are premature - there is genuinely no more
memory reclaimable from the allocating context - but that this class
of allocations should never invoke the OOM killer in the first place.

> @@ -3737,6 +3752,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		 */
>  		WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
>  
> +		/*
> +		 * Help non-failing allocations by giving them access to memory
> +		 * reserves but do not use ALLOC_NO_WATERMARKS because this
> +		 * could deplete whole memory reserves which would just make
> +		 * the situation worse
> +		 */
> +		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
> +		if (page)
> +			goto got_pg;
> +

But this should be a separate patch, IMO.

Do we observe GFP_NOFS lockups when we don't do this? Don't we risk
premature exhaustion of the memory reserves, and it's better to wait
for other reclaimers to make some progress instead? Should we give
reserve access to all GFP_NOFS allocations, or just the ones from a
reclaim/cleaning context? All that should go into the changelog of a
separate allocation booster patch, I think.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 1/2] bpf: do not use KMALLOC_SHIFT_MAX
From: Alexei Starovoitov @ 2016-12-16 18:02 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Cristopher Lameter, Andrew Morton, Michal Hocko,
	Alexei Starovoitov, netdev, Daniel Borkmann
In-Reply-To: <20161215164722.21586-2-mhocko@kernel.org>

On Thu, Dec 15, 2016 at 05:47:21PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> 01b3f52157ff ("bpf: fix allocation warnings in bpf maps and integer
> overflow") has added checks for the maximum allocateable size. It
> (ab)used KMALLOC_SHIFT_MAX for that purpose. While this is not incorrect
> it is not very clean because we already have KMALLOC_MAX_SIZE for this
> very reason so let's change both checks to use KMALLOC_MAX_SIZE instead.
> 
> Cc: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Nack until the patches 1 and 2 are reversed.

The bug that patch 2 fixes was the reason we used KMALLOC_SHIFT_MAX - 1 here
instead of KMALLOC_MAX_SIZE,
so you have to fix the kmalloc vs __alloc_pages_slowpath discrepancy first.

> ---
>  kernel/bpf/arraymap.c | 2 +-
>  kernel/bpf/hashtab.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index a2ac051c342f..229a5d5df977 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -56,7 +56,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
>  	    attr->value_size == 0 || attr->map_flags)
>  		return ERR_PTR(-EINVAL);
>  
> -	if (attr->value_size >= 1 << (KMALLOC_SHIFT_MAX - 1))
> +	if (attr->value_size > KMALLOC_MAX_SIZE)
>  		/* if value_size is bigger, the user space won't be able to
>  		 * access the elements.
>  		 */
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index ad1bc67aff1b..c5ec7dc71c84 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -181,7 +181,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>  		 */
>  		goto free_htab;
>  
> -	if (htab->map.value_size >= (1 << (KMALLOC_SHIFT_MAX - 1)) -
> +	if (htab->map.value_size >= KMALLOC_MAX_SIZE -
>  	    MAX_BPF_STACK - sizeof(struct htab_elem))
>  		/* if value_size is bigger, the user space won't be able to
>  		 * access the elements via bpf syscall. This check also makes
> -- 
> 2.10.2
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: OOM: Better, but still there on 4.9
From: Chris Mason @ 2016-12-16 18:15 UTC (permalink / raw)
  To: Michal Hocko, Nils Holland
  Cc: linux-kernel, linux-mm, David Sterba, linux-btrfs
In-Reply-To: <20161216073941.GA26976@dhcp22.suse.cz>

On 12/16/2016 02:39 AM, Michal Hocko wrote:
> [CC linux-mm and btrfs guys]
>
> On Thu 15-12-16 23:57:04, Nils Holland wrote:
> [...]
>> Of course, none of this are workloads that are new / special in any
>> way - prior to 4.8, I never experienced any issues doing the exact
>> same things.
>>
>> Dec 15 19:02:16 teela kernel: kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
>> Dec 15 19:02:18 teela kernel: kworker/u4:5 cpuset=/ mems_allowed=0
>> Dec 15 19:02:18 teela kernel: CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
>> Dec 15 19:02:18 teela kernel: Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
>> Dec 15 19:02:18 teela kernel: Workqueue: writeback wb_workfn (flush-btrfs-1)
>> Dec 15 19:02:18 teela kernel:  eff0b604 c142bcce eff0b734 00000000 eff0b634 c1163332 00000000 00000292
>> Dec 15 19:02:18 teela kernel:  eff0b634 c1431876 eff0b638 e7fb0b00 e7fa2900 e7fa2900 c1b58785 eff0b734
>> Dec 15 19:02:18 teela kernel:  eff0b678 c110795f c1043895 eff0b664 c11075c7 00000007 00000000 00000000
>> Dec 15 19:02:18 teela kernel: Call Trace:
>> Dec 15 19:02:18 teela kernel:  [<c142bcce>] dump_stack+0x47/0x69
>> Dec 15 19:02:18 teela kernel:  [<c1163332>] dump_header+0x60/0x178
>> Dec 15 19:02:18 teela kernel:  [<c1431876>] ? ___ratelimit+0x86/0xe0
>> Dec 15 19:02:18 teela kernel:  [<c110795f>] oom_kill_process+0x20f/0x3d0
>> Dec 15 19:02:18 teela kernel:  [<c1043895>] ? has_capability_noaudit+0x15/0x20
>> Dec 15 19:02:18 teela kernel:  [<c11075c7>] ? oom_badness.part.13+0xb7/0x130
>> Dec 15 19:02:18 teela kernel:  [<c1107df9>] out_of_memory+0xd9/0x260
>> Dec 15 19:02:18 teela kernel:  [<c110ba0b>] __alloc_pages_nodemask+0xbfb/0xc80
>> Dec 15 19:02:18 teela kernel:  [<c110414d>] pagecache_get_page+0xad/0x270
>> Dec 15 19:02:18 teela kernel:  [<c13664a6>] alloc_extent_buffer+0x116/0x3e0
>> Dec 15 19:02:18 teela kernel:  [<c1334a2e>] btrfs_find_create_tree_block+0xe/0x10
>> Dec 15 19:02:18 teela kernel:  [<c132a57f>] btrfs_alloc_tree_block+0x1ef/0x5f0
>> Dec 15 19:02:18 teela kernel:  [<c130f7c3>] __btrfs_cow_block+0x143/0x5f0
>> Dec 15 19:02:18 teela kernel:  [<c130fe1a>] btrfs_cow_block+0x13a/0x220
>> Dec 15 19:02:18 teela kernel:  [<c13132f1>] btrfs_search_slot+0x1d1/0x870
>> Dec 15 19:02:18 teela kernel:  [<c132fcdd>] btrfs_lookup_file_extent+0x4d/0x60
>> Dec 15 19:02:18 teela kernel:  [<c1354fe6>] __btrfs_drop_extents+0x176/0x1070
>> Dec 15 19:02:18 teela kernel:  [<c1150377>] ? kmem_cache_alloc+0xb7/0x190
>> Dec 15 19:02:18 teela kernel:  [<c133dbb5>] ? start_transaction+0x65/0x4b0
>> Dec 15 19:02:18 teela kernel:  [<c1150597>] ? __kmalloc+0x147/0x1e0
>> Dec 15 19:02:18 teela kernel:  [<c1345005>] cow_file_range_inline+0x215/0x6b0
>> Dec 15 19:02:18 teela kernel:  [<c13459fc>] cow_file_range.isra.49+0x55c/0x6d0
>> Dec 15 19:02:18 teela kernel:  [<c1361795>] ? lock_extent_bits+0x75/0x1e0
>> Dec 15 19:02:18 teela kernel:  [<c1346d51>] run_delalloc_range+0x441/0x470
>> Dec 15 19:02:18 teela kernel:  [<c13626e4>] writepage_delalloc.isra.47+0x144/0x1e0
>> Dec 15 19:02:18 teela kernel:  [<c1364548>] __extent_writepage+0xd8/0x2b0
>> Dec 15 19:02:18 teela kernel:  [<c1365c4c>] extent_writepages+0x25c/0x380
>> Dec 15 19:02:18 teela kernel:  [<c1342cd0>] ? btrfs_real_readdir+0x610/0x610
>> Dec 15 19:02:18 teela kernel:  [<c133ff0f>] btrfs_writepages+0x1f/0x30
>> Dec 15 19:02:18 teela kernel:  [<c110ff85>] do_writepages+0x15/0x40
>> Dec 15 19:02:18 teela kernel:  [<c1190a95>] __writeback_single_inode+0x35/0x2f0
>> Dec 15 19:02:18 teela kernel:  [<c119112e>] writeback_sb_inodes+0x16e/0x340
>> Dec 15 19:02:18 teela kernel:  [<c119145a>] wb_writeback+0xaa/0x280
>> Dec 15 19:02:18 teela kernel:  [<c1191de8>] wb_workfn+0xd8/0x3e0
>> Dec 15 19:02:18 teela kernel:  [<c104fd34>] process_one_work+0x114/0x3e0
>> Dec 15 19:02:18 teela kernel:  [<c1050b4f>] worker_thread+0x2f/0x4b0
>> Dec 15 19:02:18 teela kernel:  [<c1050b20>] ? create_worker+0x180/0x180
>> Dec 15 19:02:18 teela kernel:  [<c10552e7>] kthread+0x97/0xb0
>> Dec 15 19:02:18 teela kernel:  [<c1055250>] ? __kthread_parkme+0x60/0x60
>> Dec 15 19:02:18 teela kernel:  [<c19b5cb7>] ret_from_fork+0x1b/0x28
>> Dec 15 19:02:18 teela kernel: Mem-Info:
>> Dec 15 19:02:18 teela kernel: active_anon:58685 inactive_anon:90 isolated_anon:0
>>                                active_file:274324 inactive_file:281962 isolated_file:0
>
> OK, so there is still some anonymous memory that could be swapped out
> and quite a lot of page cache. This might be harder to reclaim because
> the allocation is a GFP_NOFS request which is limited in its reclaim
> capabilities. It might be possible that those pagecache pages are pinned
> in some way by the the filesystem.
>
>>                                unevictable:0 dirty:649 writeback:0 unstable:0
>>                                slab_reclaimable:40662 slab_unreclaimable:17754
>>                                mapped:7382 shmem:202 pagetables:351 bounce:0
>>                                free:206736 free_pcp:332 free_cma:0
>> Dec 15 19:02:18 teela kernel: Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
>> Dec 15 19:02:18 teela kernel: DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
>> Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 813 3474 3474
>> Dec 15 19:02:18 teela kernel: Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
>
> And this shows that there is no anonymous memory in the lowmem zone.
> Note that this request cannot use the highmem zone so no swap out would
> help. So if we are not able to reclaim those pages on the file LRU then
> we are out of luck
>
>> Dec 15 19:02:18 teela kernel: lowmem_reserve[]: 0 0 21292 21292
>> Dec 15 19:02:18 teela kernel: HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB
>
> That being said, the OOM killer invocation is clearly pointless and
> pre-mature. We normally do not invoke it normally for GFP_NOFS requests
> exactly for these reasons. But this is GFP_NOFS|__GFP_NOFAIL which
> behaves differently. I am about to change that but my last attempt [1]
> has to be rethought.
>
> Now another thing is that the __GFP_NOFAIL which has this nasty side
> effect has been introduced by me d1b5c5671d01 ("btrfs: Prevent from
> early transaction abort") in 4.3 so I am quite surprised that this has
> shown up only in 4.8. Anyway there might be some other changes in the
> btrfs which could make it more subtle.
>
> I believe the right way to go around this is to pursue what I've started
> in [1]. I will try to prepare something for testing today for you. Stay
> tuned. But I would be really happy if somebody from the btrfs camp could
> check the NOFS aspect of this allocation. We have already seen
> allocation stalls from this path quite recently

Just double checking, are you asking why we're using GFP_NOFS to avoid 
going into btrfs from the btrfs writepages call, or are you asking why 
we aren't allowing highmem?

For why we're not using highmem, it goes back to 2011:

commit a65917156e345946dbde3d7effd28124c6d6a8c2
Btrfs: stop using highmem for extent_buffers

The short answer is that kmap + shared caching pointer between threads 
made it hugely complex.  I gave up and dropped the highmem part.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [RFC PATCH 01/14] sparc64: placeholder for needed mmu shared context patching
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

MMU shared context patching will be supported on Sun4V platforms with
Niagara 2 or later processors.  There will be a need for kernel patching
based on this criteria.  This 'patch' simply adds a comment as a reminder
and placeholder to add that support.

For now, MMU shared context support will be determined at follows:
- sun4v patching will be used for shared context support.  This is too
  general as most but not all sun4v platforms contain the required
  processors.
- A new config option (CONFIG_SHARED_MMU_CTX) is added

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/sparc/kernel/setup_64.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 6b7331d..ffda69b 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -276,6 +276,17 @@ void sun_m7_patch_2insn_range(struct sun4v_2insn_patch_entry *start,
 	}
 }
 
+/*
+ * FIXME - TODO
+ *
+ * Shared MMU context support will only be provided on sun4v platforms
+ * with Niagara 2 or later processors.  A patching mechanism for this
+ * this type of support will need to be implemented.  For now, the code
+ * is making the too general assumption of supporting shared context on
+ * all sun4v platforms.  This is a placeholder to add correct support
+ * at a later time.
+ */
+
 static void __init sun4v_patch(void)
 {
 	extern void sun4v_hvapi_init(void);
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 00/14] sparc64 shared context/TLB support
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz

In Sparc mm code today, each address space is assigned a unique context
identifier.  This context ID is stored in context register 0 of the MMU.
This same context ID is stored in TLB entries.  When the MMU is searching
for a virtual address translation, the context ID as well as the virtual
address must match for a TLB hit.

Beginning with Sparc Niagara 2 processors, the MMU contains an additional
context register (register 1).  When searching the TLB, the MMU will find
a match if the virtual address matches and the ID contained in either
context register 0 -OR- context register 1 matches.

In the Linux kernel today, only context register 0 is set and used by
the MMU.  Solaris has made use of the additional context register for shared
mappings.  If two tasks share an appropriate mapping, then both tasks set
context register 1 to the same value and associate that value with the
shared mapping.  In this way, both tasks can use the same TLB entries for
pages of the shared mapping.

This RFC adds support for the additional context register, and extends the
mmap and System V shared memory system calls so that an application can
request shared context mappings.  At a very high level, this works as follows:
- An application passes a new SHARED_CTX flag to mmap or shmat
- The vma associated with the mapping is marked with a SHARED_CTX flag
  - When a SHARED_CTX marked vma is first created, all other vma's mapping
    the same underlying object are searched looking for a match that:
	1) Is also marked SHARED_CTX 
	2) Is mapped at the same virtual address
  - If a match is found, the new vma shares a context ID with the existing vma.
  - If no match is found, a context ID is allocated for the new vma
- sparc specific code associates the context ID with pages in the shared
  mappings.

This RFC patch series limits a task to having only a single shared context
vma.  Shared context vmas in different processes must match exactly (start
and length) to be shared.  In addition, shared context support is only
provided for huge page (hugetlb) mappings.  These and other restrictions can
be relaxed as the code is further developed.

Most of the code in this patch series is sparc specific for management of
the new context ID and associated TSB entries.  However, there is arch
independent code which needs to enable the flagging of mappings which request
shared context.

This is early proof of concept code.  It is not polished, and there is need
for much more work.  There are even FIXME comments in the code.  My hope is
that it is sufficiently readable to start a discussion about the general
direction to enable such functionality.

It does function, and with perf you can see a reduction in TLB misses for
shared context mappings.  A simple test program which has two tasks touch
pages in a shared mapping has the following dTLB miss rates.

Testing		Normal Mapping			Shared Context Mapping
Rounds		dTLB-load-misses		dTLB-load-misses
1			771				834
10		      1,651				881
100		     10,422				874
1000		     97,992				958
10000		    975,910				963
100000	  	  9,719,193			      1,017
1000000		 97,941,327			      4,148

Mike Kravetz (14):
  sparc64: placeholder for needed mmu shared context patching
  sparc64: add new fields to mmu context for shared context support
  sparc64: routines for basic mmu shared context structure management
  sparc64: load shared id into context register 1
  sparc64: Add PAGE_SHR_CTX flag
  sparc64: general shared context tsb creation and support
  sparc64: move COMPUTE_TAG_TARGET and COMPUTE_TSB_PTR to header file
  sparc64: shared context tsb handling at context switch time
  sparc64: TLB/TSB miss handling for shared context
  mm: add shared context to vm_area_struct
  sparc64: add routines to look for vmsa which can share context
  mm: add mmap and shmat arch hooks for shared context
  sparc64 mm: add shared context support to mmap() and shmat() APIs
  sparc64: add SHARED_MMU_CTX Kconfig option

 arch/powerpc/include/asm/mmu_context.h   |  12 ++
 arch/s390/include/asm/mmu_context.h      |  12 ++
 arch/sparc/Kconfig                       |   3 +
 arch/sparc/include/asm/hugetlb.h         |   4 +
 arch/sparc/include/asm/mman.h            |   6 +
 arch/sparc/include/asm/mmu_64.h          |  36 +++++-
 arch/sparc/include/asm/mmu_context_64.h  | 139 ++++++++++++++++++++++--
 arch/sparc/include/asm/page_64.h         |   1 +
 arch/sparc/include/asm/pgtable_64.h      |  13 +++
 arch/sparc/include/asm/spitfire.h        |   2 +
 arch/sparc/include/asm/tlb_64.h          |   3 +
 arch/sparc/include/asm/trap_block.h      |   3 +-
 arch/sparc/include/asm/tsb.h             |  40 +++++++
 arch/sparc/include/uapi/asm/mman.h       |   1 +
 arch/sparc/kernel/fpu_traps.S            |  63 +++++++++++
 arch/sparc/kernel/head_64.S              |   2 +-
 arch/sparc/kernel/rtrap_64.S             |  20 ++++
 arch/sparc/kernel/setup_64.c             |  11 ++
 arch/sparc/kernel/smp_64.c               |  22 ++++
 arch/sparc/kernel/sun4v_tlb_miss.S       |  37 ++-----
 arch/sparc/kernel/sys_sparc_64.c         |  17 +++
 arch/sparc/kernel/trampoline_64.S        |  20 ++++
 arch/sparc/kernel/tsb.S                  | 172 +++++++++++++++++++++++------
 arch/sparc/mm/fault_64.c                 |  10 ++
 arch/sparc/mm/hugetlbpage.c              |  94 +++++++++++++++-
 arch/sparc/mm/init_64.c                  | 181 ++++++++++++++++++++++++++++++-
 arch/sparc/mm/tsb.c                      |  95 +++++++++++++++-
 arch/unicore32/include/asm/mmu_context.h |  12 ++
 arch/x86/include/asm/mmu_context.h       |  12 ++
 include/asm-generic/mm_hooks.h           |  18 ++-
 include/linux/mm.h                       |   1 +
 include/linux/mm_types.h                 |  13 +++
 include/uapi/linux/shm.h                 |   1 +
 ipc/shm.c                                |  13 +++
 mm/hugetlb.c                             |   9 ++
 mm/mmap.c                                |  10 ++
 36 files changed, 1018 insertions(+), 90 deletions(-)

-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [RFC PATCH 02/14] sparc64: add new fields to mmu context for shared context support
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

Add new fields to the mm_context structure to support shared context.
Instead of a simple context ID, add a pointer to a structure with a
reference count.  This is needed as multiple tasks will share the
context ID.

Pages using the shared context ID will reside in a separate TSB.  So
changes are made to increase the number of TSBs as well.  Note that
only support for context sharing of huge pages is provided.  Therefore,
no base page size shared context TSB.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/sparc/include/asm/mmu_64.h         | 36 +++++++++++++++++++++++++++++----
 arch/sparc/include/asm/mmu_context_64.h |  8 ++++----
 2 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/arch/sparc/include/asm/mmu_64.h b/arch/sparc/include/asm/mmu_64.h
index f7de0db..edf8663 100644
--- a/arch/sparc/include/asm/mmu_64.h
+++ b/arch/sparc/include/asm/mmu_64.h
@@ -57,6 +57,13 @@
 	 (!(((__ctx.sparc64_ctx_val) ^ tlb_context_cache) & CTX_VERSION_MASK))
 #define CTX_HWBITS(__ctx)	((__ctx.sparc64_ctx_val) & CTX_HW_MASK)
 #define CTX_NRBITS(__ctx)	((__ctx.sparc64_ctx_val) & CTX_NR_MASK)
+#define	SHARED_CTX_VALID(__ctx)	(__ctx.shared_ctx && \
+	 (!(((__ctx.shared_ctx->shared_ctx_val) ^ tlb_context_cache) & \
+	   CTX_VERSION_MASK)))
+#define	SHARED_CTX_HWBITS(__ctx)	\
+	 ((__ctx.shared_ctx->shared_ctx_val) & CTX_HW_MASK)
+#define	SHARED_CTX_NRBITS(__ctx)	\
+	 ((__ctx.shared_ctx->shared_ctx_val) & CTX_NR_MASK)
 
 #ifndef __ASSEMBLY__
 
@@ -80,24 +87,45 @@ struct tsb_config {
 	unsigned long		tsb_map_pte;
 };
 
-#define MM_TSB_BASE	0
+#if defined(CONFIG_SHARED_MMU_CTX)
+struct shared_mmu_ctx {
+	atomic_t	refcount;
+	unsigned long	shared_ctx_val;
+};
+
+#define MM_TSB_HUGE_SHARED	0
+#define MM_TSB_BASE		1
+#define MM_TSB_HUGE		2
+#define MM_NUM_TSBS		3
+#else
 
+#define MM_TSB_BASE		0
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
-#define MM_TSB_HUGE	1
-#define MM_NUM_TSBS	2
+#define MM_TSB_HUGE		1
+#define MM_TSB_HUGE_SHARED	1	/* Simplifies conditions in code */
+#define MM_NUM_TSBS		2
 #else
-#define MM_NUM_TSBS	1
+#define MM_NUM_TSBS		1
+#endif
 #endif
 
 typedef struct {
 	spinlock_t		lock;
 	unsigned long		sparc64_ctx_val;
+#if defined(CONFIG_SHARED_MMU_CTX)
+	struct shared_mmu_ctx	*shared_ctx;
+	unsigned long		shared_hugetlb_pte_count;
+#endif
 	unsigned long		hugetlb_pte_count;
 	unsigned long		thp_pte_count;
 	struct tsb_config	tsb_block[MM_NUM_TSBS];
 	struct hv_tsb_descr	tsb_descr[MM_NUM_TSBS];
 } mm_context_t;
 
+#define	mm_shared_ctx_val(mm)					\
+	((mm)->context.shared_ctx ?				\
+	 (mm)->context.shared_ctx->shared_ctx_val : 0UL)
+
 #endif /* !__ASSEMBLY__ */
 
 #define TSB_CONFIG_TSB		0x00
diff --git a/arch/sparc/include/asm/mmu_context_64.h b/arch/sparc/include/asm/mmu_context_64.h
index b84be67..d031799 100644
--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -35,15 +35,15 @@ void __tsb_context_switch(unsigned long pgd_pa,
 static inline void tsb_context_switch(struct mm_struct *mm)
 {
 	__tsb_context_switch(__pa(mm->pgd),
-			     &mm->context.tsb_block[0],
+			     &mm->context.tsb_block[MM_TSB_BASE],
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
-			     (mm->context.tsb_block[1].tsb ?
-			      &mm->context.tsb_block[1] :
+			     (mm->context.tsb_block[MM_TSB_HUGE].tsb ?
+			      &mm->context.tsb_block[MM_TSB_HUGE] :
 			      NULL)
 #else
 			     NULL
 #endif
-			     , __pa(&mm->context.tsb_descr[0]));
+			     , __pa(&mm->context.tsb_descr[MM_TSB_BASE]));
 }
 
 void tsb_grow(struct mm_struct *mm,
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 03/14] sparc64: routines for basic mmu shared context structure management
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

Add routines for basic management of mmu shared context data structures.
These routines have to do with allocation/deallocation and get/put
of the structures.  The structures themselves will come from a new
kmem cache.

FIXMEs were added to then code where additional work is needed.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/sparc/include/asm/mmu_context_64.h |  6 +++
 arch/sparc/include/asm/tlb_64.h         |  3 ++
 arch/sparc/include/asm/tsb.h            |  2 +
 arch/sparc/kernel/smp_64.c              | 22 +++++++++
 arch/sparc/mm/init_64.c                 | 84 +++++++++++++++++++++++++++++++--
 arch/sparc/mm/tsb.c                     | 54 +++++++++++++++++++++
 6 files changed, 168 insertions(+), 3 deletions(-)

diff --git a/arch/sparc/include/asm/mmu_context_64.h b/arch/sparc/include/asm/mmu_context_64.h
index d031799..acaea6d 100644
--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -18,6 +18,12 @@ extern unsigned long tlb_context_cache;
 extern unsigned long mmu_context_bmap[];
 
 void get_new_mmu_context(struct mm_struct *mm);
+#if defined(CONFIG_SHARED_MMU_CTX)
+void get_new_mmu_shared_context(struct mm_struct *mm);
+void put_shared_context(struct mm_struct *mm);
+void set_mm_shared_ctx(struct mm_struct *mm, struct shared_mmu_ctx *ctx);
+void destroy_shared_context(struct mm_struct *mm);
+#endif
 #ifdef CONFIG_SMP
 void smp_new_mmu_context_version(void);
 #else
diff --git a/arch/sparc/include/asm/tlb_64.h b/arch/sparc/include/asm/tlb_64.h
index 4cb392f..e348a1b 100644
--- a/arch/sparc/include/asm/tlb_64.h
+++ b/arch/sparc/include/asm/tlb_64.h
@@ -14,6 +14,9 @@ void smp_flush_tlb_pending(struct mm_struct *,
 
 #ifdef CONFIG_SMP
 void smp_flush_tlb_mm(struct mm_struct *mm);
+#if defined(CONFIG_SHARED_MMU_CTX)
+void smp_flush_shared_tlb_mm(struct mm_struct *mm);
+#endif
 #define do_flush_tlb_mm(mm) smp_flush_tlb_mm(mm)
 #else
 #define do_flush_tlb_mm(mm) __flush_tlb_mm(CTX_HWBITS(mm->context), SECONDARY_CONTEXT)
diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
index 32258e0..311cd4e 100644
--- a/arch/sparc/include/asm/tsb.h
+++ b/arch/sparc/include/asm/tsb.h
@@ -72,6 +72,8 @@ struct tsb_phys_patch_entry {
 	unsigned int	insn;
 };
 extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
+
+extern struct kmem_cache *shared_mmu_ctx_cachep __read_mostly;
 #endif
 #define TSB_LOAD_QUAD(TSB, REG)	\
 661:	ldda		[TSB] ASI_NUCLEUS_QUAD_LDD, REG; \
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 8182f7c..c0f23ee 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1078,6 +1078,28 @@ void smp_flush_tlb_mm(struct mm_struct *mm)
 	put_cpu();
 }
 
+#if defined(CONFIG_SHARED_MMU_CTX)
+/*
+ * Called when last reference to shared context is dropped.  Flush
+ * all TLB entries associated with the shared clontext ID.
+ *
+ * FIXME
+ * Future optimization would be to store cpumask in shared context
+ * structure and only make cross call to those cpus.
+ */
+void smp_flush_shared_tlb_mm(struct mm_struct *mm)
+{
+	u32 ctx = SHARED_CTX_HWBITS(mm->context);
+
+	(void)get_cpu();		/* prevent preemption */
+
+	smp_cross_call(&xcall_flush_tlb_mm, ctx, 0, 0);
+	__flush_tlb_mm(ctx, SECONDARY_CONTEXT);
+
+	put_cpu();
+}
+#endif
+
 struct tlb_pending_info {
 	unsigned long ctx;
 	unsigned long nr;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 37aa537..bb9a6ee 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -673,14 +673,24 @@ DECLARE_BITMAP(mmu_context_bmap, MAX_CTX_NR);
  *
  * Always invoked with interrupts disabled.
  */
-void get_new_mmu_context(struct mm_struct *mm)
+static void __get_new_mmu_context_common(struct mm_struct *mm, bool shared)
 {
 	unsigned long ctx, new_ctx;
 	unsigned long orig_pgsz_bits;
 	int new_version;
 
 	spin_lock(&ctx_alloc_lock);
-	orig_pgsz_bits = (mm->context.sparc64_ctx_val & CTX_PGSZ_MASK);
+#if defined(CONFIG_SHARED_MMU_CTX)
+	if (shared)
+		/*
+		 * Note that we are only called from get_new_mmu_shared_context
+		 * which guarantees the existence of shared_ctx structure.
+		 */
+		orig_pgsz_bits = (mm->context.shared_ctx->shared_ctx_val &
+				  CTX_PGSZ_MASK);
+	else
+#endif
+		orig_pgsz_bits = (mm->context.sparc64_ctx_val & CTX_PGSZ_MASK);
 	ctx = (tlb_context_cache + 1) & CTX_NR_MASK;
 	new_ctx = find_next_zero_bit(mmu_context_bmap, 1 << CTX_NR_BITS, ctx);
 	new_version = 0;
@@ -714,13 +724,81 @@ void get_new_mmu_context(struct mm_struct *mm)
 	new_ctx |= (tlb_context_cache & CTX_VERSION_MASK);
 out:
 	tlb_context_cache = new_ctx;
-	mm->context.sparc64_ctx_val = new_ctx | orig_pgsz_bits;
+#if defined(CONFIG_SHARED_MMU_CTX)
+	if (shared)
+		mm->context.shared_ctx->shared_ctx_val =
+					new_ctx | orig_pgsz_bits;
+	else
+#endif
+		mm->context.sparc64_ctx_val = new_ctx | orig_pgsz_bits;
 	spin_unlock(&ctx_alloc_lock);
 
+	/*
+	 * FIXME
+	 * Not sure if the case where a shared context ID changed (not just
+	 * newly allocated) is handled properly.  May need to modify
+	 * smp_new_mmu_context_version to handle correctly.
+	 */
 	if (unlikely(new_version))
 		smp_new_mmu_context_version();
 }
 
+void get_new_mmu_context(struct mm_struct *mm)
+{
+	__get_new_mmu_context_common(mm, false);
+}
+
+#if defined(CONFIG_SHARED_MMU_CTX)
+void get_new_mmu_shared_context(struct mm_struct *mm)
+{
+	/*
+	 * For now, we only support one shared context mapping per mm.  So,
+	 * if mm->context.shared_ctx  is already set, we have a bug
+	 *
+	 * Note that we are called from mmap with mmap_sem held.  Thus,
+	 * there can not be two threads racing to initialize.
+	 */
+	BUG_ON(mm->context.shared_ctx);
+
+	mm->context.shared_ctx = kmem_cache_alloc(shared_mmu_ctx_cachep,
+						GFP_NOWAIT);
+	if (!mm->context.shared_ctx)
+		return;
+
+	__get_new_mmu_context_common(mm, true);
+}
+
+void put_shared_context(struct mm_struct *mm)
+{
+	if (!mm->context.shared_ctx)
+		return;
+
+	if (atomic_dec_and_test(&mm->context.shared_ctx->refcount)) {
+		smp_flush_shared_tlb_mm(mm);
+		destroy_shared_context(mm);
+		kmem_cache_free(shared_mmu_ctx_cachep, mm->context.shared_ctx);
+	}
+
+	/*
+	 * For now we assume/expect only one shared context reference per mm
+	 */
+	mm->context.shared_ctx = NULL;
+}
+
+void set_mm_shared_ctx(struct mm_struct *mm, struct shared_mmu_ctx *ctx)
+{
+	BUG_ON(mm->context.shared_ctx || !ctx);
+
+	/*
+	 * Note that we are called with mmap_lock held on underlying
+	 * mapping.  Hence, the ctx structure pointed to by the matching
+	 * vma can not go away.
+	 */
+	atomic_inc(&ctx->refcount);
+	mm->context.shared_ctx = ctx;
+}
+#endif
+
 static int numa_enabled = 1;
 static int numa_debug;
 
diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
index e20fbba..8c2d148 100644
--- a/arch/sparc/mm/tsb.c
+++ b/arch/sparc/mm/tsb.c
@@ -277,6 +277,8 @@ static void setup_tsb_params(struct mm_struct *mm, unsigned long tsb_idx, unsign
 	}
 }
 
+struct kmem_cache *shared_mmu_ctx_cachep __read_mostly;
+
 struct kmem_cache *pgtable_cache __read_mostly;
 
 static struct kmem_cache *tsb_caches[8] __read_mostly;
@@ -292,6 +294,27 @@ static const char *tsb_cache_names[8] = {
 	"tsb_1MB",
 };
 
+#if defined(CONFIG_SHARED_MMU_CTX)
+static void init_once_shared_mmu_ctx(void *mem)
+{
+	struct shared_mmu_ctx *ctx = (struct shared_mmu_ctx *) mem;
+
+	ctx->shared_ctx_val = 0;
+	atomic_set(&ctx->refcount, 1);
+}
+
+static void __init sun4v_shared_mmu_ctx_init(void)
+{
+	shared_mmu_ctx_cachep = kmem_cache_create("shared_mmu_ctx_cache",
+					sizeof(struct shared_mmu_ctx),
+					0,
+					SLAB_HWCACHE_ALIGN|SLAB_PANIC,
+					init_once_shared_mmu_ctx);
+}
+#else
+static void __init sun4v_shared_mmu_ctx_init(void) { }
+#endif
+
 void __init pgtable_cache_init(void)
 {
 	unsigned long i;
@@ -317,6 +340,13 @@ void __init pgtable_cache_init(void)
 			prom_halt();
 		}
 	}
+
+	if (tlb_type == hypervisor)
+		/*
+		 * FIXME - shared context enables/supported on most
+		 * but not all sun4v priocessors
+		 */
+		sun4v_shared_mmu_ctx_init();
 }
 
 int sysctl_tsb_ratio = -2;
@@ -547,6 +577,30 @@ static void tsb_destroy_one(struct tsb_config *tp)
 	tp->tsb_reg_val = 0UL;
 }
 
+#if defined(CONFIG_SHARED_MMU_CTX)
+void destroy_shared_context(struct mm_struct *mm)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ctx_alloc_lock, flags);
+
+	if (SHARED_CTX_VALID(mm->context)) {
+		unsigned long nr = SHARED_CTX_NRBITS(mm->context);
+
+		mmu_context_bmap[nr>>6] &= ~(1UL << (nr & 63));
+	}
+
+	spin_unlock_irqrestore(&ctx_alloc_lock, flags);
+
+#if defined(CONFIG_SHARED_MMU_CTX)
+	/*
+	 * Any shared context should have been cleaned up by now
+	 */
+	BUG_ON(SHARED_CTX_VALID(mm->context));
+#endif
+}
+#endif
+
 void destroy_context(struct mm_struct *mm)
 {
 	unsigned long flags, i;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 05/14] sparc64: Add PAGE_SHR_CTX flag
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

This new page flag is used to identify pages which are associated with
a shared context ID.  It is needed at page fault time when we only
have access to the PTE and need to determine whether the associated
TSB entry should be associated with the regular ot shared context TSB.

A new helper routine is_sharedctx_pte() is also added.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/sparc/include/asm/pgtable_64.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 1fb317f..f2fd088 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -166,6 +166,7 @@ bool kern_addr_valid(unsigned long addr);
 #define _PAGE_EXEC_4V	  _AC(0x0000000000000080,UL) /* Executable Page      */
 #define _PAGE_W_4V	  _AC(0x0000000000000040,UL) /* Writable             */
 #define _PAGE_SOFT_4V	  _AC(0x0000000000000030,UL) /* Software bits        */
+#define _PAGE_SHR_CTX_4V  _AC(0x0000000000000020,UL) /* Shared Context       */
 #define _PAGE_PRESENT_4V  _AC(0x0000000000000010,UL) /* Present              */
 #define _PAGE_RESV_4V	  _AC(0x0000000000000008,UL) /* Reserved             */
 #define _PAGE_SZ16GB_4V	  _AC(0x0000000000000007,UL) /* 16GB Page            */
@@ -426,6 +427,18 @@ static inline bool is_hugetlb_pte(pte_t pte)
 }
 #endif
 
+#if defined(CONFIG_SHARED_MMU_CTX)
+static inline bool is_sharedctx_pte(pte_t pte)
+{
+	return !!(pte_val(pte) & _PAGE_SHR_CTX_4V);
+}
+#else
+static inline bool is_sharedctx_pte(pte_t pte)
+{
+	return false;
+}
+#endif
+
 static inline pte_t pte_mkdirty(pte_t pte)
 {
 	unsigned long val = pte_val(pte), tmp;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 06/14] sparc64: general shared context tsb creation and support
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

Take into account the shared context TSB when creating and updating
TSBs.  Existing routines are modified to key off the TSB index or
PTE flag (_PAGE_SHR_CTX_4V) to determine this is a shared context
operation.

With shared context support the sun4v TSB descriptor array could
contain a 'hole' if there is a shared context TSB and no huge page
TSB. An array with a hole can not be bassed to the hypervisor, so
make sure no hole exists in the array.

For shared context TSBs, the context index in the hypervisor descriptor
structure is set to 1.  This indicates the context ID stored in context
register 1 should be used for TLB matching.

This commit does NOT load the shared context TSB into the hv MMU.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/sparc/mm/fault_64.c    | 10 ++++++++++
 arch/sparc/mm/hugetlbpage.c | 20 ++++++++++++++++----
 arch/sparc/mm/init_64.c     | 42 +++++++++++++++++++++++++++++++++++++++---
 arch/sparc/mm/tsb.c         | 41 ++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 105 insertions(+), 8 deletions(-)

diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 643c149..2b82cdb 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -493,6 +493,16 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
 			hugetlb_setup(regs);
 
 	}
+#if defined(CONFIG_SHARED_MMU_CTX)
+	mm_rss = mm->context.shared_hugetlb_pte_count * REAL_HPAGE_PER_HPAGE;
+	if (unlikely(mm_shared_ctx_val(mm) && mm_rss >
+		     mm->context.tsb_block[MM_TSB_HUGE_SHARED].tsb_rss_limit)) {
+		if (mm->context.tsb_block[MM_TSB_HUGE_SHARED].tsb)
+			tsb_grow(mm, MM_TSB_HUGE_SHARED, mm_rss);
+		else
+			hugetlb_shared_setup(regs);
+	}
+#endif
 #endif
 exit_exception:
 	exception_exit(prev_state);
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 988acc8b..2039d45 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -162,8 +162,14 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 {
 	pte_t orig;
 
-	if (!pte_present(*ptep) && pte_present(entry))
-		mm->context.hugetlb_pte_count++;
+	if (!pte_present(*ptep) && pte_present(entry)) {
+#if defined(CONFIG_SHARED_MMU_CTX)
+		if (pte_val(entry) | _PAGE_SHR_CTX_4V)
+			mm->context.shared_hugetlb_pte_count++;
+		else
+#endif
+			mm->context.hugetlb_pte_count++;
+	}
 
 	addr &= HPAGE_MASK;
 	orig = *ptep;
@@ -180,8 +186,14 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 	pte_t entry;
 
 	entry = *ptep;
-	if (pte_present(entry))
-		mm->context.hugetlb_pte_count--;
+	if (pte_present(entry)) {
+#if defined(CONFIG_SHARED_MMU_CTX)
+		if (pte_val(entry) | _PAGE_SHR_CTX_4V)
+			mm->context.shared_hugetlb_pte_count--;
+		else
+#endif
+			mm->context.hugetlb_pte_count--;
+	}
 
 	addr &= HPAGE_MASK;
 	*ptep = __pte(0UL);
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index bb9a6ee..2b310e5 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -346,6 +346,21 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
 	spin_lock_irqsave(&mm->context.lock, flags);
 
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
+#if defined(CONFIG_SHARED_MMU_CTX)
+	if ((mm->context.hugetlb_pte_count || mm->context.thp_pte_count ||
+	    mm->context.shared_hugetlb_pte_count) && is_hugetlb_pte(pte)) {
+		/* We are fabricating 8MB pages using 4MB real hw pages.  */
+		pte_val(pte) |= (address & (1UL << REAL_HPAGE_SHIFT));
+		if (is_sharedctx_pte(pte))
+			__update_mmu_tsb_insert(mm, MM_TSB_HUGE_SHARED,
+					REAL_HPAGE_SHIFT, address,
+					pte_val(pte));
+		else
+			__update_mmu_tsb_insert(mm, MM_TSB_HUGE,
+					REAL_HPAGE_SHIFT, address,
+					pte_val(pte));
+	} else
+#else
 	if ((mm->context.hugetlb_pte_count || mm->context.thp_pte_count) &&
 	    is_hugetlb_pte(pte)) {
 		/* We are fabricating 8MB pages using 4MB real hw pages.  */
@@ -354,6 +369,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
 					address, pte_val(pte));
 	} else
 #endif
+#endif
 		__update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
 					address, pte_val(pte));
 
@@ -2915,7 +2931,7 @@ static void context_reload(void *__data)
 		load_secondary_context(mm);
 }
 
-void hugetlb_setup(struct pt_regs *regs)
+static void __hugetlb_setup_common(struct pt_regs *regs, unsigned long tsb_idx)
 {
 	struct mm_struct *mm = current->mm;
 	struct tsb_config *tp;
@@ -2933,15 +2949,18 @@ void hugetlb_setup(struct pt_regs *regs)
 		die_if_kernel("HugeTSB in atomic", regs);
 	}
 
-	tp = &mm->context.tsb_block[MM_TSB_HUGE];
+	tp = &mm->context.tsb_block[tsb_idx];
 	if (likely(tp->tsb == NULL))
-		tsb_grow(mm, MM_TSB_HUGE, 0);
+		tsb_grow(mm, tsb_idx, 0);
 
 	tsb_context_switch(mm);
 	smp_tsb_sync(mm);
 
 	/* On UltraSPARC-III+ and later, configure the second half of
 	 * the Data-TLB for huge pages.
+	 *
+	 * Note that the following does not execute on platforms where
+	 * shared context is supported.
 	 */
 	if (tlb_type == cheetah_plus) {
 		bool need_context_reload = false;
@@ -2974,6 +2993,23 @@ void hugetlb_setup(struct pt_regs *regs)
 			on_each_cpu(context_reload, mm, 0);
 	}
 }
+
+void hugetlb_setup(struct pt_regs *regs)
+{
+	__hugetlb_setup_common(regs, MM_TSB_HUGE);
+}
+
+#if defined(CONFIG_SHARED_MMU_CTX)
+void hugetlb_shared_setup(struct pt_regs *regs)
+{
+	__hugetlb_setup_common(regs, MM_TSB_HUGE_SHARED);
+}
+#else
+void hugetlb_shared_setup(struct pt_regs *regs)
+{
+	BUG();
+}
+#endif
 #endif
 
 static struct resource code_resource = {
diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
index 8c2d148..0b684de 100644
--- a/arch/sparc/mm/tsb.c
+++ b/arch/sparc/mm/tsb.c
@@ -108,6 +108,12 @@ void flush_tsb_user(struct tlb_batch *tb)
 			base = __pa(base);
 		__flush_tsb_one(tb, REAL_HPAGE_SHIFT, base, nentries);
 	}
+
+	/*
+	 * FIXME
+	 * I don't "think" we want to flush shared context tsb entries here.
+	 * There should at least be a comment.
+	 */
 #endif
 	spin_unlock_irqrestore(&mm->context.lock, flags);
 }
@@ -133,6 +139,11 @@ void flush_tsb_user_page(struct mm_struct *mm, unsigned long vaddr, bool huge)
 			base = __pa(base);
 		__flush_tsb_one_entry(base, vaddr, REAL_HPAGE_SHIFT, nentries);
 	}
+	/*
+	 * FIXME
+	 * Again, we should give more thought to the need for flushing
+	 * shared context pages.  At least a comment is needed.
+	 */
 #endif
 	spin_unlock_irqrestore(&mm->context.lock, flags);
 }
@@ -159,6 +170,7 @@ static void setup_tsb_params(struct mm_struct *mm, unsigned long tsb_idx, unsign
 		break;
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
 	case MM_TSB_HUGE:
+	case MM_TSB_HUGE_SHARED:
 		base = TSBMAP_4M_BASE;
 		break;
 #endif
@@ -251,6 +263,7 @@ static void setup_tsb_params(struct mm_struct *mm, unsigned long tsb_idx, unsign
 			break;
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
 		case MM_TSB_HUGE:
+		case MM_TSB_HUGE_SHARED:
 			hp->pgsz_idx = HV_PGSZ_IDX_HUGE;
 			break;
 #endif
@@ -260,12 +273,21 @@ static void setup_tsb_params(struct mm_struct *mm, unsigned long tsb_idx, unsign
 		hp->assoc = 1;
 		hp->num_ttes = tsb_bytes / 16;
 		hp->ctx_idx = 0;
+
+#if defined(CONFIG_SHARED_MMU_CTX)
+		/*
+		 * For shared context TSBs, adjust the context register index
+		 */
+		if (mm->context.shared_ctx && tsb_idx == MM_TSB_HUGE_SHARED)
+			hp->ctx_idx = 1;
+#endif
 		switch (tsb_idx) {
 		case MM_TSB_BASE:
 			hp->pgsz_mask = HV_PGSZ_MASK_BASE;
 			break;
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
 		case MM_TSB_HUGE:
+		case MM_TSB_HUGE_SHARED:
 			hp->pgsz_mask = HV_PGSZ_MASK_HUGE;
 			break;
 #endif
@@ -520,12 +542,18 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
 	unsigned long saved_hugetlb_pte_count;
 	unsigned long saved_thp_pte_count;
+#if defined(CONFIG_SHARED_MMU_CTX)
+	unsigned long saved_shared_hugetlb_pte_count;
+#endif
 #endif
 	unsigned int i;
 
 	spin_lock_init(&mm->context.lock);
 
 	mm->context.sparc64_ctx_val = 0UL;
+#if defined(CONFIG_SHARED_MMU_CTX)
+	mm->context.shared_ctx = NULL;
+#endif
 
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
 	/* We reset them to zero because the fork() page copying
@@ -536,6 +564,10 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 	saved_thp_pte_count = mm->context.thp_pte_count;
 	mm->context.hugetlb_pte_count = 0;
 	mm->context.thp_pte_count = 0;
+#if defined(CONFIG_SHARED_MMU_CTX)
+	saved_shared_hugetlb_pte_count = mm->context.shared_hugetlb_pte_count;
+	mm->context.shared_hugetlb_pte_count = 0;
+#endif
 
 	mm_rss -= saved_thp_pte_count * (HPAGE_SIZE / PAGE_SIZE);
 #endif
@@ -544,8 +576,10 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 	 * us, so we need to zero out the TSB pointer or else tsb_grow()
 	 * will be confused and think there is an older TSB to free up.
 	 */
-	for (i = 0; i < MM_NUM_TSBS; i++)
+	for (i = 0; i < MM_NUM_TSBS; i++) {
 		mm->context.tsb_block[i].tsb = NULL;
+		mm->context.tsb_descr[i].tsb_base = 0UL;
+	}
 
 	/* If this is fork, inherit the parent's TSB size.  We would
 	 * grow it to that size on the first page fault anyways.
@@ -557,6 +591,11 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 		tsb_grow(mm, MM_TSB_HUGE,
 			 (saved_hugetlb_pte_count + saved_thp_pte_count) *
 			 REAL_HPAGE_PER_HPAGE);
+#if defined(CONFIG_SHARED_MMU_CTX)
+	if (unlikely(saved_shared_hugetlb_pte_count))
+		tsb_grow(mm, MM_TSB_HUGE_SHARED,
+			saved_shared_hugetlb_pte_count * REAL_HPAGE_PER_HPAGE);
+#endif
 #endif
 
 	if (unlikely(!mm->context.tsb_block[MM_TSB_BASE].tsb))
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 07/14] sparc64: move COMPUTE_TAG_TARGET and COMPUTE_TSB_PTR to header file
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

Move macros COMPUTE_TSB_PTR and COMPUTE_TSB_PTR out of .S file to
headers so that they can be used in other files.

Also, add new macro IF_TLB_TYPE_NOT_HYPE

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/sparc/include/asm/tsb.h       | 38 ++++++++++++++++++++++++++++++++++++++
 arch/sparc/kernel/sun4v_tlb_miss.S | 29 ++---------------------------
 2 files changed, 40 insertions(+), 27 deletions(-)

diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
index 311cd4e..bb7df61 100644
--- a/arch/sparc/include/asm/tsb.h
+++ b/arch/sparc/include/asm/tsb.h
@@ -75,6 +75,44 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
 
 extern struct kmem_cache *shared_mmu_ctx_cachep __read_mostly;
 #endif
+
+	/*
+	 * If tlb type is not hypervisor, branch to label
+	 */
+#define	IF_TLB_TYPE_NOT_HYPE(TMP, NOT_HYPE_LABEL)	\
+	sethi	%hi(tlb_type), TMP;			\
+	lduw	[TMP + %lo(tlb_type)], TMP;		\
+	cmp	TMP, 3;					\
+	bne,pn	%icc, NOT_HYPE_LABEL;			\
+	nop
+
+	/* DEST = (VADDR >> 22)
+	 *
+	 * Branch to ZERO_CTX_LABEL if context is zero.
+	 */
+#define	COMPUTE_TAG_TARGET(DEST, VADDR, CTX, ZERO_CTX_LABEL) \
+	srlx	VADDR, 22, DEST; \
+	brz,pn	CTX, ZERO_CTX_LABEL; \
+	 nop;
+
+	/* Create TSB pointer.  This is something like:
+	 *
+	 * index_mask = (512 << (tsb_reg & 0x7UL)) - 1UL;
+	 * tsb_base = tsb_reg & ~0x7UL;
+	 * tsb_index = ((vaddr >> HASH_SHIFT) & tsb_mask);
+	 * tsb_ptr = tsb_base + (tsb_index * 16);
+	 */
+#define COMPUTE_TSB_PTR(TSB_PTR, VADDR, HASH_SHIFT, TMP1, TMP2) \
+	and	TSB_PTR, 0x7, TMP1;			\
+	mov	512, TMP2;				\
+	andn	TSB_PTR, 0x7, TSB_PTR;			\
+	sllx	TMP2, TMP1, TMP2;			\
+	srlx	VADDR, HASH_SHIFT, TMP1;		\
+	sub	TMP2, 1, TMP2;				\
+	and	TMP1, TMP2, TMP1;			\
+	sllx	TMP1, 4, TMP1;				\
+	add	TSB_PTR, TMP1, TSB_PTR;
+
 #define TSB_LOAD_QUAD(TSB, REG)	\
 661:	ldda		[TSB] ASI_NUCLEUS_QUAD_LDD, REG; \
 	.section	.tsb_ldquad_phys_patch, "ax"; \
diff --git a/arch/sparc/kernel/sun4v_tlb_miss.S b/arch/sparc/kernel/sun4v_tlb_miss.S
index 6179e19..46fbc16 100644
--- a/arch/sparc/kernel/sun4v_tlb_miss.S
+++ b/arch/sparc/kernel/sun4v_tlb_miss.S
@@ -3,6 +3,8 @@
  * Copyright (C) 2006 <davem@davemloft.net>
  */
 
+#include <asm/tsb.h>
+
 	.text
 	.align	32
 
@@ -16,33 +18,6 @@
 	ldx	[BASE + HV_FAULT_D_ADDR_OFFSET], VADDR; \
 	ldx	[BASE + HV_FAULT_D_CTX_OFFSET], CTX;
 
-	/* DEST = (VADDR >> 22)
-	 *
-	 * Branch to ZERO_CTX_LABEL if context is zero.
-	 */
-#define	COMPUTE_TAG_TARGET(DEST, VADDR, CTX, ZERO_CTX_LABEL) \
-	srlx	VADDR, 22, DEST; \
-	brz,pn	CTX, ZERO_CTX_LABEL; \
-	 nop;
-
-	/* Create TSB pointer.  This is something like:
-	 *
-	 * index_mask = (512 << (tsb_reg & 0x7UL)) - 1UL;
-	 * tsb_base = tsb_reg & ~0x7UL;
-	 * tsb_index = ((vaddr >> HASH_SHIFT) & tsb_mask);
-	 * tsb_ptr = tsb_base + (tsb_index * 16);
-	 */
-#define COMPUTE_TSB_PTR(TSB_PTR, VADDR, HASH_SHIFT, TMP1, TMP2) \
-	and	TSB_PTR, 0x7, TMP1;			\
-	mov	512, TMP2;				\
-	andn	TSB_PTR, 0x7, TSB_PTR;			\
-	sllx	TMP2, TMP1, TMP2;			\
-	srlx	VADDR, HASH_SHIFT, TMP1;		\
-	sub	TMP2, 1, TMP2;				\
-	and	TMP1, TMP2, TMP1;			\
-	sllx	TMP1, 4, TMP1;				\
-	add	TSB_PTR, TMP1, TSB_PTR;
-
 sun4v_itlb_miss:
 	/* Load MMU Miss base into %g2.  */
 	ldxa	[%g0] ASI_SCRATCHPAD, %g2
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 04/14] sparc64: load shared id into context register 1
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

In current code, only context ID register 0 is set and used by the MMU.
On sun4v platforms that support MMU shared context, there is an additional
context ID register: specifically context register 1.  When searching
the TLB, the MMU will find a match if the virtual address matches and
the ID contained in context register 0 -OR- context register 1 matches.

Load the shared context ID into context ID register 1.  Care must be
taken to load register 1 after register 0, as loading register 0
overwrites both register 0 and 1.  Modify code loading register 0 to
also load register one if applicable.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/sparc/include/asm/mmu_context_64.h | 37 +++++++++++++++++--
 arch/sparc/include/asm/spitfire.h       |  2 ++
 arch/sparc/kernel/fpu_traps.S           | 63 +++++++++++++++++++++++++++++++++
 arch/sparc/kernel/rtrap_64.S            | 20 +++++++++++
 arch/sparc/kernel/trampoline_64.S       | 20 +++++++++++
 5 files changed, 140 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/include/asm/mmu_context_64.h b/arch/sparc/include/asm/mmu_context_64.h
index acaea6d..84268df 100644
--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -61,8 +61,11 @@ void smp_tsb_sync(struct mm_struct *mm);
 #define smp_tsb_sync(__mm) do { } while (0)
 #endif
 
-/* Set MMU context in the actual hardware. */
-#define load_secondary_context(__mm) \
+/*
+ * Set MMU context in the actual hardware.  Secondary context register
+ * zero is loaded with task specific context.
+ */
+#define load_secondary_context_0(__mm) \
 	__asm__ __volatile__( \
 	"\n661:	stxa		%0, [%1] %2\n" \
 	"	.section	.sun4v_1insn_patch, \"ax\"\n" \
@@ -74,6 +77,36 @@ void smp_tsb_sync(struct mm_struct *mm);
 	: "r" (CTX_HWBITS((__mm)->context)), \
 	  "r" (SECONDARY_CONTEXT), "i" (ASI_DMMU), "i" (ASI_MMU))
 
+/*
+ * Secondary context register one is loaded with shared context if
+ * it exists for the task.
+ */
+#define load_secondary_context_1(__mm) \
+	__asm__ __volatile__( \
+	"\n661: stxa		%0, [%1] %2\n" \
+	"	.section	.sun4v_1insn_patch, \"ax\"\n" \
+	"	.word		661b\n" \
+	"	stxa		%0, [%1] %3\n" \
+	"	.previous\n" \
+	"	flush		%%g6\n" \
+	: /* No outputs */ \
+	: "r" (SHARED_CTX_HWBITS((__mm)->context)), \
+	  "r" (SECONDARY_CONTEXT_R1), "i" (ASI_DMMU), "i" (ASI_MMU))
+
+#if defined(CONFIG_SHARED_MMU_CTX)
+#define load_secondary_context(__mm) \
+	do { \
+		load_secondary_context_0(__mm); \
+		if ((__mm)->context.shared_ctx) \
+			load_secondary_context_1(__mm); \
+	} while (0)
+#else
+#define load_secondary_context(__mm) \
+	do { \
+		load_secondary_context_0(__mm); \
+	} while (0)
+#endif
+
 void __flush_tlb_mm(unsigned long, unsigned long);
 
 /* Switch the current MM context. */
diff --git a/arch/sparc/include/asm/spitfire.h b/arch/sparc/include/asm/spitfire.h
index 1d8321c..1fa4594 100644
--- a/arch/sparc/include/asm/spitfire.h
+++ b/arch/sparc/include/asm/spitfire.h
@@ -33,6 +33,8 @@
 #define DMMU_SFAR		0x0000000000000020
 #define VIRT_WATCHPOINT		0x0000000000000038
 #define PHYS_WATCHPOINT		0x0000000000000040
+#define	PRIMARY_CONTEXT_R1	0x0000000000000108
+#define	SECONDARY_CONTEXT_R1	0x0000000000000110
 
 #define SPITFIRE_HIGHEST_LOCKED_TLBENT	(64 - 1)
 #define CHEETAH_HIGHEST_LOCKED_TLBENT	(16 - 1)
diff --git a/arch/sparc/kernel/fpu_traps.S b/arch/sparc/kernel/fpu_traps.S
index 336d275..f85a034 100644
--- a/arch/sparc/kernel/fpu_traps.S
+++ b/arch/sparc/kernel/fpu_traps.S
@@ -73,6 +73,16 @@ do_fpdis:
 	ldxa		[%g3] ASI_MMU, %g5
 	.previous
 
+661:	nop
+	nop
+	.section	.sun4v_2insn_patch, "ax"
+	.word		661b
+	mov		SECONDARY_CONTEXT_R1, %g3
+	ldxa		[%g3] ASI_MMU, %g4
+	.previous
+	/* Unnecessary on sun4u and pre-Niagara 2 sun4v */
+	mov		SECONDARY_CONTEXT, %g3
+
 	sethi		%hi(sparc64_kern_sec_context), %g2
 	ldx		[%g2 + %lo(sparc64_kern_sec_context)], %g2
 
@@ -114,6 +124,16 @@ do_fpdis:
 	ldxa		[%g3] ASI_MMU, %g5
 	.previous
 
+661:	nop
+	nop
+	.section	.sun4v_2insn_patch, "ax"
+	.word		661b
+	mov		SECONDARY_CONTEXT_R1, %g3
+	ldxa		[%g3] ASI_MMU, %g4
+	.previous
+	/* Unnecessary on sun4u and pre-Niagara 2 sun4v */
+	mov		SECONDARY_CONTEXT, %g3
+
 	add		%g6, TI_FPREGS, %g1
 	sethi		%hi(sparc64_kern_sec_context), %g2
 	ldx		[%g2 + %lo(sparc64_kern_sec_context)], %g2
@@ -155,6 +175,16 @@ do_fpdis:
 	ldxa		[%g3] ASI_MMU, %g5
 	.previous
 
+661:	nop
+	nop
+	.section	.sun4v_2insn_patch, "ax"
+	.word		661b
+	mov		SECONDARY_CONTEXT_R1, %g3
+	ldxa		[%g3] ASI_MMU, %g4
+	.previous
+	/* Unnecessary on sun4u and pre-Niagara 2 sun4v */
+	mov		SECONDARY_CONTEXT, %g3
+
 	sethi		%hi(sparc64_kern_sec_context), %g2
 	ldx		[%g2 + %lo(sparc64_kern_sec_context)], %g2
 
@@ -181,11 +211,24 @@ fpdis_exit:
 	stxa		%g5, [%g3] ASI_MMU
 	.previous
 
+661:	nop
+	nop
+	.section	.sun4v_2insn_patch, "ax"
+	.word		661b
+	mov		SECONDARY_CONTEXT_R1, %g3
+	stxa		%g4, [%g3] ASI_MMU
+	.previous
+
 	membar		#Sync
 fpdis_exit2:
 	wr		%g7, 0, %gsr
 	ldx		[%g6 + TI_XFSR], %fsr
 	rdpr		%tstate, %g3
+661:	nop
+	.section	.sun4v_1insn_patch, "ax"
+	.word		661b
+	sethi		%hi(TSTATE_PEF), %g4
+	.previous
 	or		%g3, %g4, %g3		! anal...
 	wrpr		%g3, %tstate
 	wr		%g0, FPRS_FEF, %fprs	! clean DU/DL bits
@@ -347,6 +390,16 @@ do_fptrap_after_fsr:
 	ldxa		[%g3] ASI_MMU, %g5
 	.previous
 
+661:	nop
+	nop
+	.section	.sun4v_2insn_patch, "ax"
+	.word		661b
+	mov		SECONDARY_CONTEXT_R1, %g3
+	ldxa		[%g3] ASI_MMU, %g4
+	.previous
+	/* Unnecessary on sun4u and pre-Niagara 2 sun4v */
+	mov		SECONDARY_CONTEXT, %g3
+
 	sethi		%hi(sparc64_kern_sec_context), %g2
 	ldx		[%g2 + %lo(sparc64_kern_sec_context)], %g2
 
@@ -377,7 +430,17 @@ do_fptrap_after_fsr:
 	stxa		%g5, [%g1] ASI_MMU
 	.previous
 
+661:	nop
+	nop
+	.section	.sun4v_2insn_patch, "ax"
+	.word		661b
+	mov		SECONDARY_CONTEXT_R1, %g1
+	stxa		%g4, [%g1] ASI_MMU
+	.previous
+
 	membar		#Sync
+	/* Unnecessary on sun4u and pre-Niagara 2 sun4v */
+	mov		SECONDARY_CONTEXT, %g1
 	ba,pt		%xcc, etrap
 	 wr		%g0, 0, %fprs
 	.size		do_fptrap,.-do_fptrap
diff --git a/arch/sparc/kernel/rtrap_64.S b/arch/sparc/kernel/rtrap_64.S
index 216948c..d409d84 100644
--- a/arch/sparc/kernel/rtrap_64.S
+++ b/arch/sparc/kernel/rtrap_64.S
@@ -202,6 +202,7 @@ rt_continue:	ldx			[%sp + PTREGS_OFF + PT_V9_G1], %g1
 		brnz,pn			%l3, kern_rtt
 		 mov			PRIMARY_CONTEXT, %l7
 
+		/* Get value from SECONDARY_CONTEXT register */
 661:		ldxa			[%l7 + %l7] ASI_DMMU, %l0
 		.section		.sun4v_1insn_patch, "ax"
 		.word			661b
@@ -212,12 +213,31 @@ rt_continue:	ldx			[%sp + PTREGS_OFF + PT_V9_G1], %g1
 		ldx			[%l1 + %lo(sparc64_kern_pri_nuc_bits)], %l1
 		or			%l0, %l1, %l0
 
+		/* and, put into PRIMARY_CONTEXT register */
 661:		stxa			%l0, [%l7] ASI_DMMU
 		.section		.sun4v_1insn_patch, "ax"
 		.word			661b
 		stxa			%l0, [%l7] ASI_MMU
 		.previous
 
+		/* Get value from SECONDARY_CONTEXT_R1 register */
+661:		nop
+		nop
+		.section		.sun4v_2insn_patch, "ax"
+		.word			661b
+		mov			SECONDARY_CONTEXT_R1, %l7
+		ldxa			[%l7] ASI_MMU, %l0
+		.previous
+
+		/* and, put into PRIMARY_CONTEXT_R1 register */
+661:		nop
+		nop
+		.section		.sun4v_2insn_patch, "ax"
+		.word			661b
+		mov			PRIMARY_CONTEXT_R1, %l7
+		stxa			%l0, [%l7] ASI_MMU
+		.previous
+
 		sethi			%hi(KERNBASE), %l7
 		flush			%l7
 		rdpr			%wstate, %l1
diff --git a/arch/sparc/kernel/trampoline_64.S b/arch/sparc/kernel/trampoline_64.S
index 88ede1d..7c4ab3b 100644
--- a/arch/sparc/kernel/trampoline_64.S
+++ b/arch/sparc/kernel/trampoline_64.S
@@ -260,6 +260,16 @@ after_lock_tlb:
 	stxa		%g0, [%g7] ASI_MMU
 	.previous
 
+	/* Save SECONDARY_CONTEXT_R1, membar should be part of patch */
+	membar		#Sync
+661:	nop
+	nop
+	.section	.sun4v_2insn_patch, "ax"
+	.word		661b
+	mov		SECONDARY_CONTEXT_R1, %g7
+	ldxa		[%g7] ASI_MMU, %g1
+	.previous
+
 	membar		#Sync
 	mov		SECONDARY_CONTEXT, %g7
 
@@ -269,6 +279,16 @@ after_lock_tlb:
 	stxa		%g0, [%g7] ASI_MMU
 	.previous
 
+	/* Restore SECONDARY_CONTEXT_R1, membar should be part of patch */
+	membar		#Sync
+661:	nop
+	nop
+	.section	.sun4v_2insn_patch, "ax"
+	.word		661b
+	mov		SECONDARY_CONTEXT_R1, %g7
+	stxa		%g1, [%g7] ASI_MMU
+	.previous
+
 	membar		#Sync
 
 	/* Everything we do here, until we properly take over the
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 08/14] sparc64: shared context tsb handling at context switch time
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

At context switch time, load the shared context TSB into the MMU (if
applicable) and set up global state to include the TSB.

sun4v loads the address of base and huge page TSBs into scratchpad
registers.  There is not an extra register for shared context TSB.
So, use offset 0xd0 in the trap block.  This is TRAP_PER_CPU_TSB_HUGE,
and is only used on sun4u.  We can then use this area for the shared
context on sun4v.

With this commit, global state is set up for shared context TSB but
still not used.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/sparc/include/asm/mmu_context_64.h | 27 ++++++++++++++----
 arch/sparc/include/asm/trap_block.h     |  3 +-
 arch/sparc/kernel/head_64.S             |  2 +-
 arch/sparc/kernel/tsb.S                 | 50 +++++++++++++++++++++------------
 4 files changed, 57 insertions(+), 25 deletions(-)

diff --git a/arch/sparc/include/asm/mmu_context_64.h b/arch/sparc/include/asm/mmu_context_64.h
index 84268df..0dc95cb5 100644
--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -36,21 +36,38 @@ void destroy_context(struct mm_struct *mm);
 void __tsb_context_switch(unsigned long pgd_pa,
 			  struct tsb_config *tsb_base,
 			  struct tsb_config *tsb_huge,
+			  struct tsb_config *tsb_huge_shared,
 			  unsigned long tsb_descr_pa);
 
+#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
 static inline void tsb_context_switch(struct mm_struct *mm)
 {
+	/*
+	 * The conditional for tsb_descr_pa handles shared context
+	 * case where tsb_block[0] may not be used.
+	 */
 	__tsb_context_switch(__pa(mm->pgd),
 			     &mm->context.tsb_block[MM_TSB_BASE],
-#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
 			     (mm->context.tsb_block[MM_TSB_HUGE].tsb ?
 			      &mm->context.tsb_block[MM_TSB_HUGE] :
-			      NULL)
+			      NULL),
+			     (mm->context.tsb_block[MM_TSB_HUGE_SHARED].tsb ?
+			      &mm->context.tsb_block[MM_TSB_HUGE_SHARED] :
+			      NULL),
+			     (mm->context.tsb_block[0].tsb ?
+			      __pa(&mm->context.tsb_descr[0]) :
+			      __pa(&mm->context.tsb_descr[1])));
+}
 #else
-			     NULL
-#endif
-			     , __pa(&mm->context.tsb_descr[MM_TSB_BASE]));
+static inline void tsb_context_switch(struct mm_struct *mm)
+{
+	__tsb_context_switch(__pa(mm->pgd),
+			     &mm->context.tsb_block[MM_TSB_BASE],
+			     NULL,
+			     NULL,
+			     __pa(&mm->context.tsb_descr[MM_TSB_BASE]);
 }
+#endif
 
 void tsb_grow(struct mm_struct *mm,
 	      unsigned long tsb_index,
diff --git a/arch/sparc/include/asm/trap_block.h b/arch/sparc/include/asm/trap_block.h
index ec9c04d..e971785 100644
--- a/arch/sparc/include/asm/trap_block.h
+++ b/arch/sparc/include/asm/trap_block.h
@@ -96,7 +96,8 @@ extern struct sun4v_2insn_patch_entry __sun_m7_2insn_patch,
 #define TRAP_PER_CPU_FAULT_INFO		0x40
 #define TRAP_PER_CPU_CPU_MONDO_BLOCK_PA	0xc0
 #define TRAP_PER_CPU_CPU_LIST_PA	0xc8
-#define TRAP_PER_CPU_TSB_HUGE		0xd0
+#define TRAP_PER_CPU_TSB_HUGE		0xd0	/* sun4u only */
+#define TRAP_PER_CPU_TSB_HUGE_SHARED	0xd0	/* sun4v only */
 #define TRAP_PER_CPU_TSB_HUGE_TEMP	0xd8
 #define TRAP_PER_CPU_IRQ_WORKLIST_PA	0xe0
 #define TRAP_PER_CPU_CPU_MONDO_QMASK	0xe8
diff --git a/arch/sparc/kernel/head_64.S b/arch/sparc/kernel/head_64.S
index 6aa3da1..0bf1e1f 100644
--- a/arch/sparc/kernel/head_64.S
+++ b/arch/sparc/kernel/head_64.S
@@ -875,7 +875,6 @@ sparc64_boot_end:
 #include "sun4v_tlb_miss.S"
 #include "sun4v_ivec.S"
 #include "ktlb.S"
-#include "tsb.S"
 
 /*
  * The following skip makes sure the trap table in ttable.S is aligned
@@ -916,6 +915,7 @@ swapper_4m_tsb:
 
 ! 0x0000000000428000
 
+#include "tsb.S"
 #include "systbls_64.S"
 
 	.data
diff --git a/arch/sparc/kernel/tsb.S b/arch/sparc/kernel/tsb.S
index d568c82..3ed3e7c 100644
--- a/arch/sparc/kernel/tsb.S
+++ b/arch/sparc/kernel/tsb.S
@@ -374,7 +374,8 @@ tsb_flush:
 	 * %o0: page table physical address
 	 * %o1:	TSB base config pointer
 	 * %o2:	TSB huge config pointer, or NULL if none
-	 * %o3:	Hypervisor TSB descriptor physical address
+	 * %o3: TSB huge shared config pointer, or NULL if none
+	 * %o4: Hypervisor TSB descriptor physical address
 	 *
 	 * We have to run this whole thing with interrupts
 	 * disabled so that the current cpu doesn't change
@@ -387,6 +388,8 @@ __tsb_context_switch:
 	rdpr	%pstate, %g1
 	wrpr	%g1, PSTATE_IE, %pstate
 
+	mov	%o4, %g7
+
 	TRAP_LOAD_TRAP_BLOCK(%g2, %g3)
 
 	stx	%o0, [%g2 + TRAP_PER_CPU_PGD_PADDR]
@@ -397,13 +400,8 @@ __tsb_context_switch:
 
 	ldx	[%o2 + TSB_CONFIG_REG_VAL], %g3
 
-1:	stx	%g3, [%g2 + TRAP_PER_CPU_TSB_HUGE]
-
-	sethi	%hi(tlb_type), %g2
-	lduw	[%g2 + %lo(tlb_type)], %g2
-	cmp	%g2, 3
-	bne,pt	%icc, 50f
-	 nop
+1:	IF_TLB_TYPE_NOT_HYPE(%o5, 50f)
+	/* Only setup HV TSB descriptors on appropriate MMU */
 
 	/* Hypervisor TSB switch. */
 	mov	SCRATCHPAD_UTSBREG1, %o5
@@ -411,27 +409,43 @@ __tsb_context_switch:
 	mov	SCRATCHPAD_UTSBREG2, %o5
 	stxa	%g3, [%o5] ASI_SCRATCHPAD
 
-	mov	2, %o0
+	/* Start counting HV tsb descriptors. */
+	mov	1, %o0				/* Always MM_TSB_BASE */
+	cmp	%g3, -1				/* MM_TSB_HUGE ? */
+	beq	%xcc, 2f
+	 nop
+	add	%o0, 1, %o0
+2:
+	brz,pt	%o3, 3f				/* MM_TSB_HUGE_SHARED ? */
+	 mov	-1, %g3
+	ldx	[%o3 + TSB_CONFIG_REG_VAL], %g3
+3:
+	/* Put Huge Shared TSB in trap block */
+	stx	%g3, [%g2 + TRAP_PER_CPU_TSB_HUGE_SHARED]
 	cmp	%g3, -1
-	move	%xcc, 1, %o0
-
+	beq	%xcc, 4f
+	 nop
+	add	%o0, 1, %o0
+4:
 	mov	HV_FAST_MMU_TSB_CTXNON0, %o5
-	mov	%o3, %o1
+	mov	%g7, %o1
 	ta	HV_FAST_TRAP
 
 	/* Finish up.  */
-	ba,pt	%xcc, 9f
+	ba,pt	%xcc, 60f
 	 nop
 
 	/* SUN4U TSB switch.  */
-50:	mov	TSB_REG, %o5
+50:	stx	%g3, [%g2 + TRAP_PER_CPU_TSB_HUGE]
+
+	mov	TSB_REG, %o5
 	stxa	%o0, [%o5] ASI_DMMU
 	membar	#Sync
 	stxa	%o0, [%o5] ASI_IMMU
 	membar	#Sync
 
-2:	ldx	[%o1 + TSB_CONFIG_MAP_VADDR], %o4
-	brz	%o4, 9f
+	ldx	[%o1 + TSB_CONFIG_MAP_VADDR], %o4
+	brz	%o4, 60f
 	 ldx	[%o1 + TSB_CONFIG_MAP_PTE], %o5
 
 	sethi	%hi(sparc64_highest_unlocked_tlb_ent), %g2
@@ -443,7 +457,7 @@ __tsb_context_switch:
 	stxa	%o5, [%g2] ASI_DTLB_DATA_ACCESS
 	membar	#Sync
 
-	brz,pt	%o2, 9f
+	brz,pt	%o2, 60f
 	 nop
 
 	ldx	[%o2 + TSB_CONFIG_MAP_VADDR], %o4
@@ -455,7 +469,7 @@ __tsb_context_switch:
 	stxa	%o5, [%g2] ASI_DTLB_DATA_ACCESS
 	membar	#Sync
 
-9:
+60:
 	wrpr	%g1, %pstate
 
 	retl
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 09/14] sparc64: TLB/TSB miss handling for shared context
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

Modifications to the fault handling code to take shared context TSB
into account.  For now, the shared context code mirrors the huge
page code.  The _PAGE_SHR_CTX_4V page flag is used to determine
which TSB should be used.

Note, TRAP_PER_CPU_TSB_HUGE_TEMP is used to stash away calculation
of a TTE address in the huge page TSB.  At present, tehre is no
similar mechanism for shared context TSB so the address must be
recalculated.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/sparc/kernel/sun4v_tlb_miss.S |   8 +++
 arch/sparc/kernel/tsb.S            | 122 ++++++++++++++++++++++++++++++++-----
 2 files changed, 116 insertions(+), 14 deletions(-)

diff --git a/arch/sparc/kernel/sun4v_tlb_miss.S b/arch/sparc/kernel/sun4v_tlb_miss.S
index 46fbc16..c438ccc 100644
--- a/arch/sparc/kernel/sun4v_tlb_miss.S
+++ b/arch/sparc/kernel/sun4v_tlb_miss.S
@@ -152,6 +152,14 @@ sun4v_tsb_miss_common:
 	sub	%g2, TRAP_PER_CPU_FAULT_INFO, %g2
 
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
+	/*
+	 * FIXME
+	 *
+	 * This just computes the possible huge page TSB entry.  It does
+	 * not consider the shared huge page TSB.  Also, care must be taken
+	 * so that TRAP_PER_CPU_TSB_HUGE_TEMP is only used for non-shared
+	 * huge TSB.
+	 */
 	mov	SCRATCHPAD_UTSBREG2, %g5
 	ldxa	[%g5] ASI_SCRATCHPAD, %g5
 	cmp	%g5, -1
diff --git a/arch/sparc/kernel/tsb.S b/arch/sparc/kernel/tsb.S
index 3ed3e7c..57ee5ad 100644
--- a/arch/sparc/kernel/tsb.S
+++ b/arch/sparc/kernel/tsb.S
@@ -55,6 +55,9 @@ tsb_miss_page_table_walk:
 	 */
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
 
+	/*
+	 * First check the normal huge page TSB
+	 */
 661:	ldx		[%g7 + TRAP_PER_CPU_TSB_HUGE], %g5
 	nop
 	.section	.sun4v_2insn_patch, "ax"
@@ -64,7 +67,47 @@ tsb_miss_page_table_walk:
 	.previous
 
 	cmp		%g5, -1
-	be,pt		%xcc, 80f
+	be,pt		%xcc, chk_huge_page_shared
+	 nop
+
+	/* We need an aligned pair of registers containing 2 values
+	 * which can be easily rematerialized.  %g6 and %g7 foot the
+	 * bill just nicely.  We'll save %g6 away into %g2 for the
+	 * huge page TSB TAG comparison.
+	 *
+	 * Perform a huge page TSB lookup.
+	 */
+	mov		%g6, %g2
+
+	COMPUTE_TSB_PTR(%g5, %g4, REAL_HPAGE_SHIFT, %g6, %g7)
+
+	TSB_LOAD_QUAD(%g5, %g6)
+	cmp		%g6, %g2
+	be,a,pt		%xcc, tsb_tlb_reload
+	 mov		%g7, %g5
+
+	/*
+	 * No match, restore %g6 and %g7.
+	 * Store huge page TSB entry address
+	 *
+	 * FIXME - Look into use of TRAP_PER_CPU_TSB_HUGE_TEMP as it
+	 * is only used for regular, not shared huge pages.
+	 */
+	TRAP_LOAD_TRAP_BLOCK(%g7, %g6)
+	srlx		%g4, 22, %g6
+
+chk_huge_page_shared:
+	stx		%g5, [%g7 + TRAP_PER_CPU_TSB_HUGE_TEMP]
+
+	/*
+	 * For now (POC) only check shared context on hypervisor
+	 */
+	IF_TLB_TYPE_NOT_HYPE(%g2, huge_checks_done)
+
+	/* Check the shared huge page TSB */
+	ldx		[%g7 + TRAP_PER_CPU_TSB_HUGE_SHARED], %g5
+	cmp		%g5, -1
+	bne,pn		%xcc, huge_checks_done
 	 nop
 
 	/* We need an aligned pair of registers containing 2 values
@@ -75,15 +118,8 @@ tsb_miss_page_table_walk:
 	 * Perform a huge page TSB lookup.
 	 */
 	mov		%g6, %g2
-	and		%g5, 0x7, %g6
-	mov		512, %g7
-	andn		%g5, 0x7, %g5
-	sllx		%g7, %g6, %g7
-	srlx		%g4, REAL_HPAGE_SHIFT, %g6
-	sub		%g7, 1, %g7
-	and		%g6, %g7, %g6
-	sllx		%g6, 4, %g6
-	add		%g5, %g6, %g5
+
+	COMPUTE_TSB_PTR(%g5, %g4, REAL_HPAGE_SHIFT, %g6, %g7)
 
 	TSB_LOAD_QUAD(%g5, %g6)
 	cmp		%g6, %g2
@@ -91,25 +127,29 @@ tsb_miss_page_table_walk:
 	 mov		%g7, %g5
 
 	/* No match, remember the huge page TSB entry address,
-	 * and restore %g6 and %g7.
+	 * restore %g6 and %g7.
+	 *
+	 * NOT REALLY REMEMBERING -  See FIXME above
 	 */
 	TRAP_LOAD_TRAP_BLOCK(%g7, %g6)
 	srlx		%g4, 22, %g6
-80:	stx		%g5, [%g7 + TRAP_PER_CPU_TSB_HUGE_TEMP]
 
+huge_checks_done:
+	stx		%g5, [%g7 + TRAP_PER_CPU_TSB_HUGE_TEMP]
 #endif
 
 	ldx		[%g7 + TRAP_PER_CPU_PGD_PADDR], %g7
 
 	/* At this point we have:
-	 * %g1 --	TSB entry address
+	 * %g1 --	Base TSB entry address
 	 * %g3 --	FAULT_CODE_{D,I}TLB
 	 * %g4 --	missing virtual address
 	 * %g6 --	TAG TARGET (vaddr >> 22)
 	 * %g7 --	page table physical address
 	 *
 	 * We know that both the base PAGE_SIZE TSB and the HPAGE_SIZE
-	 * TSB both lack a matching entry.
+	 * TSB both lack a matching entry, as well as shared TSBs if
+	 * present.
 	 */
 tsb_miss_page_table_walk_sun4v_fastpath:
 	USER_PGTABLE_WALK_TL1(%g4, %g7, %g5, %g2, tsb_do_fault)
@@ -152,12 +192,42 @@ tsb_miss_page_table_walk_sun4v_fastpath:
 	 * thus handle it here.  This also makes sure that we can
 	 * allocate the TSB hash table on the correct NUMA node.
 	 */
+
+	/*
+	 * Check for shared context PTE, in this case we do not have
+	 * a saved TSB entry pointer and must compute now
+	 */
+	IF_TLB_TYPE_NOT_HYPE(%g2, no_shared_ctx_pte)
+
+	mov		_PAGE_SHR_CTX_4V, %g2
+	andcc		%g5, %g2, %g2
+	be,pn		%xcc, no_shared_ctx_pte
+
+	/*
+	 * If there was a shared context TSB, then we need to copmute the
+	 * TSB entry address.  Previously, only the non-shared context
+	 * TSB entry address was calculated.
+	 *
+	 * FIXME
+	 */
+	TRAP_LOAD_TRAP_BLOCK(%g7, %g1)
+	ldx		[%g7 + TRAP_PER_CPU_TSB_HUGE_SHARED], %g1
+	cmp		%g1, -1
+	be,pn		%xcc, no_shared_hugetlb
+	 nop
+
+	COMPUTE_TSB_PTR(%g1, %g4, REAL_HPAGE_SHIFT, %g2, %g7)
+
+	ba,a,pt %xcc,tsb_reload
+
+no_shared_ctx_pte:
 	TRAP_LOAD_TRAP_BLOCK(%g7, %g2)
 	ldx		[%g7 + TRAP_PER_CPU_TSB_HUGE_TEMP], %g1
 	cmp		%g1, -1
 	bne,pt		%xcc, 60f
 	 nop
 
+no_hugetlb:
 661:	rdpr		%pstate, %g5
 	wrpr		%g5, PSTATE_AG | PSTATE_MG, %pstate
 	.section	.sun4v_2insn_patch, "ax"
@@ -177,6 +247,30 @@ tsb_miss_page_table_walk_sun4v_fastpath:
 	ba,pt	%xcc, rtrap
 	 nop
 
+	/*
+	 * This is the same as above call to hugetlb_setup.
+	 * FIXME
+	 */
+no_shared_hugetlb:
+661:	rdpr		%pstate, %g5
+	wrpr		%g5, PSTATE_AG | PSTATE_MG, %pstate
+	.section	.sun4v_2insn_patch, "ax"
+	.word		661b
+	SET_GL(1)
+	nop
+	.previous
+
+	rdpr	%tl, %g7
+	cmp	%g7, 1
+	bne,pn	%xcc, winfix_trampoline
+	 mov	%g3, %g4
+	ba,pt	%xcc, etrap
+	 rd	%pc, %g7
+	call	hugetlb_shared_setup
+	 add	%sp, PTREGS_OFF, %o0
+	ba,pt	%xcc, rtrap
+	 nop
+
 60:
 #endif
 
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 10/14] mm: add shared context to vm_area_struct
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

Shared context usage is reflected in a vm area (vma).  To handle this,
a new flag (VM_SHARED_CTX) is added anlng with a pointer to a shared
context structure (vm_shared_mmu_ctx).

This commit does not contain the method by which a vma is marked for
shared context.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/mm.h       |  1 +
 include/linux/mm_types.h | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a92c8d7..9d82028 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -182,6 +182,7 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_ACCOUNT	0x00100000	/* Is a VM accounted object */
 #define VM_NORESERVE	0x00200000	/* should the VM suppress accounting */
 #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
+#define VM_SHARED_CTX	0x00800000	/* Shared TLB context */
 #define VM_ARCH_1	0x01000000	/* Architecture-specific flag */
 #define VM_ARCH_2	0x02000000
 #define VM_DONTDUMP	0x04000000	/* Do not include in the core dump */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4a8aced..0c30d43 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -291,6 +291,18 @@ struct vm_userfaultfd_ctx {
 struct vm_userfaultfd_ctx {};
 #endif /* CONFIG_USERFAULTFD */
 
+#ifdef CONFIG_SHARED_MMU_CTX
+#define NULL_VM_SHARED_MMU_CTX ((struct vm_shared_mmu_ctx) { NULL, })
+struct vm_shared_mmu_ctx {
+	struct shared_mmu_ctx *ctx;
+};
+#define vma_shared_ctx_val(vma)					\
+	((vma)->vm_shared_mmu_ctx.ctx ?				\
+	 (vma)->vm_shared_mmu_ctx.ctx->shared_ctx_val : 0UL)
+#else /* CONFIG_SHARED__MMU_CTX */
+struct vm_shared_mmu_ctx {};
+#endif /* CONFIG_SHARED_MMU_CTX */
+
 /*
  * This struct defines a memory VMM memory area. There is one of these
  * per VM-area/task.  A VM area is any part of the process virtual memory
@@ -358,6 +370,7 @@ struct vm_area_struct {
 	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
 #endif
 	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
+	struct vm_shared_mmu_ctx vm_shared_mmu_ctx;
 };
 
 struct core_thread {
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 11/14] sparc64: add routines to look for vmsa which can share context
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

When a shared context mapping is requested, a search of the other
vmas mapping the same object is searched.  For simplicity, vmas
can only share context if the following is true:
- They both request shared context mapping
- The are at the same virtual address
- They are of the same size
In addition, a task is only allowed to have a single vma with shared
context.

Some of these contstraints can be relaxed at a later date.  They
make the code simpler for now.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/sparc/include/asm/mmu_context_64.h |  1 +
 arch/sparc/include/asm/page_64.h        |  1 +
 arch/sparc/mm/hugetlbpage.c             | 78 ++++++++++++++++++++++++++++++++-
 arch/sparc/mm/init_64.c                 | 19 ++++++++
 mm/hugetlb.c                            |  9 ++++
 5 files changed, 106 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/include/asm/mmu_context_64.h b/arch/sparc/include/asm/mmu_context_64.h
index 0dc95cb5..46c2c7e 100644
--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -23,6 +23,7 @@ void get_new_mmu_shared_context(struct mm_struct *mm);
 void put_shared_context(struct mm_struct *mm);
 void set_mm_shared_ctx(struct mm_struct *mm, struct shared_mmu_ctx *ctx);
 void destroy_shared_context(struct mm_struct *mm);
+void set_vma_shared_ctx(struct vm_area_struct *vma);
 #endif
 #ifdef CONFIG_SMP
 void smp_new_mmu_context_version(void);
diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index c1263fc..ccceb76 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -33,6 +33,7 @@
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
 struct pt_regs;
 void hugetlb_setup(struct pt_regs *regs);
+void hugetlb_shared_setup(struct pt_regs *regs);
 #endif
 
 #define WANT_PAGE_VIRTUAL
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 2039d45..5681df6 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -127,6 +127,80 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
 				pgoff, flags);
 }
 
+#if defined(CONFIG_SHARED_MMU_CTX)
+static bool huge_vma_can_share_ctx(struct vm_area_struct *vma,
+					struct vm_area_struct *tvma)
+{
+	/*
+	 * Do not match unless there is an actual context value.  It
+	 * could be the case that tvma is a new mapping with VM_SHARED_CTX
+	 * set, but still not associated with a shared context ID.
+	 */
+	if (!vma_shared_ctx_val(tvma))
+		return false;
+
+	/*
+	 * For simple functionality now, vmas must be exactly the same
+	 */
+	if ((vma->vm_flags & VM_LOCKED_CLEAR_MASK) ==
+	    (tvma->vm_flags & VM_LOCKED_CLEAR_MASK) &&
+	    vma->vm_pgoff == tvma->vm_pgoff &&
+	    vma->vm_start == tvma->vm_start &&
+	    vma->vm_end == tvma->vm_end)
+		return true;
+
+	return false;
+}
+
+/*
+ * If vma is marked as desiring shared contexxt, search for a context to
+ * share.  If no context found, assign one.
+ */
+void huge_get_shared_ctx(struct mm_struct *mm, unsigned long addr)
+{
+	struct vm_area_struct *vma = find_vma(mm, addr);
+	struct address_space *mapping = vma->vm_file->f_mapping;
+	pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) +
+			vma->vm_pgoff;
+	struct vm_area_struct *tvma;
+
+	/*
+	 * FIXME
+	 *
+	 * For now limit a task to a single shared context mapping
+	 */
+	if (!(vma->vm_flags & VM_SHARED_CTX) || vma_shared_ctx_val(vma) ||
+	    mm_shared_ctx_val(mm))
+		return;
+
+	i_mmap_lock_write(mapping);
+	vma_interval_tree_foreach(tvma, &mapping->i_mmap, idx, idx) {
+		if (tvma == vma)
+			continue;
+
+		if (huge_vma_can_share_ctx(vma, tvma)) {
+			set_mm_shared_ctx(mm, tvma->vm_shared_mmu_ctx.ctx);
+			set_vma_shared_ctx(vma);
+			if (likely(mm_shared_ctx_val(mm))) {
+				load_secondary_context(mm);
+				/*
+				 * What about multiple matches ?
+				 */
+				break;
+			}
+		}
+	}
+
+	if (!mm_shared_ctx_val(mm)) {
+		get_new_mmu_shared_context(mm);
+		set_vma_shared_ctx(vma);
+		load_secondary_context(mm);
+	}
+
+	i_mmap_unlock_write(mapping);
+}
+#endif
+
 pte_t *huge_pte_alloc(struct mm_struct *mm,
 			unsigned long addr, unsigned long sz)
 {
@@ -164,7 +238,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 
 	if (!pte_present(*ptep) && pte_present(entry)) {
 #if defined(CONFIG_SHARED_MMU_CTX)
-		if (pte_val(entry) | _PAGE_SHR_CTX_4V)
+		if (is_sharedctx_pte(entry))
 			mm->context.shared_hugetlb_pte_count++;
 		else
 #endif
@@ -188,7 +262,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 	entry = *ptep;
 	if (pte_present(entry)) {
 #if defined(CONFIG_SHARED_MMU_CTX)
-		if (pte_val(entry) | _PAGE_SHR_CTX_4V)
+		if (is_sharedctx_pte(entry))
 			mm->context.shared_hugetlb_pte_count--;
 		else
 #endif
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 2b310e5..25ad5bd 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -813,6 +813,25 @@ void set_mm_shared_ctx(struct mm_struct *mm, struct shared_mmu_ctx *ctx)
 	atomic_inc(&ctx->refcount);
 	mm->context.shared_ctx = ctx;
 }
+
+/*
+ * Set the shared context value in the vma to that in the mm.
+ *
+ *
+ * Note that we are called from mmap with mmap_sem held.
+ */
+void set_vma_shared_ctx(struct vm_area_struct *vma)
+{
+	struct mm_struct *mm = vma->vm_mm;
+
+	BUG_ON(vma->vm_shared_mmu_ctx.ctx);
+
+	if (!mm_shared_ctx_val(mm))
+		return;
+
+	atomic_inc(&mm->context.shared_ctx->refcount);
+	vma->vm_shared_mmu_ctx.ctx = mm->context.shared_ctx;
+}
 #endif
 
 static int numa_enabled = 1;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 418bf01..3733ba1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3150,6 +3150,15 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
 	entry = pte_mkhuge(entry);
 	entry = arch_make_huge_pte(entry, vma, page, writable);
 
+#if defined(CONFIG_SHARED_MMU_CTX)
+	/*
+	 * FIXME
+	 * needs arch independent way of setting - perhaps arch_make_huge_pte
+	 */
+	if (vma->vm_flags & VM_SHARED_CTX)
+		pte_val(entry) |= _PAGE_SHR_CTX_4V;
+#endif
+
 	return entry;
 }
 
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 12/14] mm: add mmap and shmat arch hooks for shared context
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

Shared context will require some additional checking and processing
when mappings are created.  To faciliate this, add new mmap hooks
arch_pre_mmap_flags and arch_post_mmap to generic mm_hooks.  For
shmat, a new hook arch_shmat_check is added.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/powerpc/include/asm/mmu_context.h   | 12 ++++++++++++
 arch/s390/include/asm/mmu_context.h      | 12 ++++++++++++
 arch/unicore32/include/asm/mmu_context.h | 12 ++++++++++++
 arch/x86/include/asm/mmu_context.h       | 12 ++++++++++++
 include/asm-generic/mm_hooks.h           | 18 +++++++++++++++---
 ipc/shm.c                                | 13 +++++++++++++
 mm/mmap.c                                | 10 ++++++++++
 7 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 5c45114..d5ce33a 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -133,6 +133,18 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
 #endif
 }
 
+static inline unsigned long arch_pre_mmap_flags(struct file *file,
+						unsigned long flags,
+						vm_flags_t *vm_flags)
+{
+	return 0;	/* no errors */
+}
+
+static inline void arch_post_mmap(struct mm_struct *mm, unsigned long addr,
+					vm_flags_t vm_flags)
+{
+}
+
 static inline void arch_dup_mmap(struct mm_struct *oldmm,
 				 struct mm_struct *mm)
 {
diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
index 515fea5..0a2322d 100644
--- a/arch/s390/include/asm/mmu_context.h
+++ b/arch/s390/include/asm/mmu_context.h
@@ -129,6 +129,18 @@ static inline void activate_mm(struct mm_struct *prev,
 	set_user_asce(next);
 }
 
+static inline unsigned long arch_pre_mmap_flags(struct file *file,
+						unsigned long flags,
+						vm_flags_t *vm_flags)
+{
+	return 0;	/* no errors */
+}
+
+static inline void arch_post_mmap(struct mm_struct *mm, unsigned long addr,
+					vm_flags_t vm_flags)
+{
+}
+
 static inline void arch_dup_mmap(struct mm_struct *oldmm,
 				 struct mm_struct *mm)
 {
diff --git a/arch/unicore32/include/asm/mmu_context.h b/arch/unicore32/include/asm/mmu_context.h
index 62dfc64..8b57b9d 100644
--- a/arch/unicore32/include/asm/mmu_context.h
+++ b/arch/unicore32/include/asm/mmu_context.h
@@ -81,6 +81,18 @@ do { \
 	} \
 } while (0)
 
+static inline unsigned long arch_pre_mmap_flags(struct file *file,
+						unsigned long flags,
+						vm_flags_t *vm_flags)
+{
+	return 0;	/* no errors */
+}
+
+static inline void arch_post_mmap(struct mm_struct *mm, unsigned long addr,
+					vm_flags_t vm_flags)
+{
+}
+
 static inline void arch_dup_mmap(struct mm_struct *oldmm,
 				 struct mm_struct *mm)
 {
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 8e0a9fe..fe60309 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -151,6 +151,18 @@ do {						\
 } while (0)
 #endif
 
+static inline unsigned long arch_pre_mmap_flags(struct file *file,
+						unsigned long flags,
+						vm_flags_t *vm_flags)
+{
+	return 0;	/* no errors */
+}
+
+static inline void arch_post_mmap(struct mm_struct *mm, unsigned long addr,
+					vm_flags_t vm_flags)
+{
+}
+
 static inline void arch_dup_mmap(struct mm_struct *oldmm,
 				 struct mm_struct *mm)
 {
diff --git a/include/asm-generic/mm_hooks.h b/include/asm-generic/mm_hooks.h
index cc5d9a1..c742e52 100644
--- a/include/asm-generic/mm_hooks.h
+++ b/include/asm-generic/mm_hooks.h
@@ -1,11 +1,23 @@
 /*
- * Define generic no-op hooks for arch_dup_mmap, arch_exit_mmap
- * and arch_unmap to be included in asm-FOO/mmu_context.h for any
- * arch FOO which doesn't need to hook these.
+ * Define generic no-op hooks for mmap and protection related routines
+ * to be included in asm-FOO/mmu_context.h for any arch FOO which doesn't
+ * need to hook these.
  */
 #ifndef _ASM_GENERIC_MM_HOOKS_H
 #define _ASM_GENERIC_MM_HOOKS_H
 
+static inline unsigned long arch_pre_mmap_flags(struct file *file,
+						unsigned long flags,
+						vm_flags_t *vm_flags)
+{
+	return 0;	/* no errors */
+}
+
+static inline void arch_post_mmap(struct mm_struct *mm, unsigned long addr,
+					vm_flags_t vm_flags)
+{
+}
+
 static inline void arch_dup_mmap(struct mm_struct *oldmm,
 				 struct mm_struct *mm)
 {
diff --git a/ipc/shm.c b/ipc/shm.c
index dbac886..dab6cd1 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -72,6 +72,14 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp);
 static int sysvipc_shm_proc_show(struct seq_file *s, void *it);
 #endif
 
+#ifndef arch_shmat_check
+#define arch_shmat_check(file, shmflg, flags) (0)
+#endif
+
+#ifndef arch_shmat_check
+#define arch_shmat_check(file, shmflg, flags) (0)
+#endif
+
 void shm_init_ns(struct ipc_namespace *ns)
 {
 	ns->shm_ctlmax = SHMMAX;
@@ -1149,6 +1157,11 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
 		goto out_unlock;
 	}
 
+	/* arch specific check and possible flag modification */
+	err = arch_shmat_check(shp->shm_file, shmflg, &flags);
+	if (err)
+		goto out_unlock;
+
 	err = -EACCES;
 	if (ipcperms(ns, &shp->shm_perm, acc_mode))
 		goto out_unlock;
diff --git a/mm/mmap.c b/mm/mmap.c
index 1af87c1..7fc946b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1307,6 +1307,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			unsigned long pgoff, unsigned long *populate)
 {
 	struct mm_struct *mm = current->mm;
+	unsigned long ret;
 	int pkey = 0;
 
 	*populate = 0;
@@ -1314,6 +1315,11 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	if (!len)
 		return -EINVAL;
 
+	/* arch specific check and possible modification of vm_flags */
+	ret = arch_pre_mmap_flags(file, flags, &vm_flags);
+	if (ret)
+		return ret;
+
 	/*
 	 * Does the application expect PROT_READ to imply PROT_EXEC?
 	 *
@@ -1452,6 +1458,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	    ((vm_flags & VM_LOCKED) ||
 	     (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
 		*populate = len;
+
+	if (!IS_ERR_VALUE(addr))
+		arch_post_mmap(mm, addr, vm_flags);
+
 	return addr;
 }
 
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 13/14] sparc64 mm: add shared context support to mmap() and shmat() APIs
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

Add new mmap(MAP_SHAREDCTX) and shm(SHM_SHAREDCTX) flags to specify
desire for shared context mappings.  This only works on HUGETLB
mappings.  In addition, the mappings must be SHARED and at a FIXED
address otherwize EINVAL will be returned.

Also, populate the sparc specific hooks to mmap and shmat that perform
shared context processing.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/sparc/include/asm/hugetlb.h        |  4 +++
 arch/sparc/include/asm/mman.h           |  6 ++++
 arch/sparc/include/asm/mmu_context_64.h | 62 ++++++++++++++++++++++++++++++++-
 arch/sparc/include/uapi/asm/mman.h      |  1 +
 arch/sparc/kernel/sys_sparc_64.c        | 17 +++++++++
 arch/sparc/mm/init_64.c                 | 36 +++++++++++++++++++
 include/uapi/linux/shm.h                |  1 +
 7 files changed, 126 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
index dcbf985..13157b3 100644
--- a/arch/sparc/include/asm/hugetlb.h
+++ b/arch/sparc/include/asm/hugetlb.h
@@ -78,4 +78,8 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
 			    unsigned long end, unsigned long floor,
 			    unsigned long ceiling);
 
+#if defined(CONFIG_SHARED_MMU_CTX)
+void huge_get_shared_ctx(struct mm_struct *mm, unsigned long addr);
+#endif
+
 #endif /* _ASM_SPARC64_HUGETLB_H */
diff --git a/arch/sparc/include/asm/mman.h b/arch/sparc/include/asm/mman.h
index 59bb593..cbe384e 100644
--- a/arch/sparc/include/asm/mman.h
+++ b/arch/sparc/include/asm/mman.h
@@ -6,5 +6,11 @@
 #ifndef __ASSEMBLY__
 #define arch_mmap_check(addr,len,flags)	sparc_mmap_check(addr,len)
 int sparc_mmap_check(unsigned long addr, unsigned long len);
+
+#if defined(CONFIG_SHARED_MMU_CTX)
+#define arch_shmat_check(file, shmflg, flags) \
+				sparc_shmat_check(file, shmflg, flags)
+int sparc_shmat_check(struct file *file, int shmflg, unsigned long *flags);
+#endif
 #endif
 #endif /* __SPARC_MMAN_H__ */
diff --git a/arch/sparc/include/asm/mmu_context_64.h b/arch/sparc/include/asm/mmu_context_64.h
index 46c2c7e..8ab05f2 100644
--- a/arch/sparc/include/asm/mmu_context_64.h
+++ b/arch/sparc/include/asm/mmu_context_64.h
@@ -7,7 +7,6 @@
 
 #include <linux/spinlock.h>
 #include <asm/spitfire.h>
-#include <asm-generic/mm_hooks.h>
 
 static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 {
@@ -24,6 +23,13 @@ void put_shared_context(struct mm_struct *mm);
 void set_mm_shared_ctx(struct mm_struct *mm, struct shared_mmu_ctx *ctx);
 void destroy_shared_context(struct mm_struct *mm);
 void set_vma_shared_ctx(struct vm_area_struct *vma);
+void sparc64_exit_mmap(struct mm_struct *mm);
+void sparc64_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
+			unsigned long start, unsigned long end);
+unsigned long sparc64_pre_mmap_flags(struct file *file, unsigned long flags,
+					vm_flags_t *vm_flags);
+void sparc64_post_mmap(struct mm_struct *mm, unsigned long addr,
+					vm_flags_t vm_flags);
 #endif
 #ifdef CONFIG_SMP
 void smp_new_mmu_context_version(void);
@@ -208,6 +214,60 @@ static inline void activate_mm(struct mm_struct *active_mm, struct mm_struct *mm
 	spin_unlock_irqrestore(&mm->context.lock, flags);
 }
 
+#if defined(CONFIG_SHARED_MMU_CTX)
+/*
+ * mm_hooks only needed for CONFIG_SHARED_MMU_CTX
+ */
+static inline unsigned long arch_pre_mmap_flags(struct file *file,
+						unsigned long flags,
+						vm_flags_t *vm_flags)
+{
+	return sparc64_pre_mmap_flags(file, flags, vm_flags);
+}
+
+static inline void arch_post_mmap(struct mm_struct *mm, unsigned long addr,
+							vm_flags_t vm_flags)
+{
+	sparc64_post_mmap(mm, addr, vm_flags);
+}
+
+static inline void arch_dup_mmap(struct mm_struct *oldmm,
+				 struct mm_struct *mm)
+{
+}
+
+static inline void arch_exit_mmap(struct mm_struct *mm)
+{
+	sparc64_exit_mmap(mm);
+}
+
+static inline void arch_unmap(struct mm_struct *mm,
+			struct vm_area_struct *vma,
+			unsigned long start, unsigned long end)
+{
+	sparc64_unmap(mm, vma, start, end);
+}
+
+static inline void arch_bprm_mm_init(struct mm_struct *mm,
+				     struct vm_area_struct *vma)
+{
+}
+
+static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
+		bool write, bool execute, bool foreign)
+{
+	/* by default, allow everything */
+	return true;
+}
+
+static inline bool arch_pte_access_permitted(pte_t pte, bool write)
+{
+	/* by default, allow everything */
+	return true;
+}
+#else
+#include <asm-generic/mm_hooks.h>
+#endif
 #endif /* !(__ASSEMBLY__) */
 
 #endif /* !(__SPARC64_MMU_CONTEXT_H) */
diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
index 9765896..a52c6fe 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -23,6 +23,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define	MAP_SHAREDCTX	0x80000		/* request shared cxt mapping */
 
 
 #endif /* _UAPI__SPARC_MMAN_H__ */
diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
index fe8b8ee..23fa538 100644
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -25,6 +25,7 @@
 #include <linux/random.h>
 #include <linux/export.h>
 #include <linux/context_tracking.h>
+#include <linux/hugetlb.h>
 
 #include <asm/uaccess.h>
 #include <asm/utrap.h>
@@ -444,6 +445,22 @@ int sparc_mmap_check(unsigned long addr, unsigned long len)
 	return 0;
 }
 
+int sparc_shmat_check(struct file *file, int shmflg, unsigned long *flags)
+{
+	if (shmflg & SHM_SHAREDCTX) {
+		if ((*flags & (MAP_SHARED | MAP_FIXED)) !=
+		    (unsigned long)(MAP_SHARED | MAP_FIXED))
+			return -EINVAL;
+
+		if (!is_file_hugepages(file))
+			return -EINVAL;
+
+		*flags |= MAP_SHAREDCTX;
+	}
+
+	return 0;
+}
+
 /* Linux version of mmap */
 SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 		unsigned long, prot, unsigned long, flags, unsigned long, fd,
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 25ad5bd..0637762 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -27,6 +27,7 @@
 #include <linux/memblock.h>
 #include <linux/mmzone.h>
 #include <linux/gfp.h>
+#include <linux/mman.h>
 
 #include <asm/head.h>
 #include <asm/page.h>
@@ -832,6 +833,41 @@ void set_vma_shared_ctx(struct vm_area_struct *vma)
 	atomic_inc(&mm->context.shared_ctx->refcount);
 	vma->vm_shared_mmu_ctx.ctx = mm->context.shared_ctx;
 }
+
+unsigned long sparc64_pre_mmap_flags(struct file *file, unsigned long flags,
+					vm_flags_t *vm_flags)
+{
+	if (flags & MAP_SHAREDCTX) {
+		/* Must be a shared huge page mapping */
+		if (!(flags & (MAP_SHARED | MAP_FIXED)))
+			return -EINVAL;
+		if (!(flags & MAP_HUGETLB)  &&
+		    !(file && is_file_hugepages(file)))
+			return -EINVAL;
+
+		*vm_flags |= VM_SHARED_CTX;
+	}
+
+	return 0;
+}
+
+void sparc64_post_mmap(struct mm_struct *mm, unsigned long addr,
+							vm_flags_t vm_flags)
+{
+	if (vm_flags & VM_SHARED_CTX)
+		huge_get_shared_ctx(mm, addr);
+}
+
+void sparc64_exit_mmap(struct mm_struct *mm)
+{
+	put_shared_context(mm);
+}
+
+void sparc64_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
+			unsigned long start, unsigned long end)
+{
+	put_shared_context(mm);
+}
 #endif
 
 static int numa_enabled = 1;
diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
index 1fbf24e..3373567 100644
--- a/include/uapi/linux/shm.h
+++ b/include/uapi/linux/shm.h
@@ -49,6 +49,7 @@ struct shmid_ds {
 #define	SHM_RND		020000	/* round attach address to SHMLBA boundary */
 #define	SHM_REMAP	040000	/* take-over region on attach */
 #define	SHM_EXEC	0100000	/* execution access */
+#define	SHM_SHAREDCTX	0200000	/* share context (TLB entries) if possible */
 
 /* super user shmctl commands */
 #define SHM_LOCK 	11
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [RFC PATCH 14/14] sparc64: add SHARED_MMU_CTX Kconfig option
From: Mike Kravetz @ 2016-12-16 18:35 UTC (permalink / raw)
  To: sparclinux, linux-mm, linux-kernel
  Cc: David S . Miller, Bob Picco, Nitin Gupta, Vijay Kumar,
	Julian Calaby, Adam Buchbinder, Kirill A . Shutemov, Michal Hocko,
	Andrew Morton, Mike Kravetz
In-Reply-To: <1481913337-9331-1-git-send-email-mike.kravetz@oracle.com>

Depends on SPARC64 && HUGETLB_PAGE

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/sparc/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 165ecdd..f39dcdf 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -155,6 +155,9 @@ config PGTABLE_LEVELS
 	default 4 if 64BIT
 	default 3
 
+config SHARED_MMU_CTX
+	def_bool y if SPARC64 && HUGETLB_PAGE
+
 source "init/Kconfig"
 
 source "kernel/Kconfig.freezer"
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* Re: OOM: Better, but still there on
From: Nils Holland @ 2016-12-16 18:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, Chris Mason, David Sterba, linux-btrfs
In-Reply-To: <20161216155808.12809-1-mhocko@kernel.org>

On Fri, Dec 16, 2016 at 04:58:06PM +0100, Michal Hocko wrote:
> On Fri 16-12-16 08:39:41, Michal Hocko wrote:
> [...]
> > That being said, the OOM killer invocation is clearly pointless and
> > pre-mature. We normally do not invoke it normally for GFP_NOFS requests
> > exactly for these reasons. But this is GFP_NOFS|__GFP_NOFAIL which
> > behaves differently. I am about to change that but my last attempt [1]
> > has to be rethought.
> > 
> > Now another thing is that the __GFP_NOFAIL which has this nasty side
> > effect has been introduced by me d1b5c5671d01 ("btrfs: Prevent from
> > early transaction abort") in 4.3 so I am quite surprised that this has
> > shown up only in 4.8. Anyway there might be some other changes in the
> > btrfs which could make it more subtle.
> > 
> > I believe the right way to go around this is to pursue what I've started
> > in [1]. I will try to prepare something for testing today for you. Stay
> > tuned. But I would be really happy if somebody from the btrfs camp could
> > check the NOFS aspect of this allocation. We have already seen
> > allocation stalls from this path quite recently
> 
> Could you try to run with the two following patches?

I tried the two patches you sent, and ... well, things are different
now, but probably still a bit problematic. ;-)

Once again, I freshly booted both of my machines and told Gentoo's
portage to unpack and build the firefox sources. The first machine,
the one from which yesterday's OOM report came, became unresponsive
during the tarball unpack phase and had to be power cycled.
Unfortunately, there's nothing concerning its OOMs in the logs. :-(

The second machine actually finished the unpack phase successfully and
started the build process (which, every now and then, had also worked
with previous problematic kernels). However, after it had been
building for a while and I decided to increase the stress level by
starting X, firefox as well as a terminal and unpack a kernel source
tarball in it, it also started OOMing, this time once more with a
genuine kernel panic. Luckily, this machine also caught something in
the logs, which I'm including below.

Despite the fact that I'm no expert, I can see that there's no more
GFP_NOFS being logged, which seems to be what the patches tried to
achieve. What the still present OOMs mean remains up for
interpretation by the experts, all I can say is that in the (pre-4.8?)
past, doing all of the things I just did would probably slow down my
machine quite a bit, but I can't remember to have ever seen it OOM or
even crash completely.

Dec 16 18:56:24 boerne.fritz.box kernel: Purging GPU memory, 37 pages freed, 10219 pages still pinned.
Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd invoked oom-killer: gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=0, order=1, oom_score_adj=0
Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd cpuset=/ mems_allowed=0
Dec 16 18:56:29 boerne.fritz.box kernel: CPU: 1 PID: 2 Comm: kthreadd Not tainted 4.9.0-gentoo #3
Dec 16 18:56:29 boerne.fritz.box kernel: Hardware name: TOSHIBA Satellite L500/KSWAA, BIOS V1.80 10/28/2009
Dec 16 18:56:29 boerne.fritz.box kernel:  f4105d6c c1433406 f4105e9c c6611280 f4105d9c c1170011 f4105df0 00200296
Dec 16 18:56:29 boerne.fritz.box kernel:  f4105d9c c1438fff f4105da0 edc1bc80 ee32ce00 c6611280 c1ad1899 f4105e9c
Dec 16 18:56:29 boerne.fritz.box kernel:  f4105de0 c1114407 c10513a5 f4105dcc c11140a1 00000001 00000000 00000000
Dec 16 18:56:29 boerne.fritz.box kernel: Call Trace:
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1433406>] dump_stack+0x47/0x61
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1170011>] dump_header+0x5f/0x175
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1438fff>] ? ___ratelimit+0x7f/0xe0
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1114407>] oom_kill_process+0x207/0x3c0
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c10513a5>] ? has_capability_noaudit+0x15/0x20
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c11140a1>] ? oom_badness.part.13+0xb1/0x120
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c11148c4>] out_of_memory+0xd4/0x270
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1118615>] __alloc_pages_nodemask+0xcf5/0xd60
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c10464f5>] copy_process.part.52+0xd5/0x1410
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1080779>] ? pick_next_task_fair+0x479/0x510
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1062ba0>] ? __kthread_parkme+0x60/0x60
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c10479d7>] _do_fork+0xc7/0x360
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1062ba0>] ? __kthread_parkme+0x60/0x60
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c1047ca0>] kernel_thread+0x30/0x40
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c10637c6>] kthreadd+0x106/0x150
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c10636c0>] ? kthread_park+0x50/0x50
Dec 16 18:56:29 boerne.fritz.box kernel:  [<c19422b7>] ret_from_fork+0x1b/0x28
Dec 16 18:56:29 boerne.fritz.box kernel: Mem-Info:
Dec 16 18:56:29 boerne.fritz.box kernel: active_anon:132176 inactive_anon:11640 isolated_anon:0
                                          active_file:295257 inactive_file:389350 isolated_file:20
                                          unevictable:0 dirty:3956 writeback:0 unstable:0
                                          slab_reclaimable:54632 slab_unreclaimable:21963
                                          mapped:36724 shmem:11853 pagetables:914 bounce:0
                                          free:77600 free_pcp:327 free_cma:0
Dec 16 18:56:29 boerne.fritz.box kernel: Node 0 active_anon:528704kB inactive_anon:46560kB active_file:1181028kB inactive_file:1557400kB unevictable:0kB isolated(anon):0kB isolated(file):80kB mapped:146896kB dirty:15824kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 172032kB anon_thp: 47412kB writeback_tmp:0kB unstable:0kB pages_scanned:15066965 all_unreclaimable? yes
Dec 16 18:56:29 boerne.fritz.box kernel: DMA free:3976kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:4788kB inactive_file:0kB unevictable:0kB writepending:160kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:5356kB slab_unreclaimable:1616kB kernel_stack:32kB pagetables:84kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 16 18:56:29 boerne.fritz.box kernel: lowmem_reserve[]: 0 808 3849 3849
Dec 16 18:56:29 boerne.fritz.box kernel: Normal free:41008kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:470556kB inactive_file:148kB unevictable:0kB writepending:1616kB present:897016kB managed:831480kB mlocked:0kB slab_reclaimable:213172kB slab_unreclaimable:86236kB kernel_stack:1864kB pagetables:3572kB bounce:0kB free_pcp:532kB local_pcp:456kB free_cma:0kB
Dec 16 18:56:29 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 24330 24330
Dec 16 18:56:29 boerne.fritz.box kernel: HighMem free:265416kB min:512kB low:39184kB high:77856kB active_anon:528704kB inactive_anon:46560kB active_file:705684kB inactive_file:1557292kB unevictable:0kB writepending:14048kB present:3114256kB managed:3114256kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:776kB local_pcp:660kB free_cma:0kB
Dec 16 18:56:29 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 0 0
Dec 16 18:56:29 boerne.fritz.box kernel: DMA: 2*4kB (UE) 2*8kB (U) 1*16kB (E) 1*32kB (U) 1*64kB (U) 0*128kB 1*256kB (E) 1*512kB (E) 1*1024kB (U) 1*2048kB (M) 0*4096kB = 3976kB
Dec 16 18:56:29 boerne.fritz.box kernel: Normal: 32*4kB (ME) 28*8kB (UM) 15*16kB (UM) 141*32kB (UME) 141*64kB (UM) 80*128kB (UM) 19*256kB (UME) 3*512kB (UME) 2*1024kB (ME) 2*2048kB (ME) 1*4096kB (M) = 41008kB
Dec 16 18:56:29 boerne.fritz.box kernel: HighMem: 340*4kB (UME) 339*8kB (UME) 258*16kB (UME) 192*32kB (UME) 69*64kB (UME) 15*128kB (UME) 6*256kB (ME) 5*512kB (UME) 7*1024kB (UME) 4*2048kB (UE) 55*4096kB (UM) = 265416kB
Dec 16 18:56:29 boerne.fritz.box kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 16 18:56:29 boerne.fritz.box kernel: 696480 total pagecache pages
Dec 16 18:56:29 boerne.fritz.box kernel: 0 pages in swap cache
Dec 16 18:56:29 boerne.fritz.box kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 16 18:56:29 boerne.fritz.box kernel: Free swap  = 3781628kB
Dec 16 18:56:29 boerne.fritz.box kernel: Total swap = 3781628kB
Dec 16 18:56:29 boerne.fritz.box kernel: 1006816 pages RAM
Dec 16 18:56:29 boerne.fritz.box kernel: 778564 pages HighMem/MovableOnly
Dec 16 18:56:29 boerne.fritz.box kernel: 16403 pages reserved
Dec 16 18:56:29 boerne.fritz.box kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 16 18:56:29 boerne.fritz.box kernel: [ 1874]     0  1874     6166      987       9       3        0             0 systemd-journal
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2497]     0  2497     2965      911       8       3        0         -1000 systemd-udevd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2582]   107  2582     3874      958       8       3        0             0 systemd-timesyn
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2585]   108  2585     1269      883       6       3        0          -900 dbus-daemon
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2586]     0  2586    22054     3277      20       3        0             0 NetworkManager
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2587]     0  2587     1521      972       7       3        0             0 systemd-logind
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2589]    88  2589     1158      627       6       3        0             0 nullmailer-send
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2612]     0  2612     1510      460       5       3        0             0 fcron
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2665]     0  2665      768      580       5       3        0             0 dhcpcd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2668]     0  2668      639      408       5       3        0             0 vnstatd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2669]     0  2669     1460     1063       6       3        0         -1000 sshd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2670]     0  2670     1235      838       6       3        0             0 login
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2672]     0  2672     1972     1267       7       3        0             0 systemd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2700]     0  2700     2279      586       7       3        0             0 (sd-pam)
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2733]     0  2733     1836      890       7       3        0             0 bash
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2753]   109  2753    16724     3089      19       3        0             0 polkitd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2776]     0  2776     2153     1349       7       3        0             0 wpa_supplicant
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2941]     0  2941    16268    15095      36       3        0             0 emerge
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2942]     0  2942     1235      833       5       3        0             0 login
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2949]  1000  2949     2033     1378       7       3        0             0 systemd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2973]  1000  2973     2279      589       7       3        0             0 (sd-pam)
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2989]  1000  2989     1836      907       7       3        0             0 bash
Dec 16 18:56:29 boerne.fritz.box kernel: [ 2997]  1000  2997    25339     2169      17       3        0             0 pulseaudio
Dec 16 18:56:29 boerne.fritz.box kernel: [ 3000]   111  3000     5763      655       9       3        0             0 rtkit-daemon
Dec 16 18:56:29 boerne.fritz.box kernel: [ 3019]  1000  3019     3575     1403      11       3        0             0 gconf-helper
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5626]  1000  5626     1743      709       8       3        0             0 startx
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5647]  1000  5647     1001      579       6       3        0             0 xinit
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5648]  1000  5648    22873     7477      43       3        0             0 X
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5674]  1000  5674    10584     4543      21       3        0             0 awesome
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5718]  1000  5718     1571      610       7       3        0             0 dbus-launch
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5720]  1000  5720     1238      645       6       3        0             0 dbus-daemon
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5725]  1000  5725     1571      634       7       3        0             0 dbus-launch
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5726]  1000  5726     1238      649       6       3        0             0 dbus-daemon
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5823]  1000  5823    35683     8366      42       3        0             0 nm-applet
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5825]  1000  5825    21454     7358      31       3        0             0 xfce4-terminal
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5827]  1000  5827    11257     1911      14       3        0             0 at-spi-bus-laun
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5832]  1000  5832     1238      831       6       3        0             0 dbus-daemon
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5838]  1000  5838     7480     2110      12       3        0             0 at-spi2-registr
Dec 16 18:56:29 boerne.fritz.box kernel: [ 5840]  1000  5840    10179     1459      13       3        0             0 gvfsd
Dec 16 18:56:29 boerne.fritz.box kernel: [ 6181]  1000  6181     1836      883       7       3        0             0 bash
Dec 16 18:56:29 boerne.fritz.box kernel: [ 7874]  1000  7874     2246     1185       8       3        0             0 ssh
Dec 16 18:56:29 boerne.fritz.box kernel: [12950]  1000 12950   197232    73307     252       3        0             0 firefox
Dec 16 18:56:29 boerne.fritz.box kernel: [13020]   250 13020      549      377       4       3        0             0 sandbox
Dec 16 18:56:29 boerne.fritz.box kernel: [13022]   250 13022     2629     1567       8       3        0             0 ebuild.sh
Dec 16 18:56:29 boerne.fritz.box kernel: [13040]  1000 13040     1836      933       7       3        0             0 bash
Dec 16 18:56:29 boerne.fritz.box kernel: [13048]   250 13048     3002     1718       8       3        0             0 ebuild.sh
Dec 16 18:56:29 boerne.fritz.box kernel: [13052]   250 13052     1122      732       5       3        0             0 emake
Dec 16 18:56:29 boerne.fritz.box kernel: [13054]   250 13054      921      697       5       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13118]   250 13118     1048      783       5       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13181]   250 13181     1043      789       5       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13208]   250 13208     1095      855       6       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13255]   250 13255      772      555       5       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13299]   250 13299      913      689       5       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13493]   250 13493      876      619       5       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13494]   250 13494    15191    14639      34       3        0             0 python
Dec 16 18:56:29 boerne.fritz.box kernel: [13532]   250 13532      808      594       4       3        0             0 make
Dec 16 18:56:29 boerne.fritz.box kernel: [13593]  1000 13593     1533      624       7       3        0             0 tar
Dec 16 18:56:29 boerne.fritz.box kernel: [13594]  1000 13594    17834    16906      38       3        0             0 xz
Dec 16 18:56:29 boerne.fritz.box kernel: [13604]   250 13604    12439    11843      27       3        0             0 python
Dec 16 18:56:29 boerne.fritz.box kernel: [13651]   250 13651      253        5       1       3        0             0 sh
Dec 16 18:56:29 boerne.fritz.box kernel: Out of memory: Kill process 12950 (firefox) score 38 or sacrifice child
Dec 16 18:56:29 boerne.fritz.box kernel: Killed process 12950 (firefox) total-vm:788928kB, anon-rss:192656kB, file-rss:100548kB, shmem-rss:24kB
Dec 16 18:56:29 boerne.fritz.box kernel: oom_reaper: reaped process 12950 (firefox), now anon-rss:0kB, file-rss:96kB, shmem-rss:24kB
Dec 16 18:56:31 boerne.fritz.box kernel: xfce4-terminal invoked oom-killer: gfp_mask=0x25000c0(GFP_KERNEL_ACCOUNT), nodemask=0, order=0, oom_score_adj=0
Dec 16 18:56:31 boerne.fritz.box kernel: xfce4-terminal cpuset=/ mems_allowed=0
Dec 16 18:56:31 boerne.fritz.box kernel: CPU: 0 PID: 5825 Comm: xfce4-terminal Not tainted 4.9.0-gentoo #3
Dec 16 18:56:31 boerne.fritz.box kernel: Hardware name: TOSHIBA Satellite L500/KSWAA, BIOS V1.80 10/28/2009
Dec 16 18:56:31 boerne.fritz.box kernel:  c6941c18 c1433406 c6941d48 c5972500 c6941c48 c1170011 c6941c9c 00200286
Dec 16 18:56:31 boerne.fritz.box kernel:  c6941c48 c1438fff c6941c4c edc1a940 ee32d400 c5972500 c1ad1899 c6941d48
Dec 16 18:56:31 boerne.fritz.box kernel:  c6941c8c c1114407 c10513a5 c6941c78 c11140a1 00000006 00000000 00000000
Dec 16 18:56:31 boerne.fritz.box kernel: Call Trace:
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1433406>] dump_stack+0x47/0x61
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1170011>] dump_header+0x5f/0x175
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1438fff>] ? ___ratelimit+0x7f/0xe0
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1114407>] oom_kill_process+0x207/0x3c0
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c10513a5>] ? has_capability_noaudit+0x15/0x20
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c11140a1>] ? oom_badness.part.13+0xb1/0x120
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c11148c4>] out_of_memory+0xd4/0x270
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1118615>] __alloc_pages_nodemask+0xcf5/0xd60
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1758900>] ? skb_queue_purge+0x30/0x30
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c175dcde>] alloc_skb_with_frags+0xee/0x1a0
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1753dba>] sock_alloc_send_pskb+0x19a/0x1c0
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1186120>] ? poll_select_copy_remaining+0x120/0x120
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1825880>] ? wait_for_unix_gc+0x20/0x90
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1823fc0>] unix_stream_sendmsg+0x2a0/0x350
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1750b3d>] sock_sendmsg+0x2d/0x40
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1750bb7>] sock_write_iter+0x67/0xc0
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1172c42>] do_readv_writev+0x1e2/0x380
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1750b50>] ? sock_sendmsg+0x40/0x40
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1033763>] ? lapic_next_event+0x13/0x20
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c10ae675>] ? clockevents_program_event+0x95/0x190
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c10a074a>] ? __hrtimer_run_queues+0x20a/0x280
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1173d16>] vfs_writev+0x36/0x60
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1173d85>] do_writev+0x45/0xc0
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c1173efb>] SyS_writev+0x1b/0x20
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c10018ec>] do_fast_syscall_32+0x7c/0x130
Dec 16 18:56:31 boerne.fritz.box kernel:  [<c194232b>] sysenter_past_esp+0x40/0x6a
Dec 16 18:56:31 boerne.fritz.box kernel: Mem-Info:
Dec 16 18:56:31 boerne.fritz.box kernel: active_anon:72795 inactive_anon:7267 isolated_anon:0
                                          active_file:297627 inactive_file:387672 isolated_file:0
                                          unevictable:0 dirty:77 writeback:18 unstable:0
                                          slab_reclaimable:54648 slab_unreclaimable:21983
                                          mapped:17819 shmem:8215 pagetables:662 bounce:8
                                          free:141692 free_pcp:107 free_cma:0
Dec 16 18:56:31 boerne.fritz.box kernel: Node 0 active_anon:291180kB inactive_anon:29068kB active_file:1190508kB inactive_file:1550688kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:71276kB dirty:308kB writeback:72kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 122880kB anon_thp: 32860kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Dec 16 18:56:31 boerne.fritz.box kernel: DMA free:4020kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:4804kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:5356kB slab_unreclaimable:1572kB kernel_stack:32kB pagetables:84kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: lowmem_reserve[]: 0 808 3849 3849
Dec 16 18:56:32 boerne.fritz.box kernel: Normal free:41028kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:472164kB inactive_file:108kB unevictable:0kB writepending:112kB present:897016kB managed:831480kB mlocked:0kB slab_reclaimable:213236kB slab_unreclaimable:86360kB kernel_stack:1584kB pagetables:2564kB bounce:32kB free_pcp:180kB local_pcp:24kB free_cma:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 24330 24330
Dec 16 18:56:32 boerne.fritz.box kernel: HighMem free:521720kB min:512kB low:39184kB high:77856kB active_anon:291180kB inactive_anon:29068kB active_file:713448kB inactive_file:1550556kB unevictable:0kB writepending:76kB present:3114256kB managed:3114256kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:248kB local_pcp:156kB free_cma:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 0 0
Dec 16 18:56:32 boerne.fritz.box kernel: DMA: 13*4kB (UE) 2*8kB (U) 1*16kB (E) 1*32kB (U) 1*64kB (U) 0*128kB 1*256kB (E) 1*512kB (E) 1*1024kB (U) 1*2048kB (M) 0*4096kB = 4020kB
Dec 16 18:56:32 boerne.fritz.box kernel: Normal: 37*4kB (UME) 24*8kB (ME) 17*16kB (UME) 137*32kB (UME) 143*64kB (UME) 82*128kB (UM) 18*256kB (UM) 3*512kB (UME) 2*1024kB (ME) 2*2048kB (ME) 1*4096kB (M) = 41028kB
Dec 16 18:56:32 boerne.fritz.box kernel: HighMem: 3230*4kB (ME) 1616*8kB (M) 680*16kB (UM) 398*32kB (UME) 145*64kB (UM) 59*128kB (UM) 25*256kB (ME) 19*512kB (UME) 9*1024kB (UME) 36*2048kB (UME) 87*4096kB (UME) = 521720kB
Dec 16 18:56:32 boerne.fritz.box kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 16 18:56:32 boerne.fritz.box kernel: 693537 total pagecache pages
Dec 16 18:56:32 boerne.fritz.box kernel: 0 pages in swap cache
Dec 16 18:56:32 boerne.fritz.box kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 16 18:56:32 boerne.fritz.box kernel: Free swap  = 3781628kB
Dec 16 18:56:32 boerne.fritz.box kernel: Total swap = 3781628kB
Dec 16 18:56:32 boerne.fritz.box kernel: 1006816 pages RAM
Dec 16 18:56:32 boerne.fritz.box kernel: 778564 pages HighMem/MovableOnly
Dec 16 18:56:32 boerne.fritz.box kernel: 16403 pages reserved
Dec 16 18:56:32 boerne.fritz.box kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 16 18:56:32 boerne.fritz.box kernel: [ 1874]     0  1874     6166     1007       9       3        0             0 systemd-journal
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2497]     0  2497     2965      911       8       3        0         -1000 systemd-udevd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2582]   107  2582     3874      958       8       3        0             0 systemd-timesyn
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2585]   108  2585     1301      885       6       3        0          -900 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2586]     0  2586    22054     3277      20       3        0             0 NetworkManager
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2587]     0  2587     1521      972       7       3        0             0 systemd-logind
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2589]    88  2589     1158      627       6       3        0             0 nullmailer-send
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2612]     0  2612     1510      460       5       3        0             0 fcron
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2665]     0  2665      768      580       5       3        0             0 dhcpcd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2668]     0  2668      639      408       5       3        0             0 vnstatd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2669]     0  2669     1460     1063       6       3        0         -1000 sshd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2670]     0  2670     1235      838       6       3        0             0 login
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2672]     0  2672     1972     1267       7       3        0             0 systemd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2700]     0  2700     2279      586       7       3        0             0 (sd-pam)
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2733]     0  2733     1836      890       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2753]   109  2753    16724     3089      19       3        0             0 polkitd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2776]     0  2776     2153     1349       7       3        0             0 wpa_supplicant
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2941]     0  2941    16268    15095      36       3        0             0 emerge
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2942]     0  2942     1235      833       5       3        0             0 login
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2949]  1000  2949     2033     1378       7       3        0             0 systemd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2973]  1000  2973     2279      589       7       3        0             0 (sd-pam)
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2989]  1000  2989     1836      907       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2997]  1000  2997    25339     2169      17       3        0             0 pulseaudio
Dec 16 18:56:32 boerne.fritz.box kernel: [ 3000]   111  3000     5763      655       9       3        0             0 rtkit-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 3019]  1000  3019     3575     1403      11       3        0             0 gconf-helper
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5626]  1000  5626     1743      709       8       3        0             0 startx
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5647]  1000  5647     1001      579       6       3        0             0 xinit
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5648]  1000  5648    22392     7078      41       3        0             0 X
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5674]  1000  5674    10584     4543      21       3        0             0 awesome
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5718]  1000  5718     1571      610       7       3        0             0 dbus-launch
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5720]  1000  5720     1238      645       6       3        0             0 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5725]  1000  5725     1571      634       7       3        0             0 dbus-launch
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5726]  1000  5726     1238      649       6       3        0             0 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5823]  1000  5823    35683     8366      42       3        0             0 nm-applet
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5825]  1000  5825    21454     7358      31       3        0             0 xfce4-terminal
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5827]  1000  5827    11257     1911      14       3        0             0 at-spi-bus-laun
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5832]  1000  5832     1238      831       6       3        0             0 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5838]  1000  5838     7480     2110      12       3        0             0 at-spi2-registr
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5840]  1000  5840    10179     1459      13       3        0             0 gvfsd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 6181]  1000  6181     1836      883       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [ 7874]  1000  7874     2246     1185       8       3        0             0 ssh
Dec 16 18:56:32 boerne.fritz.box kernel: [13020]   250 13020      549      377       4       3        0             0 sandbox
Dec 16 18:56:32 boerne.fritz.box kernel: [13022]   250 13022     2629     1567       8       3        0             0 ebuild.sh
Dec 16 18:56:32 boerne.fritz.box kernel: [13040]  1000 13040     1836      933       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [13048]   250 13048     3002     1718       8       3        0             0 ebuild.sh
Dec 16 18:56:32 boerne.fritz.box kernel: [13052]   250 13052     1122      732       5       3        0             0 emake
Dec 16 18:56:32 boerne.fritz.box kernel: [13054]   250 13054      921      697       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13118]   250 13118     1048      783       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13181]   250 13181     1043      789       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13208]   250 13208     1095      855       6       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13255]   250 13255      772      555       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13299]   250 13299      913      689       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13493]   250 13493      876      619       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13494]   250 13494    15321    14729      34       3        0             0 python
Dec 16 18:56:32 boerne.fritz.box kernel: [13532]   250 13532      808      594       4       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13593]  1000 13593     1533      624       7       3        0             0 tar
Dec 16 18:56:32 boerne.fritz.box kernel: [13594]  1000 13594    17834    16906      38       3        0             0 xz
Dec 16 18:56:32 boerne.fritz.box kernel: [13604]   250 13604    12599    12029      28       3        0             0 python
Dec 16 18:56:32 boerne.fritz.box kernel: [13658]   250 13658     1549     1104       6       3        0             0 python
Dec 16 18:56:32 boerne.fritz.box kernel: Out of memory: Kill process 13594 (xz) score 8 or sacrifice child
Dec 16 18:56:32 boerne.fritz.box kernel: Killed process 13594 (xz) total-vm:71336kB, anon-rss:65668kB, file-rss:1956kB, shmem-rss:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: xfce4-terminal invoked oom-killer: gfp_mask=0x25000c0(GFP_KERNEL_ACCOUNT), nodemask=0, order=0, oom_score_adj=0
Dec 16 18:56:32 boerne.fritz.box kernel: xfce4-terminal cpuset=/ mems_allowed=0
Dec 16 18:56:32 boerne.fritz.box kernel: CPU: 1 PID: 5825 Comm: xfce4-terminal Not tainted 4.9.0-gentoo #3
Dec 16 18:56:32 boerne.fritz.box kernel: Hardware name: TOSHIBA Satellite L500/KSWAA, BIOS V1.80 10/28/2009
Dec 16 18:56:32 boerne.fritz.box kernel:  c6941c18 c1433406 c6941d48 ef25ef00 c6941c48 c1170011 c6941c9c 00200286
Dec 16 18:56:32 boerne.fritz.box kernel:  c6941c48 c1438fff c6941c4c ef267c80 ef233a00 ef25ef00 c1ad1899 c6941d48
Dec 16 18:56:32 boerne.fritz.box kernel:  c6941c8c c1114407 c10513a5 c6941c78 c11140a1 00000006 00000000 00000000
Dec 16 18:56:32 boerne.fritz.box kernel: Call Trace:
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1433406>] dump_stack+0x47/0x61
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1170011>] dump_header+0x5f/0x175
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1438fff>] ? ___ratelimit+0x7f/0xe0
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1114407>] oom_kill_process+0x207/0x3c0
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c10513a5>] ? has_capability_noaudit+0x15/0x20
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c11140a1>] ? oom_badness.part.13+0xb1/0x120
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c11148c4>] out_of_memory+0xd4/0x270
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1118615>] __alloc_pages_nodemask+0xcf5/0xd60
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1758900>] ? skb_queue_purge+0x30/0x30
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c175dcde>] alloc_skb_with_frags+0xee/0x1a0
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1753dba>] sock_alloc_send_pskb+0x19a/0x1c0
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1186120>] ? poll_select_copy_remaining+0x120/0x120
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1825880>] ? wait_for_unix_gc+0x20/0x90
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1823fc0>] unix_stream_sendmsg+0x2a0/0x350
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1750b3d>] sock_sendmsg+0x2d/0x40
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1750bb7>] sock_write_iter+0x67/0xc0
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1172c42>] do_readv_writev+0x1e2/0x380
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1750b50>] ? sock_sendmsg+0x40/0x40
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1033763>] ? lapic_next_event+0x13/0x20
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c10ae675>] ? clockevents_program_event+0x95/0x190
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c10a074a>] ? __hrtimer_run_queues+0x20a/0x280
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1173d16>] vfs_writev+0x36/0x60
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1173d85>] do_writev+0x45/0xc0
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c1173efb>] SyS_writev+0x1b/0x20
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c10018ec>] do_fast_syscall_32+0x7c/0x130
Dec 16 18:56:32 boerne.fritz.box kernel:  [<c194232b>] sysenter_past_esp+0x40/0x6a
Dec 16 18:56:32 boerne.fritz.box kernel: Mem-Info:
Dec 16 18:56:32 boerne.fritz.box kernel: active_anon:56747 inactive_anon:7267 isolated_anon:0
                                          active_file:297677 inactive_file:387697 isolated_file:0
                                          unevictable:0 dirty:151 writeback:18 unstable:0
                                          slab_reclaimable:54648 slab_unreclaimable:21983
                                          mapped:17769 shmem:8215 pagetables:637 bounce:8
                                          free:157498 free_pcp:299 free_cma:0
Dec 16 18:56:32 boerne.fritz.box kernel: Node 0 active_anon:226988kB inactive_anon:29068kB active_file:1190708kB inactive_file:1550788kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:71076kB dirty:604kB writeback:72kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 47104kB anon_thp: 32860kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Dec 16 18:56:32 boerne.fritz.box kernel: DMA free:4020kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:4804kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:5356kB slab_unreclaimable:1572kB kernel_stack:32kB pagetables:84kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: lowmem_reserve[]: 0 808 3849 3849
Dec 16 18:56:32 boerne.fritz.box kernel: Normal free:40988kB min:41100kB low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB active_file:472436kB inactive_file:144kB unevictable:0kB writepending:312kB present:897016kB managed:831480kB mlocked:0kB slab_reclaimable:213236kB slab_unreclaimable:86360kB kernel_stack:1584kB pagetables:2464kB bounce:32kB free_pcp:116kB local_pcp:0kB free_cma:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 24330 24330
Dec 16 18:56:32 boerne.fritz.box kernel: HighMem free:584984kB min:512kB low:39184kB high:77856kB active_anon:226988kB inactive_anon:29068kB active_file:713448kB inactive_file:1550556kB unevictable:0kB writepending:224kB present:3114256kB managed:3114256kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1080kB local_pcp:400kB free_cma:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: lowmem_reserve[]: 0 0 0 0
Dec 16 18:56:32 boerne.fritz.box kernel: DMA: 13*4kB (UE) 2*8kB (U) 1*16kB (E) 1*32kB (U) 1*64kB (U) 0*128kB 1*256kB (E) 1*512kB (E) 1*1024kB (U) 1*2048kB (M) 0*4096kB = 4020kB
Dec 16 18:56:32 boerne.fritz.box kernel: Normal: 36*4kB (ME) 24*8kB (ME) 16*16kB (ME) 138*32kB (UME) 143*64kB (UME) 82*128kB (UM) 18*256kB (UM) 3*512kB (UME) 2*1024kB (ME) 2*2048kB (ME) 1*4096kB (M) = 41040kB
Dec 16 18:56:32 boerne.fritz.box kernel: HighMem: 3430*4kB (UME) 1795*8kB (UME) 750*16kB (UM) 401*32kB (UM) 148*64kB (UME) 56*128kB (UM) 28*256kB (UME) 19*512kB (UME) 9*1024kB (UME) 55*2048kB (UME) 92*4096kB (UME) = 585136kB
Dec 16 18:56:32 boerne.fritz.box kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 16 18:56:32 boerne.fritz.box kernel: 693648 total pagecache pages
Dec 16 18:56:32 boerne.fritz.box kernel: 0 pages in swap cache
Dec 16 18:56:32 boerne.fritz.box kernel: Swap cache stats: add 0, delete 0, find 0/0
Dec 16 18:56:32 boerne.fritz.box kernel: Free swap  = 3781628kB
Dec 16 18:56:32 boerne.fritz.box kernel: Total swap = 3781628kB
Dec 16 18:56:32 boerne.fritz.box kernel: 1006816 pages RAM
Dec 16 18:56:32 boerne.fritz.box kernel: 778564 pages HighMem/MovableOnly
Dec 16 18:56:32 boerne.fritz.box kernel: 16403 pages reserved
Dec 16 18:56:32 boerne.fritz.box kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Dec 16 18:56:32 boerne.fritz.box kernel: [ 1874]     0  1874     6166     1011       9       3        0             0 systemd-journal
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2497]     0  2497     2965      911       8       3        0         -1000 systemd-udevd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2582]   107  2582     3874      958       8       3        0             0 systemd-timesyn
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2585]   108  2585     1301      885       6       3        0          -900 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2586]     0  2586    22054     3277      20       3        0             0 NetworkManager
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2587]     0  2587     1521      972       7       3        0             0 systemd-logind
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2589]    88  2589     1158      627       6       3        0             0 nullmailer-send
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2612]     0  2612     1510      460       5       3        0             0 fcron
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2665]     0  2665      768      580       5       3        0             0 dhcpcd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2668]     0  2668      639      408       5       3        0             0 vnstatd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2669]     0  2669     1460     1063       6       3        0         -1000 sshd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2670]     0  2670     1235      838       6       3        0             0 login
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2672]     0  2672     1972     1267       7       3        0             0 systemd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2700]     0  2700     2279      586       7       3        0             0 (sd-pam)
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2733]     0  2733     1836      890       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2753]   109  2753    16724     3089      19       3        0             0 polkitd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2776]     0  2776     2153     1349       7       3        0             0 wpa_supplicant
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2941]     0  2941    16268    15095      36       3        0             0 emerge
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2942]     0  2942     1235      833       5       3        0             0 login
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2949]  1000  2949     2033     1378       7       3        0             0 systemd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2973]  1000  2973     2279      589       7       3        0             0 (sd-pam)
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2989]  1000  2989     1836      907       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [ 2997]  1000  2997    25339     2169      17       3        0             0 pulseaudio
Dec 16 18:56:32 boerne.fritz.box kernel: [ 3000]   111  3000     5763      655       9       3        0             0 rtkit-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 3019]  1000  3019     3575     1403      11       3        0             0 gconf-helper
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5626]  1000  5626     1743      709       8       3        0             0 startx
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5647]  1000  5647     1001      579       6       3        0             0 xinit
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5648]  1000  5648    22392     7078      41       3        0             0 X
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5674]  1000  5674    10584     4543      21       3        0             0 awesome
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5718]  1000  5718     1571      610       7       3        0             0 dbus-launch
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5720]  1000  5720     1238      645       6       3        0             0 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5725]  1000  5725     1571      634       7       3        0             0 dbus-launch
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5726]  1000  5726     1238      649       6       3        0             0 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5823]  1000  5823    35683     8366      42       3        0             0 nm-applet
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5825]  1000  5825    21454     7358      31       3        0             0 xfce4-terminal
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5827]  1000  5827    11257     1911      14       3        0             0 at-spi-bus-laun
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5832]  1000  5832     1238      831       6       3        0             0 dbus-daemon
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5838]  1000  5838     7480     2110      12       3        0             0 at-spi2-registr
Dec 16 18:56:32 boerne.fritz.box kernel: [ 5840]  1000  5840    10179     1459      13       3        0             0 gvfsd
Dec 16 18:56:32 boerne.fritz.box kernel: [ 6181]  1000  6181     1836      883       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [ 7874]  1000  7874     2246     1185       8       3        0             0 ssh
Dec 16 18:56:32 boerne.fritz.box kernel: [13020]   250 13020      549      377       4       3        0             0 sandbox
Dec 16 18:56:32 boerne.fritz.box kernel: [13022]   250 13022     2629     1567       8       3        0             0 ebuild.sh
Dec 16 18:56:32 boerne.fritz.box kernel: [13040]  1000 13040     1836      933       7       3        0             0 bash
Dec 16 18:56:32 boerne.fritz.box kernel: [13048]   250 13048     3002     1718       8       3        0             0 ebuild.sh
Dec 16 18:56:32 boerne.fritz.box kernel: [13052]   250 13052     1122      732       5       3        0             0 emake
Dec 16 18:56:32 boerne.fritz.box kernel: [13054]   250 13054      921      697       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13118]   250 13118     1048      783       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13181]   250 13181     1043      789       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13208]   250 13208     1095      855       6       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13255]   250 13255      772      555       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13299]   250 13299      913      689       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13493]   250 13493      876      619       5       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13494]   250 13494    15321    14775      34       3        0             0 python
Dec 16 18:56:32 boerne.fritz.box kernel: [13532]   250 13532      808      594       4       3        0             0 make
Dec 16 18:56:32 boerne.fritz.box kernel: [13593]  1000 13593     1533      643       7       3        0             0 tar
Dec 16 18:56:32 boerne.fritz.box kernel: [13604]   250 13604    12760    12198      28       3        0             0 python
Dec 16 18:56:32 boerne.fritz.box kernel: [13658]   250 13658     1687     1280       6       3        0             0 python
Dec 16 18:56:32 boerne.fritz.box kernel: Out of memory: Kill process 13494 (python) score 7 or sacrifice child
Dec 16 18:56:32 boerne.fritz.box kernel: Killed process 13494 (python) total-vm:61284kB, anon-rss:54128kB, file-rss:4972kB, shmem-rss:0kB
Dec 16 18:56:32 boerne.fritz.box kernel: oom_reaper: reaped process 13494 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Greetings
Nils

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: OOM: Better, but still there on 4.9
From: Chris Mason @ 2016-12-16 19:50 UTC (permalink / raw)
  To: Michal Hocko, Nils Holland
  Cc: linux-kernel, linux-mm, David Sterba, linux-btrfs
In-Reply-To: <20161216073941.GA26976@dhcp22.suse.cz>

On 12/16/2016 02:39 AM, Michal Hocko wrote:
> [CC linux-mm and btrfs guys]
>
> On Thu 15-12-16 23:57:04, Nils Holland wrote:
> [...]
>> Of course, none of this are workloads that are new / special in any
>> way - prior to 4.8, I never experienced any issues doing the exact
>> same things.
>>
>> Dec 15 19:02:16 teela kernel: kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
>> Dec 15 19:02:18 teela kernel: kworker/u4:5 cpuset=/ mems_allowed=0
>> Dec 15 19:02:18 teela kernel: CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
>> Dec 15 19:02:18 teela kernel: Hardware name: Hewlett-Packard Compaq 15 Notebook PC/21F7, BIOS F.22 08/06/2014
>> Dec 15 19:02:18 teela kernel: Workqueue: writeback wb_workfn (flush-btrfs-1)
>> Dec 15 19:02:18 teela kernel:  eff0b604 c142bcce eff0b734 00000000 eff0b634 c1163332 00000000 00000292
>> Dec 15 19:02:18 teela kernel:  eff0b634 c1431876 eff0b638 e7fb0b00 e7fa2900 e7fa2900 c1b58785 eff0b734
>> Dec 15 19:02:18 teela kernel:  eff0b678 c110795f c1043895 eff0b664 c11075c7 00000007 00000000 00000000
>> Dec 15 19:02:18 teela kernel: Call Trace:
>> Dec 15 19:02:18 teela kernel:  [<c142bcce>] dump_stack+0x47/0x69
>> Dec 15 19:02:18 teela kernel:  [<c1163332>] dump_header+0x60/0x178
>> Dec 15 19:02:18 teela kernel:  [<c1431876>] ? ___ratelimit+0x86/0xe0
>> Dec 15 19:02:18 teela kernel:  [<c110795f>] oom_kill_process+0x20f/0x3d0
>> Dec 15 19:02:18 teela kernel:  [<c1043895>] ? has_capability_noaudit+0x15/0x20
>> Dec 15 19:02:18 teela kernel:  [<c11075c7>] ? oom_badness.part.13+0xb7/0x130
>> Dec 15 19:02:18 teela kernel:  [<c1107df9>] out_of_memory+0xd9/0x260
>> Dec 15 19:02:18 teela kernel:  [<c110ba0b>] __alloc_pages_nodemask+0xbfb/0xc80
>> Dec 15 19:02:18 teela kernel:  [<c110414d>] pagecache_get_page+0xad/0x270
>> Dec 15 19:02:18 teela kernel:  [<c13664a6>] alloc_extent_buffer+0x116/0x3e0
>> Dec 15 19:02:18 teela kernel:  [<c1334a2e>] btrfs_find_create_tree_block+0xe/0x10
>> Dec 15 19:02:18 teela kernel:  [<c132a57f>] btrfs_alloc_tree_block+0x1ef/0x5f0
>> Dec 15 19:02:18 teela kernel:  [<c130f7c3>] __btrfs_cow_block+0x143/0x5f0
>> Dec 15 19:02:18 teela kernel:  [<c130fe1a>] btrfs_cow_block+0x13a/0x220
>> Dec 15 19:02:18 teela kernel:  [<c13132f1>] btrfs_search_slot+0x1d1/0x870
>> Dec 15 19:02:18 teela kernel:  [<c132fcdd>] btrfs_lookup_file_extent+0x4d/0x60
>> Dec 15 19:02:18 teela kernel:  [<c1354fe6>] __btrfs_drop_extents+0x176/0x1070
>> Dec 15 19:02:18 teela kernel:  [<c1150377>] ? kmem_cache_alloc+0xb7/0x190
>> Dec 15 19:02:18 teela kernel:  [<c133dbb5>] ? start_transaction+0x65/0x4b0
>> Dec 15 19:02:18 teela kernel:  [<c1150597>] ? __kmalloc+0x147/0x1e0
>> Dec 15 19:02:18 teela kernel:  [<c1345005>] cow_file_range_inline+0x215/0x6b0
>> Dec 15 19:02:18 teela kernel:  [<c13459fc>] cow_file_range.isra.49+0x55c/0x6d0
>> Dec 15 19:02:18 teela kernel:  [<c1361795>] ? lock_extent_bits+0x75/0x1e0
>> Dec 15 19:02:18 teela kernel:  [<c1346d51>] run_delalloc_range+0x441/0x470
>> Dec 15 19:02:18 teela kernel:  [<c13626e4>] writepage_delalloc.isra.47+0x144/0x1e0
>> Dec 15 19:02:18 teela kernel:  [<c1364548>] __extent_writepage+0xd8/0x2b0
>> Dec 15 19:02:18 teela kernel:  [<c1365c4c>] extent_writepages+0x25c/0x380
>> Dec 15 19:02:18 teela kernel:  [<c1342cd0>] ? btrfs_real_readdir+0x610/0x610
>> Dec 15 19:02:18 teela kernel:  [<c133ff0f>] btrfs_writepages+0x1f/0x30
>> Dec 15 19:02:18 teela kernel:  [<c110ff85>] do_writepages+0x15/0x40
>> Dec 15 19:02:18 teela kernel:  [<c1190a95>] __writeback_single_inode+0x35/0x2f0
>> Dec 15 19:02:18 teela kernel:  [<c119112e>] writeback_sb_inodes+0x16e/0x340
>> Dec 15 19:02:18 teela kernel:  [<c119145a>] wb_writeback+0xaa/0x280
>> Dec 15 19:02:18 teela kernel:  [<c1191de8>] wb_workfn+0xd8/0x3e0
>> Dec 15 19:02:18 teela kernel:  [<c104fd34>] process_one_work+0x114/0x3e0
>> Dec 15 19:02:18 teela kernel:  [<c1050b4f>] worker_thread+0x2f/0x4b0
>> Dec 15 19:02:18 teela kernel:  [<c1050b20>] ? create_worker+0x180/0x180
>> Dec 15 19:02:18 teela kernel:  [<c10552e7>] kthread+0x97/0xb0
>> Dec 15 19:02:18 teela kernel:  [<c1055250>] ? __kthread_parkme+0x60/0x60
>> Dec 15 19:02:18 teela kernel:  [<c19b5cb7>] ret_from_fork+0x1b/0x28
>> Dec 15 19:02:18 teela kernel: Mem-Info:
>> Dec 15 19:02:18 teela kernel: active_anon:58685 inactive_anon:90 isolated_anon:0
>>                                active_file:274324 inactive_file:281962 isolated_file:0
>
> OK, so there is still some anonymous memory that could be swapped out
> and quite a lot of page cache. This might be harder to reclaim because
> the allocation is a GFP_NOFS request which is limited in its reclaim
> capabilities. It might be possible that those pagecache pages are pinned
> in some way by the the filesystem.

Reading harder, its possible those pagecache pages are all from the 
btree inode.  They shouldn't be pinned by btrfs, kswapd should be able 
to wander in and free a good chunk.  What btrfs wants to happen is for 
this allocation to sit and wait for kswapd to make progress.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 5/9 v2] xfs: use memalloc_nofs_{save,restore} instead of memalloc_noio*
From: Michal Hocko @ 2016-12-16 22:00 UTC (permalink / raw)
  To: Brian Foster
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Dave Chinner,
	Theodore Ts'o, Chris Mason, David Sterba, Jan Kara,
	ceph-devel, cluster-devel, linux-nfs, logfs, linux-xfs,
	linux-ext4, linux-btrfs, linux-mtd, reiserfs-devel,
	linux-ntfs-dev, linux-f2fs-devel, linux-afs, LKML
In-Reply-To: <20161216163811.GG8447@bfoster.bfoster>

On Fri 16-12-16 11:38:11, Brian Foster wrote:
> On Thu, Dec 15, 2016 at 03:07:11PM +0100, Michal Hocko wrote:
[...]
> > @@ -459,7 +459,7 @@ _xfs_buf_map_pages(
> >  				break;
> >  			vm_unmap_aliases();
> >  		} while (retried++ <= 1);
> > -		memalloc_noio_restore(noio_flag);
> > +		memalloc_noio_restore(nofs_flag);
> 
> memalloc_nofs_restore() ?

Ups, you are right of course. Fixed.
---

^ permalink raw reply

* Re: [PATCH 2/9 v2] xfs: introduce and use KM_NOLOCKDEP to silence reclaim lockdep false positives
From: Michal Hocko @ 2016-12-16 22:01 UTC (permalink / raw)
  To: Brian Foster
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Dave Chinner,
	Theodore Ts'o, Chris Mason, David Sterba, Jan Kara,
	ceph-devel, cluster-devel, linux-nfs, logfs, linux-xfs,
	linux-ext4, linux-btrfs, linux-mtd, reiserfs-devel,
	linux-ntfs-dev, linux-f2fs-devel, linux-afs, LKML
In-Reply-To: <20161216163749.GE8447@bfoster.bfoster>

On Fri 16-12-16 11:37:50, Brian Foster wrote:
> On Fri, Dec 16, 2016 at 04:40:41PM +0100, Michal Hocko wrote:
> > Updated patch after Mike noticed a BUG_ON when KM_NOLOCKDEP is used.
> > ---
> > From 1497e713e11639157aef21cae29052cb3dc7ab44 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Thu, 15 Dec 2016 13:06:43 +0100
> > Subject: [PATCH] xfs: introduce and use KM_NOLOCKDEP to silence reclaim
> >  lockdep false positives
> > 
> > Now that the page allocator offers __GFP_NOLOCKDEP let's introduce
> > KM_NOLOCKDEP alias for the xfs allocation APIs. While we are at it
> > also change KM_NOFS users introduced by b17cb364dbbb ("xfs: fix missing
> > KM_NOFS tags to keep lockdep happy") and use the new flag for them
> > instead. There is really no reason to make these allocations contexts
> > weaker just because of the lockdep which even might not be enabled
> > in most cases.
> > 
> 
> Hi Michal,
> 
> I haven't gone back to fully grok b17cb364dbbb ("xfs: fix missing
> KM_NOFS tags to keep lockdep happy"), so I'm not really familiar with
> the original problem. FWIW, there was another KM_NOFS instance added by
> that commit in xlog_cil_prepare_log_vecs() that is now in
> xlog_cil_alloc_shadow_bufs(). Perhaps Dave can confirm whether the
> original issue still applies..?

Yes, I've noticed that but the reworked code looked sufficiently
different that I didn't dare to simply convert it.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox