From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f41.google.com (mail-pa0-f41.google.com [209.85.220.41]) by kanga.kvack.org (Postfix) with ESMTP id C2109800CA for ; Thu, 6 Nov 2014 23:27:56 -0500 (EST) Received: by mail-pa0-f41.google.com with SMTP id rd3so2747480pab.0 for ; Thu, 06 Nov 2014 20:27:56 -0800 (PST) Received: from userp1040.oracle.com (userp1040.oracle.com. [156.151.31.81]) by mx.google.com with ESMTPS id nz9si7889316pbb.86.2014.11.06.20.27.54 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 06 Nov 2014 20:27:55 -0800 (PST) Message-ID: <545C4A36.9050702@oracle.com> Date: Thu, 06 Nov 2014 23:27:34 -0500 From: Sasha Levin MIME-Version: 1.0 Subject: mm: shmem: freeing mlocked page Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins , Andrew Morton Cc: "linux-mm@kvack.org" , LKML , Dave Jones Hi all, While fuzzing with trinity inside a KVM tools guest running the latest -next kernel, I've stumbled on the following spew: [ 1441.564471] BUG: Bad page state in process trinity-c612 pfn:12593a [ 1441.564476] page:ffffea0006e175c0 count:0 mapcount:0 mapping: (null) index: 0x49 [ 1441.564488] flags: 0xafffff8028000c(referenced|uptodate|swapbacked|mlocked) [ 1441.564491] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set [ 1441.564493] bad because of flags: [ 1441.564498] flags: 0x200000(mlocked) [ 1441.564503] Modules linked in: [ 1441.564511] CPU: 2 PID: 11657 Comm: trinity-c612 Not tainted 3.18.0-rc3-next-20141106-sasha-00054-g09b7ccf-dirty #1447 [ 1441.564519] 0000000000000000 0000000000000000 1ffffffff3b44e48 ffff8805c969b868 [ 1441.564526] ffffffff9c085024 0000000000000000 ffffea0006e175c0 ffff8805c969b898 [ 1441.564532] ffffffff925fd0a1 ffffea0006e17628 dfffe90000000000 0000000000000000 [ 1441.564534] Call Trace: [ 1441.568496] dump_stack (lib/dump_stack.c:52) [ 1441.568516] bad_page (mm/page_alloc.c:338) [ 1441.568523] free_pages_prepare (mm/page_alloc.c:649 mm/page_alloc.c:755) [ 1441.568531] free_hot_cold_page (mm/page_alloc.c:1436) [ 1441.568541] free_hot_cold_page_list (mm/page_alloc.c:1482 (discriminator 3)) [ 1441.568555] release_pages (mm/swap.c:961) [ 1441.568566] __pagevec_release (include/linux/pagevec.h:44 mm/swap.c:978) [ 1441.568579] shmem_undo_range (include/linux/pagevec.h:69 mm/shmem.c:451) [ 1441.568591] shmem_truncate_range (mm/shmem.c:546) [ 1441.568599] shmem_fallocate (include/linux/spinlock.h:309 mm/shmem.c:2092) [ 1441.568612] ? __sb_start_write (fs/super.c:1208) [ 1441.568622] ? __sb_start_write (fs/super.c:1208) [ 1441.568633] do_fallocate (fs/open.c:297) [ 1441.568648] SyS_madvise (mm/madvise.c:332 mm/madvise.c:381 mm/madvise.c:531 mm/madvise.c:462) [ 1441.568660] ? syscall_trace_enter_phase1 (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1486) [ 1441.568672] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) I'm slightly confused here, because the page is mapcount==0, not LOCKED but still MLOCKED... Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f173.google.com (mail-ob0-f173.google.com [209.85.214.173]) by kanga.kvack.org (Postfix) with ESMTP id CE0E26B00CE for ; Fri, 14 Nov 2014 09:49:29 -0500 (EST) Received: by mail-ob0-f173.google.com with SMTP id uy5so146163obc.4 for ; Fri, 14 Nov 2014 06:49:29 -0800 (PST) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id l10si32559435oep.3.2014.11.14.06.49.27 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 14 Nov 2014 06:49:28 -0800 (PST) Message-ID: <5466142C.60100@oracle.com> Date: Fri, 14 Nov 2014 09:39:40 -0500 From: Sasha Levin MIME-Version: 1.0 Subject: Re: mm: shmem: freeing mlocked page References: <545C4A36.9050702@oracle.com> In-Reply-To: <545C4A36.9050702@oracle.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins , Andrew Morton Cc: "linux-mm@kvack.org" , LKML , Dave Jones On 11/06/2014 11:27 PM, Sasha Levin wrote: > Hi all, > > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel, I've stumbled on the following spew: > > [ 1441.564471] BUG: Bad page state in process trinity-c612 pfn:12593a > [ 1441.564476] page:ffffea0006e175c0 count:0 mapcount:0 mapping: (null) index: > 0x49 > [ 1441.564488] flags: 0xafffff8028000c(referenced|uptodate|swapbacked|mlocked) > [ 1441.564491] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > [ 1441.564493] bad because of flags: > [ 1441.564498] flags: 0x200000(mlocked) > [ 1441.564503] Modules linked in: > [ 1441.564511] CPU: 2 PID: 11657 Comm: trinity-c612 Not tainted 3.18.0-rc3-next-20141106-sasha-00054-g09b7ccf-dirty #1447 > [ 1441.564519] 0000000000000000 0000000000000000 1ffffffff3b44e48 ffff8805c969b868 > [ 1441.564526] ffffffff9c085024 0000000000000000 ffffea0006e175c0 ffff8805c969b898 > [ 1441.564532] ffffffff925fd0a1 ffffea0006e17628 dfffe90000000000 0000000000000000 > [ 1441.564534] Call Trace: > [ 1441.568496] dump_stack (lib/dump_stack.c:52) > [ 1441.568516] bad_page (mm/page_alloc.c:338) > [ 1441.568523] free_pages_prepare (mm/page_alloc.c:649 mm/page_alloc.c:755) > [ 1441.568531] free_hot_cold_page (mm/page_alloc.c:1436) > [ 1441.568541] free_hot_cold_page_list (mm/page_alloc.c:1482 (discriminator 3)) > [ 1441.568555] release_pages (mm/swap.c:961) > [ 1441.568566] __pagevec_release (include/linux/pagevec.h:44 mm/swap.c:978) > [ 1441.568579] shmem_undo_range (include/linux/pagevec.h:69 mm/shmem.c:451) > [ 1441.568591] shmem_truncate_range (mm/shmem.c:546) > [ 1441.568599] shmem_fallocate (include/linux/spinlock.h:309 mm/shmem.c:2092) > [ 1441.568612] ? __sb_start_write (fs/super.c:1208) > [ 1441.568622] ? __sb_start_write (fs/super.c:1208) > [ 1441.568633] do_fallocate (fs/open.c:297) > [ 1441.568648] SyS_madvise (mm/madvise.c:332 mm/madvise.c:381 mm/madvise.c:531 mm/madvise.c:462) > [ 1441.568660] ? syscall_trace_enter_phase1 (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1486) > [ 1441.568672] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) > > I'm slightly confused here, because the page is mapcount==0, not LOCKED but still MLOCKED... So I got this as well: [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70 [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked) [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set [ 1026.993479] bad because of flags: [ 1026.994125] flags: 0x200000(mlocked) [ 1026.994816] Modules linked in: [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 [ 1026.996123] FAULT_INJECTION: forcing a failure. [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 [ 1027.003359] Call Trace: [ 1027.003831] dump_stack (lib/dump_stack.c:52) [ 1027.004725] bad_page (mm/page_alloc.c:338) [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) [ 1027.007772] ? __page_cache_release (mm/swap.c:66) [ 1027.008815] put_page (mm/swap.c:270) [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) [ 1027.012856] ? pipe_lock (fs/pipe.c:69) [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) Which makes me suspect I blamed shmem for nothing. Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-f176.google.com (mail-ie0-f176.google.com [209.85.223.176]) by kanga.kvack.org (Postfix) with ESMTP id B297C6B006E for ; Tue, 18 Nov 2014 16:58:46 -0500 (EST) Received: by mail-ie0-f176.google.com with SMTP id ar1so2912752iec.7 for ; Tue, 18 Nov 2014 13:58:46 -0800 (PST) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTPS id p194si60295267ioe.16.2014.11.18.13.58.45 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Nov 2014 13:58:45 -0800 (PST) Date: Tue, 18 Nov 2014 13:58:43 -0800 From: Andrew Morton Subject: Re: mm: shmem: freeing mlocked page Message-Id: <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> In-Reply-To: <5466142C.60100@oracle.com> References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Sasha Levin Cc: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin wrote: > > [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70 > [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b > [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked) > [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > [ 1026.993479] bad because of flags: > [ 1026.994125] flags: 0x200000(mlocked) Gee that new page dumping code is nice! > [ 1026.994816] Modules linked in: > [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 > [ 1026.996123] FAULT_INJECTION: forcing a failure. > [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 > [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 > [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 > [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 > [ 1027.003359] Call Trace: > [ 1027.003831] dump_stack (lib/dump_stack.c:52) > [ 1027.004725] bad_page (mm/page_alloc.c:338) > [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) > [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) > [ 1027.007772] ? __page_cache_release (mm/swap.c:66) > [ 1027.008815] put_page (mm/swap.c:270) > [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) > [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) > [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) > [ 1027.012856] ? pipe_lock (fs/pipe.c:69) > [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) > [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) > [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) > [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) > [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) > [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) > So what happened here? Userspace fed some mlocked memory into splice() and then, while splice() was running, userspace dropped its reference to the memory, leaving splice() with the last reference. Yet somehow, that page was still marked as being mlocked. I wouldn't expect the kernel to permit userspace to drop its reference to the memory without first clearing the mlocked state. Is it possible to work out from trinity sources what the exact sequence was? Which syscalls are being used, for example? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f174.google.com (mail-yk0-f174.google.com [209.85.160.174]) by kanga.kvack.org (Postfix) with ESMTP id D974E6B0038 for ; Tue, 18 Nov 2014 22:50:49 -0500 (EST) Received: by mail-yk0-f174.google.com with SMTP id 10so2825517ykt.19 for ; Tue, 18 Nov 2014 19:50:49 -0800 (PST) Received: from userp1040.oracle.com (userp1040.oracle.com. [156.151.31.81]) by mx.google.com with ESMTPS id n45si343350yho.55.2014.11.18.19.50.48 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 18 Nov 2014 19:50:48 -0800 (PST) Message-ID: <546C1202.1020502@oracle.com> Date: Tue, 18 Nov 2014 22:44:02 -0500 From: Sasha Levin MIME-Version: 1.0 Subject: Re: mm: shmem: freeing mlocked page References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> In-Reply-To: <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe On 11/18/2014 04:58 PM, Andrew Morton wrote: > On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin wrote: > >> >> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70 >> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b >> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked) >> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >> [ 1026.993479] bad because of flags: >> [ 1026.994125] flags: 0x200000(mlocked) > > Gee that new page dumping code is nice! > >> [ 1026.994816] Modules linked in: >> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 >> [ 1026.996123] FAULT_INJECTION: forcing a failure. >> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 >> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 >> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 >> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 >> [ 1027.003359] Call Trace: >> [ 1027.003831] dump_stack (lib/dump_stack.c:52) >> [ 1027.004725] bad_page (mm/page_alloc.c:338) >> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) >> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) >> [ 1027.007772] ? __page_cache_release (mm/swap.c:66) >> [ 1027.008815] put_page (mm/swap.c:270) >> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) >> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) >> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69) >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) >> > > So what happened here? Userspace fed some mlocked memory into splice() > and then, while splice() was running, userspace dropped its reference > to the memory, leaving splice() with the last reference. Yet somehow, > that page was still marked as being mlocked. I wouldn't expect the > kernel to permit userspace to drop its reference to the memory without > first clearing the mlocked state. > > Is it possible to work out from trinity sources what the exact sequence > was? Which syscalls are being used, for example? Trinity can't really log anything because attempts to log syscalls slow everything down to a crawl to the point nothing reproduces. I've just looked at that trace above, and got a bit more confused. I didn't think that you can mlock page cache. How would a user do that exactly? Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f180.google.com (mail-ig0-f180.google.com [209.85.213.180]) by kanga.kvack.org (Postfix) with ESMTP id 2BA036B0038 for ; Tue, 18 Nov 2014 22:56:32 -0500 (EST) Received: by mail-ig0-f180.google.com with SMTP id h15so302370igd.7 for ; Tue, 18 Nov 2014 19:56:31 -0800 (PST) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org. [140.211.169.12]) by mx.google.com with ESMTPS id z7si298966igl.22.2014.11.18.19.56.30 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Nov 2014 19:56:31 -0800 (PST) Date: Tue, 18 Nov 2014 19:56:56 -0800 From: Andrew Morton Subject: Re: mm: shmem: freeing mlocked page Message-Id: <20141118195656.f80ff650.akpm@linux-foundation.org> In-Reply-To: <546C1202.1020502@oracle.com> References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> <546C1202.1020502@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Sasha Levin Cc: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe On Tue, 18 Nov 2014 22:44:02 -0500 Sasha Levin wrote: > On 11/18/2014 04:58 PM, Andrew Morton wrote: > > On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin wrote: > > > >> > >> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70 > >> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b > >> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked) > >> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > >> [ 1026.993479] bad because of flags: > >> [ 1026.994125] flags: 0x200000(mlocked) > > > > Gee that new page dumping code is nice! > > > >> [ 1026.994816] Modules linked in: > >> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 > >> [ 1026.996123] FAULT_INJECTION: forcing a failure. > >> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 > >> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 > >> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 > >> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 > >> [ 1027.003359] Call Trace: > >> [ 1027.003831] dump_stack (lib/dump_stack.c:52) > >> [ 1027.004725] bad_page (mm/page_alloc.c:338) > >> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) > >> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) > >> [ 1027.007772] ? __page_cache_release (mm/swap.c:66) > >> [ 1027.008815] put_page (mm/swap.c:270) > >> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) > >> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) > >> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) > >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69) > >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) > >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) > >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) > >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) > >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) > >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) > >> > > > > So what happened here? Userspace fed some mlocked memory into splice() > > and then, while splice() was running, userspace dropped its reference > > to the memory, leaving splice() with the last reference. Yet somehow, > > that page was still marked as being mlocked. I wouldn't expect the > > kernel to permit userspace to drop its reference to the memory without > > first clearing the mlocked state. > > > > Is it possible to work out from trinity sources what the exact sequence > > was? Which syscalls are being used, for example? > > Trinity can't really log anything because attempts to log syscalls slow everything > down to a crawl to the point nothing reproduces. Ah. I was thinking that it could be worked out by looking at the trinity source around where it calls splice(). But I suspect that doesn't make sense if trinity just creates a zillion threads each of which sprays semi-random syscalls at the kernel(?). > I've just looked at that trace above, and got a bit more confused. I didn't think > that you can mlock page cache. How would a user do that exactly? mmap it then mlock it! The kernel will fault everything in for you then pin it down. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f179.google.com (mail-ob0-f179.google.com [209.85.214.179]) by kanga.kvack.org (Postfix) with ESMTP id AA6BD6B0038 for ; Tue, 18 Nov 2014 23:19:30 -0500 (EST) Received: by mail-ob0-f179.google.com with SMTP id va2so3562368obc.24 for ; Tue, 18 Nov 2014 20:19:30 -0800 (PST) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id bq7si377971obb.79.2014.11.18.20.19.28 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 18 Nov 2014 20:19:29 -0800 (PST) Message-ID: <546C18C5.5090508@oracle.com> Date: Tue, 18 Nov 2014 23:12:53 -0500 From: Sasha Levin MIME-Version: 1.0 Subject: Re: mm: shmem: freeing mlocked page References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> <546C1202.1020502@oracle.com> <20141118195656.f80ff650.akpm@linux-foundation.org> In-Reply-To: <20141118195656.f80ff650.akpm@linux-foundation.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe On 11/18/2014 10:56 PM, Andrew Morton wrote: >> Trinity can't really log anything because attempts to log syscalls slow everything >> > down to a crawl to the point nothing reproduces. > Ah. I was thinking that it could be worked out by looking at the > trinity source around where it calls splice(). But I suspect that > doesn't make sense if trinity just creates a zillion threads each of > which sprays semi-random syscalls at the kernel(?). I think Dave would agree here that this is a rather accurate description of Trinity :) >> > I've just looked at that trace above, and got a bit more confused. I didn't think >> > that you can mlock page cache. How would a user do that exactly? > mmap it then mlock it! The kernel will fault everything in for you > then pin it down. But that's a pipe buffer, I didn't think userspace can mmap pipes? I have some reading to do. Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qc0-f170.google.com (mail-qc0-f170.google.com [209.85.216.170]) by kanga.kvack.org (Postfix) with ESMTP id D608B6B0038 for ; Tue, 18 Nov 2014 23:33:17 -0500 (EST) Received: by mail-qc0-f170.google.com with SMTP id x3so5181832qcv.15 for ; Tue, 18 Nov 2014 20:33:17 -0800 (PST) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id o105si877129qgd.39.2014.11.18.20.33.16 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Nov 2014 20:33:16 -0800 (PST) Date: Tue, 18 Nov 2014 22:56:10 -0500 From: Dave Jones Subject: Re: mm: shmem: freeing mlocked page Message-ID: <20141119035610.GA14468@redhat.com> References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> <546C1202.1020502@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <546C1202.1020502@oracle.com> Sender: owner-linux-mm@kvack.org List-ID: To: Sasha Levin Cc: Andrew Morton , Hugh Dickins , "linux-mm@kvack.org" , LKML , Jens Axboe On Tue, Nov 18, 2014 at 10:44:02PM -0500, Sasha Levin wrote: > On 11/18/2014 04:58 PM, Andrew Morton wrote: > >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69) > >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) > >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) > >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) > >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) > >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) > >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) > > > > So what happened here? Userspace fed some mlocked memory into splice() > > and then, while splice() was running, userspace dropped its reference > > to the memory, leaving splice() with the last reference. Yet somehow, > > that page was still marked as being mlocked. I wouldn't expect the > > kernel to permit userspace to drop its reference to the memory without > > first clearing the mlocked state. > > > > Is it possible to work out from trinity sources what the exact sequence > > was? Which syscalls are being used, for example? > > Trinity can't really log anything because attempts to log syscalls slow everything > down to a crawl to the point nothing reproduces. If the machine is still alive after /proc/sys/kernel/tainted changes, trinity will dump a trinity-post-mortem.log somewhere[*] that should contain the last two syscalls each process did. (Even if logging is disabled). It's not perfect however, and knowing that we passed a pointer to a syscall isn't always useful unless we also dump the data that pointer pointed at. It's a work in progress. I don't know if I'm going to get time to improve it any time soon though. Dave [*] wherever cwd happened to be when the main process exited. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f41.google.com (mail-wg0-f41.google.com [74.125.82.41]) by kanga.kvack.org (Postfix) with ESMTP id 104DB6B0038 for ; Wed, 19 Nov 2014 08:38:26 -0500 (EST) Received: by mail-wg0-f41.google.com with SMTP id y19so888610wgg.0 for ; Wed, 19 Nov 2014 05:38:25 -0800 (PST) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id pc1si2714950wjb.23.2014.11.19.05.38.23 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 19 Nov 2014 05:38:23 -0800 (PST) Message-ID: <546C9D4D.9090201@suse.cz> Date: Wed, 19 Nov 2014 14:38:21 +0100 From: Vlastimil Babka MIME-Version: 1.0 Subject: Re: mm: shmem: freeing mlocked page References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> In-Reply-To: <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , Sasha Levin Cc: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe On 11/18/2014 10:58 PM, Andrew Morton wrote: > On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin wrote: > >> >> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70 >> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b >> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked) >> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >> [ 1026.993479] bad because of flags: >> [ 1026.994125] flags: 0x200000(mlocked) > > Gee that new page dumping code is nice! > >> [ 1026.994816] Modules linked in: >> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 >> [ 1026.996123] FAULT_INJECTION: forcing a failure. >> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 >> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 >> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 >> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 >> [ 1027.003359] Call Trace: >> [ 1027.003831] dump_stack (lib/dump_stack.c:52) >> [ 1027.004725] bad_page (mm/page_alloc.c:338) >> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) >> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) >> [ 1027.007772] ? __page_cache_release (mm/swap.c:66) >> [ 1027.008815] put_page (mm/swap.c:270) >> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) >> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) >> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69) >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) >> > > So what happened here? Userspace fed some mlocked memory into splice() > and then, while splice() was running, userspace dropped its reference > to the memory, leaving splice() with the last reference. Yet somehow, > that page was still marked as being mlocked. I wouldn't expect the > kernel to permit userspace to drop its reference to the memory without > first clearing the mlocked state. I did check a bit and something caught my eye. Both page_remove_rmap() and page_remove_file_rmap() contain this: if (unlikely(PageMlocked(page))) clear_page_mlock(page); So could maybe something mlock the page between the check and clear? I find lru_cache_add_active_or_unevictable somewhat suspicious. But checking if these two could race will take some time. > Is it possible to work out from trinity sources what the exact sequence > was? Which syscalls are being used, for example? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f174.google.com (mail-ob0-f174.google.com [209.85.214.174]) by kanga.kvack.org (Postfix) with ESMTP id 98FC56B0038 for ; Tue, 9 Dec 2014 21:20:37 -0500 (EST) Received: by mail-ob0-f174.google.com with SMTP id nt9so1650753obb.19 for ; Tue, 09 Dec 2014 18:20:37 -0800 (PST) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id mu8si2003319obc.39.2014.12.09.18.20.35 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 09 Dec 2014 18:20:36 -0800 (PST) Message-ID: <5487ACC5.1010002@oracle.com> Date: Tue, 09 Dec 2014 21:15:33 -0500 From: Sasha Levin MIME-Version: 1.0 Subject: Re: mm: shmem: freeing mlocked page References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> In-Reply-To: <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe , Davidlohr Bueso , "Kirill A. Shutemov" , Peter Zijlstra , Rik van Riel , Mel Gorman On 11/18/2014 04:58 PM, Andrew Morton wrote: >> [ 1026.994816] Modules linked in: >> > [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 >> > [ 1026.996123] FAULT_INJECTION: forcing a failure. >> > [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 >> > [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 >> > [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 >> > [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 >> > [ 1027.003359] Call Trace: >> > [ 1027.003831] dump_stack (lib/dump_stack.c:52) >> > [ 1027.004725] bad_page (mm/page_alloc.c:338) >> > [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) >> > [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) >> > [ 1027.007772] ? __page_cache_release (mm/swap.c:66) >> > [ 1027.008815] put_page (mm/swap.c:270) >> > [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) >> > [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) >> > [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) >> > [ 1027.012856] ? pipe_lock (fs/pipe.c:69) >> > [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) >> > [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) >> > [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) >> > [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) >> > [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) >> > [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) >> > > So what happened here? Userspace fed some mlocked memory into splice() > and then, while splice() was running, userspace dropped its reference > to the memory, leaving splice() with the last reference. Yet somehow, > that page was still marked as being mlocked. I wouldn't expect the > kernel to permit userspace to drop its reference to the memory without > first clearing the mlocked state. > > Is it possible to work out from trinity sources what the exact sequence > was? Which syscalls are being used, for example? Phew, this took a long while but I've bisected it (with good confidence) down to: commit a38246260912ba4a0f8b563704a965a7a97cf3c3 Author: Davidlohr Bueso Date: Wed Dec 3 18:54:27 2014 +1100 mm/memory.c: share the i_mmap_rwsem The unmap_mapping_range family of functions do the unmapping of user pages (ultimately via zap_page_range_single) without touching the actual interval tree, thus share the lock. Signed-off-by: Davidlohr Bueso Cc: "Kirill A. Shutemov" Acked-by: Hugh Dickins Cc: Oleg Nesterov Cc: Peter Zijlstra (Intel) Cc: Rik van Riel Cc: Srikar Dronamraju Acked-by: Mel Gorman Signed-off-by: Andrew Morton Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f54.google.com (mail-oi0-f54.google.com [209.85.218.54]) by kanga.kvack.org (Postfix) with ESMTP id 668846B0038 for ; Tue, 9 Dec 2014 21:30:56 -0500 (EST) Received: by mail-oi0-f54.google.com with SMTP id u20so1374185oif.27 for ; Tue, 09 Dec 2014 18:30:56 -0800 (PST) Received: from userp1040.oracle.com (userp1040.oracle.com. [156.151.31.81]) by mx.google.com with ESMTPS id ts6si2015116obb.38.2014.12.09.18.30.52 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 09 Dec 2014 18:30:55 -0800 (PST) Message-ID: <5487AE8C.7000302@oracle.com> Date: Tue, 09 Dec 2014 21:23:08 -0500 From: Sasha Levin MIME-Version: 1.0 Subject: Re: mm: shmem: freeing mlocked page References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> <5487ACC5.1010002@oracle.com> In-Reply-To: <5487ACC5.1010002@oracle.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe , Peter Zijlstra , Rik van Riel , Mel Gorman , dbueso@suse.de, kirill@shutemov.name (Apologies for spam, I've Cc'ed a few outdated emails in the previous mail) On 12/09/2014 09:15 PM, Sasha Levin wrote: > On 11/18/2014 04:58 PM, Andrew Morton wrote: >>> [ 1026.994816] Modules linked in: >>>> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 >>>> [ 1026.996123] FAULT_INJECTION: forcing a failure. >>>> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 >>>> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 >>>> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 >>>> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 >>>> [ 1027.003359] Call Trace: >>>> [ 1027.003831] dump_stack (lib/dump_stack.c:52) >>>> [ 1027.004725] bad_page (mm/page_alloc.c:338) >>>> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) >>>> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) >>>> [ 1027.007772] ? __page_cache_release (mm/swap.c:66) >>>> [ 1027.008815] put_page (mm/swap.c:270) >>>> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) >>>> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) >>>> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) >>>> [ 1027.012856] ? pipe_lock (fs/pipe.c:69) >>>> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) >>>> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) >>>> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) >>>> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) >>>> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) >>>> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) >>>> >> So what happened here? Userspace fed some mlocked memory into splice() >> and then, while splice() was running, userspace dropped its reference >> to the memory, leaving splice() with the last reference. Yet somehow, >> that page was still marked as being mlocked. I wouldn't expect the >> kernel to permit userspace to drop its reference to the memory without >> first clearing the mlocked state. >> >> Is it possible to work out from trinity sources what the exact sequence >> was? Which syscalls are being used, for example? > > Phew, this took a long while but I've bisected it (with good confidence) down > to: > > commit a38246260912ba4a0f8b563704a965a7a97cf3c3 > Author: Davidlohr Bueso > Date: Wed Dec 3 18:54:27 2014 +1100 > > mm/memory.c: share the i_mmap_rwsem > > The unmap_mapping_range family of functions do the unmapping of user pages > (ultimately via zap_page_range_single) without touching the actual > interval tree, thus share the lock. > > Signed-off-by: Davidlohr Bueso > Cc: "Kirill A. Shutemov" > Acked-by: Hugh Dickins > Cc: Oleg Nesterov > Cc: Peter Zijlstra (Intel) > Cc: Rik van Riel > Cc: Srikar Dronamraju > Acked-by: Mel Gorman > Signed-off-by: Andrew Morton > > > Thanks, > Sasha > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751662AbaKGE1p (ORCPT ); Thu, 6 Nov 2014 23:27:45 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:46584 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751348AbaKGE1m (ORCPT ); Thu, 6 Nov 2014 23:27:42 -0500 Message-ID: <545C4A36.9050702@oracle.com> Date: Thu, 06 Nov 2014 23:27:34 -0500 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Hugh Dickins , Andrew Morton CC: "linux-mm@kvack.org" , LKML , Dave Jones Subject: mm: shmem: freeing mlocked page Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Source-IP: ucsinet22.oracle.com [156.151.31.94] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi all, While fuzzing with trinity inside a KVM tools guest running the latest -next kernel, I've stumbled on the following spew: [ 1441.564471] BUG: Bad page state in process trinity-c612 pfn:12593a [ 1441.564476] page:ffffea0006e175c0 count:0 mapcount:0 mapping: (null) index: 0x49 [ 1441.564488] flags: 0xafffff8028000c(referenced|uptodate|swapbacked|mlocked) [ 1441.564491] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set [ 1441.564493] bad because of flags: [ 1441.564498] flags: 0x200000(mlocked) [ 1441.564503] Modules linked in: [ 1441.564511] CPU: 2 PID: 11657 Comm: trinity-c612 Not tainted 3.18.0-rc3-next-20141106-sasha-00054-g09b7ccf-dirty #1447 [ 1441.564519] 0000000000000000 0000000000000000 1ffffffff3b44e48 ffff8805c969b868 [ 1441.564526] ffffffff9c085024 0000000000000000 ffffea0006e175c0 ffff8805c969b898 [ 1441.564532] ffffffff925fd0a1 ffffea0006e17628 dfffe90000000000 0000000000000000 [ 1441.564534] Call Trace: [ 1441.568496] dump_stack (lib/dump_stack.c:52) [ 1441.568516] bad_page (mm/page_alloc.c:338) [ 1441.568523] free_pages_prepare (mm/page_alloc.c:649 mm/page_alloc.c:755) [ 1441.568531] free_hot_cold_page (mm/page_alloc.c:1436) [ 1441.568541] free_hot_cold_page_list (mm/page_alloc.c:1482 (discriminator 3)) [ 1441.568555] release_pages (mm/swap.c:961) [ 1441.568566] __pagevec_release (include/linux/pagevec.h:44 mm/swap.c:978) [ 1441.568579] shmem_undo_range (include/linux/pagevec.h:69 mm/shmem.c:451) [ 1441.568591] shmem_truncate_range (mm/shmem.c:546) [ 1441.568599] shmem_fallocate (include/linux/spinlock.h:309 mm/shmem.c:2092) [ 1441.568612] ? __sb_start_write (fs/super.c:1208) [ 1441.568622] ? __sb_start_write (fs/super.c:1208) [ 1441.568633] do_fallocate (fs/open.c:297) [ 1441.568648] SyS_madvise (mm/madvise.c:332 mm/madvise.c:381 mm/madvise.c:531 mm/madvise.c:462) [ 1441.568660] ? syscall_trace_enter_phase1 (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1486) [ 1441.568672] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) I'm slightly confused here, because the page is mapcount==0, not LOCKED but still MLOCKED... Thanks, Sasha From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965348AbaKNOjt (ORCPT ); Fri, 14 Nov 2014 09:39:49 -0500 Received: from aserp1040.oracle.com ([141.146.126.69]:39127 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965136AbaKNOjs (ORCPT ); Fri, 14 Nov 2014 09:39:48 -0500 Message-ID: <5466142C.60100@oracle.com> Date: Fri, 14 Nov 2014 09:39:40 -0500 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Hugh Dickins , Andrew Morton CC: "linux-mm@kvack.org" , LKML , Dave Jones Subject: Re: mm: shmem: freeing mlocked page References: <545C4A36.9050702@oracle.com> In-Reply-To: <545C4A36.9050702@oracle.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Source-IP: ucsinet22.oracle.com [156.151.31.94] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/06/2014 11:27 PM, Sasha Levin wrote: > Hi all, > > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel, I've stumbled on the following spew: > > [ 1441.564471] BUG: Bad page state in process trinity-c612 pfn:12593a > [ 1441.564476] page:ffffea0006e175c0 count:0 mapcount:0 mapping: (null) index: > 0x49 > [ 1441.564488] flags: 0xafffff8028000c(referenced|uptodate|swapbacked|mlocked) > [ 1441.564491] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > [ 1441.564493] bad because of flags: > [ 1441.564498] flags: 0x200000(mlocked) > [ 1441.564503] Modules linked in: > [ 1441.564511] CPU: 2 PID: 11657 Comm: trinity-c612 Not tainted 3.18.0-rc3-next-20141106-sasha-00054-g09b7ccf-dirty #1447 > [ 1441.564519] 0000000000000000 0000000000000000 1ffffffff3b44e48 ffff8805c969b868 > [ 1441.564526] ffffffff9c085024 0000000000000000 ffffea0006e175c0 ffff8805c969b898 > [ 1441.564532] ffffffff925fd0a1 ffffea0006e17628 dfffe90000000000 0000000000000000 > [ 1441.564534] Call Trace: > [ 1441.568496] dump_stack (lib/dump_stack.c:52) > [ 1441.568516] bad_page (mm/page_alloc.c:338) > [ 1441.568523] free_pages_prepare (mm/page_alloc.c:649 mm/page_alloc.c:755) > [ 1441.568531] free_hot_cold_page (mm/page_alloc.c:1436) > [ 1441.568541] free_hot_cold_page_list (mm/page_alloc.c:1482 (discriminator 3)) > [ 1441.568555] release_pages (mm/swap.c:961) > [ 1441.568566] __pagevec_release (include/linux/pagevec.h:44 mm/swap.c:978) > [ 1441.568579] shmem_undo_range (include/linux/pagevec.h:69 mm/shmem.c:451) > [ 1441.568591] shmem_truncate_range (mm/shmem.c:546) > [ 1441.568599] shmem_fallocate (include/linux/spinlock.h:309 mm/shmem.c:2092) > [ 1441.568612] ? __sb_start_write (fs/super.c:1208) > [ 1441.568622] ? __sb_start_write (fs/super.c:1208) > [ 1441.568633] do_fallocate (fs/open.c:297) > [ 1441.568648] SyS_madvise (mm/madvise.c:332 mm/madvise.c:381 mm/madvise.c:531 mm/madvise.c:462) > [ 1441.568660] ? syscall_trace_enter_phase1 (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1486) > [ 1441.568672] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) > > I'm slightly confused here, because the page is mapcount==0, not LOCKED but still MLOCKED... So I got this as well: [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70 [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked) [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set [ 1026.993479] bad because of flags: [ 1026.994125] flags: 0x200000(mlocked) [ 1026.994816] Modules linked in: [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 [ 1026.996123] FAULT_INJECTION: forcing a failure. [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 [ 1027.003359] Call Trace: [ 1027.003831] dump_stack (lib/dump_stack.c:52) [ 1027.004725] bad_page (mm/page_alloc.c:338) [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) [ 1027.007772] ? __page_cache_release (mm/swap.c:66) [ 1027.008815] put_page (mm/swap.c:270) [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) [ 1027.012856] ? pipe_lock (fs/pipe.c:69) [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) Which makes me suspect I blamed shmem for nothing. Thanks, Sasha From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754270AbaKRV6q (ORCPT ); Tue, 18 Nov 2014 16:58:46 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:36912 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753300AbaKRV6p (ORCPT ); Tue, 18 Nov 2014 16:58:45 -0500 Date: Tue, 18 Nov 2014 13:58:43 -0800 From: Andrew Morton To: Sasha Levin Cc: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe Subject: Re: mm: shmem: freeing mlocked page Message-Id: <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> In-Reply-To: <5466142C.60100@oracle.com> References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> X-Mailer: Sylpheed 3.4.0beta7 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin wrote: > > [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70 > [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b > [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked) > [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > [ 1026.993479] bad because of flags: > [ 1026.994125] flags: 0x200000(mlocked) Gee that new page dumping code is nice! > [ 1026.994816] Modules linked in: > [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 > [ 1026.996123] FAULT_INJECTION: forcing a failure. > [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 > [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 > [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 > [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 > [ 1027.003359] Call Trace: > [ 1027.003831] dump_stack (lib/dump_stack.c:52) > [ 1027.004725] bad_page (mm/page_alloc.c:338) > [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) > [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) > [ 1027.007772] ? __page_cache_release (mm/swap.c:66) > [ 1027.008815] put_page (mm/swap.c:270) > [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) > [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) > [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) > [ 1027.012856] ? pipe_lock (fs/pipe.c:69) > [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) > [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) > [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) > [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) > [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) > [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) > So what happened here? Userspace fed some mlocked memory into splice() and then, while splice() was running, userspace dropped its reference to the memory, leaving splice() with the last reference. Yet somehow, that page was still marked as being mlocked. I wouldn't expect the kernel to permit userspace to drop its reference to the memory without first clearing the mlocked state. Is it possible to work out from trinity sources what the exact sequence was? Which syscalls are being used, for example? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754704AbaKSDoM (ORCPT ); Tue, 18 Nov 2014 22:44:12 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:39485 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754124AbaKSDoK (ORCPT ); Tue, 18 Nov 2014 22:44:10 -0500 Message-ID: <546C1202.1020502@oracle.com> Date: Tue, 18 Nov 2014 22:44:02 -0500 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Andrew Morton CC: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe Subject: Re: mm: shmem: freeing mlocked page References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> In-Reply-To: <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Source-IP: ucsinet22.oracle.com [156.151.31.94] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/18/2014 04:58 PM, Andrew Morton wrote: > On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin wrote: > >> >> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70 >> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b >> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked) >> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >> [ 1026.993479] bad because of flags: >> [ 1026.994125] flags: 0x200000(mlocked) > > Gee that new page dumping code is nice! > >> [ 1026.994816] Modules linked in: >> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 >> [ 1026.996123] FAULT_INJECTION: forcing a failure. >> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 >> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 >> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 >> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 >> [ 1027.003359] Call Trace: >> [ 1027.003831] dump_stack (lib/dump_stack.c:52) >> [ 1027.004725] bad_page (mm/page_alloc.c:338) >> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) >> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) >> [ 1027.007772] ? __page_cache_release (mm/swap.c:66) >> [ 1027.008815] put_page (mm/swap.c:270) >> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) >> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) >> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69) >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) >> > > So what happened here? Userspace fed some mlocked memory into splice() > and then, while splice() was running, userspace dropped its reference > to the memory, leaving splice() with the last reference. Yet somehow, > that page was still marked as being mlocked. I wouldn't expect the > kernel to permit userspace to drop its reference to the memory without > first clearing the mlocked state. > > Is it possible to work out from trinity sources what the exact sequence > was? Which syscalls are being used, for example? Trinity can't really log anything because attempts to log syscalls slow everything down to a crawl to the point nothing reproduces. I've just looked at that trace above, and got a bit more confused. I didn't think that you can mlock page cache. How would a user do that exactly? Thanks, Sasha From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754685AbaKSD41 (ORCPT ); Tue, 18 Nov 2014 22:56:27 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51319 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754023AbaKSD4Z (ORCPT ); Tue, 18 Nov 2014 22:56:25 -0500 Date: Tue, 18 Nov 2014 22:56:10 -0500 From: Dave Jones To: Sasha Levin Cc: Andrew Morton , Hugh Dickins , "linux-mm@kvack.org" , LKML , Jens Axboe Subject: Re: mm: shmem: freeing mlocked page Message-ID: <20141119035610.GA14468@redhat.com> Mail-Followup-To: Dave Jones , Sasha Levin , Andrew Morton , Hugh Dickins , "linux-mm@kvack.org" , LKML , Jens Axboe References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> <546C1202.1020502@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <546C1202.1020502@oracle.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 18, 2014 at 10:44:02PM -0500, Sasha Levin wrote: > On 11/18/2014 04:58 PM, Andrew Morton wrote: > >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69) > >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) > >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) > >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) > >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) > >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) > >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) > > > > So what happened here? Userspace fed some mlocked memory into splice() > > and then, while splice() was running, userspace dropped its reference > > to the memory, leaving splice() with the last reference. Yet somehow, > > that page was still marked as being mlocked. I wouldn't expect the > > kernel to permit userspace to drop its reference to the memory without > > first clearing the mlocked state. > > > > Is it possible to work out from trinity sources what the exact sequence > > was? Which syscalls are being used, for example? > > Trinity can't really log anything because attempts to log syscalls slow everything > down to a crawl to the point nothing reproduces. If the machine is still alive after /proc/sys/kernel/tainted changes, trinity will dump a trinity-post-mortem.log somewhere[*] that should contain the last two syscalls each process did. (Even if logging is disabled). It's not perfect however, and knowing that we passed a pointer to a syscall isn't always useful unless we also dump the data that pointer pointed at. It's a work in progress. I don't know if I'm going to get time to improve it any time soon though. Dave [*] wherever cwd happened to be when the main process exited. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754799AbaKSD4d (ORCPT ); Tue, 18 Nov 2014 22:56:33 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:40130 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754023AbaKSD4b (ORCPT ); Tue, 18 Nov 2014 22:56:31 -0500 Date: Tue, 18 Nov 2014 19:56:56 -0800 From: Andrew Morton To: Sasha Levin Cc: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe Subject: Re: mm: shmem: freeing mlocked page Message-Id: <20141118195656.f80ff650.akpm@linux-foundation.org> In-Reply-To: <546C1202.1020502@oracle.com> References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> <546C1202.1020502@oracle.com> X-Mailer: Sylpheed 2.7.1 (GTK+ 2.18.9; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 18 Nov 2014 22:44:02 -0500 Sasha Levin wrote: > On 11/18/2014 04:58 PM, Andrew Morton wrote: > > On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin wrote: > > > >> > >> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70 > >> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b > >> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked) > >> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > >> [ 1026.993479] bad because of flags: > >> [ 1026.994125] flags: 0x200000(mlocked) > > > > Gee that new page dumping code is nice! > > > >> [ 1026.994816] Modules linked in: > >> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 > >> [ 1026.996123] FAULT_INJECTION: forcing a failure. > >> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 > >> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 > >> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 > >> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 > >> [ 1027.003359] Call Trace: > >> [ 1027.003831] dump_stack (lib/dump_stack.c:52) > >> [ 1027.004725] bad_page (mm/page_alloc.c:338) > >> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) > >> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) > >> [ 1027.007772] ? __page_cache_release (mm/swap.c:66) > >> [ 1027.008815] put_page (mm/swap.c:270) > >> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) > >> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) > >> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) > >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69) > >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) > >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) > >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) > >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) > >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) > >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) > >> > > > > So what happened here? Userspace fed some mlocked memory into splice() > > and then, while splice() was running, userspace dropped its reference > > to the memory, leaving splice() with the last reference. Yet somehow, > > that page was still marked as being mlocked. I wouldn't expect the > > kernel to permit userspace to drop its reference to the memory without > > first clearing the mlocked state. > > > > Is it possible to work out from trinity sources what the exact sequence > > was? Which syscalls are being used, for example? > > Trinity can't really log anything because attempts to log syscalls slow everything > down to a crawl to the point nothing reproduces. Ah. I was thinking that it could be worked out by looking at the trinity source around where it calls splice(). But I suspect that doesn't make sense if trinity just creates a zillion threads each of which sprays semi-random syscalls at the kernel(?). > I've just looked at that trace above, and got a bit more confused. I didn't think > that you can mlock page cache. How would a user do that exactly? mmap it then mlock it! The kernel will fault everything in for you then pin it down. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754902AbaKSENJ (ORCPT ); Tue, 18 Nov 2014 23:13:09 -0500 Received: from aserp1040.oracle.com ([141.146.126.69]:41520 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753851AbaKSENH (ORCPT ); Tue, 18 Nov 2014 23:13:07 -0500 Message-ID: <546C18C5.5090508@oracle.com> Date: Tue, 18 Nov 2014 23:12:53 -0500 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Andrew Morton CC: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe Subject: Re: mm: shmem: freeing mlocked page References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> <546C1202.1020502@oracle.com> <20141118195656.f80ff650.akpm@linux-foundation.org> In-Reply-To: <20141118195656.f80ff650.akpm@linux-foundation.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Source-IP: ucsinet21.oracle.com [156.151.31.93] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/18/2014 10:56 PM, Andrew Morton wrote: >> Trinity can't really log anything because attempts to log syscalls slow everything >> > down to a crawl to the point nothing reproduces. > Ah. I was thinking that it could be worked out by looking at the > trinity source around where it calls splice(). But I suspect that > doesn't make sense if trinity just creates a zillion threads each of > which sprays semi-random syscalls at the kernel(?). I think Dave would agree here that this is a rather accurate description of Trinity :) >> > I've just looked at that trace above, and got a bit more confused. I didn't think >> > that you can mlock page cache. How would a user do that exactly? > mmap it then mlock it! The kernel will fault everything in for you > then pin it down. But that's a pipe buffer, I didn't think userspace can mmap pipes? I have some reading to do. Thanks, Sasha From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754986AbaKSNi0 (ORCPT ); Wed, 19 Nov 2014 08:38:26 -0500 Received: from cantor2.suse.de ([195.135.220.15]:55979 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751968AbaKSNiZ (ORCPT ); Wed, 19 Nov 2014 08:38:25 -0500 Message-ID: <546C9D4D.9090201@suse.cz> Date: Wed, 19 Nov 2014 14:38:21 +0100 From: Vlastimil Babka User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Andrew Morton , Sasha Levin CC: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe Subject: Re: mm: shmem: freeing mlocked page References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> In-Reply-To: <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/18/2014 10:58 PM, Andrew Morton wrote: > On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin wrote: > >> >> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70 >> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b >> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked) >> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >> [ 1026.993479] bad because of flags: >> [ 1026.994125] flags: 0x200000(mlocked) > > Gee that new page dumping code is nice! > >> [ 1026.994816] Modules linked in: >> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 >> [ 1026.996123] FAULT_INJECTION: forcing a failure. >> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 >> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 >> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 >> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 >> [ 1027.003359] Call Trace: >> [ 1027.003831] dump_stack (lib/dump_stack.c:52) >> [ 1027.004725] bad_page (mm/page_alloc.c:338) >> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) >> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) >> [ 1027.007772] ? __page_cache_release (mm/swap.c:66) >> [ 1027.008815] put_page (mm/swap.c:270) >> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) >> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) >> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69) >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) >> > > So what happened here? Userspace fed some mlocked memory into splice() > and then, while splice() was running, userspace dropped its reference > to the memory, leaving splice() with the last reference. Yet somehow, > that page was still marked as being mlocked. I wouldn't expect the > kernel to permit userspace to drop its reference to the memory without > first clearing the mlocked state. I did check a bit and something caught my eye. Both page_remove_rmap() and page_remove_file_rmap() contain this: if (unlikely(PageMlocked(page))) clear_page_mlock(page); So could maybe something mlock the page between the check and clear? I find lru_cache_add_active_or_unevictable somewhat suspicious. But checking if these two could race will take some time. > Is it possible to work out from trinity sources what the exact sequence > was? Which syscalls are being used, for example? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755499AbaLJCQM (ORCPT ); Tue, 9 Dec 2014 21:16:12 -0500 Received: from aserp1040.oracle.com ([141.146.126.69]:49441 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753271AbaLJCQJ (ORCPT ); Tue, 9 Dec 2014 21:16:09 -0500 Message-ID: <5487ACC5.1010002@oracle.com> Date: Tue, 09 Dec 2014 21:15:33 -0500 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Andrew Morton CC: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe , Davidlohr Bueso , "Kirill A. Shutemov" , Peter Zijlstra , Rik van Riel , Mel Gorman Subject: Re: mm: shmem: freeing mlocked page References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> In-Reply-To: <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/18/2014 04:58 PM, Andrew Morton wrote: >> [ 1026.994816] Modules linked in: >> > [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 >> > [ 1026.996123] FAULT_INJECTION: forcing a failure. >> > [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 >> > [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 >> > [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 >> > [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 >> > [ 1027.003359] Call Trace: >> > [ 1027.003831] dump_stack (lib/dump_stack.c:52) >> > [ 1027.004725] bad_page (mm/page_alloc.c:338) >> > [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) >> > [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) >> > [ 1027.007772] ? __page_cache_release (mm/swap.c:66) >> > [ 1027.008815] put_page (mm/swap.c:270) >> > [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) >> > [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) >> > [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) >> > [ 1027.012856] ? pipe_lock (fs/pipe.c:69) >> > [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) >> > [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) >> > [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) >> > [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) >> > [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) >> > [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) >> > > So what happened here? Userspace fed some mlocked memory into splice() > and then, while splice() was running, userspace dropped its reference > to the memory, leaving splice() with the last reference. Yet somehow, > that page was still marked as being mlocked. I wouldn't expect the > kernel to permit userspace to drop its reference to the memory without > first clearing the mlocked state. > > Is it possible to work out from trinity sources what the exact sequence > was? Which syscalls are being used, for example? Phew, this took a long while but I've bisected it (with good confidence) down to: commit a38246260912ba4a0f8b563704a965a7a97cf3c3 Author: Davidlohr Bueso Date: Wed Dec 3 18:54:27 2014 +1100 mm/memory.c: share the i_mmap_rwsem The unmap_mapping_range family of functions do the unmapping of user pages (ultimately via zap_page_range_single) without touching the actual interval tree, thus share the lock. Signed-off-by: Davidlohr Bueso Cc: "Kirill A. Shutemov" Acked-by: Hugh Dickins Cc: Oleg Nesterov Cc: Peter Zijlstra (Intel) Cc: Rik van Riel Cc: Srikar Dronamraju Acked-by: Mel Gorman Signed-off-by: Andrew Morton Thanks, Sasha From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754222AbaLJCXs (ORCPT ); Tue, 9 Dec 2014 21:23:48 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:21854 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752651AbaLJCXr (ORCPT ); Tue, 9 Dec 2014 21:23:47 -0500 Message-ID: <5487AE8C.7000302@oracle.com> Date: Tue, 09 Dec 2014 21:23:08 -0500 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Andrew Morton CC: Hugh Dickins , "linux-mm@kvack.org" , LKML , Dave Jones , Jens Axboe , Peter Zijlstra , Rik van Riel , Mel Gorman , dbueso@suse.de, kirill@shutemov.name Subject: Re: mm: shmem: freeing mlocked page References: <545C4A36.9050702@oracle.com> <5466142C.60100@oracle.com> <20141118135843.bd711e95d3977c74cf51d803@linux-foundation.org> <5487ACC5.1010002@oracle.com> In-Reply-To: <5487ACC5.1010002@oracle.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (Apologies for spam, I've Cc'ed a few outdated emails in the previous mail) On 12/09/2014 09:15 PM, Sasha Levin wrote: > On 11/18/2014 04:58 PM, Andrew Morton wrote: >>> [ 1026.994816] Modules linked in: >>>> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455 >>>> [ 1026.996123] FAULT_INJECTION: forcing a failure. >>>> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1 >>>> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8 >>>> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08 >>>> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000 >>>> [ 1027.003359] Call Trace: >>>> [ 1027.003831] dump_stack (lib/dump_stack.c:52) >>>> [ 1027.004725] bad_page (mm/page_alloc.c:338) >>>> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763) >>>> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438) >>>> [ 1027.007772] ? __page_cache_release (mm/swap.c:66) >>>> [ 1027.008815] put_page (mm/swap.c:270) >>>> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93) >>>> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886) >>>> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734) >>>> [ 1027.012856] ? pipe_lock (fs/pipe.c:69) >>>> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534) >>>> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574) >>>> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169) >>>> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684) >>>> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639) >>>> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529) >>>> >> So what happened here? Userspace fed some mlocked memory into splice() >> and then, while splice() was running, userspace dropped its reference >> to the memory, leaving splice() with the last reference. Yet somehow, >> that page was still marked as being mlocked. I wouldn't expect the >> kernel to permit userspace to drop its reference to the memory without >> first clearing the mlocked state. >> >> Is it possible to work out from trinity sources what the exact sequence >> was? Which syscalls are being used, for example? > > Phew, this took a long while but I've bisected it (with good confidence) down > to: > > commit a38246260912ba4a0f8b563704a965a7a97cf3c3 > Author: Davidlohr Bueso > Date: Wed Dec 3 18:54:27 2014 +1100 > > mm/memory.c: share the i_mmap_rwsem > > The unmap_mapping_range family of functions do the unmapping of user pages > (ultimately via zap_page_range_single) without touching the actual > interval tree, thus share the lock. > > Signed-off-by: Davidlohr Bueso > Cc: "Kirill A. Shutemov" > Acked-by: Hugh Dickins > Cc: Oleg Nesterov > Cc: Peter Zijlstra (Intel) > Cc: Rik van Riel > Cc: Srikar Dronamraju > Acked-by: Mel Gorman > Signed-off-by: Andrew Morton > > > Thanks, > Sasha >