From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f42.google.com (mail-pa0-f42.google.com [209.85.220.42]) by kanga.kvack.org (Postfix) with ESMTP id 6A1616B0032 for ; Thu, 15 Jan 2015 16:00:13 -0500 (EST) Received: by mail-pa0-f42.google.com with SMTP id et14so19801098pad.1 for ; Thu, 15 Jan 2015 13:00:13 -0800 (PST) Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTP id nr8si3229866pdb.5.2015.01.15.13.00.10 for ; Thu, 15 Jan 2015 13:00:11 -0800 (PST) Message-ID: <54B82A57.9060000@intel.com> Date: Thu, 15 Jan 2015 13:00:07 -0800 From: Dave Hansen MIME-Version: 1.0 Subject: [LSF/MM TOPIC] Reclaim in the face of really fast I/O Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Matthew Wilcox , lsf-pc@lists.linux-foundation.org, "Reddy, Dheeraj" Cc: "Kleen, Andi" , "Chen, Tim C" , Linux-MM I/O devices are only getting faster. In fact, they're getting closer and closer to memory in latency and bandwidth. But the VM is still designed to do very orderly and costly procedures to reclaim memory, and the existing algorithms don't parallelize particularly well. They hit contention on mmap_sem or the lru locks well before all of the CPU horsepower that we have can be brought to bear on reclaim. Once the latency to bring pages in and out of storage becomes low enough, reclaiming the _right_ pages becomes much less important than doing something useful with the CPU horsepower that we have. We need to talk about ways to do reclaim with lower CPU overhead and to parallelize more effectively. There has been some research in this area by some folks at Intel and we could quickly summarize what has been learned so far to help kick off a discussion. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f178.google.com (mail-ob0-f178.google.com [209.85.214.178]) by kanga.kvack.org (Postfix) with ESMTP id 13EDD6B0032 for ; Thu, 15 Jan 2015 20:21:54 -0500 (EST) Received: by mail-ob0-f178.google.com with SMTP id gq1so16702356obb.9 for ; Thu, 15 Jan 2015 17:21:53 -0800 (PST) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id y4si897400obm.66.2015.01.15.17.21.52 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 15 Jan 2015 17:21:53 -0800 (PST) Message-ID: <54B867A8.6050900@oracle.com> Date: Thu, 15 Jan 2015 20:21:44 -0500 From: Sasha Levin MIME-Version: 1.0 Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Reclaim in the face of really fast I/O References: <54B82A57.9060000@intel.com> In-Reply-To: <54B82A57.9060000@intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen , Matthew Wilcox , lsf-pc@lists.linux-foundation.org, "Reddy, Dheeraj" Cc: "Kleen, Andi" , Linux-MM , "Chen, Tim C" On 01/15/2015 04:00 PM, Dave Hansen wrote: > I/O devices are only getting faster. In fact, they're getting closer > and closer to memory in latency and bandwidth. But the VM is still > designed to do very orderly and costly procedures to reclaim memory, and > the existing algorithms don't parallelize particularly well. They hit > contention on mmap_sem or the lru locks well before all of the CPU > horsepower that we have can be brought to bear on reclaim. > > Once the latency to bring pages in and out of storage becomes low > enough, reclaiming the _right_ pages becomes much less important than > doing something useful with the CPU horsepower that we have. > > We need to talk about ways to do reclaim with lower CPU overhead and to > parallelize more effectively. > > There has been some research in this area by some folks at Intel and we > could quickly summarize what has been learned so far to help kick off a > discussion. I was actually planning to bring that up. Trinity can cause enough stress to a system that the hang watchdog triggers (with a 10 minute timeout!) inside reclaim code. Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f51.google.com (mail-oi0-f51.google.com [209.85.218.51]) by kanga.kvack.org (Postfix) with ESMTP id 4F9476B0032 for ; Thu, 15 Jan 2015 20:32:37 -0500 (EST) Received: by mail-oi0-f51.google.com with SMTP id h136so15279253oig.10 for ; Thu, 15 Jan 2015 17:32:37 -0800 (PST) Received: from userp1040.oracle.com (userp1040.oracle.com. [156.151.31.81]) by mx.google.com with ESMTPS id k7si849790oef.90.2015.01.15.17.32.35 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 15 Jan 2015 17:32:36 -0800 (PST) Message-ID: <54B86A2D.50709@oracle.com> Date: Thu, 15 Jan 2015 20:32:29 -0500 From: Sasha Levin MIME-Version: 1.0 Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Reclaim in the face of really fast I/O References: <54B82A57.9060000@intel.com> <54B867A8.6050900@oracle.com> In-Reply-To: <54B867A8.6050900@oracle.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen , Matthew Wilcox , lsf-pc@lists.linux-foundation.org, "Reddy, Dheeraj" Cc: "Kleen, Andi" , Linux-MM , "Chen, Tim C" On 01/15/2015 08:21 PM, Sasha Levin wrote: > On 01/15/2015 04:00 PM, Dave Hansen wrote: >> I/O devices are only getting faster. In fact, they're getting closer >> and closer to memory in latency and bandwidth. But the VM is still >> designed to do very orderly and costly procedures to reclaim memory, and >> the existing algorithms don't parallelize particularly well. They hit >> contention on mmap_sem or the lru locks well before all of the CPU >> horsepower that we have can be brought to bear on reclaim. >> >> Once the latency to bring pages in and out of storage becomes low >> enough, reclaiming the _right_ pages becomes much less important than >> doing something useful with the CPU horsepower that we have. >> >> We need to talk about ways to do reclaim with lower CPU overhead and to >> parallelize more effectively. >> >> There has been some research in this area by some folks at Intel and we >> could quickly summarize what has been learned so far to help kick off a >> discussion. > > I was actually planning to bring that up. Trinity can cause enough stress > to a system that the hang watchdog triggers (with a 10 minute timeout!) > inside reclaim code. Something like this: [ 5412.398971] trinity-c23 R running task 10768 29063 30295 0x10000006 [ 5412.400111] ffff88046127b178 0000000000000ec3 0000000000000000 0000000000000000 [ 5412.401404] ffffe8fff263cd6e 0000000000000000 0000000000000000 0000000000000000 [ 5412.402695] 0000000000000000 ffff88076bfff4c8 0000000000001099 0000000000000000 [ 5412.404084] Call Trace: [ 5412.404495] [] ? _raw_spin_unlock_irqrestore+0xa2/0xf0 [ 5412.405611] [] schedule+0x55/0x280 [ 5412.406435] [] throttle_direct_reclaim+0x432/0x660 [ 5412.407530] [] ? __init_waitqueue_head+0xe0/0xe0 [ 5412.408565] [] try_to_free_pages+0xf7/0x460 [ 5412.409528] [] __alloc_pages_nodemask+0xc01/0x1940 [ 5412.410589] [] alloc_pages_vma+0x216/0x6f0 [ 5412.411572] [] ? shmem_alloc_page+0x9a/0x170 [ 5412.412556] [] shmem_alloc_page+0x9a/0x170 [ 5412.413507] [] ? find_get_entry+0x191/0x2c0 [ 5412.414475] [] ? find_get_entry+0x5/0x2c0 [ 5412.415513] [] ? find_lock_entry+0x2b/0x140 [ 5412.416474] [] shmem_getpage_gfp+0xde6/0x1710 [ 5412.417461] [] shmem_fault+0x1a1/0x7d0 [ 5412.418353] [] __do_fault+0xad/0x2a0 [ 5412.419285] [] handle_mm_fault+0x1331/0x5440 [ 5412.420121] [] __do_page_fault+0x2d3/0xfb0 [ 5412.420877] [] ? mark_held_locks+0x117/0x2b0 [ 5412.421688] [] ? trace_hardirqs_off+0xd/0x10 [ 5412.422491] [] trace_do_page_fault+0xc8/0x420 [ 5412.423348] [] do_async_page_fault+0x83/0x120 [ 5412.424175] [] async_page_fault+0x28/0x30 [ 5412.424920] [] ? iov_iter_fault_in_readable+0x17e/0x280 [ 5412.425851] [] ? ___might_sleep+0x2a5/0x420 [ 5412.426645] [] generic_perform_write+0x179/0x5b0 [ 5412.427573] [] ? __mnt_drop_write+0x57/0xa0 [ 5412.428365] [] __generic_file_write_iter+0x59c/0x13e0 [ 5412.429292] [] ? rw_copy_check_uvector+0x5c/0x470 [ 5412.430153] [] ? kasan_poison_shadow+0x34/0x40 [ 5412.431062] [] generic_file_write_iter+0xd9/0x510 [ 5412.431892] [] ? new_sync_read+0x220/0x220 [ 5412.432675] [] do_iter_readv_writev+0x9d/0x190 [ 5412.433504] [] do_readv_writev+0x239/0xe10 [ 5412.434294] [] ? __generic_file_write_iter+0x13e0/0x13e0 [ 5412.435300] [] ? acct_account_cputime+0x6e/0xa0 [ 5412.435942] [] ? __generic_file_write_iter+0x13e0/0x13e0 [ 5412.436678] [] ? context_tracking_user_exit+0xc7/0x330 [ 5412.437395] [] ? trace_hardirqs_on_caller+0x521/0x850 [ 5412.438097] [] vfs_writev+0x93/0x100 [ 5412.438632] [] SyS_pwritev+0x11a/0x200 Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org