From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f71.google.com (mail-pa0-f71.google.com [209.85.220.71]) by kanga.kvack.org (Postfix) with ESMTP id EAABC6B0253 for ; Mon, 24 Oct 2016 16:34:54 -0400 (EDT) Received: by mail-pa0-f71.google.com with SMTP id fl2so4082525pad.7 for ; Mon, 24 Oct 2016 13:34:54 -0700 (PDT) Received: from mga01.intel.com (mga01.intel.com. [192.55.52.88]) by mx.google.com with ESMTPS id al3si14276713pad.337.2016.10.24.13.34.53 for (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 24 Oct 2016 13:34:53 -0700 (PDT) Subject: Re: [PATCH] shmem: avoid huge pages for small files References: <20161017145539.GA26930@node.shutemov.name> <20161018142007.GL12092@dhcp22.suse.cz> <20161018143207.GA5833@node.shutemov.name> <20161018183023.GC27792@dhcp22.suse.cz> <20161020103946.GA3881@node.shutemov.name> <20161020224630.GO23194@dastard> <20161021020116.GD1075@tassilo.jf.intel.com> <20161021050118.GR23194@dastard> <20161021150007.GA13597@node.shutemov.name> <20161021225013.GS14023@dastard> From: Dave Hansen Message-ID: <580E706D.6030905@intel.com> Date: Mon, 24 Oct 2016 13:34:53 -0700 MIME-Version: 1.0 In-Reply-To: <20161021225013.GS14023@dastard> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dave Chinner , "Kirill A. Shutemov" Cc: Andi Kleen , Hugh Dickins , Michal Hocko , "Kirill A. Shutemov" , Andrea Arcangeli , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org On 10/21/2016 03:50 PM, Dave Chinner wrote: > On Fri, Oct 21, 2016 at 06:00:07PM +0300, Kirill A. Shutemov wrote: >> On Fri, Oct 21, 2016 at 04:01:18PM +1100, Dave Chinner wrote: >> To me, most of things you're talking about is highly dependent on access >> pattern generated by userspace: >> >> - we may want to allocate huge pages from byte 1 if we know that file >> will grow; > > delayed allocation takes care of that. We use a growing speculative > delalloc size that kicks in at specific sizes and can be used > directly to determine if a large page shoul dbe allocated. This code > is aware of sparse files, sparse writes, etc. OK, so somebody does a write() of 1 byte. We can delay the underlying block allocation for a long time, but we can *not* delay the memory allocation. We've got to decide before the write() returns. How does delayed allocation help with that decision? I guess we could (always?) allocate small pages up front, and then only bother promoting them once the FS delayed-allocation code kicks in and is *also* giving us underlying large allocations. That punts the logic to the filesystem, which is a bit counterintuitive, but it seems relatively sane. >>> As such, there is no way we should be considering different >>> interfaces and methods for configuring the /same functionality/ just >>> because DAX is enabled or not. It's the /same decision/ that needs >>> to be made, and the filesystem knows an awful lot more about whether >>> huge pages can be used efficiently at the time of access than just >>> about any other actor you can name.... >> >> I'm not convinced that filesystem is in better position to see access >> patterns than mm for page cache. It's not all about on-disk layout. > > Spoken like a true mm developer. IO performance is all about IO > patterns, and the primary contributor to bad IO patterns is bad > filesystem allocation patterns.... :P For writes, I think you have a good point. Managing a horribly fragmented file with larger pages and eating the associated write magnification that comes along with it seems like a recipe for disaster. But, Isn't some level of disconnection between the page cache and the underlying IO patterns a *good* thing? Once we've gone to the trouble of bringing some (potentially very fragmented) data into the page cache, why _not_ manage it in a lower-overhead way if we can? For read-only data it seems like a no-brainer that we'd want things in as large of a management unit as we can get. IOW, why let the underlying block allocation layout hamstring how the memory is managed? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org