From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932285AbdJWOA5 (ORCPT ); Mon, 23 Oct 2017 10:00:57 -0400 Received: from mout.gmx.net ([212.227.15.18]:51980 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932191AbdJWOA4 (ORCPT ); Mon, 23 Oct 2017 10:00:56 -0400 Subject: Re: PROBLEM: Remapping hugepages mappings causes kernel to return EINVAL To: Michal Hocko Cc: Mike Kravetz , linux-mm@kvack.org, linux-kernel , Andrea Arcangeli , "Kirill A. Shutemov" , Vlastimil Babka References: <93684e4b-9e60-ef3a-ba62-5719fdf7cff9@gmx.de> <6b639da5-ad9a-158c-ad4a-7a4e44bd98fc@gmx.de> <5fb8955d-23af-ec85-a19f-3a5b26cc04d1@oracle.com> <20171023114210.j7ip75ewoy2tiqs4@dhcp22.suse.cz> <20171023124122.tjmrbcwo2btzk3li@dhcp22.suse.cz> From: "C.Wehrmeyer" Message-ID: Date: Mon, 23 Oct 2017 16:00:13 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20171023124122.tjmrbcwo2btzk3li@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:V2ZYvq8G5KX/fAyKPhP5BcwOJRHGoA0+SdS30tQAdpFf9w5zosC 8Zq+1SUmcShaRsiPJHjO4OwuaQnaWmyImuV3450bmwTjqN8YKpKoXCHpwv2yYPn9QMtGKJR Qf2Q8m+5iYNskCg6YKetnaQQSrHCmeN6MsA4Cb8Gx0PooAfQ1ZxY6qViA67314Pw+nPITRE h+94tUp/YFPajTyO1lAVg== X-UI-Out-Filterresults: notjunk:1;V01:K0:BNd9ap6+vRM=:rlj5dvsLECLe5B3yRkqrux UvOOKjFvekSPEc5w68xHifgrF2i4FN/pTpbruVSLYtdkMak9f987IGYfAMJVcb948XN+XzuLe mUEEBGeoPgehbozAVVGkQsmh0LlIC4K/Hx78n+L4jLXpbut+BdZJt/wv39qJP3HLC7fkE/+aZ O8xuevwcQBHdBZWEXaN640VqpI/Wccl2kFiwQtBVBR9hA3P3si3q0WO0QjgUt0bJNjv0A1UjK OfOF13uK019JgpZJH/6q3ne0aoFg+L9QIjy3m9Q/xGRokKNKMgPS7zdgBOjeyOlh2UgzrEdSb Qhp7GCMc/wSki7zEn6Uig2SwOC6euyhkWBP74TVhFCj2jVKkOMRAMpfJhu/aWnCaxEUgUcCCx qqcmZJybmnAOrmQwL8D9Z92A3ScCRcSHf0n0Nt3glMQJNu+CUq5X54+OCLnpNZob1pjOSMufi iJpXYQ0+u+q/HZEZf0R621HRZ+P7mOgnEwLQ/17ApvW43tsbPDc0QjoIoBG4lIxKgANeVuQwp dm3d0MYUvDoCK6Ow134hKSi7pPyyrHCbeomJVmO0ApclOm0hKoDSDAUAEV3sCTz4Dx9yTyTya dQTzPhFv8TaeBOlnel6xIcO3p1YKc4/OMcoVFsK9Umbkul8a8ZlyqPQ8u7Nwk5MwM1YKSuvqH MAhXddDsq/kDnbZb8cm9WNWAB2S6WRXu5gUxPH1tv6sPpMIUt/XBwYtcnVsN9C+gPIKu2TP9X 7zWdXMWaibF407gVgx3KVSIWuLUOIr/Ud3mK9G9YK/PEbYBzGeNRd6Um+mGMHy8RvbI0FHMqz imKyDxt/FUYYT5xNL6btY0sMcksrTR2hP9zzVloFRfcncqF6lY= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2017-10-23 14:41, Michal Hocko wrote: > On Mon 23-10-17 14:22:30, C.Wehrmeyer wrote: >> On 2017-10-23 13:42, Michal Hocko wrote: >>> I do not remember any such a request either. I can see some merit in the >>> described use case. It is not specific on why hugetlb pages are used for >>> the allocator memory because that comes with it own issues. >> >> That is yet for the user to specify. As of now hugepages still require a >> special setup that not all people might have as of now - to my knowledge a >> kernel being compiled with CONFIG_TRANSPARENT_HUGEPAGE=y and a number of >> such pages being allocated either through the kernel boot line or through > > CONFIG_TRANSPARENT_HUGEPAGE has nothing to do with hugetlb pages. These > are THP which do not need any special configuration and mremap works on > them. I was not aware of the fact that HP != THP, so thank you for clarifying that. > This is no longer true. GB pages can be allocated during runtime as > well. Didn't know that as well. I just knew the last time I tested this it was not possible. >> 2-MiB pages, on the other hand, >> shouldn't have those limitations anymore. User-space programs should be >> capable of allocating such pages without the need for the user to fiddle >> with nr_hugepages beforehand. > > And that is what we have THP for... Then I might have been using it incorrectly? I've been digging through Documentation/vm/transhuge.txt after your initial pointing out, and verified that the kernel uses THPs pretty much always, without the usage of madvise: # cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never And just to be very sure I've added: if (madvise(buf1,ALLOC_SIZE_1,MADV_HUGEPAGE)) { errno_tmp = errno; fprintf(stderr,"madvise: %u\n",errno_tmp); goto out; } /*Make sure the mapping is actually used*/ memset(buf1,'!',ALLOC_SIZE_1); /*Give me time for monitoring*/ sleep(2000); right after the mmap call. I've also made sure that nothing is being optimised away by the compiler. With a 2MiB mapping being requested this should be a good opportunity for the kernel, and yet when I try to figure out how many THPs my processes uses: $ cat /proc/21986/smaps | grep 'AnonHuge' I just end up with lots of: AnonHugePages: 0 kB And cat /proc/meminfo | grep 'Huge' doesn't change significantly as well. Am I just doing something wrong here, or shouldn't I trust the THP mechanisms to actually allocate hugepages for me? > General purpose allocator playing with hugetlb > pages is rather tricky and I would be really cautious there. I would > rather play with THP to reduce the TLB footprint. May one ask why you'd recommend to be cautious here? I understand that actual huge pages can slow down certain things - swapping comes to mind immediately, which is probably the reason why Linux (used to?) lock such pages in memory as well. I once again want to emphasise that this is my first time writing to the mailing list. It might be redundant, but I'm not yet used to any conventions or technical details you're familiar with.