From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932188AbdJWMXR (ORCPT ); Mon, 23 Oct 2017 08:23:17 -0400 Received: from mout.gmx.net ([212.227.15.19]:55044 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751116AbdJWMXQ (ORCPT ); Mon, 23 Oct 2017 08:23:16 -0400 Subject: Re: PROBLEM: Remapping hugepages mappings causes kernel to return EINVAL To: Michal Hocko , Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel , Andrea Arcangeli , "Kirill A. Shutemov" , Vlastimil Babka References: <93684e4b-9e60-ef3a-ba62-5719fdf7cff9@gmx.de> <6b639da5-ad9a-158c-ad4a-7a4e44bd98fc@gmx.de> <5fb8955d-23af-ec85-a19f-3a5b26cc04d1@oracle.com> <20171023114210.j7ip75ewoy2tiqs4@dhcp22.suse.cz> From: "C.Wehrmeyer" Message-ID: Date: Mon, 23 Oct 2017 14:22:30 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20171023114210.j7ip75ewoy2tiqs4@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:6JjaPMbWj6Y/rHt/V0bQ8opsAF3d6mCmsKPFtyzZHh0QgnNfs6+ SX+nU3JXtW5FJ2MjAveedRc2Oux1MnT7/dmPbxtU4Mv/s7kDKjwWeDDVJPWIhWgTKEUg6Dn N9LTorXNUnTJ8AOZQPslhL4xEZcwSuIr+WRHiNfVXyyPpryhC5YcwLS0nJurwlCoH0jYtgm HNbqRR1dS5m+5EVomv0fg== X-UI-Out-Filterresults: notjunk:1;V01:K0:QfGrAR5pJrQ=:ePq0EPnqDIioI2Si/hJKfl nj5/Hw2Wus3EPO1hbipMCUwf8XtgsxCvEx6Gc0ZWBCFooLSe0t7a4UV0SCtmqGsTAX21/qqBn yaY5nKngofVMrIhzRC4Be0ZQTdaBDArv/X7nEAKVSdp8AKKLFgRnE+9NIY0qQe8HsgkRt+JhJ Vm/WyodqViTZb0C19PfQz3Kq2bkOe2QD6AEN39emRd5ZbeRMmpQ/CLUOFr0pFNpfK7ZXJsqrm R8aswUKGEfsC/gGpD3vk6KjV9KL3YXoN5mxDmyJHqfVG2QJWWH5A7APIPuFy2+R+zMFy6WfcT DoW6YSL3odnPM5c0FOGEkzy2ivaL4VyONFFsdkKL8eOl89HHn4UYBDo4bI5RcPX5uoVhJJXfc apu6iD9j7rAnvmnAql4gPRE6JARW4I5kfOnuE75KFU/Gdhyonh9LT2k1i5yYhvSIYTTegxxnH RI3RNUkzAGArYc6tQ8wtzwYgdBfsvKDX2/6lbdwT2FKkaH6vMf1/g9LNdY9IEByDvqtxZadyX aJBYAcS00yyaPHfzDfVK46OXohfcVQUIDeiXGbRrSHJJXEJQlhjhTTLCu/mVS6O3leIsnukoB WyhR8Fw9usplZM7Xe8p9g2Nj0CahSKDX5MKAi4vc8fLbAeJPvCBeznf6DDajT5h+yFYeVtVwi wLrIO6aDYMUuhFE2hrxi9FbxWZ1TMhMKmudzkEXGC5hXoUELtOdD/jxNt6lxrz9KvllCnssfW tY4fPdWZSvtuAEX2s9twUOoJ3hMe8Q6vQPygQneU3unSOsBRS9bOlZHdzb4f5gC5uzHtYFRsE ySQYaLoUhXjhoqkW0iYjgqfxL8u0qmOdgcKStB1WOlhtenm6W0= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2017-10-23 13:42, Michal Hocko wrote: > I do not remember any such a request either. I can see some merit in the > described use case. It is not specific on why hugetlb pages are used for > the allocator memory because that comes with it own issues. That is yet for the user to specify. As of now hugepages still require a special setup that not all people might have as of now - to my knowledge a kernel being compiled with CONFIG_TRANSPARENT_HUGEPAGE=y and a number of such pages being allocated either through the kernel boot line or through /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages. I'm deliberately ignoring 1-GiB pages here because those are only allocatable during boot, when no processes have been spawned and memory is still not fragmented. My point is that I can see people not being too eager to support 1 GiB pages as of now unless for very specific use case. 2-MiB pages, on the other hand, shouldn't have those limitations anymore. User-space programs should be capable of allocating such pages without the need for the user to fiddle with nr_hugepages beforehand. Some time ago I've written some code to detect TLB capabilities on my current testing CPU, those are the results: [TLB] Instruction TLB: 2M/4M pages, fully associative, 8 entries [TLB] Data TLB: 4 KByte pages, 4-way set associative, 64 entries [TLB] Data TLB: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries and a separate array with 1 GByte pages, 4-way set associative, 4 entries [TLB] Instruction TLB: 4KByte pages, 8-way set associative, 64 entries [STLB] Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries With the knowledge that allocations in the Mebibyte range aren't uncommon at all nowadays and that one 2-MiB page eliminates the need for 512 4-KiB pages, we really should make advances towards treating 2-MiB pages just as casual as older pages. Allocators can still query if the kernel supports the specified page size, and specifying MAP_HUGETLB | MAP_HUGE_2MB would still be required in order to not break older programs, but from my perspective there is a lot to gain here.