From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E7FDC0015E for ; Wed, 26 Jul 2023 23:23:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230494AbjGZXX1 (ORCPT ); Wed, 26 Jul 2023 19:23:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229621AbjGZXX1 (ORCPT ); Wed, 26 Jul 2023 19:23:27 -0400 X-Greylist: delayed 1023 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 26 Jul 2023 16:23:25 PDT Received: from mail.codeweavers.com (mail.codeweavers.com [4.36.192.163]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B49F19BA; Wed, 26 Jul 2023 16:23:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=codeweavers.com; s=s1; h=From:Cc:To:Subject:Date:Message-ID:Sender; bh=TzAjea2nGNOKcL40wex2xg6v2cKZ58dDAwDQ7iciIm4=; b=f0zt7pMLMdwTX2ta0nfexztCNV HHSlhUIacV52g6Gm/BaxdE3SMrwORdOn5s+8WObEEgzbC5Yag7Rt5w51QdFXF5BQ2xQ+kE8pzPJHB uyQDsCgOGOl3r278g3GDzjHAqe8AiV95iGzS6tgoB3mrXdg5CM+GgyjdqteS04HxHAnxMTTiX5kKP SBg2HaSWD/1ufvrzD23UAqTfw4OAVyFh5EvqH+OguUd0YQ6qxaTTnn4VDuMXN1xyxToxZK61kVwPM YQ21CzfWqTRP/lL9yJp3jtuxRCI7cBUOfCLhf2L3xMxqYH1Q0i4lczsT/GPf068f+FrxpXbFVX+Bq /Nk4kViA==; Received: from cw141ip123.vpn.codeweavers.com ([10.69.141.123]) by mail.codeweavers.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1qOna7-005Zuy-A7; Wed, 26 Jul 2023 18:06:11 -0500 Message-ID: <94c6b665-bbc2-5030-f9b1-d933791008b8@codeweavers.com> Date: Wed, 26 Jul 2023 17:06:02 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [v3] fs/proc/task_mmu: Implement IOCTL for efficient page table scanning To: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Muhammad Usama Anjum Cc: "Kirill A. Shutemov" , =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Andrei Vagin , Danylo Mocherniuk , Alex Sierra , Alexander Viro , Andrew Morton , Axel Rasmussen , Christian Brauner , Cyrill Gorcunov , Dan Williams , David Hildenbrand , Greg KH , "Gustavo A . R . Silva" , "Liam R . Howlett" , Matthew Wilcox , Mike Rapoport , Nadav Amit , Pasha Tatashin , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , Yang Shi , Yun Zhou , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, kernel@collabora.com References: <20230713101415.108875-6-usama.anjum@collabora.com> <7eedf953-7cf6-c342-8fa8-b7626d69ab63@collabora.com> <382f4435-2088-08ce-20e9-bc1a15050861@collabora.com> <44eddc7d-fd68-1595-7e4f-e196abe37311@collabora.com> <1afedab8-5929-61e5-b0da-9c70dc01c254@collabora.com> Content-Language: en-GB From: Paul Gofman In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Hello Michał,     I was looking into that from the Wine point of view and did a bit of testing, so will try to answer the question cited below.     Without Windows large pages I guess the only way to make this work correctly is to disable THP with madvise(MADV_NOHUGEPAGE) on the memory ranges allocated with MEM_WRITE_WATCH, as the memory changes should not only be reported but also tracked with 4k page granularity as Windows applications expect.     Currently we don't implement MEM_LARGE_PAGES flag support in Wine (while of course might want to do that in the future). On Windows using this flag requires special permissions and implies more than just using huge pages under the hood but also, in particular, locking pages in memory. I'd expect that support to be extended in Windows though in the future in some way. WRT write watches, the range is watched with large page granularity. GetWriteWatch lpdwGranularity output parameter returns the value of "large page minimum" (returned by GetLargePageMinimum) and the returned addresses correspond to those large pages. I suppose to implement that on top of Linux huge pages we'd need a way to control huge pages allocation at the first place, i. e., a way to enforce the specified size for the huge pages for the memory ranged being mapped. Without that I am afraid the only way to correctly implement that is to still disable THP on the range and only adjust our API output so that matches expected.     Not related to the question, but without any relation to Wine and Windows API current way of dealing with THP in the API design looks a bit not straightforward to me. In a sense that transparent huge pages will appear not so transparent when it comes to dirty pages tracking. If I understand correctly, the application which allocated a reasonably big memory area and didn't use madvise(MADV_NOHUGEPAGE) might end up with a whole range being a single page and getting dirtified as a whole, which may likely void app's optimization based on changed memory tracking. Not that I know an ideal way out of this, maybe it is a matter of having THP disabled by default on watched ranges or clearly warning about this caveat in documentation? Regards,     Paul. On 7/26/23 15:10, Michał Mirosław wrote: > >>> 3. BTW, One of the uses is the GetWriteWatch and I wonder how it >>> behaves on HugeTLB (MEM_LARGE_PAGES allocation)? Shouldn't it return a >>> list of huge pages and write *lpdwGranularity = HPAGE_SIZE? >> Wine/Proton doesn't used hugetlb by default. Hugetlb isn't enabled by >> default on Debian as well. For GetWriteWatch() we don't care about the >> hugetlb at all. We have added hugetlb's implementation to complete the >> feature and leave out something. > How is GetWriteWatch() working when passed a VirtualAlloc(..., > MEM_LARGE_PAGES|MEM_WRITE_WATCH...)-allocated range? Does it still > report 4K pages? > This is only a problem when using max_pages: a hugetlb range might > need counting and reporting huge pages and not 4K parts. > > Best Regards > Michał Mirosław