From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF926C38142 for ; Fri, 27 Jan 2023 09:56:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37EEE6B0074; Fri, 27 Jan 2023 04:56:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 32FA16B0075; Fri, 27 Jan 2023 04:56:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F7E26B0078; Fri, 27 Jan 2023 04:56:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0DB836B0074 for ; Fri, 27 Jan 2023 04:56:05 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D05C0C0F0B for ; Fri, 27 Jan 2023 09:56:04 +0000 (UTC) X-FDA: 80400123048.12.7605856 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf02.hostedemail.com (Postfix) with ESMTP id EEED580002 for ; Fri, 27 Jan 2023 09:56:02 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=pFkiUSyV; spf=pass (imf02.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674813363; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K1INdtejauRJYtP9oZrPmUOmi61oacyCPgF02MAsw5A=; b=BR4oU0SxXa47uQb9ypxlaMZ94C3+HcJ+vrsoKDoi6xondJ99P+u2PTiqplrLqo1jj5KpLD tEYF0DvZN6ZSr+V8WYRFsj3xMwZgfCINbJfMFmZ91fGRroPMWAk5SDtGc89FKZDMh4LcGh AIXqXjRA4oKgd5Qr25U5iMSkUptWwHk= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=pFkiUSyV; spf=pass (imf02.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674813363; a=rsa-sha256; cv=none; b=ZmBFM9jC8PGimD6VqJ5l0F4lEA2OduyFGxbNRV4IsKFA146bOw9KFyRKgRc/BsxywRJqxD IIpmfdJQvl68Ys74qBRtwwm0E/HVt4V9Yi515AXdZwOmTBihzzLSLFVZZ4D97CmiAWAQZD HgCVhPuiplUb/UOYrTinNGtVZZYSdiM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 91EA4201D4; Fri, 27 Jan 2023 09:56:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1674813361; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=K1INdtejauRJYtP9oZrPmUOmi61oacyCPgF02MAsw5A=; b=pFkiUSyVuIxSfvmx/1Y0VajT5hWDfUAtfjAA7bZQ0AHdf68vJ5b3AjQN0w4uKYFJ5xgz+L 1oXUN9gv7nP2V8VejWHXEcvHxUUjVMVE+9D2GSU0SVjgX1F4xv+BzhpA8KkQ1532SN+yL5 CgX/ZnJBRDLruHpgY5BjbpUrr32LOSU= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6D079138E3; Fri, 27 Jan 2023 09:56:01 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Ue+sF7Gf02O+KgAAMHmgww (envelope-from ); Fri, 27 Jan 2023 09:56:01 +0000 Date: Fri, 27 Jan 2023 10:56:00 +0100 From: Michal Hocko To: Mike Kravetz Cc: linux-mm@kvack.org, Naoya Horiguchi , David Rientjes , Matthew Wilcox , David Hildenbrand , Peter Xu , James Houghton , Muchun Song Subject: Re: A mapcount riddle Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: zekfwo1p6t1o5hqygqguejp7c8etoyk7 X-Rspam-User: X-Rspamd-Queue-Id: EEED580002 X-Rspamd-Server: rspam06 X-HE-Tag: 1674813362-750340 X-HE-Meta: U2FsdGVkX1+dVXYQE9zkiC9jxyrlhtEGEstlap1ZIeCkxgRc0d+GNYeyrO+KrCtFFSt521L8eoBRHOIylUTox+Nlx3En+GBmcIMtvpiJR0T4AEnnSsNZ5RhXytF0u1yQdAEHo0v+dpVKngQxo1BRYCxtQbZVoT0nWOAVbob3gIOxgl/Npxlt9oBZiHEowB22hxg7n7ghiaI4QxlJY86Rkh+h7/itmS//7HSbpCScFVqaOB2nvU8eSRBcBFWUye8rCgkkti97OxxiPbR317U8kbEQ+eikAoYpiEvNw9BhXElChJQrhdnfmyyfu516trU4jiivKncvD/EWF8mOjR2kT6cuBsYjfnS5nqcIuMHi0sM+7H8A4yRyg7NhVc+bVOb02vAHvTkH/rXEiyobATy01BBzE1JlI7Ae1WN92MvQsb4x+8piD7nWyVzwTn9FvP1ngYFyTDDYL2JIrivuApOrKNeBmOMuTPJU6fyERCmmtUJA0rd46emp3UxLBmKz/sikaWkPnqnDB3WCgluQ4V+miKIE1mMkSo5JKNCPMdXnUL1wdREPqezIln6vlX9BlGDE+OJpJ7y4pS1kwaMk83g8pOy5GS3GCcBytY6jTobFyKoQBmjNtfPWh5Bx8uB8F3/CTCTF68uV8HZzAC60eQtCAfHe4FA0aHsuFDjRpcfgJVTsOm3nORGvUvUNb4xrjxrHOchr1tlaaERb9jS7TfOSytnSVT8f7eFbDm9tgGB6wL/qWxxu6E8wGosTnQUYP8SZpiuxD98EMthIPU5BNUsuJiJk5J6IRBVCkt74C46i/wIuJ6XXbHOuisGeYpIQRzchjwdnH/zC7b7j5IUXI636CKs6kxqgEPVx4kM9ydRSGszNfp0eZOSbIOjycOS5HO2V3VqGZhxYT4aSERNzt7fW8EcMZDVXTlZ1q4gdYoYrjsVikIol/Y+wjSQyZXaoJOzjFum/8Metoruv2Zpkeei POjj/WHJ Y1MR2MRuT7NGGiq0P8Ue157oym7cyOkIWAUhb43soUQNy0xrWeg5+OH9sO3/DIzW9p+bgUzgL6Q/s2YI5636r3Nb24YLsaJyriU2K8f7oR281pEqv++2+tB6gYqsOO+iweyYwdtYbsxjUt4bZD7vo6FihC2EHHdGqdlml X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 26-01-23 09:51:12, Mike Kravetz wrote: > On 01/26/23 10:16, Michal Hocko wrote: > > On Wed 25-01-23 09:59:15, Mike Kravetz wrote: > > > On 01/25/23 09:24, Michal Hocko wrote: > > > > On Tue 24-01-23 12:56:24, Mike Kravetz wrote: > > > > > At first thought this seems bad. However, I believe this has been the > > > > > behavior since hugetlb PMD sharing was introduced in 2006 and I am > > > > > unaware of any reported issues. I did a audit of code looking at > > > > > mapcount. In addition to the above issue with smaps, there appears > > > > > to be an issue with 'migrate_pages' where shared pages could be migrated > > > > > without appropriate privilege. > > > > > > > > > > /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ > > > > > if (flags & (MPOL_MF_MOVE_ALL) || > > > > > (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { > > > > > if (isolate_hugetlb(page, qp->pagelist) && > > > > > (flags & MPOL_MF_STRICT)) > > > > > /* > > > > > * Failed to isolate page but allow migrating pages > > > > > * which have been queued. > > > > > */ > > > > > ret = 1; > > > > > } > > > > > > > > Could you elaborate what is problematic about that? The whole pmd > > > > sharing is a cooperative thing. So if some of the processes decides to > > > > migrate the page then why that should be a problem for others sharing > > > > that page via page table? Am I missing something obvious? > > > > > > Nothing obvious. It is just that the semantics seem to be that you can > > > only move shared pages if you have CAP_SYS_NICE. > > > > Correct > > > > > Certainly cooperation > > > is implied for shared PMDs, but I would guess that most applications are > > > not even aware they are sharing PMDs. > > > > How come? They have to explicitly map those hugetlb pages to the same > > address. Or is it common that the mapping just lands there by accident? > > Mapping to the same address is not required for PMD sharing. What is > required is that the alignment of PUD_SIZE offsets within the mapped object > (file) are mapped to PUD_SIZE aligned virtual addresses. That may not be > clear as it is difficult to describe. Bottom like is that addresses do not > need to match. Hmm, my bad then. I thought that is a strict requirement. But looking at the code page_table_shareable talks about pmd_index indeed. I must have misremember. I do agree that it is much simpler to hit into page table sharing for large mappings unintentionally - especially if they are GB aligned which is not really that unexpected. -- Michal Hocko SUSE Labs