From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 397DCC43334 for ; Tue, 14 Jun 2022 15:22:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80E8D6B0093; Tue, 14 Jun 2022 11:22:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 797996B0095; Tue, 14 Jun 2022 11:22:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5EA146B0096; Tue, 14 Jun 2022 11:22:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 48E5F6B0093 for ; Tue, 14 Jun 2022 11:22:35 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id DF51D8094F for ; Tue, 14 Jun 2022 15:22:34 +0000 (UTC) X-FDA: 79577208228.28.BA49038 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf12.hostedemail.com (Postfix) with ESMTP id DFD52400A1 for ; Tue, 14 Jun 2022 15:22:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1655220153; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R6j8lK4mFA9AhjABXXpxTcFeAxZz0hOvtJKofzbuigg=; b=X60OVx1rb09lEM3iq2qmu+ko7KUGGzE14ZDWO8NFCpIp9ev08kxowYI+9NlGE2VJTHOjRZ xT6EM5WEp7Ylgm/9bJqJIlOjgP7DvVu1UHFmov4H1uC0Qf223N9o5OTEX9DeGohmwDmbUU 2AO3qsPLflp4pRN2zu3Jk7YK7bc3aos= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-626-yroALQVrNLarRiCGTvK_wQ-1; Tue, 14 Jun 2022 11:22:32 -0400 X-MC-Unique: yroALQVrNLarRiCGTvK_wQ-1 Received: by mail-wr1-f69.google.com with SMTP id u18-20020adfb212000000b0021855847651so1389503wra.6 for ; Tue, 14 Jun 2022 08:22:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=R6j8lK4mFA9AhjABXXpxTcFeAxZz0hOvtJKofzbuigg=; b=eAZyEjGNIzpPtsSIam+ZjzxWImjsvAx22qDs2LgaCVznKGU20JuoHVQm0NQ5ndYzIZ GTCNX0o8uXpcYskDVCtdk1W/oXNbO38j9R7dCGI9HfnIyS3/UYX4lO6zLsQeHgpoQRfe wj8P4VqlXpl0pEAixoOFV5RSjTnE/nsuc9XyRfNp8A9La4Mfa1c7NnS6VHKAMPjHsN76 HFDBRraDDuWNubeRa9APrXd1Fyvu9nDw6VfiE7T+NGaZvR3x7SflOGE2BTnVpc1jDAlK JCo6RVNJN454LUX0GkeUJO9AqrOfdFS8W09nwTVWkyU8wzZ6BgbPVkayz+a2HqwYUimg paGA== X-Gm-Message-State: AJIora8d1UDbkgyGNWWx9Gu1CWRBbbYcdkNHfixBC+QJ1l5IHUY9rDaX RSm23K998UTFA93d84RDiQLr9tguy/+Kt/7TlGXAaLqdf65lpUiO+zNHv1aSaaEufqA0eKCnJaY tv3BhK1Jgq+s= X-Received: by 2002:a5d:6c61:0:b0:219:b032:afb5 with SMTP id r1-20020a5d6c61000000b00219b032afb5mr5252493wrz.666.1655220150765; Tue, 14 Jun 2022 08:22:30 -0700 (PDT) X-Google-Smtp-Source: AGRyM1t1jGssAznJWuIYglHUfYjoyjp1sMtsby7p9CZ/6N69MTYvDp3GRU/jhr0vWlO5myEvn8Wwxg== X-Received: by 2002:a5d:6c61:0:b0:219:b032:afb5 with SMTP id r1-20020a5d6c61000000b00219b032afb5mr5252469wrz.666.1655220150499; Tue, 14 Jun 2022 08:22:30 -0700 (PDT) Received: from ?IPV6:2003:cb:c70b:cf00:aace:de16:d459:d411? (p200300cbc70bcf00aacede16d459d411.dip0.t-ipconnect.de. [2003:cb:c70b:cf00:aace:de16:d459:d411]) by smtp.gmail.com with ESMTPSA id g5-20020a5d5405000000b0020cfed0bb7fsm12062095wrv.53.2022.06.14.08.22.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 14 Jun 2022 08:22:30 -0700 (PDT) Message-ID: <3eea2e6e-1646-546a-d9ef-d30052c00c7d@redhat.com> Date: Tue, 14 Jun 2022 17:22:29 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 To: Nadav Amit , Peter Xu Cc: linux-mm@kvack.org, Nadav Amit , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Mike Rapoport References: <20220613204043.98432-1-namit@vmware.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH RFC] userfaultfd: introduce UFFDIO_COPY_MODE_YOUNG In-Reply-To: <20220613204043.98432-1-namit@vmware.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655220154; a=rsa-sha256; cv=none; b=W5/IMeTyxJ6CH/GtgJ/rTeGGujkNxr5kwBRILvFyeK5f0HJS5JdtjEDIoGvQ7STaGz+xxs RQDwucMwCdswVdSgHYXqtdkP0Yofw1sz3CYNlIqBNVT9M3L770ioC/dTFh1PzIVdeYofFK uCnh3bsTxv1f182ADE9AnmALgsBiA0c= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=X60OVx1r; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf12.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655220154; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=R6j8lK4mFA9AhjABXXpxTcFeAxZz0hOvtJKofzbuigg=; b=tWa3f5E9eGCfDb+EiNQvLfJI7ea/SgzgVhpTtnS1BEGd7CnI7YD4FpAeOs2hY/4ietwiZe SgD5/89y8qYE3c2qeyE9VsPMAd8h+CH4PczGE6ZUZ9p/uI1X9QmZUOUN1YiPTXuT4DP63z 88R6K/M41hf1hDR7JAAwMu6MPYdj1XM= Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=X60OVx1r; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf12.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-Rspam-User: X-Stat-Signature: 1u5ut5d67zh6c8ern7a9m9ktuufqtt36 X-Rspamd-Queue-Id: DFD52400A1 X-Rspamd-Server: rspam08 X-HE-Tag: 1655220153-831919 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 13.06.22 22:40, Nadav Amit wrote: > From: Nadav Amit > > As we know, using a PTE on x86 with cleared access-bit (aka young-bit) > takes ~600 cycles more than when the access-bit is set. At the same > time, setting the access-bit for memory that is not used (e.g., > prefetched) can introduce greater overheads, as the prefetched memory is > reclaimed later than it should be. > > Userfaultfd currently does not set the access-bit (excluding the > huge-pages case). Arguably, it is best to let the uffd monitor control > whether the access-bit should be set or not. The expected use is for the > monitor to request userfaultfd to set the access-bit when the copy > operation is done to resolve a page-fault, and not to set the young-bit > when the memory is prefetched. Thinking out loud about existing users: postcopy live migration in QEMU has two usage for placement of pages a) Resolving a fault. E.g., a VCPU might be waiting for resolution to make progress. b) Background migration to converge without faults on all relevant pages. I guess in a) we'd want UFFDIO_COPY_MODE_YOUNG in b) we don't want it. I wonder, however, instead of calling this "young", which implies what the OS should or shouldn't do, to define this as a hint that the placed page is very likely to be accessed next. I'm bad at naming, UFFDIO_COPY_MODE_ACCESS_LIKELY would express what I have in mind. > > Introduce UFFDIO_COPY_MODE_YOUNG to enable userspace to request the > young bit to be set. For UFFDIO_CONTINUE and UFFDIO_ZEROPAGE set the bit > unconditionally since the former is only used to resolve page-faults and > the latter would not benefit from not setting the access-bit. > > Cc: Mike Kravetz > Cc: Hugh Dickins > Cc: Andrew Morton > Cc: Axel Rasmussen > Cc: Peter Xu > Cc: David Hildenbrand > Cc: Mike Rapoport > Signed-off-by: Nadav Amit > > --- > > There are 2 possible enhancements: > > 1. Use the flag to decide on whether to mark the PTE as dirty (for > writable PTEs). I guess that setting the dirty-bit is as expensive as > setting the access-bit, and setting it introduces similar tradeoffs, > as mentioned above. > > 2. Introduce a similar mode for write-protect and use this information > for setting both the young and dirty bits. Makes one wonder whether > mprotect() should also set the bit in certain cases... I wonder if UFFDIO_COPY_MODE_READ_ACCESS_LIKELY vs. UFFDIO_COPY_WRITE_ACCESS_LIKELY could evenmake sense. I feel like it could. For example, QEMU knows if a page fault it's resolving was due to a read or a write fault and could use that information accordingly. Of course, we don't completely know if we currently have a read fault, if we could get a write fault immediately after. Especially in the context of UFFDIO_ZEROPAGE, UFFDIO_ZEROPAGE_WRITE_ACCESS_LIKELY could ... not place the zeropage but instead populate an actual page and mark it accessed+dirty. I even have a use case for that ;) The kernel could decide how to treat these hints -- for example, if it doesn't want user space to mess with access/dirty bits, it could just mostly ignore the hints. -- Thanks, David / dhildenb