From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 441D7C3DA7F for ; Sun, 4 Aug 2024 23:04:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7CBAA6B007B; Sun, 4 Aug 2024 19:04:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 77B926B0082; Sun, 4 Aug 2024 19:04:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61C3C6B0085; Sun, 4 Aug 2024 19:04:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 448336B007B for ; Sun, 4 Aug 2024 19:04:28 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B845681B49 for ; Sun, 4 Aug 2024 23:04:27 +0000 (UTC) X-FDA: 82416093774.12.95F5B94 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by imf27.hostedemail.com (Postfix) with ESMTP id B661140016 for ; Sun, 4 Aug 2024 23:04:25 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OPVS9kiP; spf=pass (imf27.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722812618; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Sk1dtiHwwmKlrxqNQkxNJP5+HAxZZqo3ycqfI2UvHWw=; b=GAdBigQF2+ratHnxvN2RtEft1BXtfTPVa4Ck6sDn4eqdar6eo6ru4t/cLX/tKUkPzZht2t yn+OsIpdXMDEJjR9TaTFo6a367+uBsfeNTjzV2f7ENYSzD3K2ONonsB7vUsltkZ828DbmD L0P5vOIELOsjel4/nOCoetK9WH0/+YU= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OPVS9kiP; spf=pass (imf27.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722812618; a=rsa-sha256; cv=none; b=raonFt6XdGIoNn8UUgY3ZOGXSShm8bhKEjlyXFmwLhUKrkpq7cI5DdszOQLTXl9mmm+kU/ rn18g1Mu88XqoCqUByMKl2Bnb1vEeJQUTezwqi5o05UxISffbE1kJd89uq/Jr5RtYfbd+u N3LykrY8PPpIWEplYRDOfI7PoXj0Zv4= Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-427fc97a88cso68952385e9.0 for ; Sun, 04 Aug 2024 16:04:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722812664; x=1723417464; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=Sk1dtiHwwmKlrxqNQkxNJP5+HAxZZqo3ycqfI2UvHWw=; b=OPVS9kiP1e1SRbh+mApc+IcbYW6CqMlbTaKf4zndBMB/snvNPwc6GqFX7F0NZ6xnqa RaWEY8fdNsMj+tIv4kn4LmhA4hWVDbZHVGBqu/qV6TUK56w9j8q2MeAb5QtD/6VxhuPF jRis0R4vgYRYkuAPT9uI/zs5Fbf7xYxezddUNzQbjl+tFgBOug5BQP8xq0tmsE6I4WtH F7spLxg7YFT+Hbt8l1n6FnW/UJQA+FLnodwDG+FFVfJohSZ3DAD0+KPmfUfplR1jhLNv l6qpiO+ufWzJ4ckT5GnDN1Rh4X+fJu4vn59pJTAIIv6LsjKSTnC4ks41G4M9HorLLmjG HLVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722812664; x=1723417464; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Sk1dtiHwwmKlrxqNQkxNJP5+HAxZZqo3ycqfI2UvHWw=; b=AFH5qBvFC2TGD7sAdy1NrxriixOjofuHT0r4ze/p7eTz6Zxe+hAQtDXDiBYNbfk5vK Cr21bCEFMdezbR0HHsK1CixdkdRA5EeUHDBNV1B0k4jC+tRlb5ehG7fWYBsdq12MhRtX iYLFOqz3oW26k/1vIIIGSDjxG8hb6k0oRmz7AHpWUBBWFh+UHuMqqDnTmInpwf9Td6tR ruqRBQ6d7PEw+anGmDOJImIzr3MfxdElMmaMcWxD9JQh3Btb0+xM70h66qZpGj5Gq9LT wumobfoMbVX4eaa0GyWSHaueOs3HvCI8k32ZWB5s2tdFTvwjAhMTQtInArsM5lML5xMR dflw== X-Forwarded-Encrypted: i=1; AJvYcCVvn6NNv4VFqf1VvR2KzRyG2OrQpbERDxlg+qyP6BFchJrT2zWHzkeZOKZBMlXHDSkkPuqXghJ4U9xVGLgvZiPLzjw= X-Gm-Message-State: AOJu0YzhdDV1+jpCdGZxtyQ1SLPMaJPTe4IaSGCTdqQtKhOvVvjctUZK Cfvvs67mSTwNXWEiljx8zQivBfNlVX9qpisuMN5kxh/JbcjemEQm X-Google-Smtp-Source: AGHT+IE6OhiyMfW628st/dncl7sEGRh5oHdDYLCxtbrvUNoY3hxLjELfYegePbKghQvkqMn0qh3UBQ== X-Received: by 2002:a05:600c:19cd:b0:426:5cee:4abc with SMTP id 5b1f17b1804b1-428e6b07cc9mr68917585e9.20.1722812663814; Sun, 04 Aug 2024 16:04:23 -0700 (PDT) Received: from ?IPV6:2a02:6b6f:e750:7600:c5:51ce:2b5:970b? ([2a02:6b6f:e750:7600:c5:51ce:2b5:970b]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4282b89aa93sm176367475e9.10.2024.08.04.16.04.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 04 Aug 2024 16:04:23 -0700 (PDT) Message-ID: <930e111a-b13b-43d9-93f8-5bf28f343074@gmail.com> Date: Mon, 5 Aug 2024 00:04:22 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/6] mm: split underutilized THPs To: David Hildenbrand , akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com References: <20240730125346.1580150-1-usamaarif642@gmail.com> <3cd1b07d-7b02-4d37-918a-5759b23291fb@gmail.com> <73b97a03-3742-472f-9a36-26ba9009d715@gmail.com> <95ed1631-ff62-4627-8dc6-332096e673b4@redhat.com> <01899bc3-1920-4ff2-a470-decd1c282e38@gmail.com> <4b9a9546-e97b-4210-979b-262d8cf37ba0@redhat.com> <64c3746a-7b44-4dd6-a51b-e5b90557a30a@gmail.com> <204af83b-57ec-40d0-98c0-038bfeb393a3@gmail.com> <58025293-c70f-4377-b8be-39994136af83@redhat.com> Content-Language: en-US From: Usama Arif In-Reply-To: <58025293-c70f-4377-b8be-39994136af83@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: p9wbqnp7mhfjb15tsb4kaqaoyguffiiq X-Rspam-User: X-Rspamd-Queue-Id: B661140016 X-Rspamd-Server: rspam02 X-HE-Tag: 1722812665-801629 X-HE-Meta: U2FsdGVkX1+atnaNFtTLfAq9p1QaKi35+iLAMeW1x55vdm4VDn2ttmXJ/u1QH+IrUZS1qoGGw2bLj8SgIiNAj1cQhYB3Ti6vR+hFej7sb6KIgdkdgcK5o7QPlUs2OKMcVw3esyh1/cktizPx2R54tgEwPK8D5XLsEx55vnTZ0R8V9+rfejD+Fl6NDlI7aktTUJKGlq/xf7mGOZfnJ6ICbSv023kXPwMZ9J6ytdrBJnPdTfrGf/yqzVdx32tlS59DYJORiakLJu5IgnCNhCyBNaC4/zkLU5t/Z3lOXkIYlkhCWh2ZUki/rsSkZ2MEHC99H2kgx3akPYlZBfjDSMlsSmZ37PfSsUbChO0ZKfBBPO+480m7CXmF1HYJ3rDFSC2KgYPWQA8sO6thbPOMwHcuKBm4cc1gsvvBA0mAGC1CbrRLqptH3irwAGZHQa67NgIzFrZyAAv35KwziGZTRJPSocWXaQOmuP1CHUHa18woVCz+Czb5MJUuKvPlkUPx4zvjILvdO9cQVgY2IOjM7NDicQJuArvbf6G53kq0YqXT9gHW6ubZvkdFfl4ULpLCxI7qcemSfqstZp2rYGG6ItsY0FyWKs/VN6IW5jNVOlHQ8ci0TfWOrz7o5oIA4SN4TiyKtuJWZ1HleNRujJq57ETUtkWSQdEfAs/xzFWPxc0GyrGLldQe3Uo5/D03hIiyytA0vQ4ttTlrbp7wYBbkN4U9TP5BF6GuLNukysqpNVLsTFIcxK5r2Y8J0x/XS39CM0gdSabVek64NMxNqPxi8Spe3bGNbEpRyCoasWVEvvoC7U3mAeCwMb4dhlY4v5SXbk9y+xRECzqqfOf5x0x+nBSIxSklgx3PUg69txX/lsI3hHrcR0SK/w4BIiM+HnGhu+PrtPFxLSYT3g/H67CwKVj0ZON3JbDxUCGhNy6dLIWL+QhUwwvnO8AFxtRrR3kRaDixxlWhYjTSIexoLak+TcP JH7TP5de 31hkMDMTSAluLCSeKn+LtOu7mOVUvsCenlctyMFmrPK2V+epISO3je++tonDDSaR2ZHhHpfg7tw73/dxuwgU6BHZGGNm6oAv27HOHVfXB5JOCgBKF9cjTaskctq4rQkly3oFuMytwsowNPfuYu6mhKBwRkZ2EnFbmBQ3G1XG4MEe83kdOlMocYLk5TYfjhLRM02Ql+s/4UXZQY6u5c9fP6ynThzEHXXcghoP9hw4hEOImxKkLjsSbqx2wwabnXxTxnAfCOqb1pBMgdK36o9fd0WMZF/qWYfh3iPG6aoi4BYCwbBZsfHq9M5ujWeZoBRieqepvU+VCdL113f9Fr9gJGKnR3j/Fde3CQu3kwlv3/j9f/dZRRgxFCmGz3TSstv1ZPuGUAXUuwxC5soNx+UcF/wOIynNPA8U2IJqykNB9CfWP99VPlV71wtlf5miYW64QgxkP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 01/08/2024 07:36, David Hildenbrand wrote: >>> I just added a bunch of quick printfs to QEMU and ran a precopy+postcopy live migration. Looks like my assumption was right: >>> >>> On the destination: >>> >>> Writing received pages during precopy # ram_load_precopy() >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Writing received pages during precopy >>> Disabling THP: MADV_NOHUGEPAGE # postcopy_ram_prepare_discard() >>> Discarding pages # loadvm_postcopy_ram_handle_discard() >>> Discarding pages >>> Discarding pages >>> Discarding pages >>> Discarding pages >>> Discarding pages >>> Discarding pages >>> Registering UFFD # postcopy_ram_incoming_setup() >>> >> >> Thanks for this, yes it makes sense after you mentioned postcopy_ram_incoming_setup. >> postcopy_ram_incoming_setup happens in the Listen phase, which is after the discard phase, so I was able to follow in code in qemu the same sequence of events that the above prints show. > > > I just added another printf to postcopy_ram_supported_by_host(), where we temporarily do a UFFDIO_REGISTER on some test area. > > Sensing UFFD support # postcopy_ram_supported_by_host() > Sensing UFFD support > Writing received pages during precopy # ram_load_precopy() > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Writing received pages during precopy > Disabling THP: MADV_NOHUGEPAGE # postcopy_ram_prepare_discard() > Discarding pages # loadvm_postcopy_ram_handle_discard() > Discarding pages > Discarding pages > Discarding pages > Discarding pages > Discarding pages > Discarding pages > Discarding pages > Discarding pages > Discarding pages > Discarding pages > Discarding pages > Discarding pages > Discarding pages > Discarding pages > Discarding pages > Registering UFFD # postcopy_ram_incoming_setup() > > We could think about using this "ever user uffd" to avoid the shared zeropage in most processes. > > Of course, there might be other applications where that wouldn't work, but I think this behavior (write to area before enabling uffd) might be fairly QEMU specific already. > > Avoiding the shared zeropage has the benefit that a later write fault won't have to do a TLB flush and can simply install a fresh anon page. > I checked CRIU and that does a check at the start as well before attempting to use uffd: https://github.com/checkpoint-restore/criu/blob/criu-dev/criu/kerndat.c#L1349 If writing to an area before enabling uffd is likely to be QEMU specific, then you make a good point to clear pte instead of using shared zeropage to avoid the TLB flush if uffd is ever used. I think "ever used uffd" would need to be tracked using mm_struct. This also won't cause an issue if the check is done in a parent process and the actual use is in a forked process, as copy_mm should take care of it. The possibilities would then be: 1) Have a new bit in mm->flags, set it in new_userfaultfd and test it in try_to_unmap_unused, but unfortunately all the bits in mm->flags are taken. 2) We could use mm->def_flags as it looks like there is an unused bit (0x800) just before VM_UFFD_WP. But that makes the code confusing as its used to initialize the default flags for VMAs and is not supposed to be used as a "mm flag". 3) Introducing mm->flags2 and set/test as 1. This would introduce a 8 byte overhead for all mm_structs. I am not sure either 2 or 3 are acceptable upstream, unless there is a need for more flags in the near future and the 8 byte overhead starts to make sense. Maybe we go with shared zeropage?