From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D811C433F5 for ; Tue, 16 Nov 2021 08:58:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3927060235 for ; Tue, 16 Nov 2021 08:58:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3927060235 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B16F26B0089; Tue, 16 Nov 2021 03:58:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AC7556B008A; Tue, 16 Nov 2021 03:58:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98E8F6B008C; Tue, 16 Nov 2021 03:58:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0076.hostedemail.com [216.40.44.76]) by kanga.kvack.org (Postfix) with ESMTP id 886426B0089 for ; Tue, 16 Nov 2021 03:58:46 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3D2AB800A8 for ; Tue, 16 Nov 2021 08:58:46 +0000 (UTC) X-FDA: 78814192842.19.D1D586B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 76E5550000B7 for ; Tue, 16 Nov 2021 08:58:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637053107; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4284vAzYHLAxGcEEtHPUG4T3Bgo3Ms3nj1In/yM7t9U=; b=HQnI9C4QdeV7WtNR60I7tLJWldrRtiKsnVFzwB3nZMSFZTu+BetMUtBoZS1Ej0PZLdncX2 UrZ911S7irsXDRNbAv8kwHkcy8wlkE5etJxfhe/MIPGNQ7nweovfdSPXY5QfDH4GIgr3MR IXBeCTcB+LNNtRpSlm5shGM2ohucueA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1637053119; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4284vAzYHLAxGcEEtHPUG4T3Bgo3Ms3nj1In/yM7t9U=; b=BZo6Ngta1naUyzo8aOw+OGO3nj4rw83sHhye15adotMInvj1zHQUeqELv1shcUWNN2R4Kz QfaEmjA9eXwW3EgD+rfTsz/z04hSYJ4Eu5mzyn5FvXCzRc1Wi3EYi7RxVmBYkViCcp5Ae1 6eCbWiPMOGUbs5VnSF1tjh/KF8UK+ho= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-574-FzyKJexNMoqdaTen-xd20A-1; Tue, 16 Nov 2021 03:58:26 -0500 X-MC-Unique: FzyKJexNMoqdaTen-xd20A-1 Received: by mail-wm1-f71.google.com with SMTP id b133-20020a1c808b000000b0032cdd691994so905590wmd.1 for ; Tue, 16 Nov 2021 00:58:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=4284vAzYHLAxGcEEtHPUG4T3Bgo3Ms3nj1In/yM7t9U=; b=xTuMpo4sk6EvYCWLWSGgijvjqYKogM48uPMJm5LxeWbMKjsjC73I0yvb0gpUOq9OEa Zh6xz56PlpLug0oO0uzoqM6tFWfaObWR2/kbrnmLR1RjRFxtv7RhU/t6LWjW77OO6akR HVzEiDPh8eGz4ZY0HWZ8kTm+CrT1K5XUYbPNlH8iTtWfBcS2ITnawmsseOl3bB56JAoQ OWEHR+Ieg4Xw7FrUkhWRel+jUi+k/MIxRHTPQHL1JuFw93GFo0sWCLwcjS1zvz2YQa16 am995G1UpEKbflEk0umVh4/iPNyIMBEktk/kPbNSK03jO42iZDCaRXCBkKx6qW+fS7iw 6CQg== X-Gm-Message-State: AOAM5325JW4SCx/AzuxAplg7Ns1ZrnFCNeDkuLtMi+bjpdPytfrblUhz yZPEmSKbSVZg1/J4Hk/pYvWdQs2T/XqMhbg142AeMTmTwskyZ0M5229X3JEpWxKdpvp4ZH1L4Jc TmebksWMDHEY= X-Received: by 2002:adf:e8c8:: with SMTP id k8mr7111271wrn.135.1637053105610; Tue, 16 Nov 2021 00:58:25 -0800 (PST) X-Google-Smtp-Source: ABdhPJwNnhYnmoiEKmhWbgITaAMbH06RYzALBRVXN/xZ5keEiGtxNnsHeJdzCkwXHAfqWjFKt7cxCw== X-Received: by 2002:adf:e8c8:: with SMTP id k8mr7111233wrn.135.1637053105277; Tue, 16 Nov 2021 00:58:25 -0800 (PST) Received: from [192.168.3.132] (p4ff23d3a.dip0.t-ipconnect.de. [79.242.61.58]) by smtp.gmail.com with ESMTPSA id r17sm2015005wmq.11.2021.11.16.00.58.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 16 Nov 2021 00:58:24 -0800 (PST) Message-ID: Date: Tue, 16 Nov 2021 09:58:24 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 To: Zi Yan , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Michael Ellerman , Christoph Hellwig , Marek Szyprowski , Robin Murphy , linuxppc-dev@lists.ozlabs.org, virtualization@lists.linux-foundation.org, iommu@lists.linux-foundation.org References: <20211115193725.737539-1-zi.yan@sent.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment. In-Reply-To: <20211115193725.737539-1-zi.yan@sent.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 76E5550000B7 X-Stat-Signature: 3bab7759kmj9osa1rjwjfjzrg1944wyj Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HQnI9C4Q; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BZo6Ngta; spf=none (imf04.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1637053108-666921 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 15.11.21 20:37, Zi Yan wrote: > From: Zi Yan > > Hi David, Hi, thanks for looking into this. > > You suggested to make alloc_contig_range() deal with pageblock_order instead of > MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This > patchset is my attempt to achieve that. Please take a look and let me know if > I am doing it correctly or not. > > From what my understanding, cma required alignment of > max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced, > __free_one_page() does not prevent merging two different pageblocks, when > MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation > does prevent that. It should be OK to just align cma to pageblock_order. > alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use > pageblock_order as alignment too. I wonder if that's sufficient. Especially the outer_start logic in alloc_contig_range() might be problematic. There are some ugly corner cases with free pages/allocations spanning multiple pageblocks and we only isolated a single pageblock. Regarding CMA, we have to keep the following cases working: a) Different pageblock types (MIGRATE_CMA and !MIGRATE_CMA) in MAX_ORDER - 1 page: [ MAX_ ORDER - 1 ] [ pageblock 0 | pageblock 1] Assume either pageblock 0 is MIGRATE_CMA or pageblock 1 is MIGRATE_CMA, but not both. We have to make sure alloc_contig_range() keeps working correctly. This should be the case even with your change, as we won't merging pages accross differing migratetypes. b) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated: [ MAX_ ORDER - 1 ] [ pageblock 0 | pageblock 1] Assume both are MIGRATE_CMA. Assume we want to either allocate from pageblock 0 or pageblock 1. Especially, assume we want to allocate from pageblock 1. While we would isolate pageblock 1, we wouldn't isolate pageblock 0. What happens if we either have a free page spanning the MAX_ORDER - 1 range already OR if we have to migrate a MAX_ORDER - 1 page, resulting in a free MAX_ORDER - 1 page of which only the second pageblock is isolated? We would end up essentially freeing a page that has mixed pageblocks, essentially placing it in !MIGRATE_ISOLATE free lists ... I might be wrong but I have the feeling that this would be problematic. c) Concurrent allocations: [ MAX_ ORDER - 1 ] [ pageblock 0 | pageblock 1] Assume b) but we have two concurrent CMA allocations to pageblock 0 and pageblock 1, which would now be possible as start_isolate_page_range() isolate would succeed on both. Regarding virtio-mem, we care about the following cases: a) Allocating parts from completely movable MAX_ ORDER - 1 page: [ MAX_ ORDER - 1 ] [ pageblock 0 | pageblock 1] Assume pageblock 0 and pageblock 1 are either free or contain only movable pages. Assume we allocated pageblock 0. We have to make sure we can allocate pageblock 1. The other way around, assume we allocated pageblock 1, we have to make sure we can allocate pageblock 0. Free pages spanning both pageblocks might be problematic. b) Allocate parts of partially movable MAX_ ORDER - 1 page: [ MAX_ ORDER - 1 ] [ pageblock 0 | pageblock 1] Assume pageblock 0 contains unmovable data but pageblock 1 not: we have to make sure we can allocate pageblock 1. Similarly, assume pageblock 1 contains unmovable data but pageblock 0 no: we have to make sure we can allocate pageblock 1. has_unmovable_pages() might allow for that. But, we want to fail early in case we want to allocate a single pageblock but it contains unmovable data. This could be either directly or indirectly. If we have an unmovable (compound) MAX_ ORDER - 1 and we'd try isolating pageblock 1, has_unmovable_pages() would always return "false" because we'd simply be skiping over any tail pages, and not detect the un-movability. c) Migrating/freeing a MAX_ ORDER - 1 page while partially isolated: Same concern as for CMA b) So the biggest concern I have is dealing with migrating/freeing > pageblock_order pages while only having isolated a single pageblock. > > In terms of virtio_mem, if I understand correctly, it relies on > alloc_contig_range() to obtain contiguous free pages and offlines them to reduce > guest memory size. As the result of alloc_contig_range() alignment change, > virtio_mem should be able to just align PFNs to pageblock_order. For virtio-mem it will most probably be desirable to first try allocating the MAX_ORDER -1 range if possible and then fallback to pageblock_order. But that's an additional change on top in virtio-mem code. My take to teach alloc_contig_range() to properly handle would be the following: a) Convert MIGRATE_ISOLATE into a separate pageblock flag We would want to convert MIGRATE_ISOLATE into a separate pageblock flags, such that when we isolate a page block we preserve the original migratetype. While start_isolate_page_range() would set that bit, undo_isolate_page_range() would simply clear that bit. The buddy would use a single MIGRATE_ISOLATE queue as is: the original migratetype is only used for restoring the correct migratetype. This would allow for restoring e.g., MIGRATE_UNMOVABLE after isolating an unmovable pageblock (below) and not simply setting all such pageblocks to MIGRATE_MOVABLE when un-isolating. Ideally, we'd get rid of the "migratetype" parameter for alloc_contig_range(). However, even with the above change we have to make sure that memory offlining and ordinary alloc_contig_range() users will fail on MIGRATE_CMA pageblocks (has_unmovable_page() checks that as of today). We could achieve that differently, though (e.g., bool cma_alloc parameter instead). b) Allow isolating pageblocks with unmovable pages We'd pass the actual range of interest to start_isolate_page_range() and rework the code to check has_unmovable_pages() only on the range of interest, but considering overlapping larger allocations. E.g., if we stumble over a compound page, lookup the head an test if that page is movable/unmovable. c) Change alloc_contig_range() to not "extend" the range of interest to include pageblock of different type. Assume we're isolating a MIGRATE_CMA pageblock, only isolate a neighboring MIGRATE_CMA pageblock, not other pageblocks. So we'd keep isolating complete MAX_ORDER - 1 pages unless c) prevents it. We'd allow isolating even pageblocks that contain unmovable pages on ZONE_NORMAL, and check via has_unmovable_pages() only if the range of interest contains unmovable pages, not the whole MAX_ORDER -1 range or even the whole pageblock. We'd not silently overwrite the pageblock type when restoring but instead restore the old migratetype. -- Thanks, David / dhildenb