From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8934EEAA4E for ; Thu, 14 Sep 2023 14:47:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF8F26B02BD; Thu, 14 Sep 2023 10:47:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA9556B02BE; Thu, 14 Sep 2023 10:47:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C713E6B02C0; Thu, 14 Sep 2023 10:47:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B6B426B02BD for ; Thu, 14 Sep 2023 10:47:53 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8022740E78 for ; Thu, 14 Sep 2023 14:47:53 +0000 (UTC) X-FDA: 81235482426.24.2329B30 Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) by imf04.hostedemail.com (Postfix) with ESMTP id 63BEA40002 for ; Thu, 14 Sep 2023 14:47:51 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=Chxv4dT5; spf=pass (imf04.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.179 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694702871; a=rsa-sha256; cv=none; b=lUnTHlWi5ZlKqZX2rXj4Ob6xH5ZwT1gDFZqb90r16RPqmYG6wYEMGzT1Zbn0DQBkKMvg+D oXwQ9T1G6opTb3MxYS85NQ7oQ6OYRdbpgeBVmIlTTpciFqA1brNBa+NBgkGecoym1mLnKF ymSN5ZWEGZmZdLz6lQNTNG1LJZJ/qcw= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=Chxv4dT5; spf=pass (imf04.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.179 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694702871; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mVH4DXY9VF/SCkIPvYlOVDAfZbiQJThhKSQDc5LVcNw=; b=LI9GntUnnBRZBj+THfTmALylheD7DGnaI+2F/ZRKVZmMukJUnxVr6qnBym3VMHbIxyMpIx W8KJX/VOh0HRDbradTpN9YQ8HRxrWnpAQFmtt1n0TKIs+l8xVIm8CH0AbZWnzBBaA7u6Ef 37ZdH64ujNG31LmcRqYvXk29X9jJpoQ= Received: by mail-qt1-f179.google.com with SMTP id d75a77b69052e-4135d72c75bso6338091cf.0 for ; Thu, 14 Sep 2023 07:47:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1694702870; x=1695307670; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=mVH4DXY9VF/SCkIPvYlOVDAfZbiQJThhKSQDc5LVcNw=; b=Chxv4dT5LJ+9v0aKCbfIVohvAe35HjR4SMhJfMZM5xe5VktTJQ4x0vg3ZWdZevfKko 8+soGiss49rEJMgCVlgrkMMhfeRoE5lHB7SxXzaeYm86IsEfc39C4+qYzHGlaJ2sR3Sl iF+WizsXw3HKapylauH5tudjRdAHRotdXU1V4C8ScghvbysWU7o9PotkvATAM6ZbsJ+W 6H1ORTBosVk3RhVw31ULoIayfFKk/twAREmcJe87VRfXdVZ9bJoq/IWTw/lyf2adNlXD gk5cgXzJwRX8n924XvAc7K25UAjPkHOEI5hx6kBI6nvb2ZUzG/U3I7yxQl04PxDSTlrG cNoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694702870; x=1695307670; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=mVH4DXY9VF/SCkIPvYlOVDAfZbiQJThhKSQDc5LVcNw=; b=prGKUCIU3g8WaGWHDU0x7QmSwBrMXYGs1U2c9l9nKZC6VlJs1RcxgaZ0QS8WvimmdC Q/7cQx9Z6FDtCHJrLdbKqmci9o1Ee2NEjeJhHdBMky2etDzoB5GexCUOdOIkhYkMdACh 4VufhDOIrMlP4H4olub+uhJvQvvQSab4AXlVGCtpv4agwvxsxZdGu1e7mEaJconI/Vi9 FkKlhb4roXxUW/fW9oa9msfs8V9mqGkMN/+GsxTDPOtWHmzyfP/+3QmOttLonitz5Wfn QyBScxL/uPLinUR5wkTfPnaGr1ZBKFCpVJ5Kcj+UzB0+AJXtmkZEaG1hCptMCB0iRFpv LZOg== X-Gm-Message-State: AOJu0Yw4B2NWShjYFMmZCUtLUfVm6H3l5pIJIOo6/f6N+ru1KHM1WTHf 7XyLivouqLRIJpGghZcYHk7iSg== X-Google-Smtp-Source: AGHT+IEt1zAFPp8vU2TdyP2OWeytKqjErJa3k/doeFoZjHyShNPhYbqgSpaDFyBCJJnOpwKe+tZScw== X-Received: by 2002:a05:622a:50f:b0:405:56bb:343b with SMTP id l15-20020a05622a050f00b0040556bb343bmr5702051qtx.41.1694702870337; Thu, 14 Sep 2023 07:47:50 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:35bb]) by smtp.gmail.com with ESMTPSA id m4-20020ac807c4000000b00410ab3c78efsm493569qth.11.2023.09.14.07.47.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 07:47:50 -0700 (PDT) Date: Thu, 14 Sep 2023 10:47:49 -0400 From: Johannes Weiner To: Vlastimil Babka Cc: Andrew Morton , Mel Gorman , Miaohe Lin , Kefeng Wang , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 5/6] mm: page_alloc: fix freelist movement during block conversion Message-ID: <20230914144749.GF48476@cmpxchg.org> References: <20230911195023.247694-1-hannes@cmpxchg.org> <20230911195023.247694-6-hannes@cmpxchg.org> <5911bf29-b2a0-9016-7071-68334e7d680d@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5911bf29-b2a0-9016-7071-68334e7d680d@suse.cz> X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 63BEA40002 X-Stat-Signature: 8z9buemhjc8jux5wkou9gdbsph9gj66q X-Rspam-User: X-HE-Tag: 1694702871-893057 X-HE-Meta: U2FsdGVkX18Yi/675hRM27BvDpoqrdbivLNjoIREmJA4jQ68S5RNRqLZ7PwPYPXaGZ70MAZl5F987PmLg1RZpa+6kMOQ34Fo4Sy6gNa28oRELjLVJ+pK5jQdfpUpEEyqS7DHxmEf7eyLKQ5AwZp+pUtl0HgnsFqc6q+KSIv28DokGeURkv2+cuIv+x018v/A33LUa4G+X0UxsgRH9hHXYV0VmbNeM6Z2jKcPiTd6xaEbNuLW5ioBcFBZ4AK2BiOX/5ZjvQLDHMXkfG3NfsQtgWmq25fsVxv2Sj4DRz5ApELsN0tkVstGmv0cuIyusGzvU+UxBqcDjB0atN7/IaLrmEk3/GFPFgzT28eD+ZP+/d6n8LceBpM6KbQxdgD6GKn81Z3aktuNcb9wznWkuf7NY56S/YzUhUCCgfk/VZh5Cuo8SMAj1iGz0XCVI2O+p/6h827+WS3yYWvKMde67uWXNXEY6KGiEZbH13J3i2nMQ8C3B88C5VMQFaYOcq1BolQ1edUo8F8kMt9M8TOevxfCaSCMsTqMtVhKSC5a0ezUD/mWS5bAW5cZ2RjU78IOtqbX1to5K3y7hQxi5r5EMhTNRwMiyASGE80Zk0oENwiZsnQuJ8yYbSan+iW78PksrdCs70mjaKDWkK/Ewkh5aLVffGLF5S03NX6SB7mjUQWrgQ01sUtB0G1tJyVNTllc8lduSabvyWSziOqOheAVYCK4RKDbhHlV/sx4JZaFWF5/ScCQ5GKMquZv4dr2MTjGApz79giCdYw9lVBxBnASh95Hbjxgxow/Tczv0RWeqF4Wrs8hFbxpqPDzh0IUwID4N+x3IWcWSwUqzlSg6MpJrdUKV1WSqwD2hHqAiIq/nW6VtRTL7lNu26K+U5UOeTIvlrcrXELVt6q7A8o74Eq5iyGkSqCxcXSrx4EoMSltYclgSHWv3uEzyVhBre8IdswcAkJTFxliuRaELMMIfZStRMZ CrLSUdad D+bfyPyQsRVpovboPHW/on9QUfXqOkVVNM67+DpxNrSNEg2FeFFgQUnxUFpJa7KZlU+1/ldFK2lt5KexYmQL8nes+GNLb3fr/oJIsD03SK3i/RROSNsSU8NWfScKHrUpZqCjBZ/vsiCkgTN1nfHoBkoOaa77Lie9pwbXUWSZzMd02l6MCSjUUT+fIBfxn9uuSmOryeTIy+CavOhpiQVawiHHLnGJvGk2FDOP8z2Y292ohUZKUqBLhh96SUF69CY076XVwPVgsRJPscmYRo1bP9E+TlTZh0CqWHavKxti1WL5f1SmsBVBXwrz6DKEffOK3CuXOM4DwIXWXcPhehIQnVT+38Fiin00Ovd3FS5NIQZuj0tdoqHBTuhVBHV+4SkZPEP8WuFQwzQQ4y80sy5/aumswIE8auUMgpQzoTgNcn6qY/R1JibuvLq3M9sm5seKeTc14G3GHUGGYLg+LQF1gMvOBJQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 13, 2023 at 09:52:17PM +0200, Vlastimil Babka wrote: > On 9/11/23 21:41, Johannes Weiner wrote: > > @@ -1638,26 +1629,62 @@ static int move_freepages(struct zone *zone, > > return pages_moved; > > } > > > > -int move_freepages_block(struct zone *zone, struct page *page, > > - int migratetype, int *num_movable) > > +static bool prep_move_freepages_block(struct zone *zone, struct page *page, > > + unsigned long *start_pfn, > > + unsigned long *end_pfn, > > + int *num_free, int *num_movable) > > { > > - unsigned long start_pfn, end_pfn, pfn; > > - > > - if (num_movable) > > - *num_movable = 0; > > + unsigned long pfn, start, end; > > > > pfn = page_to_pfn(page); > > - start_pfn = pageblock_start_pfn(pfn); > > - end_pfn = pageblock_end_pfn(pfn) - 1; > > + start = pageblock_start_pfn(pfn); > > + end = pageblock_end_pfn(pfn) - 1; > > > /* Do not cross zone boundaries */ > > - if (!zone_spans_pfn(zone, start_pfn)) > > - start_pfn = zone->zone_start_pfn; > > - if (!zone_spans_pfn(zone, end_pfn)) > > - return 0; > > + if (!zone_spans_pfn(zone, start)) > > + start = zone->zone_start_pfn; > > + if (!zone_spans_pfn(zone, end)) > > + return false; > > This brings me back to my previous suggestion - if we update the end, won't > the whole "block straddles >1 zones" situation to check for go away? > > Hm or is it actually done because we have a problem by representing > pageblock migratetype with multiple zones, since there's a single > pageblock_bitmap entry per the respective pageblock range of pfn's, so one > zone's migratetype could mess with other's? And now it matters if we want > 100% match of freelist vs pageblock migratetype? Yes, it's not safe to change a shared bitmap entry with only one of the two zones locked. So I think my range adjustment isn't a complete fix. It's okay for the case I was directly encountering, where DMA starts with pfn 1 and pfn 0 belongs to nobody. But if the block straddles two genuine zones, a race is possible. > (I think even before this series it could have mattered for > MIGRATETYPE_ISOLATE, is it broken in those corner cases?) Yes, I think this is buggy indeed. start_isolate_page_range() calls isolate_single_pageblock() on block boundaries. It actually does round up to the zone start if the pfn is below it, since b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") from Zi last year. But it will still set the migratetype on a straddling block. And I don't see any handling for the end of the block being in another zone. It won't move free pages due to the above, but it appears to set the isolate migratetype in an unlocked zone. Since nobody has complained about this, I wonder if blocks truly straddling two different zones isn't just rare but actually non-existent. The DMA and DMA32 boundaries should naturally align to multiples of the pageblock order, but there might be exceptions with ZONE_MOVABLE. Maybe somebody remembers situations where this occurs? > But in that case we might not be detecting the situation properly for the > later of the two zones in a pageblock, because if start_pfn is not spanned > we adjust it and continue? Hmm... I think what needs to happen is return false in both cases and reject operation on blocks whose pages are in two different zones. None of the callers expect it, and don't hold both zone locks that would be necessary to safely move pages and adjust the migratetype. This would fix the isolate race, as well as the freelist race that this series is trying to eliminate. It would mean that a straddling block can still be stolen from during fallback, but cannot be claimed entirely and will stay MOVABLE. It's not perfect, but certainly sounds a lot more reasonable than a double zone locking scheme for all callers.