From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07ED1C43461 for ; Wed, 9 Sep 2020 15:16:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B19602223D for ; Wed, 9 Sep 2020 15:16:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="Jt6eZsRd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727804AbgIIPQQ (ORCPT ); Wed, 9 Sep 2020 11:16:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42622 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729865AbgIIL5w (ORCPT ); Wed, 9 Sep 2020 07:57:52 -0400 Received: from mail-lj1-x244.google.com (mail-lj1-x244.google.com [IPv6:2a00:1450:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2DFBC06179F for ; Wed, 9 Sep 2020 04:57:43 -0700 (PDT) Received: by mail-lj1-x244.google.com with SMTP id v23so3186809ljd.1 for ; Wed, 09 Sep 2020 04:57:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=fZrqJCuzFIeGDFGe2HFXkbJZzJaJVwqjIS4lH3Qm+G8=; b=Jt6eZsRdBFyPKxzyVDuTg6Ppr6gCpbwhIOK358Bj4Rx9mVj6oG1GI0FhHSFsMHteIf YKmXt3zdcMgly/fXkgvyQaC49zW/BD4erA2QGTIT/qBQF9X2BDoeqFTbvivBbYnR/1NW U8sl07vi++uvCZdrveauCjCfnrvyurkzguNZVkkYIURUf4mGr24UgtzxD9x9va4n1g4T NUUpyMc2BwL8lbV02efX3PI6vRzKoZGYfmZpy8fumpIcz/Z/B8K3Ab5Kj7Z5B5Ky+XNg 218YBI1uAQW+s0ZUPhKM5MksbSFIPcNMpztf3Ou2vTyszGIQMWJjQJMYykkNy7IOMwqj PYNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=fZrqJCuzFIeGDFGe2HFXkbJZzJaJVwqjIS4lH3Qm+G8=; b=Wkp9hHmO4ekRPxcRzBaxW7k4A8tZ26B4iJB2ku1EX7OjTcOwuxtKbxHRQoCooP1AF8 FnunknzN++QrB2DgUOOg6/5CdVo9wDeZgjyXYYM1qV865GQS5hD0QGHHhUph/Oprvf7K 6azRs6APzBdHLFtXaYvzSVqnG3rd9RwtnMgS/VE5ecrSjj/PXucHSAZk37UJNhZdTpX+ QHxi9c8L8gEoJTKLklEui27EUjMitaNPVcsR9R3yFBs6jW1/zl8u8K0/Ul7SqLQEOTgO RTVP0U1sbLniU8bDFNhtlKD91SzFzNFmj5LK1/99zs4ioFi/sgnzPbCe1ZMSSaC+26FZ VyIA== X-Gm-Message-State: AOAM5300t7uf7AgiomVTW0kCFq8FyPAVY3VI3jFmDASDan62edSSDhyH QwtKtyUFB7qTsXzZHfhazeeodQ== X-Google-Smtp-Source: ABdhPJxJzexKjswpA3xRhq/Krsalubx7gZ3mg4368Y5XZat/QN1QRVQxFGy0Y4nYke9hkXms2rcGhQ== X-Received: by 2002:a05:651c:cb:: with SMTP id 11mr1779252ljr.2.1599652662324; Wed, 09 Sep 2020 04:57:42 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id e8sm685782lja.93.2020.09.09.04.57.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Sep 2020 04:57:41 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id A7E0E1036AE; Wed, 9 Sep 2020 14:57:46 +0300 (+03) Date: Wed, 9 Sep 2020 14:57:46 +0300 From: "Kirill A. Shutemov" To: Matthew Wilcox Cc: Christoph Hellwig , darrick.wong@oracle.com, david@fromorbit.com, yukuai3@huawei.com, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: Splitting an iomap_page Message-ID: <20200909115746.6xhfdizp3nnopcfd@box> References: <20200821144021.GV17456@casper.infradead.org> <20200904033724.GH14765@casper.infradead.org> <20200907113324.2uixo4u5elveoysf@box> <20200908114328.GE27537@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200908114328.GE27537@casper.infradead.org> Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Tue, Sep 08, 2020 at 12:43:28PM +0100, Matthew Wilcox wrote: > On Mon, Sep 07, 2020 at 02:33:24PM +0300, Kirill A. Shutemov wrote: > > On Fri, Sep 04, 2020 at 04:37:24AM +0100, Matthew Wilcox wrote: > > > Kirill, do I have the handling of split_huge_page() failure correct? > > > It seems reasonable to me -- unlock the page and drop the reference, > > > hoping that somebody else will not have a reference to the page by the > > > next time we try to split it. Or they will split it for us. There's a > > > livelock opportunity here, but I'm not sure it's worse than the one in > > > a holepunch scenario. > > > > The worst case scenario is when the page is referenced (directly or > > indirectly) by the caller. It this case we would end up with endless loop. > > I'm not sure how we can guarantee that this will never happen. > > I don't see a way for that to happen at the moment. We're pretty > careful not to take references on multiple pages at once in these paths. > I've fixed the truncate paths to only take one reference per THP too. > > I was thinking that if livelock becomes a problem, we could (ab)use the > THP destructor mechanism somewhat like this: > > Add > [TRANSHUGE_PAGE_SPLIT] = split_transhuge_page, > to the compound_page_dtors array. > > New function split_huge_page_wait() which, if !can_split_huge_page() > first checks if the dtor is already set to TRANSHUGE_PAGE_SPLIT. If so, > it returns to its caller, reporting failure (so that it will put its > reference to the page). Then it sets the dtor to TRANSHUGE_PAGE_SPLIT > and sets page refcount to 1. It goes to sleep on the page. The refcount has to be reduced by one, not set to one. Otherwise the page will get split while somebody holds a pin. But willnot work if two callsites use split_huge_page_wait(). Emm?.. > split_transhuge_page() first has to check if the refcount went to 0 > due to mapcount being reduced. If so, it resets the refcount to 1 and > returns to the caller. If not, it freezes the page and wakes the task > above which is sleeping in split_huge_page_wait(). What happens if there's still several GUP references to the page. We must not split the page in this case. Maybe I don't follow your idea. I donno. > It's complicated and I don't love it. But it might solve livelock, should > we need to do it. It wouldn't prevent us from an indefinite wait if the > caller of split_huge_page_wait() has more than one reference to this page. > That's better than a livelock though. -- Kirill A. Shutemov