From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0909CC433DF for ; Wed, 22 Jul 2020 18:48:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A717B20737 for ; Wed, 22 Jul 2020 18:48:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A717B20737 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=canonical.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D22406B0002; Wed, 22 Jul 2020 14:48:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD0B26B0005; Wed, 22 Jul 2020 14:48:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE79F6B0006; Wed, 22 Jul 2020 14:48:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0103.hostedemail.com [216.40.44.103]) by kanga.kvack.org (Postfix) with ESMTP id A8F396B0002 for ; Wed, 22 Jul 2020 14:48:46 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3C00A180A6B3F for ; Wed, 22 Jul 2020 18:48:46 +0000 (UTC) X-FDA: 77066598252.12.value08_2c0086026f38 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 0E082195A9326 for ; Wed, 22 Jul 2020 18:48:46 +0000 (UTC) X-HE-Tag: value08_2c0086026f38 X-Filterd-Recvd-Size: 4983 Received: from youngberry.canonical.com (youngberry.canonical.com [91.189.89.112]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Wed, 22 Jul 2020 18:48:45 +0000 (UTC) Received: from mail-wm1-f72.google.com ([209.85.128.72]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jyJnI-0003lS-BZ for linux-mm@kvack.org; Wed, 22 Jul 2020 18:48:44 +0000 Received: by mail-wm1-f72.google.com with SMTP id v8so889896wma.6 for ; Wed, 22 Jul 2020 11:48:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Fj5tDWbuTCkQYGvkoKjLGsFFqP8S1vhifunocsfi/nc=; b=bVpVO+onozE+hPu4rRmmorT94Yul3XqDxNLNEHXelVNe7YShXvIeklXjyqmG/DqUqf CP9PgI0nwYiQcwaVV4kNNYl/+nQKGHQxJWbWspRVLDqhaLWBSCm2YRJNh4JkT6aYJLus wEn/e3UFMaRGojfN9HqXWcN4UZ/gc/Z9IzDiKWTxP2hp7ti99jH+2iyMJOiBOTRvMqvF U6ZKAbaYpL3pORmJqjP+VP5o4hM5+bSnJTGZRmq3BtvnaURxPDNx8+i99/7iwNZBvC6G 3a0rrO6/YqkoRcpo/J9euZb6TAtN3ncXFdZ7B3wxqVMtELHoIvk4XSbTNcm3pERD/8Cq 1pOg== X-Gm-Message-State: AOAM532XqddeDPkOdiZeXwCKLiCsMl+f2OR6sOXa1S89xTgT/Ew9QMic 2RlqCiP8aLhjh1oXJI5oLXWCZQ4fsiFGOI8thr8vDeEjTep7f+SNIpJuk39aArddo9M2ppvlAsp keIteTyNEnN8s2Uiw8ZJr7iLHE3Jp X-Received: by 2002:a5d:618e:: with SMTP id j14mr833273wru.374.1595443723933; Wed, 22 Jul 2020 11:48:43 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzytXJUXfcX9B33Jr71K2LtPXENOHJpSoG9eDG9+VXh+mjmyECJNZvvX84HgMJ04WXaI5eorg== X-Received: by 2002:a5d:618e:: with SMTP id j14mr833244wru.374.1595443723488; Wed, 22 Jul 2020 11:48:43 -0700 (PDT) Received: from localhost (host-87-11-131-192.retail.telecomitalia.it. [87.11.131.192]) by smtp.gmail.com with ESMTPSA id z63sm725561wmb.2.2020.07.22.11.48.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jul 2020 11:48:42 -0700 (PDT) Date: Wed, 22 Jul 2020 20:48:41 +0200 From: Andrea Righi To: Matthew Wilcox Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: swap: do not wait for lock_page() in unuse_pte_range() Message-ID: <20200722184841.GC841369@xps-13> References: <20200722174436.GB841369@xps-13> <20200722180425.GP15516@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200722180425.GP15516@casper.infradead.org> X-Rspamd-Queue-Id: 0E082195A9326 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 22, 2020 at 07:04:25PM +0100, Matthew Wilcox wrote: > On Wed, Jul 22, 2020 at 07:44:36PM +0200, Andrea Righi wrote: > > Waiting for lock_page() with mm->mmap_sem held in unuse_pte_range() can > > lead to stalls while running swapoff (i.e., not being able to ssh into > > the system, inability to execute simple commands like 'ps', etc.). > > > > Replace lock_page() with trylock_page() and release mm->mmap_sem if we > > fail to lock it, giving other tasks a chance to continue and prevent > > the stall. > > I think you've removed the warning at the expense of turning a stall > into a potential livelock. > > > @@ -1977,7 +1977,11 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > > return -ENOMEM; > > } > > > > - lock_page(page); > > + if (!trylock_page(page)) { > > + ret = -EAGAIN; > > + put_page(page); > > + goto out; > > + } > > If you look at the patterns we have elsewhere in the MM for doing > this kind of thing (eg truncate_inode_pages_range()), we iterate over the > entire range, take care of the easy cases, then go back and deal with the > hard cases later. > > So that would argue for skipping any page that we can't trylock, but > continue over at least the VMA, and quite possibly the entire MM until > we're convinced that we have unused all of the required pages. > > Another thing we could do is drop the MM semaphore _here_, sleep on this > page until it's unlocked, then go around again. > > if (!trylock_page(page)) { > mmap_read_unlock(mm); > lock_page(page); > unlock_page(page); > put_page(page); > ret = -EAGAIN; > goto out; > } > > (I haven't checked the call paths; maybe you can't do this because > sometimes it's called with the mmap sem held for write) > > Also, if we're trying to scale this better, there are some fun > workloads where readers block writers who block subsequent readers > and we shouldn't wait for I/O in swapin_readahead(). See patches like > 6b4c9f4469819a0c1a38a0a4541337e0f9bf6c11 for more on this kind of thing. Thanks for the review, Matthew. I'll see if I can find a better solution following your useful hints! -Andrea