From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, FSL_HELO_FAKE,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F0B4C4CECE for ; Fri, 13 Mar 2020 21:14:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B8C152076A for ; Fri, 13 Mar 2020 21:14:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="U4VhhA+Q" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B8C152076A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 581A66B0006; Fri, 13 Mar 2020 17:14:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 50AAD6B0007; Fri, 13 Mar 2020 17:14:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D2E76B0008; Fri, 13 Mar 2020 17:14:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id 1FC9F6B0006 for ; Fri, 13 Mar 2020 17:14:21 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A2C95181AF5C1 for ; Fri, 13 Mar 2020 21:14:20 +0000 (UTC) X-FDA: 76591592280.12.magic11_8c6c2f45e254 X-HE-Tag: magic11_8c6c2f45e254 X-Filterd-Recvd-Size: 8341 Received: from mail-pj1-f66.google.com (mail-pj1-f66.google.com [209.85.216.66]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Fri, 13 Mar 2020 21:14:20 +0000 (UTC) Received: by mail-pj1-f66.google.com with SMTP id m15so4301638pje.3 for ; Fri, 13 Mar 2020 14:14:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=X1ClGnk418zQDtsXXpsu8O+PIFo4dMXlEJShNPrv2dI=; b=U4VhhA+QZbu3+NQokqoZa8RGZflC2HmLxqgDxwCNJBDLEDSRqRRiy17yDKdQLJSaN5 1GssZLOUMFsc0RlkKotjhppc2uQk7Z+1JXEKyjANSm09CN8kgVbn1G5i/pXnuwm1E2ua PpGVT6SQAbt91eDF6odjm3QmDDRKPlb1haKiD40FWAFILs52QXWAcda+am63Fh7kN6Zh SwiRVetpHMha6/pTnDf6IsYz8Zg9aD6TY2h3dcuX7rhr0IO9WvilxNiCiP/WyHMNNop3 pxmM+/3IYtB2+fkjmUR2VsJH+RfaP1FU1NzKLFDDnYM832FTo7+MCwmFeODe3+5p++um QbDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=X1ClGnk418zQDtsXXpsu8O+PIFo4dMXlEJShNPrv2dI=; b=H7ibBD5z1llt62fIFTWWRu8NwKaXNqQzJR3CCOcgZg91A9tr+RHaLd3Lw2ddSiXK80 MCArymY5ArKVLVwmPr1tocdcVx4xD1dgtVQFzgePQ/6SF4fjVGnjE3kULcXzEgDK4UMj vLhva/Q/J/QY48agVI3gJCqmyY2CPjbbQAOozqngWb8SwlnSVOMEKQG1TAbYhstErl6F RcZWr6rxUUrQFmkseiFMkvEpR18Od1o3ePY00iOE/2sIwkYUGwjJdPN6ICA/UZtFCo+8 ZhFNy/se/otLomGdpcKRMEf0ZchpokyQv/t7zuKvXUW78wAeJG6WZNLB8maPQeHz4z0m 0uFQ== X-Gm-Message-State: ANhLgQ0ed/PBAG1vO5txDWNChI1v9hf6yCQHfu1Hlc41XJpuIM7BwYjC Q1m3OjogTIidiXH8EclOa7g= X-Google-Smtp-Source: ADFU+vv0zyG8qwV8ObaD8BMIvVmgai2eQt7w1MjxMU0z2/bjrkuDVp5qQlPybLAB/JXiIml8i8/wqw== X-Received: by 2002:a17:902:562:: with SMTP id 89mr715680plf.249.1584134058966; Fri, 13 Mar 2020 14:14:18 -0700 (PDT) Received: from google.com ([2620:15c:211:1:3e01:2939:5992:52da]) by smtp.gmail.com with ESMTPSA id 5sm15300556pfw.98.2020.03.13.14.14.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Mar 2020 14:14:17 -0700 (PDT) Date: Fri, 13 Mar 2020 14:14:16 -0700 From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , Jan Kara , Matthew Wilcox , Josef Bacik , Johannes Weiner Subject: Re: [PATCH v2] mm: fix long time stall from mm_populate Message-ID: <20200313211416.GC78185@google.com> References: <20200303002638.206421-1-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200303002638.206421-1-minchan@kernel.org> User-Agent: Mutt/1.12.2 (2019-09-21) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Andrew, Any chance to take a look? On Mon, Mar 02, 2020 at 04:26:38PM -0800, Minchan Kim wrote: > Basically, fault handler releases mmap_sem before requesting readahead > and then it is supposed to retry lookup the page from page cache with > FAULT_FLAG_TRIED so that it avoids the live lock of infinite retry. > > However, what happens if the fault handler find a page from page > cache and the page has readahead marker but are waiting under > writeback? Plus one more condition, it happens under mm_populate > which repeats faulting unless it encounters error. So let's assemble > conditions below. > > CPU 1 CPU 2 > > - first loop > mm_populate > for () > .. > ret = populate_vma_page_range > __get_user_pages > faultin_page > handle_mm_fault > filemap_fault > do_async_mmap_readahead > if (PageReadahead(pageA)) > maybe_unlock_mmap_for_io > up_read(mmap_sem) > shrink_page_list > pageout > SetPageReclaim(=SetPageReadahead)(pageA) > writepage > SetPageWriteback(pageA) > > page_cache_async_readahead() > ClearPageReadahead(pageA) > do_async_mmap_readahead > lock_page_maybe_drop_mmap > goto out_retry > > the pageA is reclaimed > and new pageB is populated to the file offset > and finally has become PG_readahead > > - second loop > > __get_user_pages > faultin_page > handle_mm_fault > filemap_fault > do_async_mmap_readahead > if (PageReadahead(pageB)) > maybe_unlock_mmap_for_io > up_read(mmap_sem) > shrink_page_list > pageout > SetPageReclaim(=SetPageReadahead)(pageB) > writepage > SetPageWriteback(pageB) > > page_cache_async_readahead() > ClearPageReadahead(pageB) > do_async_mmap_readahead > lock_page_maybe_drop_mmap > goto out_retry > > It could be repeated forever so it's livelock. Without involving reclaim, > it could happens if ra_pages become zero by fadvise/other threads who > have same fd one doing randome while the other one is sequential > because page_cache_async_readahead has following condition check like > PageWriteback and ra_pages are never synchrnized with fadvise and > shrink_readahead_size_eio from other threads. > > page_cache_async_readahead(struct address_space *mapping, > unsigned long req_size) > { > /* no read-ahead */ > if (!ra->ra_pages) > return; > > Thus, we need to limit fault retry from mm_populate like page > fault handler. > > Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") > Reviewed-by: Jan Kara > Signed-off-by: Minchan Kim > --- > mm/gup.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/mm/gup.c b/mm/gup.c > index 1b521e0ac1de..6f6548c63ad5 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1133,7 +1133,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, > * > * This takes care of mlocking the pages too if VM_LOCKED is set. > * > - * return 0 on success, negative error code on error. > + * return number of pages pinned on success, negative error code on error. > * > * vma->vm_mm->mmap_sem must be held. > * > @@ -1196,6 +1196,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) > struct vm_area_struct *vma = NULL; > int locked = 0; > long ret = 0; > + bool tried = false; > > end = start + len; > > @@ -1226,14 +1227,18 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) > * double checks the vma flags, so that it won't mlock pages > * if the vma was already munlocked. > */ > - ret = populate_vma_page_range(vma, nstart, nend, &locked); > + ret = populate_vma_page_range(vma, nstart, nend, > + tried ? NULL : &locked); > if (ret < 0) { > if (ignore_errors) { > ret = 0; > continue; /* continue at next VMA */ > } > break; > - } > + } else if (ret == 0) > + tried = true; > + else > + tried = false; > nend = nstart + ret * PAGE_SIZE; > ret = 0; > } > -- > 2.25.0.265.gbab2e86ba0-goog >