From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AB8JxZq68dOThVtdiUjsHn8ei5XVVwuvBIR0bCSQPnkhamTGvEM/aXDv45yvOEhp+JUSsKJg/KLH ARC-Seal: i=1; a=rsa-sha256; t=1524652910; cv=none; d=google.com; s=arc-20160816; b=gLk8D3VJHo/Izb4IcLMA9WivnXMgH8SBkh3eNlpmRiQ49y3UP/xJ8dQnqbEg8KNcKQ WbVsKo1lxExUmPwVpuEpClTPND0x/fXCrV/7wR/Qk7UDcF86DT7VLpIRVtK/cY+wwPLA rtI4PUpVgEcDFVneCiIrlimNlFuPhpURZ1i8N9yO39RwDkaK7fweywr0mg+ucwXlv5x9 WCUDNNX6NswsSIsmzAu5DNDIIv+7dUC1dx2+uKK6W6YaTne8OJ/nC3qXrGXBcJUnKcQC gFD4+cU7kbMCQEjDExFBngWOQ6B0it1LrK6MYj5mHOavTK4yyyKvHQPJyjYvQW+4nd0F 5iPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=STe/mNj8upDQQnsTet50UD+jXOoBKRy6E4fqRZqR0FU=; b=jxA4DRoXvJWKnmADy8pDvdee97YrV1+JUq/Q2CMsw63Fhf1xmWXVTkolBS9Lm8ryup P5yZXSkOpWcJsJfoeaivP908+oxnKpxRCGE+wRG+c/7B6K7GFoUj0Zg3s3Q5y3dbbLVM UncMb5zpLDuOsEeFrh5ubQ+GfhougoZs4tuwdCu6qAbtU3aBpVuTV50jd2bW0XXvJIfp yqUl+5098QwK5GV4lhmiw4KHEnO5CA661LXfmAuAg7oGJtyjvrGZdgRB11Q2RjJVlW18 Yj/2qCGPfanrAv50mCtN0B9E633v7H/bFdajXD3y1hc5eGcHgfjZfp3+OcOSV9HuM9ri 3oDQ== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning gregkh@linuxfoundation.org does not designate 90.92.61.202 as permitted sender) smtp.mailfrom=gregkh@linuxfoundation.org Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning gregkh@linuxfoundation.org does not designate 90.92.61.202 as permitted sender) smtp.mailfrom=gregkh@linuxfoundation.org From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Yang Shi , "Kirill A. Shutemov" , Michal Hocko , Hugh Dickins , Andrea Arcangeli , Arnd Bergmann , Andrew Morton , Linus Torvalds , Sasha Levin Subject: [PATCH 4.14 115/183] mm: thp: use down_read_trylock() in khugepaged to avoid long block Date: Wed, 25 Apr 2018 12:35:35 +0200 Message-Id: <20180425103247.063788351@linuxfoundation.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180425103242.532713678@linuxfoundation.org> References: <20180425103242.532713678@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-LABELS: =?utf-8?b?IlxcU2VudCI=?= X-GMAIL-THRID: =?utf-8?q?1598714450190565247?= X-GMAIL-MSGID: =?utf-8?q?1598714450190565247?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Yang Shi [ Upstream commit 3b454ad35043dfbd3b5d2bb92b0991d6342afb44 ] In the current design, khugepaged needs to acquire mmap_sem before scanning an mm. But in some corner cases, khugepaged may scan a process which is modifying its memory mapping, so khugepaged blocks in uninterruptible state. But the process might hold the mmap_sem for a long time when modifying a huge memory space and it may trigger the below khugepaged hung issue: INFO: task khugepaged:270 blocked for more than 120 seconds. Tainted: G E 4.9.65-006.ali3000.alios7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. khugepaged D 0 270 2 0x00000000  ffff883f3deae4c0 0000000000000000 ffff883f610596c0 ffff883f7d359440 ffff883f63818000 ffffc90019adfc78 ffffffff817079a5 d67e5aa8c1860a64 0000000000000246 ffff883f7d359440 ffffc90019adfc88 ffff883f610596c0 Call Trace: schedule+0x36/0x80 rwsem_down_read_failed+0xf0/0x150 call_rwsem_down_read_failed+0x18/0x30 down_read+0x20/0x40 khugepaged+0x476/0x11d0 kthread+0xe6/0x100 ret_from_fork+0x25/0x30 So it sounds pointless to just block khugepaged waiting for the semaphore so replace down_read() with down_read_trylock() to move to scan the next mm quickly instead of just blocking on the semaphore so that other processes can get more chances to install THP. Then khugepaged can come back to scan the skipped mm when it has finished the current round full_scan. And it appears that the change can improve khugepaged efficiency a little bit. Below is the test result when running LTP on a 24 cores 4GB memory 2 nodes NUMA VM: pristine w/ trylock full_scan 197 187 pages_collapsed 21 26 thp_fault_alloc 40818 44466 thp_fault_fallback 18413 16679 thp_collapse_alloc 21 150 thp_collapse_alloc_failed 14 16 thp_file_alloc 369 369 [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: tweak comment] [arnd@arndb.de: avoid uninitialized variable use] Link: http://lkml.kernel.org/r/20171215125129.2948634-1-arnd@arndb.de Link: http://lkml.kernel.org/r/1513281203-54878-1-git-send-email-yang.s@alibaba-inc.com Signed-off-by: Yang Shi Acked-by: Kirill A. Shutemov Acked-by: Michal Hocko Cc: Hugh Dickins Cc: Andrea Arcangeli Signed-off-by: Arnd Bergmann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- mm/khugepaged.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1679,10 +1679,14 @@ static unsigned int khugepaged_scan_mm_s spin_unlock(&khugepaged_mm_lock); mm = mm_slot->mm; - down_read(&mm->mmap_sem); - if (unlikely(khugepaged_test_exit(mm))) - vma = NULL; - else + /* + * Don't wait for semaphore (to avoid long wait times). Just move to + * the next mm on the list. + */ + vma = NULL; + if (unlikely(!down_read_trylock(&mm->mmap_sem))) + goto breakouterloop_mmap_sem; + if (likely(!khugepaged_test_exit(mm))) vma = find_vma(mm, khugepaged_scan.address); progress++;