From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07239C6FA83 for ; Tue, 6 Sep 2022 13:57:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9819E8028D; Tue, 6 Sep 2022 09:57:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 907D680224; Tue, 6 Sep 2022 09:57:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 733538028D; Tue, 6 Sep 2022 09:57:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 55AC280224 for ; Tue, 6 Sep 2022 09:57:44 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 2779680E61 for ; Tue, 6 Sep 2022 13:57:44 +0000 (UTC) X-FDA: 79881813648.04.91D949B Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf21.hostedemail.com (Postfix) with ESMTP id B67751C0082 for ; Tue, 6 Sep 2022 13:57:42 +0000 (UTC) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 286Dj5Gb027112; Tue, 6 Sep 2022 13:57:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : mime-version : content-type; s=pp1; bh=EjSUCynkpK2rj1hPgs3oxee0iwh/SYPXYSJGwalBnUI=; b=WXZcSz68eQGC7+3FqImksK34L4mv1xfj41sUXmVR/eRX6BOA4U1L5FaehevCJQ2SciRg 9qcLcUK/c5EW7S2fjF8Hu7BSSYBk3uorF/NEob/dWycwhX7CLADjlDIB+O/kJ9cGapYW fWoz9jUpk1O+TlDTsWM+Knu+cO4vW3+vTjR4+jrozyOzCqEtYUt74XHdH9jyei+kcS48 T/Ew+d8UpfV5MOU++DDL+eH6rGAKFoO+7BRJ59bt0Nzg8PYR2usUAUI0RWoGNibzOWDs PqIWCChloHWbAVuGBCzRPEitZ5vc1zvyCActJNPlC1tm+b8Ro3BOf9nx7yHsBFwCw0hB nA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3je7c70dm6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Sep 2022 13:57:34 +0000 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 286DlAnQ013470; Tue, 6 Sep 2022 13:57:33 GMT Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3je7c70djr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Sep 2022 13:57:33 +0000 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 286DogJb025031; Tue, 6 Sep 2022 13:57:30 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma05fra.de.ibm.com with ESMTP id 3jbxj8tu7s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 06 Sep 2022 13:57:30 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 286DvQa628049898 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 6 Sep 2022 13:57:26 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D2478A4051; Tue, 6 Sep 2022 13:57:26 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 20EB0A4040; Tue, 6 Sep 2022 13:57:26 +0000 (GMT) Received: from tuxmaker.linux.ibm.com (unknown [9.152.85.9]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Tue, 6 Sep 2022 13:57:26 +0000 (GMT) From: Sven Schnelle To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Muchun Song , Miaohe Lin , David Hildenbrand , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , linux-s390@vger.kernel.org, hca@linux.ibm.com, gor@linux.ibm.com, Alexander Gordeev Subject: Re: [PATCH 4/8] hugetlb: handle truncate racing with page faults References: <20220824175757.20590-1-mike.kravetz@oracle.com> <20220824175757.20590-5-mike.kravetz@oracle.com> Date: Tue, 06 Sep 2022 15:57:25 +0200 In-Reply-To: <20220824175757.20590-5-mike.kravetz@oracle.com> (Mike Kravetz's message of "Wed, 24 Aug 2022 10:57:53 -0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: drlOMToPnC3xAFT10lh6bW5aSlDSx33j X-Proofpoint-GUID: rPaqgr_t1NJYXw1Y3rS-QPHhwMbOHSqG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-09-06_07,2022-09-06_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 priorityscore=1501 mlxlogscore=999 clxscore=1011 mlxscore=0 phishscore=0 spamscore=0 lowpriorityscore=0 bulkscore=0 suspectscore=0 malwarescore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2209060065 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662472662; a=rsa-sha256; cv=none; b=Nhq5IuajbhUGN2sAzj7WbsgqsK7k8rUkjXsPOtTOqv4qR32143VRj2RQotWlurjtHKJUUV EXtV5fVaUbMQcZWfMZthD5yIkJ4+UWIu1A3e2kvwUCIMFWzbObH0Lc6P/9HM2O7ranCSQA yfyjFhA3ykQMuSXszbe5a+chE1Yk030= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=WXZcSz68; spf=pass (imf21.hostedemail.com: domain of svens@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=svens@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662472662; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EjSUCynkpK2rj1hPgs3oxee0iwh/SYPXYSJGwalBnUI=; b=M8gEeFWeLXp6YoZV/omGxauQA9VBPbcKlX7Kd1mG4B4a7g9PnNkigwQy0Pna34Uyxuy0mx 5fj/T81PtRnQC2yXTeBjdUjAnrq6CAvQ4tWADsbWQh4US9qfw7EG4UB7RiXyD9zxevC1D0 dxnxjLT/Fu++GbcorImaTpA4B4xPgIc= X-Stat-Signature: wshm8ijwhwbq8rq6b7wbk6qn68ab6pe9 X-Rspamd-Queue-Id: B67751C0082 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=WXZcSz68; spf=pass (imf21.hostedemail.com: domain of svens@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=svens@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1662472662-617510 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Mike, Mike Kravetz writes: > When page fault code needs to allocate and instantiate a new hugetlb > page (huegtlb_no_page), it checks early to determine if the fault is > beyond i_size. When discovered early, it is easy to abort the fault and > return an error. However, it becomes much more difficult to handle when > discovered later after allocating the page and consuming reservations > and adding to the page cache. Backing out changes in such instances > becomes difficult and error prone. > > Instead of trying to catch and backout all such races, use the hugetlb > fault mutex to handle truncate racing with page faults. The most > significant change is modification of the routine remove_inode_hugepages > such that it will take the fault mutex for EVERY index in the truncated > range (or hole in the case of hole punch). Since remove_inode_hugepages > is called in the truncate path after updating i_size, we can experience > races as follows. > - truncate code updates i_size and takes fault mutex before a racing > fault. After fault code takes mutex, it will notice fault beyond > i_size and abort early. > - fault code obtains mutex, and truncate updates i_size after early > checks in fault code. fault code will add page beyond i_size. > When truncate code takes mutex for page/index, it will remove the > page. > - truncate updates i_size, but fault code obtains mutex first. If > fault code sees updated i_size it will abort early. If fault code > does not see updated i_size, it will add page beyond i_size and > truncate code will remove page when it obtains fault mutex. > > Note, for performance reasons remove_inode_hugepages will still use > filemap_get_folios for bulk folio lookups. For indicies not returned in > the bulk lookup, it will need to lookup individual folios to check for > races with page fault. > > Signed-off-by: Mike Kravetz > --- > fs/hugetlbfs/inode.c | 184 +++++++++++++++++++++++++++++++------------ > mm/hugetlb.c | 41 +++++----- > 2 files changed, 152 insertions(+), 73 deletions(-) With linux next starting from next-20220831 i see hangs with this patch applied while running the glibc test suite. The patch doesn't revert cleanly on top, so i checked out one commit before that one and with that revision everything works. It looks like the malloc test suite in glibc triggers this. I cannot identify a single test causing it, but instead the combination of multiple tests. Running the test suite on a single CPU works. Given the subject of the patch that's likely not a surprise. This is on s390, and the warning i get from RCU is: [ 1951.906997] rcu: INFO: rcu_sched self-detected stall on CPU [ 1951.907009] rcu: 60-....: (6000 ticks this GP) idle=968c/1/0x4000000000000000 softirq=43971/43972 fqs=2765 [ 1951.907018] (t=6000 jiffies g=116125 q=1008072 ncpus=64) [ 1951.907024] CPU: 60 PID: 1236661 Comm: ld64.so.1 Not tainted 6.0.0-rc3-next-20220901 #340 [ 1951.907027] Hardware name: IBM 3906 M04 704 (z/VM 7.1.0) [ 1951.907029] Krnl PSW : 0704e00180000000 00000000003d9042 (hugetlb_fault_mutex_hash+0x2a/0xd8) [ 1951.907044] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [ 1951.907095] Call Trace: [ 1951.907098] [<00000000003d9042>] hugetlb_fault_mutex_hash+0x2a/0xd8 [ 1951.907101] ([<00000000005845a6>] fault_lock_inode_indicies+0x8e/0x128) [ 1951.907107] [<0000000000584876>] remove_inode_hugepages+0x236/0x280 [ 1951.907109] [<0000000000584a7c>] hugetlbfs_evict_inode+0x3c/0x60 [ 1951.907111] [<000000000044fe96>] evict+0xe6/0x1c0 [ 1951.907116] [<000000000044a608>] __dentry_kill+0x108/0x1e0 [ 1951.907119] [<000000000044ac64>] dentry_kill+0x6c/0x290 [ 1951.907121] [<000000000044afec>] dput+0x164/0x1c0 [ 1951.907123] [<000000000042a4d6>] __fput+0xee/0x290 [ 1951.907127] [<00000000001794a8>] task_work_run+0x88/0xe0 [ 1951.907133] [<00000000001f77a0>] exit_to_user_mode_prepare+0x1a0/0x1a8 [ 1951.907137] [<0000000000d0e42e>] __do_syscall+0x11e/0x200 [ 1951.907142] [<0000000000d1d392>] system_call+0x82/0xb0 [ 1951.907145] Last Breaking-Event-Address: [ 1951.907146] [<0000038001d839c0>] 0x38001d839c0 One of the hanging test cases is usually malloc/tst-malloc-too-large-malloc-hugetlb2. Any thoughts? Thanks, Sven