From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FD15C2BD09 for ; Mon, 15 Jul 2024 06:48:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AEBA86B007B; Mon, 15 Jul 2024 02:48:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A9ABF6B0083; Mon, 15 Jul 2024 02:48:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9626E6B0085; Mon, 15 Jul 2024 02:48:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 782106B007B for ; Mon, 15 Jul 2024 02:48:31 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E498812143B for ; Mon, 15 Jul 2024 06:48:30 +0000 (UTC) X-FDA: 82341058380.21.3636912 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by imf23.hostedemail.com (Postfix) with ESMTP id 151EF140012 for ; Mon, 15 Jul 2024 06:48:28 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fqWkuPe6; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721026078; a=rsa-sha256; cv=none; b=hlRiIKSlLGGI6mmcqQgNoofCVwuX+HfnBDSX2sHl9sAKMcymsCyUDT7LGlVDBzEnRJzYIg ViJuhQRa4w2VQAJe23cl/yZCOpVbGoZZvxqaWhjx655Qos1V02nqE5uyqrn5IumOE06L6K R/Vc/Li7RXLTDjiYVAVapHBYpOpp8NM= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fqWkuPe6; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721026078; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2Oz8knHFwgcqwN4/UwgnsxKPZXuajSlpZ5Udb9VAOsE=; b=cSQMAHAebdnPtToobgpSJbOqOWeDySKvt0UoSRtRFVds/OJLIWTFL3EsYa1eLP8eC5uTcB YwX6unOJiiSMjhaaW6h8gb0zyWid810RQZCLMhGtoV1r7bdGMC3lQ1wF3LTNKzOdwoGuiB AvoIiyuOIOLbSpWE71kCRXn29mzRTRs= Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-a77c2d89af8so440855266b.2 for ; Sun, 14 Jul 2024 23:48:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721026107; x=1721630907; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2Oz8knHFwgcqwN4/UwgnsxKPZXuajSlpZ5Udb9VAOsE=; b=fqWkuPe6l9NUwOqvYVAE84wVQ+E8dOjtqHotX7xbdCd/0+eA+i6O1F/7qOwzJ9YbCl QPKbFx8IRuDmA6QDPJEj+U7BimuJMRPZxfjq0TAvcDcAL4Mg/2k2n6TVLP6w5a7phEBI kv4DXXkACP2cwgl7SlFNg703r4JHfy2Ar8kYD7+VffhchGE3tgKCLnQXMb538b6nC7lU cj8l81/F883qCEA52vxe0SLZPdgDJeIMBnhHi7BF9BlOvEJFrDOlQT25CQ1Z72otBi47 FS/YvibMXkopK7YTF/n2XtUTeuxv0/8+0EOqUJJ3AA13Ik2V4uW+9FFmiBb6ueB2KHkG s68g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721026107; x=1721630907; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2Oz8knHFwgcqwN4/UwgnsxKPZXuajSlpZ5Udb9VAOsE=; b=L/0nRj3Cy6rR/uel73iQ1JmbOqnBrszbrQoEH0hi4DDIz2NCBP3xguPbkxXFvMTQSV 6X3B9prwp+PVhLznQujSyOgim6d/CTTzx0AVP3YrCvf7WccpQkmrpl40NIHPF+9B3662 z3w9FVv2Jy6PMFoK17gFOpOkol96pZLmyCxw9M/wJutshTMq/lngmhzdCEPnuuHSFdrI QOX+cHNjoqUp9u/4KE9W7ch0yNE7+SayKbp+I3uNp4YYlZ6Ko1J75GDMOhGZkFsEKRp/ uP21wj/VHVlhU1uNWi8Sp2/quJgMhkmNdszrV/MO8o0ujWEz1Hh11griL2Xsm0z2IzJU rS4Q== X-Forwarded-Encrypted: i=1; AJvYcCXhDVxvy6kdEY3FSnMH4/WuuLHM+BA0WaxGTcJR2hTFclWJPe5yql8/F9DjDOSx2UeIT7s/CFbqMutpIWumIMygxiY= X-Gm-Message-State: AOJu0YwEEX7Cf0OMerHc3UK5DZLsw/udxOX1SwUwsluMRlosGyjxddDd gnHiVBixcrqg7x4dBziJtj9O7Ouwrudv46frhJ9poIa4b5s6pqRDPHymwKslqD3UA3djhj4qscW jcM0uzFn0Mzqud2+oD2mO5vt1AzA= X-Google-Smtp-Source: AGHT+IFtUK0GDN1IP407bIvUHWtj24fuvQ8qMVGYdvPU/HOxjiPGs3TvhUe4PRCPTzf2/o4lHI6B3VUjSvRU8L8wSMo= X-Received: by 2002:a17:907:9496:b0:a77:e2e3:3557 with SMTP id a640c23a62f3a-a780b884a19mr1490182766b.57.1721026107041; Sun, 14 Jul 2024 23:48:27 -0700 (PDT) MIME-Version: 1.0 References: <4307e984-a593-4495-b4cc-8ef509ddda03@amd.com> <56865e57-c250-44da-9713-cf1404595bcc@amd.com> In-Reply-To: <56865e57-c250-44da-9713-cf1404595bcc@amd.com> From: Mateusz Guzik Date: Mon, 15 Jul 2024 08:48:13 +0200 Message-ID: Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system To: Bharata B Rao Cc: Yu Zhao , david@fromorbit.com, kent.overstreet@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, nikunj@amd.com, "Upadhyay, Neeraj" , Andrew Morton , David Hildenbrand , willy@infradead.org, vbabka@suse.cz, kinseyho@google.com, Mel Gorman , linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 151EF140012 X-Stat-Signature: ew919dwqn6zy4wjnk8ihmhpton68xd4q X-Rspam-User: X-HE-Tag: 1721026108-78331 X-HE-Meta: U2FsdGVkX187ysHb7ZST9ea4qg4ukjpy2JsTf5YaCgaN9sOBpAmP5hcdMqbgbI9kCQaE+eywQCkJSvE1pEusGvEujYjZ/sizOHjac6Zy3m6Av3W88GmLFlrPIxQTnWYDV5TCYIpqKTDcpYtERaAYpZgOkAmA92jbVD6ZjHqQ2KfBi4cXEKnqef1idUBiKV5ZESmdQLaAi45Z1ZO7wy1Oq0+4cGHnqcj5udkeV52b8IItqB+bg2PMONkV7fe6mpQrwSti71dtTFQhUNkjNxpXJmGGGwqtT8p5h3YCPeq/3I8clspzzXOw3MHHhybL//Lizej0Z+JU+iiCbWzq4BOnkHucaNgu0MTjL7TnIe+DUNf/ZLvftiSuQCIOzXWx5Ryr3zoppSDAasGBjdq0Mpff05YqPUfmYYbkIwQ4pSsBav3TPCSbTgZBxpYDCFCR9G9IRXKEcHoE8b8AJEU0hWLsr7cbrzAB3Y6CIGiP97m1uRdV2XipvfsdiclNlXUaizwvZz68gtnE1VqGYMWzu4V+M+a3TuMah28SdlJi/8ZFasBMtuT5CLZGTZSgQdxEkDPdS3R/wulOSlVx2DjBQg0XXf4yH3BpS6g0MxyeyZfK49eYg5CIfY2pzDksOFc3UWr808M83fGBqW8Z2hVf69RoLmOpRic/9AOzUHSkO34s3Ot8ZIVN+WSkZNlE6jOUzZoGwWEhmR3l84thfYwtpFTAwf8wgjH9IoPTK2fcoHlaSalzAMDUytnbGwqMJyXk/6xzbgOW/2nccwAzpfEHE95NhB0DusERMxNYimVfenSTuxjh9OGomjzM7xYWml79Wy+RC1Xo7gifRXiuZSV55iAU6Na6zwJDuzPnV7iC7VjpRIQGF21cznJw4jnm6YL4p4sxfziKsNTAYxvF/uqyS1IOPNRZf2ek8lRzvX4CdEl18iuTiFm4gez/p03/74hW0g74kYFrPZnKon8h02eMQSH MzIS3eqJ Wh6JVRrYZ+5VJW4np3qIK9YT/MhP33sGBAFneLF60DFk9DzZo4U4j2DZmI9u16VUiJoL2brJwFN89XN8FmbTn0Y+7QSWbO6TF4rnIU1U7LMBe54AnRWgiEuyF2bcRFkjjHKKwvqDZmni7wtvgnF2dHHWEPlaUTNxGeZB8teMZi6TfgGjwGfoiwvoGeBAd5cAZNx9G8QD1nWHdzsQBDMr3gPhg2PHUaUtvxcpZTQ3T4VMdYeEwt5tazR+eP33mJPcGS3nEY6dxdSt469kIUjiHt1cAfN58ZgP1nyf6UreOkFRGVQpXrFmjUK9tw4Zqa50ObysGNqUakyU70XX8UVamSkI98s5QjLmStWj0PWKPHVz6/qDsaH3KjO0amw4iGJilTK4lyRTRWSbSQJbC+xPqjhdaSnTKvetdYMCZ75NGEFLMAAY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jul 15, 2024 at 7:22=E2=80=AFAM Bharata B Rao wro= te: > > On 10-Jul-24 6:34 PM, Mateusz Guzik wrote: > >>> However the contention now has shifted to inode_hash_lock. Around 55 > >>> softlockups in ilookup() were observed: > >>> > >>> # tracer: preemptirqsoff > >>> # > >>> # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-trnmglru > >>> # -------------------------------------------------------------------= - > >>> # latency: 10620430 us, #4/4, CPU#260 | (M:desktop VP:0, KP:0, SP:0 H= P:0 > >>> #P:512) > >>> # ----------------- > >>> # | task: fio-3244715 (uid:0 nice:0 policy:0 rt_prio:0) > >>> # ----------------- > >>> # =3D> started at: ilookup > >>> # =3D> ended at: ilookup > >>> # > >>> # > >>> # _------=3D> CPU# > >>> # / _-----=3D> irqs-off/BH-disabled > >>> # | / _----=3D> need-resched > >>> # || / _---=3D> hardirq/softirq > >>> # ||| / _--=3D> preempt-depth > >>> # |||| / _-=3D> migrate-disable > >>> # ||||| / delay > >>> # cmd pid |||||| time | caller > >>> # \ / |||||| \ | / > >>> fio-3244715 260...1. 0us$: _raw_spin_lock <-ilookup > >>> fio-3244715 260.N.1. 10620429us : _raw_spin_unlock <-ilookup > >>> fio-3244715 260.N.1. 10620430us : tracer_preempt_on <-ilookup > >>> fio-3244715 260.N.1. 10620440us : > >>> =3D> _raw_spin_unlock > >>> =3D> ilookup > >>> =3D> blkdev_get_no_open > >>> =3D> blkdev_open > >>> =3D> do_dentry_open > >>> =3D> vfs_open > >>> =3D> path_openat > >>> =3D> do_filp_open > >>> =3D> do_sys_openat2 > >>> =3D> __x64_sys_openat > >>> =3D> x64_sys_call > >>> =3D> do_syscall_64 > >>> =3D> entry_SYSCALL_64_after_hwframe > >>> > >>> It appears that scalability issues with inode_hash_lock has been brou= ght > >>> up multiple times in the past and there were patches to address the s= ame. > >>> > >>> https://lore.kernel.org/all/20231206060629.2827226-9-david@fromorbit.= com/ > >>> https://lore.kernel.org/lkml/20240611173824.535995-2-mjguzik@gmail.co= m/ > >>> > >>> CC'ing FS folks/list for awareness/comments. > >> > >> Note my patch does not enable RCU usage in ilookup, but this can be > >> trivially added. > >> > >> I can't even compile-test at the moment, but the diff below should do > >> it. Also note the patches are present here > >> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=3Dv= fs.inode.rcu > >> , not yet integrated anywhere. > >> > >> That said, if fio you are operating on the same target inode every > >> time then this is merely going to shift contention to the inode > >> spinlock usage in find_inode_fast. > >> > >> diff --git a/fs/inode.c b/fs/inode.c > >> index ad7844ca92f9..70b0e6383341 100644 > >> --- a/fs/inode.c > >> +++ b/fs/inode.c > >> @@ -1524,10 +1524,14 @@ struct inode *ilookup(struct super_block *sb, > >> unsigned long ino) > >> { > >> struct hlist_head *head =3D inode_hashtable + hash(sb, ino); > >> struct inode *inode; > >> + > >> again: > >> - spin_lock(&inode_hash_lock); > >> - inode =3D find_inode_fast(sb, head, ino, true); > >> - spin_unlock(&inode_hash_lock); > >> + inode =3D find_inode_fast(sb, head, ino, false); > >> + if (IS_ERR_OR_NULL_PTR(inode)) { > >> + spin_lock(&inode_hash_lock); > >> + inode =3D find_inode_fast(sb, head, ino, true); > >> + spin_unlock(&inode_hash_lock); > >> + } > >> > >> if (inode) { > >> if (IS_ERR(inode)) > >> > > > > I think I expressed myself poorly, so here is take two: > > 1. inode hash soft lookup should get resolved if you apply > > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h= =3Dvfs.inode.rcu&id=3D7180f8d91fcbf252de572d9ffacc945effed0060 > > and the above pasted fix (not compile tested tho, but it should be > > obvious what the intended fix looks like) > > 2. find_inode_hash spinlocks the target inode. if your bench only > > operates on one, then contention is going to shift there and you may > > still be getting soft lockups. not taking the spinlock in this > > codepath is hackable, but I don't want to do it without a good > > justification. > > Thanks Mateusz for the fix. With this patch applied, the above mentioned > contention in ilookup() has not been observed for a test run during the > weekend. > Ok, I'll do some clean ups and send a proper patch to the vfs folks later t= oday. Thanks for testing. --=20 Mateusz Guzik