From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EA86EB64DA for ; Wed, 5 Jul 2023 22:37:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C60118D0002; Wed, 5 Jul 2023 18:37:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C36E68D0001; Wed, 5 Jul 2023 18:37:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFF328D0002; Wed, 5 Jul 2023 18:37:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A25B68D0001 for ; Wed, 5 Jul 2023 18:37:42 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 76E4980938 for ; Wed, 5 Jul 2023 22:37:42 +0000 (UTC) X-FDA: 80979021564.17.91DEB66 Received: from mail.itouring.de (mail.itouring.de [85.10.202.141]) by imf17.hostedemail.com (Postfix) with ESMTP id A2EAA4001E for ; Wed, 5 Jul 2023 22:37:39 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of holger@applied-asynchrony.com designates 85.10.202.141 as permitted sender) smtp.mailfrom=holger@applied-asynchrony.com; dmarc=pass (policy=none) header.from=applied-asynchrony.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688596660; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+Q+Qb6JfKpxc6hbBrfaUahtCkBX3yBkMqr/3KtjLh9g=; b=2N/v8HMq39Cau8b8R8pVl2ZfjFssVUX+qAyjB+REkeTgmqF584qaXnKK5E56swIm5Iq7Rl qzAicwPzGkbkMAdyEVz86av4vnCGwvImS7zQGYWl9vmH9g9wTAteJVz6AIYIv8cj+vt/oq 8C0mR3lrEZTtgRgWSzKNgOG1zWLFdWo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688596660; a=rsa-sha256; cv=none; b=1rLgghMeEjQBgZ3KW1rNaxMTO2HgR0l4IVkGR9+pcRTgnmUa/VtbvjCFYwWY9QLEaeEb59 X9Z3RlSJ6q2qK7zxyFt4OrXeK8mUsrFTyUdQHf7XcL2P6qFBwHsdLXPItQAvQdnNWviqLL yKKFojebwYm99ZkGYAacjnWpWDv1QoY= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of holger@applied-asynchrony.com designates 85.10.202.141 as permitted sender) smtp.mailfrom=holger@applied-asynchrony.com; dmarc=pass (policy=none) header.from=applied-asynchrony.com Received: from tux.applied-asynchrony.com (p5ddd7b2c.dip0.t-ipconnect.de [93.221.123.44]) by mail.itouring.de (Postfix) with ESMTPSA id 4D8E5CEF965; Thu, 6 Jul 2023 00:37:37 +0200 (CEST) Received: from [192.168.100.223] (ragnarok.applied-asynchrony.com [192.168.100.223]) by tux.applied-asynchrony.com (Postfix) with ESMTP id F19EEF01600; Thu, 6 Jul 2023 00:37:36 +0200 (CEST) Subject: Re: [PATCH v4 29/33] x86/mm: try VMA lock-based page fault handling first To: Suren Baghdasaryan Cc: Jiri Slaby , akpm@linux-foundation.org, michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, linux-mm References: <20230227173632.3292573-1-surenb@google.com> <20230227173632.3292573-30-surenb@google.com> <9a8d788c-b8ba-1b8a-fd79-0e25b1b60bed@kernel.org> <2f150512-e460-a9ae-65db-39dc54fe99d6@kernel.org> From: =?UTF-8?Q?Holger_Hoffst=c3=a4tte?= Organization: Applied Asynchrony, Inc. Message-ID: <04e701fc-2fd8-c4db-73d9-c86d4103641b@applied-asynchrony.com> Date: Thu, 6 Jul 2023 00:37:36 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: A2EAA4001E X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: f73gmyhjkiechyeo6z694sypgootc7hp X-HE-Tag: 1688596659-82513 X-HE-Meta: U2FsdGVkX1/BbYcNkrhYfgC8mbUFYkK7RJNLoZ6glEsDLS46evUTP/MvoY27Ooy2RD6gA+0X1gdnA5RfJyofRLAhiK9xVL8GVfPCqkrrsSndLpVEEVoHfwnoPx7P2Q7S5mKCTf233q0gZOcpD4gE2z9bazps7vK3AP+YaDPshWHAuY9P3ilFAcXPj0Wrs7PlWZa2cJ/ueuU7RN49Ld1q74PogOnXSHrCquFwVvLy1u8gfl9jRThPufafNv2N2isn4Bv04yS4+z1aW8OZwLFCS+CX5WKzTY5dZX4sP2H+ImBlvq5RFOKL0WA5cs/4LYvwVrGm40EmX/X7rYi4NzPt9ZFTxn6EZPRrR1F5zNXtvk7VsikvwtaL52syGx94tS9kNVZufrSUBzq/TsAsbqQZsFQINi4UBaicN3SKjmjBcNI2NAHsygzlsyxoeF1C8KuCC7iJQiusOkvmV2gL9Vtg5k3zIG8uMV82C+UzyLvr7oTCnEzJua4l4IkyrOX8hJNDV6e1Is33IDRslUgd3c2GJUqINTJu3X5bzv71aCuSVWulR63h8WBy9GpVVDgBe+Rpzgww1ydc9/ec1eb5c8zCvgtOzjHSsmo5ihi9uhPZhaSWdgXilt/7bV/OW88IZjKr2+pDCvzBJHD+oABJGP6GCLajjXuwRkWwZsw99ofGOKWbiqE891E+dxmvd5u2z2yCLospLRtdh7pRTIljSUVSEvma9FMGNbErEjfaaZ8jNPCZGLko8HwKgEcwRXa4NZt25Av4jYows3vkzlJJxkDULLElRj2SzQQoRJRZ2fLHrS9b9JbsCfsOR42tdT9qGc1y9ANRh/zna6fdJtYcsKv2nQB0cfeG3dLAFbK1WKyRJbI2yi8RyrafRpNTmNTJFyVfCcaDTTc3nltvjd3zv2la9RiCR5kg9WVQLhyTs2Xyu+6ME7HWQuLvQGPJz/GoW51GhSuajMCnYaT0sDnQsUj o+Hh0q6w DmYi5Yah/kGYUk4tf7Symii/4/jj4YuhfmIKIBRCi7pccDdu/YiegFQ+njhaYmJImFcHDhMwbERU0fL9wcAg7kGPVAyR2oyuZ/k+fFsNzNgCCbTUudcumJQTdV+mBDECSfNvcqWLeEIMDz+IYMU8/J4btxX2qBdGDoGE0rFIdP6UKpmKXgxI4AJIGjY9ENr522Xlp4U3gmGCTcTIZ2P6TCUjj7WWKSUIEveq/X5h4mza36ToymdvjDoTs/xyvZlHQyPXFxiTahlo/DEz5+mJp6sV8S2um5q342VUF8o3m8lFqhpbucUf8S9enqNRKYn9V0vyzVSFgcbw1XIRTmY/4XfVfXrgP37LkTmXIwnOml+jUHKpR9mKp7DQMWZJPn+Y6sE+3Y8PZJvFELVQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023-07-06 00:15, Suren Baghdasaryan wrote: > On Mon, Jul 3, 2023 at 6:52 AM Holger Hoffstätte > wrote: >> >> On 2023-07-03 12:47, Jiri Slaby wrote: >>> Cc Jacob Young (from kernel bugzilla) >>> >>> On 30. 06. 23, 19:40, Suren Baghdasaryan wrote: >>>> On Fri, Jun 30, 2023 at 1:43 AM Jiri Slaby wrote: >>>>> >>>>> On 30. 06. 23, 10:28, Jiri Slaby wrote: >>>>>> > 2348 >>>>>> clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351 >>>>>> > 2350 <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372 >>>>>> > 2351 <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354 >>>>>> > 2351 <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357 >>>>>> > 2354 <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355 >>>>>> > 2355 <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370 >>>>>> > 2370 mmap(NULL, 262144, PROT_READ|PROT_WRITE, >>>>>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 >>>>>> > 2370 <... mmap resumed>) = 0x7fca68249000 >>>>>> > 2372 <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384 >>>>>> > 2384 <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388 >>>>>> > 2388 <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392 >>>>>> > 2392 <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395 >>>>>> > 2395 write(2, "runtime: marked free object in s"..., 36 >>>>> ...> >>>>>> >>>>>> I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON >>>>>> 0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some >>>>>> reason 0x7fca6824bec8 in that region is "bad". >>>> >>>> Thanks for the analysis Jiri. >>>> Is it possible from these logs to identify whether 2370 finished the >>>> mmap operation before 2395 tried to access 0x7fca6824bec8? That access >>>> has to happen only after mmap finishes mapping the region. >>> >>> Hi, >>> >>> it's hard to tell, but I assume so. >>> >>> For now, forget about this go's overly complicated, hard to reproduce case and concentrate on the very nice reduced testcase in: >>> https://bugzilla.kernel.org/show_bug.cgi?id=217624 >>> ;) >>> >>> FWIW, I can reproduce using the test case too. >>> >>> thanks, >> >> As another (admittedly correlation-only) data point, I noticed at least hourly crashes >> of Firefox-114 after upgrading to 6.4.1, which had never happened before with 6.3.x. >> After reverting 0bff0aaea03e2a3ed6 - with a bit of context fixup due to follow-up >> commits in 6.4.1 - it has been rock stable again, for several hours now. > > Jiri, Holger, would you be able to try > https://lore.kernel.org/all/20230705171213.2843068-2-surenb@google.com/ > and see if your issues still exist? Just in time! Not 2 minutes ago I finished rebuilding 6.4.2 + the last version of your patches on a second machine (old Intel Sandy Bridge workstation) to be my crash test dummy. I removed the BROKEN dependency in mm/Kconfig, manually set PER_VMA_LOCK=y and ... it seems to work?! Boots fine, Firefox seems to work (but no exhaustive tests yet). I will also rerun a few reboot laps, just to exercise this a bit harder and see if something comes up. Tomorrow I'll also try again on my Zen2 Thinkpad and will report back. Fingers crossed! cheers Holger