From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 740E7C433F5 for ; Tue, 19 Apr 2022 22:33:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 89F996B0072; Tue, 19 Apr 2022 18:33:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 84EAA6B0073; Tue, 19 Apr 2022 18:33:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C9636B0074; Tue, 19 Apr 2022 18:33:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0050.hostedemail.com [216.40.44.50]) by kanga.kvack.org (Postfix) with ESMTP id 5AE486B0072 for ; Tue, 19 Apr 2022 18:33:04 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 100D2A45CF for ; Tue, 19 Apr 2022 22:33:04 +0000 (UTC) X-FDA: 79375080288.28.4B4449C Received: from mail-vk1-f174.google.com (mail-vk1-f174.google.com [209.85.221.174]) by imf06.hostedemail.com (Postfix) with ESMTP id B0E0E18000D for ; Tue, 19 Apr 2022 22:33:02 +0000 (UTC) Received: by mail-vk1-f174.google.com with SMTP id b81so8202005vkf.1 for ; Tue, 19 Apr 2022 15:33:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VLPNm1zfYFG8sspty4o9ORFRIBEAx3D1EZl4YM6WiPM=; b=Aex8SDRrS5VzWfWlV8iWRhxTbkg8b4dWxKbFr3SMF+tCFKjE+IPPMx16F51NoilY7s 21rG4wwBx6CoPq8yBlg1ps9EKJcunUoKzUqzpGSZI84Kie/674fmOYdbgurGa8RkFRrs rKW+FDdVeXVBWi33KdLRuFtfIHeokXFBUx5QnTm7H0UDJe2qvQ6exPnmq2aGX5w1Jva5 eiRK3qLEMAdhGPBBscmEY8e5Z74TmukXQM9SI288cjtKyAMxK576z/J2iZdgTInGXKG0 RgcwEKsAXzQuvMt/REKDj1Zk+2ZWlVRhze2V36wImn0pLRx7cTH3Uy2ycA/1eeWFnTd9 E+2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VLPNm1zfYFG8sspty4o9ORFRIBEAx3D1EZl4YM6WiPM=; b=i4Zua0ys+RmhanUOSJkYAWn+0E2zMRbdtGmTtPvRB1m/XF0e+WQ6uE4bMSz3275kIQ PPA32FkchgA+t8mY3eFWWxTPF5mtmN3W8s4JXwEoWrvjClxIlCF6HMsj+XmivUz6Pkop ijBZcROTkW2gRUfpg139RuPFo3qKe5SbfiQ7Idqp/ztBP8Qvzk49/gArlu6qTjIacGNN bhuJUKHWq/+gxqVtrm2f7jM2aiQ6s0P0kG4J4FIi+fIGGO0DaZVmAGsV5Y0XCNAKdaQO KqBXcS9FxHWbSFa6BVi7pA2AfLZJU+wev7UPkKN1LNZGnkTkL6dgevQcM7VDIKeVkUUK /AtQ== X-Gm-Message-State: AOAM531ZvBXCvOzM0mJiEe6cdrLlLV3edfXDEeso6ighGJG4iHVkHpDk f1gpbvbteQyE3kBdyi75cx1/MfuDHs5PFkZssGA4BA== X-Google-Smtp-Source: ABdhPJygTdEuCeOCg/yL/LJQl7X3o9uh4wxdlNfmN7Kha46mxOdlOP3/DVBLPMHaGFHWrHYIYiGUGhK6BLx6vNi5Hy0= X-Received: by 2002:a05:6122:887:b0:332:699e:7e67 with SMTP id 7-20020a056122088700b00332699e7e67mr5134984vkf.35.1650407582433; Tue, 19 Apr 2022 15:33:02 -0700 (PDT) MIME-Version: 1.0 References: <20220407031525.2368067-1-yuzhao@google.com> <20220407031525.2368067-9-yuzhao@google.com> <20220411191621.0378467ad99ebc822d5ad005@linux-foundation.org> <20220414185654.e7150bcbe859e0dd4b9c61af@linux-foundation.org> <20220415121521.764a88dda55ae8c676ad26b0@linux-foundation.org> <20220415143220.cc37b0b0a368ed2bf2a821f8@linux-foundation.org> In-Reply-To: From: Yu Zhao Date: Tue, 19 Apr 2022 16:32:26 -0600 Message-ID: Subject: Re: [PATCH v10 08/14] mm: multi-gen LRU: support page table walks To: Justin Forbes Cc: Andrew Morton , Stephen Rothwell , Linux-MM , Andi Kleen , Aneesh Kumar , Barry Song <21cnbao@gmail.com>, Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , Linux ARM , "open list:DOCUMENTATION" , linux-kernel , Kernel Page Reclaim v2 , "the arch/x86 maintainers" , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , =?UTF-8?Q?Holger_Hoffst=C3=A4tte?= , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B0E0E18000D X-Stat-Signature: 3mx4jsq1tgizuhbma7ch6c6srwbxhywj Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Aex8SDRr; spf=pass (imf06.hostedemail.com: domain of yuzhao@google.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1650407582-834409 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Apr 16, 2022 at 10:32 AM Justin Forbes wrote: > > On Fri, Apr 15, 2022 at 4:33 PM Andrew Morton wrote: > > > > On Fri, 15 Apr 2022 14:11:32 -0600 Yu Zhao wrote: > > > > > > > > > > I grabbed > > > > https://kojipkgs.fedoraproject.org//packages/kernel/5.18.0/0.rc2.23.fc37/src/kernel-5.18.0-0.rc2.23.fc37.src.rpm > > > > and > > > > > > Yes, Fedora/RHEL is one concrete example of the model I mentioned > > > above (experimental/stable). I added Justin, the Fedora kernel > > > maintainer, and he can further clarify. > > We almost split into 3 scenarios. In rawhide we run a standard Fedora > config for rcX releases and .0, but git snapshots are built with debug > configs only. The trade off is that we can't turn on certain options > which kill performance, but we do get more users running these kernels > which expose real bugs. The rawhide kernel follows Linus' tree and is > rebuilt most weekdays. Stable Fedora is not a full debug config, but > in cases where we can keep a debug feature on without it much getting > in the way of performance, as is the case with CONFIG_DEBUG_VM, I > think there is value in keeping those on, until there is not. And of > course RHEL is a much more conservative config, and a much more > conservative rebase/backport codebase. > > > > If we don't want more VM_BUG_ONs, I'll remove them. But (let me > > > reiterate) it seems to me that just defeats the purpose of having > > > CONFIG_DEBUG_VM. > > > > > > > Well, I feel your pain. It was never expected that VM_BUG_ON() would > > get subverted in this fashion. > > Fedora is not trying to subvert anything. If keeping the option on > becomes problematic, we can simply turn it off. Fedora certainly has > a more diverse installed base than typical enterprise distributions, > and much more diverse than most QA pools. Both in the array of > hardware, and in the use patterns, so things do get uncovered that > would not be seen otherwise. > > > We could create a new MM-developer-only assertion. Might even call it > > MM_BUG_ON(). With compile-time enablement but perhaps not a runtime > > switch. > > > > With nice simple semantics, please. Like "it returns void" and "if you > > pass an expression with side-effects then you lose". And "if you send > > a patch which produces warnings when CONFIG_MM_BUG_ON=n then you get to > > switch to windows95 for a month". > > > > Let's leave the mglru assertions in place for now and let's think about > > creating something more suitable, with a view to switching mglru over > > to that at a later time. > > > > > > > > But really, none of this addresses the core problem: *_BUG_ON() often > > kills the kernel. So guess what we just did? We killed the user's > > kernel at the exact time when we least wished to do so: when they have > > a bug to report to us. So the thing is self-defeating. > > > > It's much much better to WARN and to attempt to continue. This makes > > it much more likely that we'll get to hear about the kernel flaw. > > I agree very much with this. We hear about warnings from users, they > don't go unnoticed, and several of these users are willing to spend > time to help get to the bottom of an issue. They may not know the > code, but plenty are willing to test various patches or scenarios. Thanks, Justin. Glad to hear warnings are collected from the field. Based on all the feedback, my action item is to replace all VM_BUG_ONs with VM_WARN_ON_ONCEs.