From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37203C48BE8 for ; Mon, 14 Jun 2021 03:52:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B7B4A613AD for ; Mon, 14 Jun 2021 03:52:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B7B4A613AD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 270446B006C; Sun, 13 Jun 2021 23:52:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 220B96B006E; Sun, 13 Jun 2021 23:52:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 09AC66B0070; Sun, 13 Jun 2021 23:52:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0118.hostedemail.com [216.40.44.118]) by kanga.kvack.org (Postfix) with ESMTP id CAF646B006C for ; Sun, 13 Jun 2021 23:52:41 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5BFB8180AD806 for ; Mon, 14 Jun 2021 03:52:41 +0000 (UTC) X-FDA: 78250956714.01.097BBB3 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf08.hostedemail.com (Postfix) with ESMTP id 0011780192FE for ; Mon, 14 Jun 2021 03:52:32 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id B78556120E; Mon, 14 Jun 2021 03:52:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1623642760; bh=Msfsid2dMP0z1xO8rgeqV4lboVmj2pyzEHCvjexUV90=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=l+fuc2vwhawWXWs1SCV0yi4B1tW+hB2XQRLfENdg5Ys2RbOYZ5+3vWLDgJeV2dvUN 3lkkNi3tFXCP2VSW6kSwEjxkaEwY1fksZTTmpJ0dDgb0Q0jWsRLoyXm36tpqP37Rhn b/vCObmFAsixC1JKlr1ynfE+WsRE3Ee0DW8ovYtCQ5lns86k5gK1kr2Id3u+XEeN8R Od1RpSAs/I95p4Xhh0lp3O0JFLgCA8a6RizOqM+8R6XRteWt99s81a/+nv04H/eK1L ZytXE2rBjzDCRpUPIXlhNKYkNVDsDjbiC/CLlwHQ1Qr+jqcB6+a5CyKKEf87VHktjq tSDgapbeoEAxg== Subject: Re: [PATCH v4 2/4] lazy tlb: allow lazy tlb mm refcounting to be configurable To: Nicholas Piggin , Andrew Morton Cc: Anton Blanchard , linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Randy Dunlap , Linus Torvalds References: <20210605014216.446867-1-npiggin@gmail.com> <20210605014216.446867-3-npiggin@gmail.com> <8ac1d420-b861-f586-bacf-8c3949e9b5c4@kernel.org> <1623629185.fxzl5xdab6.astroid@bobo.none> From: Andy Lutomirski Message-ID: <02e16a2f-2f58-b4f2-d335-065e007bcea2@kernel.org> Date: Sun, 13 Jun 2021 20:52:38 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <1623629185.fxzl5xdab6.astroid@bobo.none> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=l+fuc2vw; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf08.hostedemail.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org X-Stat-Signature: hi15auz9gedeeirjd6t4f1e8in5gbk14 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 0011780192FE X-HE-Tag: 1623642752-342306 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/13/21 5:45 PM, Nicholas Piggin wrote: > Excerpts from Andy Lutomirski's message of June 9, 2021 2:20 am: >> On 6/4/21 6:42 PM, Nicholas Piggin wrote: >>> Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm >>> when it is context switched. This can be disabled by architectures that >>> don't require this refcounting if they clean up lazy tlb mms when the >>> last refcount is dropped. Currently this is always enabled, which is >>> what existing code does, so the patch is effectively a no-op. >>> >>> Rename rq->prev_mm to rq->prev_lazy_mm, because that's what it is. >> >> I am in favor of this approach, but I would be a lot more comfortable >> with the resulting code if task->active_mm were at least better >> documented and possibly even guarded by ifdefs. > > active_mm is fairly well documented in Documentation/active_mm.rst IMO. > I don't think anything has changed in 20 years, I don't know what more > is needed, but if you can add to documentation that would be nice. Maybe > moving a bit of that into .c and .h files? > Quoting from that file: - however, we obviously need to keep track of which address space we "stole" for such an anonymous user. For that, we have "tsk->active_mm", which shows what the currently active address space is. This isn't even true right now on x86. With your patch applied: To support all that, the "struct mm_struct" now has two counters: a "mm_users" counter that is how many "real address space users" there are, and a "mm_count" counter that is the number of "lazy" users (ie anonymous users) plus one if there are any real users. isn't even true any more. >> x86 bare metal currently does not need the core lazy mm refcounting, and >> x86 bare metal *also* does not need ->active_mm. Under the x86 scheme, >> if lazy mm refcounting were configured out, ->active_mm could become a >> dangling pointer, and this makes me extremely uncomfortable. >> >> So I tend to think that, depending on config, the core code should >> either keep ->active_mm [1] alive or get rid of it entirely. > > I don't actually know what you mean. > > core code needs the concept of an "active_mm". This is the mm that your > kernel threads are using, even in the unmerged CONFIG_LAZY_TLB=n patch, > active_mm still points to init_mm for kernel threads. Core code does *not* need this concept. First, it's wrong on x86 since at least 4.15. Any core code that actually assumes that ->active_mm is "active" for any sensible definition of the word active is wrong. Fortunately there is no such code. I looked through all active_mm references in core code. We have: kernel/sched/core.c: it's all refcounting, although it's a bit tangled with membarrier. kernel/kthread.c: same. refcounting and membarrier stuff. kernel/exit.c: exit_mm() a BUG_ON(). kernel/fork.c: initialization code and a warning. kernel/cpu.c: cpu offline stuff. wouldn't be needed if active_mm went away. fs/exec.c: nothing of interest I didn't go through drivers, but I maintain my point. active_mm is there for refcounting. So please don't just make it even more confusing -- do your performance improvement, but improve the code at the same time: get rid of active_mm, at least on architectures that opt out of the refcounting. > > We could hide that idea behind an active_mm() function that would always > return &init_mm if mm==NULL, but you still have the concept of an active > mm and a pointer that callers must not access after free (because some > cases will be CONFIG_LAZY_TLB=y). > >> [1] I don't really think it belongs in task_struct at all. It's not a >> property of the task. It's the *per-cpu* mm that the core code is >> keeping alive for lazy purposes. How about consolidating it with the >> copy in rq? > > I agree it's conceptually a per-cpu property. I don't know why it was > done this way, maybe it was just convenient and works well for mm and > active_mm to be adjacent. Linus might have a better insight. > >> I guess the short summary of my opinion is that I like making this >> configurable, but I do not like the state of the code. > > I don't think I'd object to moving active_mm to rq and converting all > usages to active_mm() while we're there, it would make things a bit > more configurable. But I don't see it making core code fundamentally > less complex... if you're referring to the x86 mm switching monstrosity, > then that's understandable, but I admit I haven't spent enough time > looking at it to make a useful comment. A patch would be enlightening, > I have the leftover CONFIG_LAZY_TLB=n patch if you were thinking of > building on that I can send it to you. > > Thanks, > Nick >