From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02E4AC43334 for ; Mon, 4 Jul 2022 10:05:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E6486B0072; Mon, 4 Jul 2022 06:05:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 696A06B0073; Mon, 4 Jul 2022 06:05:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55D236B0074; Mon, 4 Jul 2022 06:05:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4664B6B0072 for ; Mon, 4 Jul 2022 06:05:31 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6465F66ACF for ; Mon, 4 Jul 2022 09:44:17 +0000 (UTC) X-FDA: 79648931754.17.47C412D Received: by imf07.hostedemail.com (Postfix, from userid 200) id 4C31541969; Mon, 4 Jul 2022 08:50:13 +0000 (UTC) Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf07.hostedemail.com (Postfix) with ESMTP id 7A50141489 for ; Mon, 4 Jul 2022 08:13:12 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id x18-20020a17090a8a9200b001ef83b332f5so2579806pjn.0 for ; Mon, 04 Jul 2022 01:13:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:from:to:cc:subject:references:mime-version :content-disposition:in-reply-to; bh=pdmzOiCsdIUj9TZA1NAWhLb8ahxpSH6TGjeezL44MTU=; b=etNSOaz6vKSCbaYnT7YVjgkl7tJwXJzGbu7edbsRkWQkCY7p8dy8/5L54ZC4lJhLBx OfI+pokBofSiEnyNkblniYkcmTQxMIUCQIoLR4jlyj34+/XE1h36OF10MkDGJWJxC1co 7vDIizSuv0TcbHtT07lDVa+A6vH08YVSpQU1s6+Np83YU9l+Xdlnj0THj1Xl2q80mn8/ wgKLc7vEEkJTKtOjb3z9Ni8sB7C0o1KRE4ppmb858mhZyhl2kDzNbXfJOQU0/XK2HBNY siT90teuvY3eW2WCEROLDjMNuhOH1G7XdaVOg7aK2WPDr8WA0siCi0KjHuBJzCAWdsHO 12Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:from:to:cc:subject:references :mime-version:content-disposition:in-reply-to; bh=pdmzOiCsdIUj9TZA1NAWhLb8ahxpSH6TGjeezL44MTU=; b=gaiWdoExEcLrnG83l+rAfcsnGDHd2ULwPli6NEOyqEVL6b+MPWeFG81UYvnXg0cniH LluOvPPV/euNgoafYdva4U/+zxAgieInqyGGdwQRXmAAxGIjDNjTbO42glSbyJN36+B1 vKjZyho1aMt0zGGMrjBMeq2DiEYlpnH/e/iAsGVR8/RMFmcQmAq7lULaXTMB1m+DPTO+ BdStYIam1FI3L0uYxF/umoT2I9nGWU3+eV96IZ9ntdxY6zmVV45luKqMfFEhMe3RKZaY roWhcAC+W2n2yRzVKVPyHwcfrcd1zP9uJGhbDYCUcstCPS9pRuNUUWVyoa6tap7odgrZ EgTA== X-Gm-Message-State: AJIora/Q48fktUbkVgCo29pyS+PH6cS9k+EKN12AHBx9tCK1qgqsaoKk JMkaDLhnb5v0h4RRyyZTfEQ= X-Google-Smtp-Source: AGRyM1tqSX0tELgQCRJO/cFXenksTrO11jo9WsxbOc6tQT0bdE34Pb7oUUpWSxSKIb5RCX/K/s5MdQ== X-Received: by 2002:a17:902:d488:b0:16a:158e:dd0b with SMTP id c8-20020a170902d48800b0016a158edd0bmr33831412plg.162.1656922391489; Mon, 04 Jul 2022 01:13:11 -0700 (PDT) Received: from localhost ([193.203.214.57]) by smtp.gmail.com with ESMTPSA id cp2-20020a170902e78200b0016a3b5a46efsm20415557plb.113.2022.07.04.01.13.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Jul 2022 01:13:11 -0700 (PDT) Message-ID: <62c2a117.1c69fb81.3a929.dda9@mx.google.com> X-Google-Original-Message-ID: <20220704081309.GB1266413@cgel.zte@gmail.com> Date: Mon, 4 Jul 2022 08:13:09 +0000 From: CGEL To: David Hildenbrand Cc: Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, vbabka@suse.cz, minchan@kernel.org, oleksandr@redhat.com, xu xin , Jann Horn , Andrew Morton Subject: Re: [PATCH linux-next] mm/madvise: allow KSM hints for process_madvise References: <20220701084323.1261361-1-xu.xin16@zte.com.cn> <93e1e19a-deff-2dad-0b3c-ef411309ec58@redhat.com> <203548a6-cf70-30ce-6756-f6c909e7ef21@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <203548a6-cf70-30ce-6756-f6c909e7ef21@redhat.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656922392; a=rsa-sha256; cv=none; b=m0DfLDqhbSiGP57JsJOhBYV1tfFR5Gax8OOsy1dxUJ435KvuzpQgVyu76fP8PmyD4Xa+yl EFCQNEE3bWJng80SZs6huweaRAUPMoIX+EPSbds421JDVpop+K3mPikaSxKOL8pLoqSGdw RgF0thVcfPep+EGCIXneV24AmvbzlXM= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=etNSOaz6; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of cgel.zte@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=cgel.zte@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656922392; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pdmzOiCsdIUj9TZA1NAWhLb8ahxpSH6TGjeezL44MTU=; b=FVBa8A2RLnmD8mpZu+gvXt0Y43sNs5xqCjiV7YkoeU8S7/D0VDtqxggb3h2ORw+1pNqDmk bHvSpGLuRCcELr74SMokh1oCfNSHODzoe4jo0PqCu54znemLrnImJh5kB2hJVUjRhPq5ds Epp7Fw2ezem8VZ+6AbTd36IweAs4k38= X-Rspam-User: X-Rspamd-Server: rspam07 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=etNSOaz6; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of cgel.zte@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=cgel.zte@gmail.com X-Stat-Signature: ay38xnqausk5chauorgrao4cqh9dz96w X-Rspamd-Queue-Id: 7A50141489 X-HE-Tag: 1656922392-545905 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 01, 2022 at 02:09:24PM +0200, David Hildenbrand wrote: > On 01.07.22 14:02, Michal Hocko wrote: > > On Fri 01-07-22 12:50:59, David Hildenbrand wrote: > >> On 01.07.22 12:32, David Hildenbrand wrote: > >>> On 01.07.22 11:11, Michal Hocko wrote: > >>>> [Cc Jann] > >>>> > >>>> On Fri 01-07-22 08:43:23, cgel.zte@gmail.com wrote: > >>>>> From: xu xin > >>>>> > >>>>> The benefits of doing this are obvious because using madvise in user code > >>>>> is the only current way to enable KSM, which is inconvenient for those > >>>>> compiled app without marking MERGEABLE wanting to enable KSM. > >>>> > >>>> I would rephrase: > >>>> " > >>>> KSM functionality is currently available only to processes which are > >>>> using MADV_MERGEABLE directly. This is limiting because there are > >>>> usecases which will benefit from enabling KSM on a remote process. One > >>>> example would be an application which cannot be modified (e.g. because > >>>> it is only distributed as a binary). MORE EXAMPLES WOULD BE REALLY > >>>> BENEFICIAL. > >>>> " > >>>> > >>>>> Since we already have the syscall of process_madvise(), then reusing the > >>>>> interface to allow external KSM hints is more acceptable [1]. > >>>>> > >>>>> Although this patch was released by Oleksandr Natalenko, but it was > >>>>> unfortunately terminated without any conclusions because there was debate > >>>>> on whether it should use signal_pending() to check the target task besides > >>>>> the task of current() when calling unmerge_ksm_pages of other task [2]. > >>>> > >>>> I am not sure this is particularly interesting. I do not remember > >>>> details of that discussion but checking signal_pending on a different > >>>> task is rarely the right thing to do. In this case the check is meant to > >>>> allow bailing out from the operation so that the caller could be > >>>> terminated for example. > >>>> > >>>>> I think it's unneeded to check the target task. For example, when we set > >>>>> the klob /sys/kernel/mm/ksm/run from 1 to 2, > >>>>> unmerge_and_remove_all_rmap_items() doesn't use signal_pending() to check > >>>>> all other target tasks either. > >>>>> > >>>>> I hope this patch can get attention again. > >>>> > >>>> One thing that the changelog is missing and it is quite important IMHO > >>>> is the permission model. As we have discussed in previous incarnations > >>>> of the remote KSM functionality that KSM has some security implications. > >>>> It would be really great to refer to that in the changelog for the > >>>> future reference (http://lkml.kernel.org/r/CAG48ez0riS60zcA9CC9rUDV=kLS0326Rr23OKv1_RHaTkOOj7A@mail.gmail.com) > >>>> > >>>> So this implementation requires PTRACE_MODE_READ_FSCREDS and > >>>> CAP_SYS_NICE so the remote process would need to be allowed to > >>>> introspect the address space. This is the same constrain applied to the > >>>> remote momory reclaim. Is this sufficient? > >>>> > >>>> I would say yes because to some degree KSM mergning can have very > >>>> similar effect to memory reclaim from the side channel POV. But it > >>>> should be really documented in the changelog so that it is clear that > >>>> this has been a deliberate decision and thought through. > >>>> > >>>> Other than that this looks like the most reasonable approach to me. > >>>> > >>>>> [1] https://lore.kernel.org/lkml/YoOrdh85+AqJH8w1@dhcp22.suse.cz/ > >>>>> [2] https://lore.kernel.org/lkml/2a66abd8-4103-f11b-06d1-07762667eee6@suse.cz/ > >>>>> > >>> > >>> I have various concerns, but the biggest concern is that this modifies > >>> VMA flags and can possibly break applications. > >>> > >>> process_madvise must not modify remote process state. > >>> > >>> That's why we only allow a very limited selection that are merely hints. > >>> > >>> So nack from my side. > >>> > >> > >> [I'm quit ebusy, but I think some more explanation might be of value] > >> > >> One COW example where I think force-enabling KSM for processes is > >> *currently* not a good idea (besides the side channel discussions, which > >> is also why Windows stopped to enable KSM system wide a while ago): > >> > >> App: > >> > >> a) memset(page, 0); > >> b) trigger R/O long-term pin on page (e.g., vfio) > >> > >> If between a) and b) KSM replaces the page by the shared zeropage you'll > >> get an unreliable pin because we don't break yet COW when taking a R/O > >> pin on the shared zeropage. And in the traditional sense, the app did > >> everything right to guarantee that the pin will stay reliable. > > > > Isn't this a bug in the existing implementation of the CoW? > > One the one hand yes (pinning the shared zeropage is questionable), on > the other hand no (user space did modify that memory ahead of time and > filled it with something reasonable, that's how why always worked > correctly in the absence of KSM). > Thanks for your information. So does it needs to be fixed? and if yes, are you planning to fix it. > > > >> Further, if an app explicitly decides to disable KSM one some region, we > >> should not overwrite that. > > > > Well, the interface is rather spartan. You cannot really tell "disable > > KSM on some reqion". You can only tell "KSM can be applied to this > > region" and later change your mind. Maybe this is what you had in > > mind though. > > That's what I meant. The hugepage interface has different semantics and > you get three possible states: > > 1: yes please: MADV_HUGEPAGE > 2: don't care -- don't set anything > 3. please no: MADV_NOHUGEPAGE > > Currently for KSM we only have 1 and 2 internally I think (single > flag), because it didn't matter in the past ebcause there was no > force-enablement. One could convert it into all 3 states, changing the > semantics of MADV_UNMERGEABLE slightly from > > > 1: yes please: MADV_MERGEABLE > 2: don't care: MADV_UNMERGEABLE > > to > > 1: yes please: MADV_MERGEABLE > 2: don't care -- don't set anything > 3. please no: MADV_UNMERGEABLE > > > -- > Thanks, > > David / dhildenb