From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D200CC433EF for ; Fri, 1 Jul 2022 12:09:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 702326B0071; Fri, 1 Jul 2022 08:09:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B2186B0073; Fri, 1 Jul 2022 08:09:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 579AC6B0074; Fri, 1 Jul 2022 08:09:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4A0056B0071 for ; Fri, 1 Jul 2022 08:09:29 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 26A3220431 for ; Fri, 1 Jul 2022 12:09:29 +0000 (UTC) X-FDA: 79638411258.02.91FE952 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 53F9CA003E for ; Fri, 1 Jul 2022 12:09:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656677367; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a44l1BPonlFv22UzYK+zY86aCCDzfde91OGY+36yGaU=; b=iX89YVg48hi8sfnc5Ii9e6oEaUC8N8vPgaJjsMhE/7/buW04psT058BQUQAw8RiFtqjJ1r g/27rMmVayzr+Dbu9wsSvZfRarNTNeZujD62Ukd42b6K14Rix7MajxK2JJSrYcOlZzNKU9 MP/td0q4Z6Fbn6ag50Gqo+bhuuQHVGI= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-463-3pCHyS-pMpCsSpIE4Aj3Sg-1; Fri, 01 Jul 2022 08:09:26 -0400 X-MC-Unique: 3pCHyS-pMpCsSpIE4Aj3Sg-1 Received: by mail-wm1-f69.google.com with SMTP id k5-20020a05600c0b4500b003941ca130f9so972761wmr.0 for ; Fri, 01 Jul 2022 05:09:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=a44l1BPonlFv22UzYK+zY86aCCDzfde91OGY+36yGaU=; b=RfsQbB24Dxrh08SJ4AfW6EywzgSr6Vn+DqAgEjhmBxCefu4oaAb8OUcfVE+Ir5gi0V ljnf/PGwFkG5wX3mZwMP9r9DOun3sSFkDQqDV3XUAq2biWMvIfOzWiYuWTFP5FO9sDQI J/7e6kgChXumDJqwvPJ0VX3AJKgwunQqzmS98g/NgWze6+wfZuZ3MYjhfSPhjvWUYI// p9YCV30YQyv7uHtYeQD7l4dRzqW60PL8lZaTCivHptxWV+OTGqaR5KI0rxe8tyqLR58A RRblW6CjSsp6kpTyB28Jnl33sHTq0ezzzJBseiosfOSUKcg9Z+5dI/B1XVPoiUk3j9el v1kg== X-Gm-Message-State: AJIora/a8KCm8azi6uuu2mf3kPZDVPH00B/cuEe96/DIp+rfVJ93WXFl 6LIz5oqWK4K04Dmh5DGtWp20Mtu+2Jb9WzjiTkX5ItaL+FLh4CY9Hv+BXkBxNvX7cT5NyzDKLNs s5MdbVaKUFN8= X-Received: by 2002:a05:6000:1541:b0:21d:2ae1:a5dd with SMTP id 1-20020a056000154100b0021d2ae1a5ddmr13055095wry.621.1656677365499; Fri, 01 Jul 2022 05:09:25 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tDSsXRR94HN6cIWegpiPJC0FkTvoqWtVO/FzQjWQZvywkzkWoCXtISM/CsabnrmNdOGdxCsA== X-Received: by 2002:a05:6000:1541:b0:21d:2ae1:a5dd with SMTP id 1-20020a056000154100b0021d2ae1a5ddmr13055066wry.621.1656677365146; Fri, 01 Jul 2022 05:09:25 -0700 (PDT) Received: from ?IPV6:2003:cb:c709:e300:d7a0:7fc3:8428:43e5? (p200300cbc709e300d7a07fc3842843e5.dip0.t-ipconnect.de. [2003:cb:c709:e300:d7a0:7fc3:8428:43e5]) by smtp.gmail.com with ESMTPSA id p28-20020a1c545c000000b003a02de5de80sm6283522wmi.4.2022.07.01.05.09.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 01 Jul 2022 05:09:24 -0700 (PDT) Message-ID: <203548a6-cf70-30ce-6756-f6c909e7ef21@redhat.com> Date: Fri, 1 Jul 2022 14:09:24 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 To: Michal Hocko Cc: cgel.zte@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, vbabka@suse.cz, minchan@kernel.org, oleksandr@redhat.com, xu xin , Jann Horn , Andrew Morton References: <20220701084323.1261361-1-xu.xin16@zte.com.cn> <93e1e19a-deff-2dad-0b3c-ef411309ec58@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH linux-next] mm/madvise: allow KSM hints for process_madvise In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iX89YVg4; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf15.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656677368; a=rsa-sha256; cv=none; b=lHPSTy/vcLV8NOn+M9+YFApyzJrGtBxeyHYnC3j+HmOGnhuXIqdw/kfJ5MxcVc0WsQXgyL j2xbLWBjLzlvdyTNSmJmTN/4kcFhwIuQ8/R1dfyUf5BvV8fBPAZoZDp2ZlbOIRAc9XfrtV xaUa9AxBlrGvVz1O4xlFxPb3ICuQ06I= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656677368; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=a44l1BPonlFv22UzYK+zY86aCCDzfde91OGY+36yGaU=; b=RzWp8peC3LgtbuQKxL0Vw50+GGl+PniwTjvkE5beLFN7aeICrLW86IFD31ydgqIkVqMpDD Srp0JqTQp7WSHTlSakpus+z/+4T8eCT+aIXeLR+y/mGh/CGsz2SKqgbvn6z5s2IY704nld vXNAEHXw+LrkaXPKvTeGz5pXpZHYMQs= X-Rspam-User: Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iX89YVg4; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf15.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 53F9CA003E X-Stat-Signature: y8oe676kaf8cg63wttktcyyaynzknwii X-HE-Tag: 1656677368-296331 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 01.07.22 14:02, Michal Hocko wrote: > On Fri 01-07-22 12:50:59, David Hildenbrand wrote: >> On 01.07.22 12:32, David Hildenbrand wrote: >>> On 01.07.22 11:11, Michal Hocko wrote: >>>> [Cc Jann] >>>> >>>> On Fri 01-07-22 08:43:23, cgel.zte@gmail.com wrote: >>>>> From: xu xin >>>>> >>>>> The benefits of doing this are obvious because using madvise in user code >>>>> is the only current way to enable KSM, which is inconvenient for those >>>>> compiled app without marking MERGEABLE wanting to enable KSM. >>>> >>>> I would rephrase: >>>> " >>>> KSM functionality is currently available only to processes which are >>>> using MADV_MERGEABLE directly. This is limiting because there are >>>> usecases which will benefit from enabling KSM on a remote process. One >>>> example would be an application which cannot be modified (e.g. because >>>> it is only distributed as a binary). MORE EXAMPLES WOULD BE REALLY >>>> BENEFICIAL. >>>> " >>>> >>>>> Since we already have the syscall of process_madvise(), then reusing the >>>>> interface to allow external KSM hints is more acceptable [1]. >>>>> >>>>> Although this patch was released by Oleksandr Natalenko, but it was >>>>> unfortunately terminated without any conclusions because there was debate >>>>> on whether it should use signal_pending() to check the target task besides >>>>> the task of current() when calling unmerge_ksm_pages of other task [2]. >>>> >>>> I am not sure this is particularly interesting. I do not remember >>>> details of that discussion but checking signal_pending on a different >>>> task is rarely the right thing to do. In this case the check is meant to >>>> allow bailing out from the operation so that the caller could be >>>> terminated for example. >>>> >>>>> I think it's unneeded to check the target task. For example, when we set >>>>> the klob /sys/kernel/mm/ksm/run from 1 to 2, >>>>> unmerge_and_remove_all_rmap_items() doesn't use signal_pending() to check >>>>> all other target tasks either. >>>>> >>>>> I hope this patch can get attention again. >>>> >>>> One thing that the changelog is missing and it is quite important IMHO >>>> is the permission model. As we have discussed in previous incarnations >>>> of the remote KSM functionality that KSM has some security implications. >>>> It would be really great to refer to that in the changelog for the >>>> future reference (http://lkml.kernel.org/r/CAG48ez0riS60zcA9CC9rUDV=kLS0326Rr23OKv1_RHaTkOOj7A@mail.gmail.com) >>>> >>>> So this implementation requires PTRACE_MODE_READ_FSCREDS and >>>> CAP_SYS_NICE so the remote process would need to be allowed to >>>> introspect the address space. This is the same constrain applied to the >>>> remote momory reclaim. Is this sufficient? >>>> >>>> I would say yes because to some degree KSM mergning can have very >>>> similar effect to memory reclaim from the side channel POV. But it >>>> should be really documented in the changelog so that it is clear that >>>> this has been a deliberate decision and thought through. >>>> >>>> Other than that this looks like the most reasonable approach to me. >>>> >>>>> [1] https://lore.kernel.org/lkml/YoOrdh85+AqJH8w1@dhcp22.suse.cz/ >>>>> [2] https://lore.kernel.org/lkml/2a66abd8-4103-f11b-06d1-07762667eee6@suse.cz/ >>>>> >>> >>> I have various concerns, but the biggest concern is that this modifies >>> VMA flags and can possibly break applications. >>> >>> process_madvise must not modify remote process state. >>> >>> That's why we only allow a very limited selection that are merely hints. >>> >>> So nack from my side. >>> >> >> [I'm quit ebusy, but I think some more explanation might be of value] >> >> One COW example where I think force-enabling KSM for processes is >> *currently* not a good idea (besides the side channel discussions, which >> is also why Windows stopped to enable KSM system wide a while ago): >> >> App: >> >> a) memset(page, 0); >> b) trigger R/O long-term pin on page (e.g., vfio) >> >> If between a) and b) KSM replaces the page by the shared zeropage you'll >> get an unreliable pin because we don't break yet COW when taking a R/O >> pin on the shared zeropage. And in the traditional sense, the app did >> everything right to guarantee that the pin will stay reliable. > > Isn't this a bug in the existing implementation of the CoW? One the one hand yes (pinning the shared zeropage is questionable), on the other hand no (user space did modify that memory ahead of time and filled it with something reasonable, that's how why always worked correctly in the absence of KSM). > >> Further, if an app explicitly decides to disable KSM one some region, we >> should not overwrite that. > > Well, the interface is rather spartan. You cannot really tell "disable > KSM on some reqion". You can only tell "KSM can be applied to this > region" and later change your mind. Maybe this is what you had in > mind though. That's what I meant. The hugepage interface has different semantics and you get three possible states: 1: yes please: MADV_HUGEPAGE 2: don't care -- don't set anything 3. please no: MADV_NOHUGEPAGE Currently for KSM we only have 1 and 2 internally I think (single flag), because it didn't matter in the past ebcause there was no force-enablement. One could convert it into all 3 states, changing the semantics of MADV_UNMERGEABLE slightly from 1: yes please: MADV_MERGEABLE 2: don't care: MADV_UNMERGEABLE to 1: yes please: MADV_MERGEABLE 2: don't care -- don't set anything 3. please no: MADV_UNMERGEABLE -- Thanks, David / dhildenb