From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82A99C43334 for ; Mon, 4 Jul 2022 09:58:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02E1F6B0072; Mon, 4 Jul 2022 05:58:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F1EEC6B0073; Mon, 4 Jul 2022 05:58:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBFC86B0074; Mon, 4 Jul 2022 05:58:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C487A6B0072 for ; Mon, 4 Jul 2022 05:58:56 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0B7CD608B9 for ; Mon, 4 Jul 2022 09:43:12 +0000 (UTC) X-FDA: 79648929024.19.E792FFE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf14.hostedemail.com (Postfix) with ESMTP id E353F101437 for ; Mon, 4 Jul 2022 09:30:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656927007; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kmOci4VseGTQ2nlFPrT9ilHeR7wEpOBEKOi8eA9uYXU=; b=HwzlOFXF2yRJkacamfwJdTD1dBT5TdPJMvT4vuXhFEg9iTaHNnhGAhP6Lf1NdkO29ivet+ Ru8UkL5MGFszERSlHUJpy1ZqhcGhA6jvg37OJVQAb7zh3L5ISQcD04p2VLG4abLQd5FobB pSMQQqw2ShFh1+I0/z9kpscqPHn5LSA= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-32-eb1deUAuMwWnvwKhL9YhIg-1; Mon, 04 Jul 2022 05:30:06 -0400 X-MC-Unique: eb1deUAuMwWnvwKhL9YhIg-1 Received: by mail-wr1-f69.google.com with SMTP id v18-20020adf8b52000000b0021d641d2bb0so654360wra.11 for ; Mon, 04 Jul 2022 02:30:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=kmOci4VseGTQ2nlFPrT9ilHeR7wEpOBEKOi8eA9uYXU=; b=IJ70Ub5Q1lXzrzgEs8+nyHqwvIgTOS4eUb3/KaqwUAW87HvHfWlVWrsbIQsnYoEM33 XcMjYYpMtH0QzRt3emVQCNvqOXkRFrQm2wf1afzC2gF3yWpvOmmQB9WS/nFbVtHGT4tF XPbhDQL9yhDrNtSj6hgiO5wtbBvkwOF8n2iB0nivlTLHvt15qmFHLOSG6H4i6O5FOial WNe4BH8jYZoFTWljv3OIfYMs+JPKDvpWZrvFRv1hVvS69llPktGI0XVS2LDGjhY2C35h LJV1JztkRgkHFGYh9NMhoNWAW6qYzZBoZjVLNh8yqyQd+loXcP7FP3hg2M+f85pb4Ryz FRdA== X-Gm-Message-State: AJIora+a2wlTyF1iRYIBfd97U7xG5471WImUTfM+hbonq71mBkeX7yhk i5ODw8id0ydVcedJjgyoP3cqdt856RQvtylFTqHx9zSfjhxsDbRsZ9/LPHYs833JPMXoGmiTjQR dBHUbabn8FKI= X-Received: by 2002:a05:600c:35d5:b0:3a0:4b1a:2a28 with SMTP id r21-20020a05600c35d500b003a04b1a2a28mr29085336wmq.22.1656927004997; Mon, 04 Jul 2022 02:30:04 -0700 (PDT) X-Google-Smtp-Source: AGRyM1voaGmGb5+y36zdi6y1vXxNkRbBrgSorRwYLvBn5AUdOfmh+i/xeRQGlhoEmuChnh7k7otxdQ== X-Received: by 2002:a05:600c:35d5:b0:3a0:4b1a:2a28 with SMTP id r21-20020a05600c35d500b003a04b1a2a28mr29085309wmq.22.1656927004679; Mon, 04 Jul 2022 02:30:04 -0700 (PDT) Received: from ?IPV6:2003:d8:2f16:2a00:198:c1f8:a413:2600? (p200300d82f162a000198c1f8a4132600.dip0.t-ipconnect.de. [2003:d8:2f16:2a00:198:c1f8:a413:2600]) by smtp.gmail.com with ESMTPSA id n17-20020a05600c3b9100b0039ee391a024sm14707297wms.14.2022.07.04.02.30.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 04 Jul 2022 02:30:04 -0700 (PDT) Message-ID: <96622b10-1d95-425d-278a-1cf21ee92604@redhat.com> Date: Mon, 4 Jul 2022 11:30:03 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH linux-next] mm/madvise: allow KSM hints for process_madvise To: CGEL Cc: Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, vbabka@suse.cz, minchan@kernel.org, oleksandr@redhat.com, xu xin , Jann Horn , Andrew Morton References: <20220701084323.1261361-1-xu.xin16@zte.com.cn> <93e1e19a-deff-2dad-0b3c-ef411309ec58@redhat.com> <203548a6-cf70-30ce-6756-f6c909e7ef21@redhat.com> <62c2a117.1c69fb81.3a929.dda9@mx.google.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: <62c2a117.1c69fb81.3a929.dda9@mx.google.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656927008; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kmOci4VseGTQ2nlFPrT9ilHeR7wEpOBEKOi8eA9uYXU=; b=HgLntA5jCpk++8UB75IdSyDucvvqfqwUKM+2PbLOuMu8V7ruUxURpkB0mL8/WGNcxU2bNg VxzD+nH+0U9Zfl0I1peaTH6eo1SH3NEPt1n7WJ/As/nquvCDHMBuFo2a+12hGeKbudjcNp I/yYNMCwNsDLf8qhyiwzNEV4vyhnP4w= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HwzlOFXF; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf14.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656927008; a=rsa-sha256; cv=none; b=3fcOIoXrQILNj4h/9c5uTp7SmvOv75xo0XxnuPSphlT1DAqrF3aiWHQsoQKBSMp2n9klbl Rhj6TQJxXkR17+4JxYDZ2qqdR6wlNkMVwzMnT4hNry1kVzGV4/BA8tZBe9omrN6PX+EVye GiW7xkkxOcdlVBkPeMli3zWuVxUalz0= X-Rspam-User: X-Rspamd-Queue-Id: E353F101437 Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HwzlOFXF; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf14.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com X-Stat-Signature: rzwezqs9fssf6znifi8wyqutpkd5q486 X-Rspamd-Server: rspam08 X-HE-Tag: 1656927007-554864 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 04.07.22 10:13, CGEL wrote: > On Fri, Jul 01, 2022 at 02:09:24PM +0200, David Hildenbrand wrote: >> On 01.07.22 14:02, Michal Hocko wrote: >>> On Fri 01-07-22 12:50:59, David Hildenbrand wrote: >>>> On 01.07.22 12:32, David Hildenbrand wrote: >>>>> On 01.07.22 11:11, Michal Hocko wrote: >>>>>> [Cc Jann] >>>>>> >>>>>> On Fri 01-07-22 08:43:23, cgel.zte@gmail.com wrote: >>>>>>> From: xu xin >>>>>>> >>>>>>> The benefits of doing this are obvious because using madvise in user code >>>>>>> is the only current way to enable KSM, which is inconvenient for those >>>>>>> compiled app without marking MERGEABLE wanting to enable KSM. >>>>>> >>>>>> I would rephrase: >>>>>> " >>>>>> KSM functionality is currently available only to processes which are >>>>>> using MADV_MERGEABLE directly. This is limiting because there are >>>>>> usecases which will benefit from enabling KSM on a remote process. One >>>>>> example would be an application which cannot be modified (e.g. because >>>>>> it is only distributed as a binary). MORE EXAMPLES WOULD BE REALLY >>>>>> BENEFICIAL. >>>>>> " >>>>>> >>>>>>> Since we already have the syscall of process_madvise(), then reusing the >>>>>>> interface to allow external KSM hints is more acceptable [1]. >>>>>>> >>>>>>> Although this patch was released by Oleksandr Natalenko, but it was >>>>>>> unfortunately terminated without any conclusions because there was debate >>>>>>> on whether it should use signal_pending() to check the target task besides >>>>>>> the task of current() when calling unmerge_ksm_pages of other task [2]. >>>>>> >>>>>> I am not sure this is particularly interesting. I do not remember >>>>>> details of that discussion but checking signal_pending on a different >>>>>> task is rarely the right thing to do. In this case the check is meant to >>>>>> allow bailing out from the operation so that the caller could be >>>>>> terminated for example. >>>>>> >>>>>>> I think it's unneeded to check the target task. For example, when we set >>>>>>> the klob /sys/kernel/mm/ksm/run from 1 to 2, >>>>>>> unmerge_and_remove_all_rmap_items() doesn't use signal_pending() to check >>>>>>> all other target tasks either. >>>>>>> >>>>>>> I hope this patch can get attention again. >>>>>> >>>>>> One thing that the changelog is missing and it is quite important IMHO >>>>>> is the permission model. As we have discussed in previous incarnations >>>>>> of the remote KSM functionality that KSM has some security implications. >>>>>> It would be really great to refer to that in the changelog for the >>>>>> future reference (http://lkml.kernel.org/r/CAG48ez0riS60zcA9CC9rUDV=kLS0326Rr23OKv1_RHaTkOOj7A@mail.gmail.com) >>>>>> >>>>>> So this implementation requires PTRACE_MODE_READ_FSCREDS and >>>>>> CAP_SYS_NICE so the remote process would need to be allowed to >>>>>> introspect the address space. This is the same constrain applied to the >>>>>> remote momory reclaim. Is this sufficient? >>>>>> >>>>>> I would say yes because to some degree KSM mergning can have very >>>>>> similar effect to memory reclaim from the side channel POV. But it >>>>>> should be really documented in the changelog so that it is clear that >>>>>> this has been a deliberate decision and thought through. >>>>>> >>>>>> Other than that this looks like the most reasonable approach to me. >>>>>> >>>>>>> [1] https://lore.kernel.org/lkml/YoOrdh85+AqJH8w1@dhcp22.suse.cz/ >>>>>>> [2] https://lore.kernel.org/lkml/2a66abd8-4103-f11b-06d1-07762667eee6@suse.cz/ >>>>>>> >>>>> >>>>> I have various concerns, but the biggest concern is that this modifies >>>>> VMA flags and can possibly break applications. >>>>> >>>>> process_madvise must not modify remote process state. >>>>> >>>>> That's why we only allow a very limited selection that are merely hints. >>>>> >>>>> So nack from my side. >>>>> >>>> >>>> [I'm quit ebusy, but I think some more explanation might be of value] >>>> >>>> One COW example where I think force-enabling KSM for processes is >>>> *currently* not a good idea (besides the side channel discussions, which >>>> is also why Windows stopped to enable KSM system wide a while ago): >>>> >>>> App: >>>> >>>> a) memset(page, 0); >>>> b) trigger R/O long-term pin on page (e.g., vfio) >>>> >>>> If between a) and b) KSM replaces the page by the shared zeropage you'll >>>> get an unreliable pin because we don't break yet COW when taking a R/O >>>> pin on the shared zeropage. And in the traditional sense, the app did >>>> everything right to guarantee that the pin will stay reliable. >>> >>> Isn't this a bug in the existing implementation of the CoW? >> >> One the one hand yes (pinning the shared zeropage is questionable), on >> the other hand no (user space did modify that memory ahead of time and >> filled it with something reasonable, that's how why always worked >> correctly in the absence of KSM). >> > > Thanks for your information. > > So does it needs to be fixed? and if yes, are you planning to fix it. Very high on my todo list. So yes, I think it really needs fixing, especially with KSM in mind. -- Thanks, David / dhildenb