From: David Hildenbrand <david@redhat.com>
To: CGEL <cgel.zte@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org, vbabka@suse.cz,
minchan@kernel.org, oleksandr@redhat.com,
xu xin <xu.xin16@zte.com.cn>, Jann Horn <jannh@google.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH linux-next] mm/madvise: allow KSM hints for process_madvise
Date: Mon, 4 Jul 2022 11:30:03 +0200 [thread overview]
Message-ID: <96622b10-1d95-425d-278a-1cf21ee92604@redhat.com> (raw)
In-Reply-To: <62c2a117.1c69fb81.3a929.dda9@mx.google.com>
On 04.07.22 10:13, CGEL wrote:
> On Fri, Jul 01, 2022 at 02:09:24PM +0200, David Hildenbrand wrote:
>> On 01.07.22 14:02, Michal Hocko wrote:
>>> On Fri 01-07-22 12:50:59, David Hildenbrand wrote:
>>>> On 01.07.22 12:32, David Hildenbrand wrote:
>>>>> On 01.07.22 11:11, Michal Hocko wrote:
>>>>>> [Cc Jann]
>>>>>>
>>>>>> On Fri 01-07-22 08:43:23, cgel.zte@gmail.com wrote:
>>>>>>> From: xu xin <xu.xin16@zte.com.cn>
>>>>>>>
>>>>>>> The benefits of doing this are obvious because using madvise in user code
>>>>>>> is the only current way to enable KSM, which is inconvenient for those
>>>>>>> compiled app without marking MERGEABLE wanting to enable KSM.
>>>>>>
>>>>>> I would rephrase:
>>>>>> "
>>>>>> KSM functionality is currently available only to processes which are
>>>>>> using MADV_MERGEABLE directly. This is limiting because there are
>>>>>> usecases which will benefit from enabling KSM on a remote process. One
>>>>>> example would be an application which cannot be modified (e.g. because
>>>>>> it is only distributed as a binary). MORE EXAMPLES WOULD BE REALLY
>>>>>> BENEFICIAL.
>>>>>> "
>>>>>>
>>>>>>> Since we already have the syscall of process_madvise(), then reusing the
>>>>>>> interface to allow external KSM hints is more acceptable [1].
>>>>>>>
>>>>>>> Although this patch was released by Oleksandr Natalenko, but it was
>>>>>>> unfortunately terminated without any conclusions because there was debate
>>>>>>> on whether it should use signal_pending() to check the target task besides
>>>>>>> the task of current() when calling unmerge_ksm_pages of other task [2].
>>>>>>
>>>>>> I am not sure this is particularly interesting. I do not remember
>>>>>> details of that discussion but checking signal_pending on a different
>>>>>> task is rarely the right thing to do. In this case the check is meant to
>>>>>> allow bailing out from the operation so that the caller could be
>>>>>> terminated for example.
>>>>>>
>>>>>>> I think it's unneeded to check the target task. For example, when we set
>>>>>>> the klob /sys/kernel/mm/ksm/run from 1 to 2,
>>>>>>> unmerge_and_remove_all_rmap_items() doesn't use signal_pending() to check
>>>>>>> all other target tasks either.
>>>>>>>
>>>>>>> I hope this patch can get attention again.
>>>>>>
>>>>>> One thing that the changelog is missing and it is quite important IMHO
>>>>>> is the permission model. As we have discussed in previous incarnations
>>>>>> of the remote KSM functionality that KSM has some security implications.
>>>>>> It would be really great to refer to that in the changelog for the
>>>>>> future reference (http://lkml.kernel.org/r/CAG48ez0riS60zcA9CC9rUDV=kLS0326Rr23OKv1_RHaTkOOj7A@mail.gmail.com)
>>>>>>
>>>>>> So this implementation requires PTRACE_MODE_READ_FSCREDS and
>>>>>> CAP_SYS_NICE so the remote process would need to be allowed to
>>>>>> introspect the address space. This is the same constrain applied to the
>>>>>> remote momory reclaim. Is this sufficient?
>>>>>>
>>>>>> I would say yes because to some degree KSM mergning can have very
>>>>>> similar effect to memory reclaim from the side channel POV. But it
>>>>>> should be really documented in the changelog so that it is clear that
>>>>>> this has been a deliberate decision and thought through.
>>>>>>
>>>>>> Other than that this looks like the most reasonable approach to me.
>>>>>>
>>>>>>> [1] https://lore.kernel.org/lkml/YoOrdh85+AqJH8w1@dhcp22.suse.cz/
>>>>>>> [2] https://lore.kernel.org/lkml/2a66abd8-4103-f11b-06d1-07762667eee6@suse.cz/
>>>>>>>
>>>>>
>>>>> I have various concerns, but the biggest concern is that this modifies
>>>>> VMA flags and can possibly break applications.
>>>>>
>>>>> process_madvise must not modify remote process state.
>>>>>
>>>>> That's why we only allow a very limited selection that are merely hints.
>>>>>
>>>>> So nack from my side.
>>>>>
>>>>
>>>> [I'm quit ebusy, but I think some more explanation might be of value]
>>>>
>>>> One COW example where I think force-enabling KSM for processes is
>>>> *currently* not a good idea (besides the side channel discussions, which
>>>> is also why Windows stopped to enable KSM system wide a while ago):
>>>>
>>>> App:
>>>>
>>>> a) memset(page, 0);
>>>> b) trigger R/O long-term pin on page (e.g., vfio)
>>>>
>>>> If between a) and b) KSM replaces the page by the shared zeropage you'll
>>>> get an unreliable pin because we don't break yet COW when taking a R/O
>>>> pin on the shared zeropage. And in the traditional sense, the app did
>>>> everything right to guarantee that the pin will stay reliable.
>>>
>>> Isn't this a bug in the existing implementation of the CoW?
>>
>> One the one hand yes (pinning the shared zeropage is questionable), on
>> the other hand no (user space did modify that memory ahead of time and
>> filled it with something reasonable, that's how why always worked
>> correctly in the absence of KSM).
>>
>
> Thanks for your information.
>
> So does it needs to be fixed? and if yes, are you planning to fix it.
Very high on my todo list. So yes, I think it really needs fixing,
especially with KSM in mind.
--
Thanks,
David / dhildenb
prev parent reply other threads:[~2022-07-04 9:58 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-01 8:43 [PATCH linux-next] mm/madvise: allow KSM hints for process_madvise cgel.zte
2022-07-01 9:11 ` Michal Hocko
2022-07-01 10:32 ` David Hildenbrand
2022-07-01 10:50 ` David Hildenbrand
2022-07-01 12:02 ` Michal Hocko
2022-07-01 12:09 ` David Hildenbrand
2022-07-01 12:36 ` Michal Hocko
2022-07-01 12:39 ` David Hildenbrand
2022-07-01 13:19 ` Michal Hocko
2022-07-01 19:12 ` David Hildenbrand
2022-07-04 6:48 ` Michal Hocko
2022-07-04 7:29 ` CGEL
2022-07-04 8:40 ` Michal Hocko
2022-07-04 9:35 ` David Hildenbrand
2022-07-04 8:13 ` CGEL
2022-07-04 9:30 ` David Hildenbrand [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=96622b10-1d95-425d-278a-1cf21ee92604@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=cgel.zte@gmail.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=oleksandr@redhat.com \
--cc=vbabka@suse.cz \
--cc=xu.xin16@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).