From: Stefan Roesch <shr@devkernel.io>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: kernel-team@fb.com, linux-mm@kvack.org, riel@surriel.com,
mhocko@suse.com, david@redhat.com,
linux-kselftest@vger.kernel.org, linux-doc@vger.kernel.org,
akpm@linux-foundation.org
Subject: Re: [PATCH v3 1/3] mm: add new api to enable ksm per process
Date: Wed, 08 Mar 2023 14:16:36 -0800 [thread overview]
Message-ID: <qvqwbkl2zxui.fsf@dev0134.prn3.facebook.com> (raw)
In-Reply-To: <20230308164746.GA473363@cmpxchg.org>
Johannes Weiner <hannes@cmpxchg.org> writes:
> On Thu, Feb 23, 2023 at 08:39:58PM -0800, Stefan Roesch wrote:
>> This adds a new prctl to API to enable and disable KSM on a per process
>> basis instead of only at the VMA basis (with madvise).
>>
>> 1) Introduce new MMF_VM_MERGE_ANY flag
>>
>> This introduces the new flag MMF_VM_MERGE_ANY flag. When this flag is
>> set, kernel samepage merging (ksm) gets enabled for all vma's of a
>> process.
>>
>> 2) add flag to __ksm_enter
>>
>> This change adds the flag parameter to __ksm_enter. This allows to
>> distinguish if ksm was called by prctl or madvise.
>>
>> 3) add flag to __ksm_exit call
>>
>> This adds the flag parameter to the __ksm_exit() call. This allows to
>> distinguish if this call is for an prctl or madvise invocation.
>>
>> 4) invoke madvise for all vmas in scan_get_next_rmap_item
>>
>> If the new flag MMF_VM_MERGE_ANY has been set for a process, iterate
>> over all the vmas and enable ksm if possible. For the vmas that can be
>> ksm enabled this is only done once.
>>
>> 5) support disabling of ksm for a process
>>
>> This adds the ability to disable ksm for a process if ksm has been
>> enabled for the process.
>>
>> 6) add new prctl option to get and set ksm for a process
>>
>> This adds two new options to the prctl system call
>> - enable ksm for all vmas of a process (if the vmas support it).
>> - query if ksm has been enabled for a process.
>>
>> Signed-off-by: Stefan Roesch <shr@devkernel.io>
>
> Hey Stefan, thanks for merging the patches into one. I found it much
> easier to review.
>
> Overall this looks straight-forward to me. A few comments below:
>
>> @@ -2659,6 +2660,34 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>> case PR_SET_VMA:
>> error = prctl_set_vma(arg2, arg3, arg4, arg5);
>> break;
>> +#ifdef CONFIG_KSM
>> + case PR_SET_MEMORY_MERGE:
>> + if (!capable(CAP_SYS_RESOURCE))
>> + return -EPERM;
>> +
>> + if (arg2) {
>> + if (mmap_write_lock_killable(me->mm))
>> + return -EINTR;
>> +
>> + if (test_bit(MMF_VM_MERGEABLE, &me->mm->flags))
>> + error = -EINVAL;
>
> So if the workload has already madvised specific VMAs the
> process-enablement will fail. Why is that? Shouldn't it be possible to
> override a local decision from an outside context that has more
> perspective on both sharing opportunities and security aspects?
>
> If there is a good reason for it, the -EINVAL should be addressed in
> the manpage. And maybe add a comment here as well.
>
This makes sense, I'll remove the check above.
>> + else if (!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags))
>> + error = __ksm_enter(me->mm, MMF_VM_MERGE_ANY);
>> + mmap_write_unlock(me->mm);
>> + } else {
>> + __ksm_exit(me->mm, MMF_VM_MERGE_ANY);
>> + }
>> + break;
>> + case PR_GET_MEMORY_MERGE:
>> + if (!capable(CAP_SYS_RESOURCE))
>> + return -EPERM;
>> +
>> + if (arg2 || arg3 || arg4 || arg5)
>> + return -EINVAL;
>> +
>> + error = !!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
>> + break;
>> +#endif
>> default:
>> error = -EINVAL;
>> break;
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index 56808e3bfd19..23d6944f78ad 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -1063,6 +1063,7 @@ static int unmerge_and_remove_all_rmap_items(void)
>>
>> mm_slot_free(mm_slot_cache, mm_slot);
>> clear_bit(MMF_VM_MERGEABLE, &mm->flags);
>> + clear_bit(MMF_VM_MERGE_ANY, &mm->flags);
>> mmdrop(mm);
>> } else
>> spin_unlock(&ksm_mmlist_lock);
>> @@ -2329,6 +2330,17 @@ static struct ksm_rmap_item *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
>> return rmap_item;
>> }
>>
>> +static bool vma_ksm_mergeable(struct vm_area_struct *vma)
>> +{
>> + if (vma->vm_flags & VM_MERGEABLE)
>> + return true;
>> +
>> + if (test_bit(MMF_VM_MERGE_ANY, &vma->vm_mm->flags))
>> + return true;
>> +
>> + return false;
>> +}
>> +
>> static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>> {
>> struct mm_struct *mm;
>> @@ -2405,8 +2417,20 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>> goto no_vmas;
>>
>> for_each_vma(vmi, vma) {
>> - if (!(vma->vm_flags & VM_MERGEABLE))
>> + if (!vma_ksm_mergeable(vma))
>> continue;
>> + if (!(vma->vm_flags & VM_MERGEABLE)) {
>
> IMO, the helper obscures the interaction between the vma flag and the
> per-process flag here. How about:
>
> if (!(vma->vm_flags & VM_MERGEABLE)) {
> if (!test_bit(MMF_VM_MERGE_ANY, &vma->vm_mm->flags))
> continue;
>
> /*
> * With per-process merging enabled, have the MM scan
> * enroll any existing and new VMAs on the fly.
> *
> ksm_madvise();
> }
>
>> + unsigned long flags = vma->vm_flags;
>> +
>> + /* madvise failed, use next vma */
>> + if (ksm_madvise(vma, vma->vm_start, vma->vm_end, MADV_MERGEABLE, &flags))
>> + continue;
>> + /* vma, not supported as being mergeable */
>> + if (!(flags & VM_MERGEABLE))
>> + continue;
>> +
>> + vm_flags_set(vma, VM_MERGEABLE);
>
> I don't understand the local flags. Can't it pass &vma->vm_flags to
> ksm_madvise()? It'll set VM_MERGEABLE on success. And you know it
> wasn't set before because the whole thing is inside the !set
> branch. The return value doesn't seem super useful, it's only the flag
> setting that matters:
>
> ksm_madvise(vma, vma->vm_start, vma->vm_end, MADV_MERGEABLE, &vma->vm_flags);
> /* madvise can fail, and will skip special vmas (pfnmaps and such) */
> if (!(vma->vm_flags & VM_MERGEABLE))
> continue;
>
vm_flags is defined as const. I cannot pass it directly inside the
function, this is the reason, I'm using a local variable for it.
>> + }
>> if (ksm_scan.address < vma->vm_start)
>> ksm_scan.address = vma->vm_start;
>> if (!vma->anon_vma)
>> @@ -2491,6 +2515,7 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
>>
>> mm_slot_free(mm_slot_cache, mm_slot);
>> clear_bit(MMF_VM_MERGEABLE, &mm->flags);
>> + clear_bit(MMF_VM_MERGE_ANY, &mm->flags);
>> mmap_read_unlock(mm);
>> mmdrop(mm);
>> } else {
>
>> @@ -2664,12 +2690,39 @@ int __ksm_enter(struct mm_struct *mm)
>> return 0;
>> }
>>
>> -void __ksm_exit(struct mm_struct *mm)
>> +static void unmerge_vmas(struct mm_struct *mm)
>> +{
>> + struct vm_area_struct *vma;
>> + struct vma_iterator vmi;
>> +
>> + vma_iter_init(&vmi, mm, 0);
>> +
>> + mmap_read_lock(mm);
>> + for_each_vma(vmi, vma) {
>> + if (vma->vm_flags & VM_MERGEABLE) {
>> + unsigned long flags = vma->vm_flags;
>> +
>> + if (ksm_madvise(vma, vma->vm_start, vma->vm_end, MADV_UNMERGEABLE, &flags))
>> + continue;
>> +
>> + vm_flags_clear(vma, VM_MERGEABLE);
>
> ksm_madvise() tests and clears VM_MERGEABLE, so AFAICS
>
> for_each_vma(vmi, vma)
> ksm_madvise();
>
> should do it...
>
This is the same problem. vma->vm_flags is defined as const.
+ if (vma->vm_flags & VM_MERGEABLE) {
This will be removed.
>> + }
>> + }
>> + mmap_read_unlock(mm);
>> +}
>> +
>> +void __ksm_exit(struct mm_struct *mm, int flag)
>> {
>> struct ksm_mm_slot *mm_slot;
>> struct mm_slot *slot;
>> int easy_to_free = 0;
>>
>> + if (!(current->flags & PF_EXITING) && flag == MMF_VM_MERGE_ANY &&
>> + test_bit(MMF_VM_MERGE_ANY, &mm->flags)) {
>> + clear_bit(MMF_VM_MERGE_ANY, &mm->flags);
>> + unmerge_vmas(mm);
>
> ...and then it's short enough to just open-code it here and drop the
> unmerge_vmas() helper.
next prev parent reply other threads:[~2023-03-08 22:23 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-24 4:39 [PATCH v3 0/3] mm: process/cgroup ksm support Stefan Roesch
2023-02-24 4:39 ` [PATCH v3 1/3] mm: add new api to enable ksm per process Stefan Roesch
2023-03-08 16:47 ` Johannes Weiner
2023-03-08 22:16 ` Stefan Roesch [this message]
2023-03-09 4:59 ` Johannes Weiner
2023-03-09 22:33 ` Stefan Roesch
2023-02-24 4:39 ` [PATCH v3 2/3] mm: add new KSM process and sysfs knobs Stefan Roesch
2023-02-24 4:40 ` [PATCH v3 3/3] selftests/mm: add new selftests for KSM Stefan Roesch
2023-02-26 5:30 ` Andrew Morton
2023-02-27 17:19 ` Stefan Roesch
2023-02-27 17:24 ` Mathieu Desnoyers
2023-02-26 5:08 ` [PATCH v3 0/3] mm: process/cgroup ksm support Andrew Morton
2023-02-27 17:13 ` Stefan Roesch
2023-03-07 18:48 ` Stefan Roesch
2023-03-08 17:01 ` David Hildenbrand
2023-03-08 17:30 ` Johannes Weiner
2023-03-08 18:41 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=qvqwbkl2zxui.fsf@dev0134.prn3.facebook.com \
--to=shr@devkernel.io \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=riel@surriel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).