From: Minchan Kim <minchan@kernel.org>
To: John Stultz <john.stultz@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Michael Kerrisk <mtk.manpages@gmail.com>,
Arun Sharma <asharma@fb.com>,
sanjay@google.com, Paul Turner <pjt@google.com>,
David Rientjes <rientjes@google.com>,
Christoph Lameter <cl@linux.com>,
Android Kernel Team <kernel-team@android.com>,
Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
Hugh Dickins <hughd@google.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
Rik van Riel <riel@redhat.com>,
Dave Chinner <david@fromorbit.com>, Neil Brown <neilb@suse.de>,
Mike Hommey <mh@glandium.org>, Taras Glek <tglek@mozilla.com>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [RFC 1/8] Introduce new system call mvolatile
Date: Fri, 18 Jan 2013 14:30:06 +0900 [thread overview]
Message-ID: <20130118053006.GB31368@blaptop> (raw)
In-Reply-To: <50F75875.30909@linaro.org>
On Wed, Jan 16, 2013 at 05:48:37PM -0800, John Stultz wrote:
> On 01/02/2013 08:27 PM, Minchan Kim wrote:
> >This patch adds new system call m[no]volatile.
> >If someone asks is_volatile system call, it could be added, too.
>
> So some nits below from my initial playing around with this patchset.
>
> >+/*
> >+ * Return -EINVAL if range doesn't include a right vma at all.
> >+ * Return -ENOMEM with interrupting range opeartion if memory is not enough to
> >+ * merge/split vmas.
> >+ * Return 0 if range consists of only proper vmas.
> >+ * Return 1 if part of range includes inavlid area(ex, hole/huge/ksm/mlock/
> >+ * special area)
> >+ */
> >+SYSCALL_DEFINE2(mvolatile, unsigned long, start, size_t, len)
> >+{
> >+ unsigned long end, tmp;
> >+ struct vm_area_struct *vma, *prev;
> >+ bool invalid = false;
> >+ int error = -EINVAL;
> >+
> >+ down_write(¤t->mm->mmap_sem);
> >+ if (start & ~PAGE_MASK)
> >+ goto out;
> >+
> >+ len &= PAGE_MASK;
> >+ if (!len)
> >+ goto out;
> >+
> >+ end = start + len;
> >+ if (end < start)
> >+ goto out;
> >+
> >+ vma = find_vma_prev(current->mm, start, &prev);
> >+ if (!vma)
> >+ goto out;
> >+
> >+ if (start > vma->vm_start)
> >+ prev = vma;
> >+
> >+ for (;;) {
> >+ /* Here start < (end|vma->vm_end). */
> >+ if (start < vma->vm_start) {
> >+ start = vma->vm_start;
> >+ if (start >= end)
> >+ goto out;
> >+ invalid = true;
> >+ }
> >+
> >+ /* Here vma->vm_start <= start < (end|vma->vm_end) */
> >+ tmp = vma->vm_end;
> >+ if (end < tmp)
> >+ tmp = end;
> >+
> >+ /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
> >+ error = do_mvolatile(vma, &prev, start, tmp);
> >+ if (error == -ENOMEM) {
> >+ up_write(¤t->mm->mmap_sem);
> >+ return error;
> >+ }
> >+ if (error == -EINVAL)
> >+ invalid = true;
> >+ else
> >+ error = 0;
> >+ start = tmp;
> >+ if (prev && start < prev->vm_end)
> >+ start = prev->vm_end;
> >+ if (start >= end)
> >+ break;
> >+
> >+ vma = prev->vm_next;
> >+ if (!vma)
> >+ break;
> >+ }
> >+out:
> >+ up_write(¤t->mm->mmap_sem);
> >+ return invalid ? 1 : 0;
> >+}
>
> The error logic here is really strange. If any of the early error
> cases are triggered (ie: (start & ~PAGE_MASK), etc), then we jump to
> out and return 0 (instead of EINVAL). I don't think that's what you
> intended.
Need fixing.
>
>
> >+/*
> >+ * Return -ENOMEM with interrupting range opeartion if memory is not enough
> >+ * to merge/split vmas.
> >+ * Return 1 if part of range includes purged's one, otherwise, return 0
> >+ */
> >+SYSCALL_DEFINE2(mnovolatile, unsigned long, start, size_t, len)
> >+{
> >+ unsigned long end, tmp;
> >+ struct vm_area_struct *vma, *prev;
> >+ int ret, error = -EINVAL;
> >+ bool is_purged = false;
> >+
> >+ down_write(¤t->mm->mmap_sem);
> >+ if (start & ~PAGE_MASK)
> >+ goto out;
> >+
> >+ len &= PAGE_MASK;
> >+ if (!len)
> >+ goto out;
> >+
> >+ end = start + len;
> >+ if (end < start)
> >+ goto out;
> >+
> >+ vma = find_vma_prev(current->mm, start, &prev);
> >+ if (!vma)
> >+ goto out;
> >+
> >+ if (start > vma->vm_start)
> >+ prev = vma;
> >+
> >+ for (;;) {
> >+ /* Here start < (end|vma->vm_end). */
> >+ if (start < vma->vm_start) {
> >+ start = vma->vm_start;
> >+ if (start >= end)
> >+ goto out;
> >+ }
> >+
> >+ /* Here vma->vm_start <= start < (end|vma->vm_end) */
> >+ tmp = vma->vm_end;
> >+ if (end < tmp)
> >+ tmp = end;
> >+
> >+ /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
> >+ error = do_mnovolatile(vma, &prev, start, tmp, &is_purged);
> >+ if (error) {
> >+ WARN_ON(error != -ENOMEM);
> >+ goto out;
> >+ }
> >+ start = tmp;
> >+ if (prev && start < prev->vm_end)
> >+ start = prev->vm_end;
> >+ if (start >= end)
> >+ break;
> >+
> >+ vma = prev->vm_next;
> >+ if (!vma)
> >+ break;
> >+ }
>
> I'm still not sure how this logic improves over the madvise case. If
> we catch an error mid-way through setting a series of vmas to
> non-volatile, we end up exiting and losing state (ie: if only the
> first vma was purged, but half way through 10 vmas we get a ENOMEM
> error. So the first vma is now non-volatile, but we do not return
> the purged flag ).
Right.
>
> If we're going to have a new syscall for this (which I'm not sure is
> the right approach), we should make use of multiple arguments so we
> can return if data was purged, even if we hit an error midway).
It would be easier method to achieve our goal than below suggestion
in case of VMA-basd approach because it's hard to expect how many
we need vmas with atomically.
Will do it in next version.
>
> Alternatively, if we can find a way to allocate any necessary memory
> before we do any vma volatility state changes, then we can return
> ENOMEM then and be confident we won't end up with failed partial
> state change (this is the approach I used in my fallocate-volatile
> patches).
Thanks for the review, John.
>
> thanks
> -john
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: John Stultz <john.stultz@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Michael Kerrisk <mtk.manpages@gmail.com>,
Arun Sharma <asharma@fb.com>,
sanjay@google.com, Paul Turner <pjt@google.com>,
David Rientjes <rientjes@google.com>,
Christoph Lameter <cl@linux.com>,
Android Kernel Team <kernel-team@android.com>,
Robert Love <rlove@google.com>, Mel Gorman <mel@csn.ul.ie>,
Hugh Dickins <hughd@google.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
Rik van Riel <riel@redhat.com>,
Dave Chinner <david@fromorbit.com>, Neil Brown <neilb@suse.de>,
Mike Hommey <mh@glandium.org>, Taras Glek <tglek@mozilla.com>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [RFC 1/8] Introduce new system call mvolatile
Date: Fri, 18 Jan 2013 14:30:06 +0900 [thread overview]
Message-ID: <20130118053006.GB31368@blaptop> (raw)
In-Reply-To: <50F75875.30909@linaro.org>
On Wed, Jan 16, 2013 at 05:48:37PM -0800, John Stultz wrote:
> On 01/02/2013 08:27 PM, Minchan Kim wrote:
> >This patch adds new system call m[no]volatile.
> >If someone asks is_volatile system call, it could be added, too.
>
> So some nits below from my initial playing around with this patchset.
>
> >+/*
> >+ * Return -EINVAL if range doesn't include a right vma at all.
> >+ * Return -ENOMEM with interrupting range opeartion if memory is not enough to
> >+ * merge/split vmas.
> >+ * Return 0 if range consists of only proper vmas.
> >+ * Return 1 if part of range includes inavlid area(ex, hole/huge/ksm/mlock/
> >+ * special area)
> >+ */
> >+SYSCALL_DEFINE2(mvolatile, unsigned long, start, size_t, len)
> >+{
> >+ unsigned long end, tmp;
> >+ struct vm_area_struct *vma, *prev;
> >+ bool invalid = false;
> >+ int error = -EINVAL;
> >+
> >+ down_write(¤t->mm->mmap_sem);
> >+ if (start & ~PAGE_MASK)
> >+ goto out;
> >+
> >+ len &= PAGE_MASK;
> >+ if (!len)
> >+ goto out;
> >+
> >+ end = start + len;
> >+ if (end < start)
> >+ goto out;
> >+
> >+ vma = find_vma_prev(current->mm, start, &prev);
> >+ if (!vma)
> >+ goto out;
> >+
> >+ if (start > vma->vm_start)
> >+ prev = vma;
> >+
> >+ for (;;) {
> >+ /* Here start < (end|vma->vm_end). */
> >+ if (start < vma->vm_start) {
> >+ start = vma->vm_start;
> >+ if (start >= end)
> >+ goto out;
> >+ invalid = true;
> >+ }
> >+
> >+ /* Here vma->vm_start <= start < (end|vma->vm_end) */
> >+ tmp = vma->vm_end;
> >+ if (end < tmp)
> >+ tmp = end;
> >+
> >+ /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
> >+ error = do_mvolatile(vma, &prev, start, tmp);
> >+ if (error == -ENOMEM) {
> >+ up_write(¤t->mm->mmap_sem);
> >+ return error;
> >+ }
> >+ if (error == -EINVAL)
> >+ invalid = true;
> >+ else
> >+ error = 0;
> >+ start = tmp;
> >+ if (prev && start < prev->vm_end)
> >+ start = prev->vm_end;
> >+ if (start >= end)
> >+ break;
> >+
> >+ vma = prev->vm_next;
> >+ if (!vma)
> >+ break;
> >+ }
> >+out:
> >+ up_write(¤t->mm->mmap_sem);
> >+ return invalid ? 1 : 0;
> >+}
>
> The error logic here is really strange. If any of the early error
> cases are triggered (ie: (start & ~PAGE_MASK), etc), then we jump to
> out and return 0 (instead of EINVAL). I don't think that's what you
> intended.
Need fixing.
>
>
> >+/*
> >+ * Return -ENOMEM with interrupting range opeartion if memory is not enough
> >+ * to merge/split vmas.
> >+ * Return 1 if part of range includes purged's one, otherwise, return 0
> >+ */
> >+SYSCALL_DEFINE2(mnovolatile, unsigned long, start, size_t, len)
> >+{
> >+ unsigned long end, tmp;
> >+ struct vm_area_struct *vma, *prev;
> >+ int ret, error = -EINVAL;
> >+ bool is_purged = false;
> >+
> >+ down_write(¤t->mm->mmap_sem);
> >+ if (start & ~PAGE_MASK)
> >+ goto out;
> >+
> >+ len &= PAGE_MASK;
> >+ if (!len)
> >+ goto out;
> >+
> >+ end = start + len;
> >+ if (end < start)
> >+ goto out;
> >+
> >+ vma = find_vma_prev(current->mm, start, &prev);
> >+ if (!vma)
> >+ goto out;
> >+
> >+ if (start > vma->vm_start)
> >+ prev = vma;
> >+
> >+ for (;;) {
> >+ /* Here start < (end|vma->vm_end). */
> >+ if (start < vma->vm_start) {
> >+ start = vma->vm_start;
> >+ if (start >= end)
> >+ goto out;
> >+ }
> >+
> >+ /* Here vma->vm_start <= start < (end|vma->vm_end) */
> >+ tmp = vma->vm_end;
> >+ if (end < tmp)
> >+ tmp = end;
> >+
> >+ /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
> >+ error = do_mnovolatile(vma, &prev, start, tmp, &is_purged);
> >+ if (error) {
> >+ WARN_ON(error != -ENOMEM);
> >+ goto out;
> >+ }
> >+ start = tmp;
> >+ if (prev && start < prev->vm_end)
> >+ start = prev->vm_end;
> >+ if (start >= end)
> >+ break;
> >+
> >+ vma = prev->vm_next;
> >+ if (!vma)
> >+ break;
> >+ }
>
> I'm still not sure how this logic improves over the madvise case. If
> we catch an error mid-way through setting a series of vmas to
> non-volatile, we end up exiting and losing state (ie: if only the
> first vma was purged, but half way through 10 vmas we get a ENOMEM
> error. So the first vma is now non-volatile, but we do not return
> the purged flag ).
Right.
>
> If we're going to have a new syscall for this (which I'm not sure is
> the right approach), we should make use of multiple arguments so we
> can return if data was purged, even if we hit an error midway).
It would be easier method to achieve our goal than below suggestion
in case of VMA-basd approach because it's hard to expect how many
we need vmas with atomically.
Will do it in next version.
>
> Alternatively, if we can find a way to allocate any necessary memory
> before we do any vma volatility state changes, then we can return
> ENOMEM then and be confident we won't end up with failed partial
> state change (this is the approach I used in my fallocate-volatile
> patches).
Thanks for the review, John.
>
> thanks
> -john
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
next prev parent reply other threads:[~2013-01-18 5:30 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-03 4:27 [RFC v5 0/8] Support volatile for anonymous range Minchan Kim
2013-01-03 4:27 ` Minchan Kim
2013-01-03 4:27 ` [RFC 1/8] Introduce new system call mvolatile Minchan Kim
2013-01-03 4:27 ` Minchan Kim
2013-01-03 18:35 ` Taras Glek
2013-01-03 18:35 ` Taras Glek
2013-01-04 4:25 ` Minchan Kim
2013-01-04 4:25 ` Minchan Kim
2013-01-17 1:48 ` John Stultz
2013-01-17 1:48 ` John Stultz
2013-01-18 5:30 ` Minchan Kim [this message]
2013-01-18 5:30 ` Minchan Kim
2013-01-03 4:28 ` [RFC 2/8] Don't allow volatile attribute on THP and KSM Minchan Kim
2013-01-03 4:28 ` Minchan Kim
2013-01-03 16:27 ` Dave Hansen
2013-01-03 16:27 ` Dave Hansen
2013-01-04 2:51 ` Minchan Kim
2013-01-04 2:51 ` Minchan Kim
2013-01-03 4:28 ` [RFC 3/8] bail out when the page is in VOLATILE vma Minchan Kim
2013-01-03 4:28 ` Minchan Kim
2013-01-03 4:28 ` [RFC 4/8] add page_locked parameter in free_swap_and_cache Minchan Kim
2013-01-03 4:28 ` Minchan Kim
2013-01-03 4:28 ` [RFC 5/8] Discard volatile page Minchan Kim
2013-01-03 4:28 ` Minchan Kim
2013-01-03 4:28 ` [RFC 6/8] add PGVOLATILE vmstat count Minchan Kim
2013-01-03 4:28 ` Minchan Kim
2013-01-03 4:28 ` [RFC 7/8] add volatile page discard hook to kswapd Minchan Kim
2013-01-03 4:28 ` Minchan Kim
2013-01-03 4:28 ` [RFC 8/8] extend PGVOLATILE vmstat " Minchan Kim
2013-01-03 4:28 ` Minchan Kim
2013-01-03 17:19 ` [RFC v5 0/8] Support volatile for anonymous range Sanjay Ghemawat
2013-01-03 17:19 ` Sanjay Ghemawat
2013-01-04 5:15 ` Minchan Kim
2013-01-04 5:15 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130118053006.GB31368@blaptop \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=asharma@fb.com \
--cc=cl@linux.com \
--cc=dave@linux.vnet.ibm.com \
--cc=david@fromorbit.com \
--cc=hughd@google.com \
--cc=john.stultz@linaro.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kernel-team@android.com \
--cc=kosaki.motohiro@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mh@glandium.org \
--cc=mtk.manpages@gmail.com \
--cc=neilb@suse.de \
--cc=pjt@google.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=rlove@google.com \
--cc=sanjay@google.com \
--cc=tglek@mozilla.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.