From: Andrea Arcangeli <aarcange@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Stephen Rothwell <sfr@canb.auug.org.au>,
linux-mm@kvack.org, linux-next@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup
Date: Fri, 3 Jun 2016 15:51:54 +0200 [thread overview]
Message-ID: <20160603135154.GD29930@redhat.com> (raw)
In-Reply-To: <20160602122109.GM1995@dhcp22.suse.cz>
On Thu, Jun 02, 2016 at 02:21:10PM +0200, Michal Hocko wrote:
> Testing with the patch makes some sense as well, but I would like to
> hear from Andrea whether the approach is good because I am wondering why
> he hasn't done that before - it feels so much simpler than the current
> code.
The down_write in the exit path comes from __ksm_exit. If you don't
like it there I'd suggest to also remove it from __ksm_exit.
This is a proposed cleanup correct?
The first thing that I can notice is that khugepaged_test_exit() then
can only be called and provide the expected retval, after
atomic_inc_not_zero(mm_users). Also note mmget_not_zero() should be
used instead.
However the code still uses khugepaged_test_exit in __khugepage_enter
that won't increase the mm_users, so then the patch relaxes that check
too much, albeit only for a debug check not strictly a bug.
The cons of this change purely that it'll decrease the responsiveness
in releasing the RAM of a killed task a bit.
To me the fewer time we hold the mm_users the better and I don't see
an obvious runtime improvement coming from this change. It's a bit
simpler yes, but the down_write in the exit path is well understood,
ksm does the same thing and it's in a slow path (it only happens if
the mm that exited is the current one under scan by either ksmd or
khugepaged, so normally the down_write is not executed in the exit
path and the "mm" is collected right away both as a mm_users and
mm_count).
In short I think it's a tradeoff: pros) removes down_write in a slow
path of the the mm exit which may simplify the code a bit, cons) it
could increase the latency in freeing memory as result of a task
exiting or being killed during the khugepaged scan, for example while
the THP is being allocated. While compaction runs to allocate the THP
in collapse_huge_page, if the task is killed currently the memory is
released right away, without waiting for the allocation to succeed or
fail.
I don't see a big enough problem with the down_write in a slow path of
khugepaged_exit to justify the increased latency in releasing memory.
I was very happy by Oleg's patch reducing the mm_users holding of
userfaultfd too. That was controlled by userland so it would only be
an issue for non-cooperative usage which isn't upstream yet, and it
was also much wider than this one would become with the patch applied,
but I liked the direction.
If prefer instead to remove the down_write, you probably could move
the test_exit before the down_read/write to bail out before taking the
lock: you don't need the mmap_sem to do test_exit anymore. The only
reason the text_exit would remain in fact is just to reduce the
latency of the memory freeing, it then becomes a voluntary preempt
cond_resched() to release the memory to make a parallel ;), but unable
to let the kernel free the memory while the THP allocation runs.
Thanks,
Andrea
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Stephen Rothwell <sfr@canb.auug.org.au>,
linux-mm@kvack.org, linux-next@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup
Date: Fri, 3 Jun 2016 15:51:54 +0200 [thread overview]
Message-ID: <20160603135154.GD29930@redhat.com> (raw)
In-Reply-To: <20160602122109.GM1995@dhcp22.suse.cz>
On Thu, Jun 02, 2016 at 02:21:10PM +0200, Michal Hocko wrote:
> Testing with the patch makes some sense as well, but I would like to
> hear from Andrea whether the approach is good because I am wondering why
> he hasn't done that before - it feels so much simpler than the current
> code.
The down_write in the exit path comes from __ksm_exit. If you don't
like it there I'd suggest to also remove it from __ksm_exit.
This is a proposed cleanup correct?
The first thing that I can notice is that khugepaged_test_exit() then
can only be called and provide the expected retval, after
atomic_inc_not_zero(mm_users). Also note mmget_not_zero() should be
used instead.
However the code still uses khugepaged_test_exit in __khugepage_enter
that won't increase the mm_users, so then the patch relaxes that check
too much, albeit only for a debug check not strictly a bug.
The cons of this change purely that it'll decrease the responsiveness
in releasing the RAM of a killed task a bit.
To me the fewer time we hold the mm_users the better and I don't see
an obvious runtime improvement coming from this change. It's a bit
simpler yes, but the down_write in the exit path is well understood,
ksm does the same thing and it's in a slow path (it only happens if
the mm that exited is the current one under scan by either ksmd or
khugepaged, so normally the down_write is not executed in the exit
path and the "mm" is collected right away both as a mm_users and
mm_count).
In short I think it's a tradeoff: pros) removes down_write in a slow
path of the the mm exit which may simplify the code a bit, cons) it
could increase the latency in freeing memory as result of a task
exiting or being killed during the khugepaged scan, for example while
the THP is being allocated. While compaction runs to allocate the THP
in collapse_huge_page, if the task is killed currently the memory is
released right away, without waiting for the allocation to succeed or
fail.
I don't see a big enough problem with the down_write in a slow path of
khugepaged_exit to justify the increased latency in releasing memory.
I was very happy by Oleg's patch reducing the mm_users holding of
userfaultfd too. That was controlled by userland so it would only be
an issue for non-cooperative usage which isn't upstream yet, and it
was also much wider than this one would become with the patch applied,
but I liked the direction.
If prefer instead to remove the down_write, you probably could move
the test_exit before the down_read/write to bail out before taking the
lock: you don't need the mmap_sem to do test_exit anymore. The only
reason the text_exit would remain in fact is just to reduce the
latency of the memory freeing, it then becomes a voluntary preempt
cond_resched() to release the memory to make a parallel ;), but unable
to let the kernel free the memory while the THP allocation runs.
Thanks,
Andrea
next prev parent reply other threads:[~2016-06-03 13:51 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-01 3:11 linux-next: Tree for Jun 1 Stephen Rothwell
2016-06-02 1:48 ` [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write_failed lockup Sergey Senozhatsky
2016-06-02 1:48 ` Sergey Senozhatsky
2016-06-02 9:21 ` Michal Hocko
2016-06-02 9:21 ` Michal Hocko
2016-06-02 12:08 ` Sergey Senozhatsky
2016-06-02 12:08 ` Sergey Senozhatsky
2016-06-02 12:21 ` Michal Hocko
2016-06-02 12:21 ` Michal Hocko
2016-06-03 13:51 ` Andrea Arcangeli [this message]
2016-06-03 13:51 ` Andrea Arcangeli
2016-06-03 14:46 ` Michal Hocko
2016-06-03 14:46 ` Michal Hocko
2016-06-03 15:10 ` Andrea Arcangeli
2016-06-03 15:10 ` Andrea Arcangeli
2016-06-07 7:34 ` Michal Hocko
2016-06-07 7:34 ` Michal Hocko
2016-06-08 8:19 ` Vlastimil Babka
2016-06-08 8:19 ` Vlastimil Babka
2016-06-03 7:15 ` Sergey Senozhatsky
2016-06-03 7:15 ` Sergey Senozhatsky
2016-06-03 7:25 ` Michal Hocko
2016-06-03 7:25 ` Michal Hocko
2016-06-03 8:43 ` Sergey Senozhatsky
2016-06-03 8:43 ` Sergey Senozhatsky
2016-06-03 9:55 ` Michal Hocko
2016-06-03 9:55 ` Michal Hocko
2016-06-03 10:05 ` Michal Hocko
2016-06-03 10:05 ` Michal Hocko
2016-06-03 13:38 ` Sergey Senozhatsky
2016-06-03 13:38 ` Sergey Senozhatsky
2016-06-03 13:45 ` Michal Hocko
2016-06-03 13:45 ` Michal Hocko
2016-06-03 13:49 ` Michal Hocko
2016-06-03 13:49 ` Michal Hocko
2016-06-03 13:49 ` Michal Hocko
2016-06-04 7:51 ` Sergey Senozhatsky
2016-06-04 7:51 ` Sergey Senozhatsky
2016-06-06 8:39 ` Michal Hocko
2016-06-06 8:39 ` Michal Hocko
2016-06-02 13:24 ` Vlastimil Babka
2016-06-02 13:24 ` Vlastimil Babka
2016-06-02 18:58 ` Ebru Akagunduz
2016-06-02 18:58 ` Ebru Akagunduz
2016-06-03 1:00 ` Sergey Senozhatsky
2016-06-03 1:00 ` Sergey Senozhatsky
2016-06-03 1:29 ` Sergey Senozhatsky
2016-06-03 1:29 ` Sergey Senozhatsky
2016-06-03 4:14 ` Sergey Senozhatsky
2016-06-03 4:14 ` Sergey Senozhatsky
2016-06-03 12:28 ` [PATCH] mm, thp: fix locking inconsistency in collapse_huge_page Ebru Akagunduz
2016-06-03 12:28 ` Ebru Akagunduz
2016-06-06 13:05 ` Vlastimil Babka
2016-06-06 13:05 ` Vlastimil Babka
2016-06-09 3:51 ` Sergey Senozhatsky
2016-06-09 3:51 ` Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160603135154.GD29930@redhat.com \
--to=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-next@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=sergey.senozhatsky@gmail.com \
--cc=sfr@canb.auug.org.au \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.