All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang\, Ying" <ying.huang@intel.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huang, Ying" <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH -mm -v7 9/9] mm, THP, swap: Delay splitting THP during swap out
Date: Sat, 01 Apr 2017 10:54:39 +0800	[thread overview]
Message-ID: <874ly8suq8.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20170331144948.GA6408@cmpxchg.org> (Johannes Weiner's message of "Fri, 31 Mar 2017 10:49:48 -0400")

Johannes Weiner <hannes@cmpxchg.org> writes:

> On Thu, Mar 30, 2017 at 12:15:13PM +0800, Huang, Ying wrote:
>> Johannes Weiner <hannes@cmpxchg.org> writes:
>> > On Tue, Mar 28, 2017 at 01:32:09PM +0800, Huang, Ying wrote:
>> >> @@ -198,6 +240,18 @@ int add_to_swap(struct page *page, struct list_head *list)
>> >>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
>> >>  	VM_BUG_ON_PAGE(!PageUptodate(page), page);
>> >>  
>> >> +	if (unlikely(PageTransHuge(page))) {
>> >> +		err = add_to_swap_trans_huge(page, list);
>> >> +		switch (err) {
>> >> +		case 1:
>> >> +			return 1;
>> >> +		case 0:
>> >> +			/* fallback to split firstly if return 0 */
>> >> +			break;
>> >> +		default:
>> >> +			return 0;
>> >> +		}
>> >> +	}
>> >>  	entry = get_swap_page();
>> >>  	if (!entry.val)
>> >>  		return 0;
>> >
>> > add_to_swap_trans_huge() is too close a copy of add_to_swap(), which
>> > makes the code error prone for future modifications to the swap slot
>> > allocation protocol.
>> >
>> > This should read:
>> >
>> > retry:
>> > 	entry = get_swap_page(page);
>> > 	if (!entry.val) {
>> > 		if (PageTransHuge(page)) {
>> > 			split_huge_page_to_list(page, list);
>> > 			goto retry;
>> > 		}
>> > 		return 0;
>> > 	}
>> 
>> If the swap space is used up, that is, get_swap_page() cannot allocate
>> even 1 swap entry for a normal page.  We will split THP unnecessarily
>> with the change, but in the original code, we just skip the THP.  There
>> may be a performance regression here.  Similar problem exists for
>> mem_cgroup_try_charge_swap() too.  If the mem cgroup exceeds the swap
>> limit, the THP will be split unnecessary with the change too.
>
> If we skip the page, we're swapping out another page hotter than this
> one. Giving THP preservation priority over LRU order is an issue best
> kept for a separate patch set;

In my original patch, if we failed to allocate the swap space for a THP,
and we can allocate the swap space for a normal page, we will split the
THP.  We skip the page only if we cannot allocate the swap space for a
normal page, that is, nr_swap_pages is 0.  So we will not give THP
preservation priority over LRU order in the patch.

> this one is supposed to be a mechanical
> implementation of THP swapping. Let's nail down the basics first.

Yes.  So I tried to keep the original behavior to deal with THP if we
cannot allocate the swap space (a swap cluster) for a whole THP.

Per my understanding, the difference between what you suggested and the
original behavior is that, when nr_swap_pages is 0, whether to split the
THP.

> Such a decision would need proof that splitting THPs on full swap
> devices is a concern for real applications. I would assume that we're
> pretty close to OOM anyway; it's much more likely that a single slot
> frees up than a full cluster, at which point we'll be splitting THPs
> anyway; etc. I have my doubts that this would be measurable.
>
> But even if so, I don't think we'd have to duplicate the main code
> flow to handle this corner case. You can extend get_swap_page() to
> return an error code that tells add_to_swap() whether to split and
> retry, or to fail and move on. So this way should be future proof.

Yes.  I will try to merge add_to_swap_trans_huge() into add_to_swap() in
the next version.  But if we want to keep the original behavior, we will
need an extra "nr_entries" parameter for mem_cgroup_try_charge_swap().

Best Regards,
Huang, Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Huang\, Ying" <ying.huang@intel.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huang\, Ying" <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH -mm -v7 9/9] mm, THP, swap: Delay splitting THP during swap out
Date: Sat, 01 Apr 2017 10:54:39 +0800	[thread overview]
Message-ID: <874ly8suq8.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20170331144948.GA6408@cmpxchg.org> (Johannes Weiner's message of "Fri, 31 Mar 2017 10:49:48 -0400")

Johannes Weiner <hannes@cmpxchg.org> writes:

> On Thu, Mar 30, 2017 at 12:15:13PM +0800, Huang, Ying wrote:
>> Johannes Weiner <hannes@cmpxchg.org> writes:
>> > On Tue, Mar 28, 2017 at 01:32:09PM +0800, Huang, Ying wrote:
>> >> @@ -198,6 +240,18 @@ int add_to_swap(struct page *page, struct list_head *list)
>> >>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
>> >>  	VM_BUG_ON_PAGE(!PageUptodate(page), page);
>> >>  
>> >> +	if (unlikely(PageTransHuge(page))) {
>> >> +		err = add_to_swap_trans_huge(page, list);
>> >> +		switch (err) {
>> >> +		case 1:
>> >> +			return 1;
>> >> +		case 0:
>> >> +			/* fallback to split firstly if return 0 */
>> >> +			break;
>> >> +		default:
>> >> +			return 0;
>> >> +		}
>> >> +	}
>> >>  	entry = get_swap_page();
>> >>  	if (!entry.val)
>> >>  		return 0;
>> >
>> > add_to_swap_trans_huge() is too close a copy of add_to_swap(), which
>> > makes the code error prone for future modifications to the swap slot
>> > allocation protocol.
>> >
>> > This should read:
>> >
>> > retry:
>> > 	entry = get_swap_page(page);
>> > 	if (!entry.val) {
>> > 		if (PageTransHuge(page)) {
>> > 			split_huge_page_to_list(page, list);
>> > 			goto retry;
>> > 		}
>> > 		return 0;
>> > 	}
>> 
>> If the swap space is used up, that is, get_swap_page() cannot allocate
>> even 1 swap entry for a normal page.  We will split THP unnecessarily
>> with the change, but in the original code, we just skip the THP.  There
>> may be a performance regression here.  Similar problem exists for
>> mem_cgroup_try_charge_swap() too.  If the mem cgroup exceeds the swap
>> limit, the THP will be split unnecessary with the change too.
>
> If we skip the page, we're swapping out another page hotter than this
> one. Giving THP preservation priority over LRU order is an issue best
> kept for a separate patch set;

In my original patch, if we failed to allocate the swap space for a THP,
and we can allocate the swap space for a normal page, we will split the
THP.  We skip the page only if we cannot allocate the swap space for a
normal page, that is, nr_swap_pages is 0.  So we will not give THP
preservation priority over LRU order in the patch.

> this one is supposed to be a mechanical
> implementation of THP swapping. Let's nail down the basics first.

Yes.  So I tried to keep the original behavior to deal with THP if we
cannot allocate the swap space (a swap cluster) for a whole THP.

Per my understanding, the difference between what you suggested and the
original behavior is that, when nr_swap_pages is 0, whether to split the
THP.

> Such a decision would need proof that splitting THPs on full swap
> devices is a concern for real applications. I would assume that we're
> pretty close to OOM anyway; it's much more likely that a single slot
> frees up than a full cluster, at which point we'll be splitting THPs
> anyway; etc. I have my doubts that this would be measurable.
>
> But even if so, I don't think we'd have to duplicate the main code
> flow to handle this corner case. You can extend get_swap_page() to
> return an error code that tells add_to_swap() whether to split and
> retry, or to fail and move on. So this way should be future proof.

Yes.  I will try to merge add_to_swap_trans_huge() into add_to_swap() in
the next version.  But if we want to keep the original behavior, we will
need an extra "nr_entries" parameter for mem_cgroup_try_charge_swap().

Best Regards,
Huang, Ying

  reply	other threads:[~2017-04-01  2:54 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-28  5:32 [PATCH -mm -v7 0/9] THP swap: Delay splitting THP during swapping out Huang, Ying
2017-03-28  5:32 ` Huang, Ying
2017-03-28  5:32 ` [PATCH -mm -v7 1/9] mm, swap: Make swap cluster size same of THP size on x86_64 Huang, Ying
2017-03-28  5:32   ` Huang, Ying
2017-03-28 23:30   ` Kirill A. Shutemov
2017-03-28 23:30     ` Kirill A. Shutemov
2017-03-29  1:10     ` Huang, Ying
2017-03-29  1:10       ` Huang, Ying
2017-03-29 16:55   ` Johannes Weiner
2017-03-29 16:55     ` Johannes Weiner
2017-03-30  0:45     ` Huang, Ying
2017-03-30  0:45       ` Huang, Ying
2017-03-31 14:56       ` Johannes Weiner
2017-03-31 14:56         ` Johannes Weiner
2017-04-01  3:29         ` Huang, Ying
2017-04-01  3:29           ` Huang, Ying
2017-03-28  5:32 ` [PATCH -mm -v7 2/9] mm, memcg: Support to charge/uncharge multiple swap entries Huang, Ying
2017-03-28  5:32   ` Huang, Ying
     [not found]   ` <20170328053209.25876-3-ying.huang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-03-29 16:57     ` Johannes Weiner
2017-03-29 16:57       ` Johannes Weiner
2017-03-29 16:57       ` Johannes Weiner
     [not found]       ` <20170329165722.GB31821-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2017-03-30  0:53         ` Huang, Ying
2017-03-30  0:53           ` Huang, Ying
2017-03-30  0:53           ` Huang, Ying
     [not found]           ` <87k277twip.fsf-5/hDr2MS57EDqwDYnZuMFFaTQe2KTcn/@public.gmane.org>
2017-03-31 14:59             ` Johannes Weiner
2017-03-31 14:59               ` Johannes Weiner
2017-03-31 14:59               ` Johannes Weiner
2017-03-28  5:32 ` [PATCH -mm -v7 3/9] mm, THP, swap: Add swap cluster allocate/free functions Huang, Ying
2017-03-28  5:32   ` Huang, Ying
2017-03-28  5:32 ` [PATCH -mm -v7 4/9] mm, THP, swap: Add get_huge_swap_page() Huang, Ying
2017-03-28  5:32   ` Huang, Ying
2017-03-29 17:08   ` Johannes Weiner
2017-03-29 17:08     ` Johannes Weiner
2017-03-30  4:28     ` Huang, Ying
2017-03-30  4:28       ` Huang, Ying
2017-03-31 15:24       ` Johannes Weiner
2017-03-31 15:24         ` Johannes Weiner
2017-04-01  3:32         ` Huang, Ying
2017-04-01  3:32           ` Huang, Ying
2017-03-28  5:32 ` [PATCH -mm -v7 5/9] mm, THP, swap: Support to clear SWAP_HAS_CACHE for huge page Huang, Ying
2017-03-28  5:32   ` Huang, Ying
2017-03-28  5:32 ` [PATCH -mm -v7 6/9] mm, THP, swap: Support to add/delete THP to/from swap cache Huang, Ying
2017-03-28  5:32   ` Huang, Ying
2017-03-28  5:32 ` [PATCH -mm -v7 7/9] mm, THP: Add can_split_huge_page() Huang, Ying
2017-03-28  5:32   ` Huang, Ying
2017-03-28  5:32 ` [PATCH -mm -v7 8/9] mm, THP, swap: Support to split THP in swap cache Huang, Ying
2017-03-28  5:32   ` Huang, Ying
2017-03-28  5:32 ` [PATCH -mm -v7 9/9] mm, THP, swap: Delay splitting THP during swap out Huang, Ying
2017-03-28  5:32   ` Huang, Ying
2017-03-29 17:16   ` Johannes Weiner
2017-03-29 17:16     ` Johannes Weiner
2017-03-30  4:15     ` Huang, Ying
2017-03-30  4:15       ` Huang, Ying
2017-03-31 14:49       ` Johannes Weiner
2017-03-31 14:49         ` Johannes Weiner
2017-04-01  2:54         ` Huang, Ying [this message]
2017-04-01  2:54           ` Huang, Ying
2017-03-28 22:13 ` [PATCH -mm -v7 0/9] THP swap: Delay splitting THP during swapping out Andrew Morton
2017-03-28 22:13   ` Andrew Morton
2017-03-29  8:52   ` Huang, Ying
2017-03-29  8:52     ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874ly8suq8.fsf@yhuang-dev.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.