All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang\, Ying" <ying.huang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: "Huang,
	Ying" <ying.huang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Andrea Arcangeli
	<aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Ebru Akagunduz
	<ebru.akagunduz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Shaohua Li <shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH -mm -v8 1/3] mm, THP, swap: Delay splitting THP during swap out
Date: Tue, 18 Apr 2017 08:33:16 +0800	[thread overview]
Message-ID: <87efwqtv03.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20170417182410.GA26500-druUgvl0LCNAfugRpC6u6w@public.gmane.org> (Johannes Weiner's message of "Mon, 17 Apr 2017 14:24:10 -0400")

Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> writes:

> On Sat, Apr 15, 2017 at 09:17:04AM +0800, Huang, Ying wrote:
>> Hi, Johannes,
>> 
>> Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> writes:
>> 
>> > Hi Huang,
>> >
>> > I reviewed this patch based on the feedback I already provided, but
>> > eventually gave up and rewrote it. Please take review feedback more
>> > seriously in the future.
>> 
>> Thanks a lot for your help!  I do respect all your review and effort.
>> The -v8 patch doesn't take all your comments, just because I thought we
>> have not reach consensus for some points and I want to use -v8 patch to
>> discuss them.
>> 
>> One concern I have before is whether to split THP firstly when swap
>> space or memcg swap is used up.  Now I think your solution is
>> acceptable. And if we receive any regression report for that in the
>> future, it's not very hard to deal with.
>
> If you look at get_scan_count(), we'll stop scanning anonymous pages
> altogether when swap space runs out. So it might happen to a few THP,
> but it shouldn't be a big deal.

Yes.  It only influences a few THP.

> And yes, in that case I'd really rather wait for any real problems to
> materialize before we complicate things.
>
>> > Attached below is the reworked patch. Most changes are to the layering
>> > (page functions, cluster functions, range functions) so that we don't
>> > make the lowest swap range code require a notion of huge pages, or
>> > make the memcg page functions take size information that can be
>> > gathered from the page itself. I turned the config symbol into a
>> > generic THP_SWAP that can later be extended when we add 2MB IO. The
>> > rest is function naming, #ifdef removal etc.
>> 
>> For some #ifdef in swapfile.c, it is to avoid unnecessary code size
>> increase for !CONFIG_TRANSPARENT_HUGEPAGE or platform with THP swap
>> optimization disabled.  Is it an issue?
>
> It saves some code size, but it looks like the biggest cost comes from
> bloating PageSwapCache(). This is mm/builtin.o with !CONFIG_THP:
>
> add/remove: 1/0 grow/shrink: 34/5 up/down: 920/-311 (609)
> function                                     old     new   delta
> __free_cluster                                 -     106    +106
> get_swap_pages                               465     555     +90
> migrate_page_move_mapping                   1404    1479     +75
> shrink_page_list                            3573    3632     +59
> __delete_from_swap_cache                     235     293     +58
> __test_set_page_writeback                    626     678     +52
> __swap_writepage                             766     812     +46
> try_to_unuse                                1763    1795     +32
> madvise_free_pte_range                       882     912     +30
> __set_page_dirty_nobuffers                   245     268     +23
> migrate_page_copy                            565     587     +22
> swap_slot_free_notify                        133     151     +18
> shmem_replace_page                           616     633     +17
> try_to_free_swap                             135     151     +16
> test_clear_page_writeback                    512     528     +16
> swap_set_page_dirty                          109     125     +16
> swap_readpage                                384     400     +16
> shmem_unuse                                 1535    1551     +16
> reuse_swap_page                              340     356     +16
> page_mapping                                 144     160     +16
> migrate_huge_page_move_mapping               483     499     +16
> free_swap_and_cache                          409     425     +16
> free_pages_and_swap_cache                    161     177     +16
> free_page_and_swap_cache                     145     161     +16
> do_swap_page                                1216    1232     +16
> __remove_mapping                             408     424     +16
> __page_file_mapping                           82      98     +16
> __page_file_index                             70      85     +15
> try_to_unmap_one                            1324    1337     +13
> shmem_getpage_gfp.isra                      2358    2371     +13
> add_to_swap_cache                             47      60     +13
> inc_cluster_info_page                        204     210      +6
> get_swap_page                                411     415      +4
> shmem_writepage                              922     925      +3
> sys_swapon                                  4210    4211      +1
> swapcache_free_entries                       786     768     -18
> __add_to_swap_cache                          445     406     -39
> delete_from_swap_cache                       149     104     -45
> scan_swap_map_slots                         1953    1889     -64
> swap_do_scheduled_discard                    713     568    -145
> Total: Before=454535, After=455144, chg +0.13%
>
> If I make the compound_head() in there conditional, this patch
> actually ends up shrinking the code due to the refactoring of the
> cluster functions:
>
> add/remove: 1/0 grow/shrink: 10/5 up/down: 302/-327 (-25)
> function                                     old     new   delta
> __free_cluster                                 -     106    +106
> get_swap_pages                               465     555     +90
> __delete_from_swap_cache                     235     277     +42
> shmem_replace_page                           616     633     +17
> migrate_page_move_mapping                   1404    1418     +14
> add_to_swap_cache                             47      60     +13
> migrate_page_copy                            565     571      +6
> inc_cluster_info_page                        204     210      +6
> get_swap_page                                411     415      +4
> shmem_writepage                              922     925      +3
> sys_swapon                                  4210    4211      +1
> swapcache_free_entries                       786     768     -18
> delete_from_swap_cache                       149     104     -45
> __add_to_swap_cache                          445     390     -55
> scan_swap_map_slots                         1953    1889     -64
> swap_do_scheduled_discard                    713     568    -145
> Total: Before=454535, After=454510, chg -0.01%

This looks great!  Thanks!

> But PageSwapCache() is somewhat ugly either way. Even with THP_SWAP
> compiled in, it seems like most callsites wouldn't test tailpages?
> Can we get rid of the compound_head() and annotate any callsites
> working on potential tail pages?

I think we can keep the current #ifdef version and make some cleanup in
the next step?

>> > Please review whether this is an acceptable version for you.
>> 
>> Yes.  It is good for me.  I will give it more test on next Monday.
>
> Thanks

I think we will fold the below patch into the original one?

Best Regards,
Huang, Ying

> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index f4acd6c4f808..d33e3280c8ad 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -326,7 +326,9 @@ PAGEFLAG_FALSE(HighMem)
>  #ifdef CONFIG_SWAP
>  static __always_inline int PageSwapCache(struct page *page)
>  {
> +#ifdef CONFIG_THP_SWAP
>  	page = compound_head(page);
> +#endif
>  	return PageSwapBacked(page) && test_bit(PG_swapcache, &page->flags);
>  
>  }
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 34adac6e9457..a4dba6975e7b 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -401,7 +401,6 @@ extern int swap_duplicate(swp_entry_t);
>  extern int swapcache_prepare(swp_entry_t);
>  extern void swap_free(swp_entry_t);
>  extern void swapcache_free(swp_entry_t);
> -extern void swapcache_free_cluster(swp_entry_t);
>  extern void swapcache_free_entries(swp_entry_t *entries, int n);
>  extern int free_swap_and_cache(swp_entry_t);
>  extern int swap_type_of(dev_t, sector_t, struct block_device **);
> @@ -467,10 +466,6 @@ static inline void swapcache_free(swp_entry_t swp)
>  {
>  }
>  
> -static inline void swapcache_free_cluster(swp_entry_t swp)
> -{
> -}
> -
>  static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
>  			struct vm_area_struct *vma, unsigned long addr)
>  {
> @@ -592,5 +587,13 @@ static inline bool mem_cgroup_swap_full(struct page *page)
>  }
>  #endif
>  
> +#ifdef CONFIG_THP_SWAP
> +extern void swapcache_free_cluster(swp_entry_t);
> +#else
> +static inline void swapcache_free_cluster(swp_entry_t swp)
> +{
> +}
> +#endif
> +
>  #endif /* __KERNEL__*/
>  #endif /* _LINUX_SWAP_H */
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index f597cabcaab7..eeaf145b2a20 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -825,6 +825,7 @@ static int scan_swap_map_slots(struct swap_info_struct *si,
>  	return n_ret;
>  }
>  
> +#ifdef CONFIG_THP_SWAP
>  static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
>  {
>  	unsigned long idx;
> @@ -862,6 +863,13 @@ static void swap_free_cluster(struct swap_info_struct *si, unsigned long idx)
>  	unlock_cluster(ci);
>  	swap_range_free(si, offset, SWAPFILE_CLUSTER);
>  }
> +#else
> +static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
> +{
> +	VM_WARN_ON_ONCE(1);
> +	return 0;
> +}
> +#endif /* CONFIG_THP_SWAP */
>  
>  static unsigned long scan_swap_map(struct swap_info_struct *si,
>  				   unsigned char usage)
> @@ -1145,6 +1153,7 @@ void swapcache_free(swp_entry_t entry)
>  	}
>  }
>  
> +#ifdef CONFIG_THP_SWAP
>  void swapcache_free_cluster(swp_entry_t entry)
>  {
>  	unsigned long offset = swp_offset(entry);
> @@ -1170,6 +1179,7 @@ void swapcache_free_cluster(swp_entry_t entry)
>  	swap_free_cluster(si, idx);
>  	spin_unlock(&si->lock);
>  }
> +#endif /* CONFIG_THP_SWAP */
>  
>  void swapcache_free_entries(swp_entry_t *entries, int n)
>  {

WARNING: multiple messages have this Message-ID (diff)
From: "Huang\, Ying" <ying.huang@intel.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huang, Ying" <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Ebru Akagunduz <ebru.akagunduz@gmail.com>,
	Michal Hocko <mhocko@kernel.org>, Tejun Heo <tj@kernel.org>,
	Hugh Dickins <hughd@google.com>, Shaohua Li <shli@kernel.org>,
	Minchan Kim <minchan@kernel.org>, Rik van Riel <riel@redhat.com>,
	cgroups@vger.kernel.org
Subject: Re: [PATCH -mm -v8 1/3] mm, THP, swap: Delay splitting THP during swap out
Date: Tue, 18 Apr 2017 08:33:16 +0800	[thread overview]
Message-ID: <87efwqtv03.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20170417182410.GA26500@cmpxchg.org> (Johannes Weiner's message of "Mon, 17 Apr 2017 14:24:10 -0400")

Johannes Weiner <hannes@cmpxchg.org> writes:

> On Sat, Apr 15, 2017 at 09:17:04AM +0800, Huang, Ying wrote:
>> Hi, Johannes,
>> 
>> Johannes Weiner <hannes@cmpxchg.org> writes:
>> 
>> > Hi Huang,
>> >
>> > I reviewed this patch based on the feedback I already provided, but
>> > eventually gave up and rewrote it. Please take review feedback more
>> > seriously in the future.
>> 
>> Thanks a lot for your help!  I do respect all your review and effort.
>> The -v8 patch doesn't take all your comments, just because I thought we
>> have not reach consensus for some points and I want to use -v8 patch to
>> discuss them.
>> 
>> One concern I have before is whether to split THP firstly when swap
>> space or memcg swap is used up.  Now I think your solution is
>> acceptable. And if we receive any regression report for that in the
>> future, it's not very hard to deal with.
>
> If you look at get_scan_count(), we'll stop scanning anonymous pages
> altogether when swap space runs out. So it might happen to a few THP,
> but it shouldn't be a big deal.

Yes.  It only influences a few THP.

> And yes, in that case I'd really rather wait for any real problems to
> materialize before we complicate things.
>
>> > Attached below is the reworked patch. Most changes are to the layering
>> > (page functions, cluster functions, range functions) so that we don't
>> > make the lowest swap range code require a notion of huge pages, or
>> > make the memcg page functions take size information that can be
>> > gathered from the page itself. I turned the config symbol into a
>> > generic THP_SWAP that can later be extended when we add 2MB IO. The
>> > rest is function naming, #ifdef removal etc.
>> 
>> For some #ifdef in swapfile.c, it is to avoid unnecessary code size
>> increase for !CONFIG_TRANSPARENT_HUGEPAGE or platform with THP swap
>> optimization disabled.  Is it an issue?
>
> It saves some code size, but it looks like the biggest cost comes from
> bloating PageSwapCache(). This is mm/builtin.o with !CONFIG_THP:
>
> add/remove: 1/0 grow/shrink: 34/5 up/down: 920/-311 (609)
> function                                     old     new   delta
> __free_cluster                                 -     106    +106
> get_swap_pages                               465     555     +90
> migrate_page_move_mapping                   1404    1479     +75
> shrink_page_list                            3573    3632     +59
> __delete_from_swap_cache                     235     293     +58
> __test_set_page_writeback                    626     678     +52
> __swap_writepage                             766     812     +46
> try_to_unuse                                1763    1795     +32
> madvise_free_pte_range                       882     912     +30
> __set_page_dirty_nobuffers                   245     268     +23
> migrate_page_copy                            565     587     +22
> swap_slot_free_notify                        133     151     +18
> shmem_replace_page                           616     633     +17
> try_to_free_swap                             135     151     +16
> test_clear_page_writeback                    512     528     +16
> swap_set_page_dirty                          109     125     +16
> swap_readpage                                384     400     +16
> shmem_unuse                                 1535    1551     +16
> reuse_swap_page                              340     356     +16
> page_mapping                                 144     160     +16
> migrate_huge_page_move_mapping               483     499     +16
> free_swap_and_cache                          409     425     +16
> free_pages_and_swap_cache                    161     177     +16
> free_page_and_swap_cache                     145     161     +16
> do_swap_page                                1216    1232     +16
> __remove_mapping                             408     424     +16
> __page_file_mapping                           82      98     +16
> __page_file_index                             70      85     +15
> try_to_unmap_one                            1324    1337     +13
> shmem_getpage_gfp.isra                      2358    2371     +13
> add_to_swap_cache                             47      60     +13
> inc_cluster_info_page                        204     210      +6
> get_swap_page                                411     415      +4
> shmem_writepage                              922     925      +3
> sys_swapon                                  4210    4211      +1
> swapcache_free_entries                       786     768     -18
> __add_to_swap_cache                          445     406     -39
> delete_from_swap_cache                       149     104     -45
> scan_swap_map_slots                         1953    1889     -64
> swap_do_scheduled_discard                    713     568    -145
> Total: Before=454535, After=455144, chg +0.13%
>
> If I make the compound_head() in there conditional, this patch
> actually ends up shrinking the code due to the refactoring of the
> cluster functions:
>
> add/remove: 1/0 grow/shrink: 10/5 up/down: 302/-327 (-25)
> function                                     old     new   delta
> __free_cluster                                 -     106    +106
> get_swap_pages                               465     555     +90
> __delete_from_swap_cache                     235     277     +42
> shmem_replace_page                           616     633     +17
> migrate_page_move_mapping                   1404    1418     +14
> add_to_swap_cache                             47      60     +13
> migrate_page_copy                            565     571      +6
> inc_cluster_info_page                        204     210      +6
> get_swap_page                                411     415      +4
> shmem_writepage                              922     925      +3
> sys_swapon                                  4210    4211      +1
> swapcache_free_entries                       786     768     -18
> delete_from_swap_cache                       149     104     -45
> __add_to_swap_cache                          445     390     -55
> scan_swap_map_slots                         1953    1889     -64
> swap_do_scheduled_discard                    713     568    -145
> Total: Before=454535, After=454510, chg -0.01%

This looks great!  Thanks!

> But PageSwapCache() is somewhat ugly either way. Even with THP_SWAP
> compiled in, it seems like most callsites wouldn't test tailpages?
> Can we get rid of the compound_head() and annotate any callsites
> working on potential tail pages?

I think we can keep the current #ifdef version and make some cleanup in
the next step?

>> > Please review whether this is an acceptable version for you.
>> 
>> Yes.  It is good for me.  I will give it more test on next Monday.
>
> Thanks

I think we will fold the below patch into the original one?

Best Regards,
Huang, Ying

> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index f4acd6c4f808..d33e3280c8ad 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -326,7 +326,9 @@ PAGEFLAG_FALSE(HighMem)
>  #ifdef CONFIG_SWAP
>  static __always_inline int PageSwapCache(struct page *page)
>  {
> +#ifdef CONFIG_THP_SWAP
>  	page = compound_head(page);
> +#endif
>  	return PageSwapBacked(page) && test_bit(PG_swapcache, &page->flags);
>  
>  }
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 34adac6e9457..a4dba6975e7b 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -401,7 +401,6 @@ extern int swap_duplicate(swp_entry_t);
>  extern int swapcache_prepare(swp_entry_t);
>  extern void swap_free(swp_entry_t);
>  extern void swapcache_free(swp_entry_t);
> -extern void swapcache_free_cluster(swp_entry_t);
>  extern void swapcache_free_entries(swp_entry_t *entries, int n);
>  extern int free_swap_and_cache(swp_entry_t);
>  extern int swap_type_of(dev_t, sector_t, struct block_device **);
> @@ -467,10 +466,6 @@ static inline void swapcache_free(swp_entry_t swp)
>  {
>  }
>  
> -static inline void swapcache_free_cluster(swp_entry_t swp)
> -{
> -}
> -
>  static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
>  			struct vm_area_struct *vma, unsigned long addr)
>  {
> @@ -592,5 +587,13 @@ static inline bool mem_cgroup_swap_full(struct page *page)
>  }
>  #endif
>  
> +#ifdef CONFIG_THP_SWAP
> +extern void swapcache_free_cluster(swp_entry_t);
> +#else
> +static inline void swapcache_free_cluster(swp_entry_t swp)
> +{
> +}
> +#endif
> +
>  #endif /* __KERNEL__*/
>  #endif /* _LINUX_SWAP_H */
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index f597cabcaab7..eeaf145b2a20 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -825,6 +825,7 @@ static int scan_swap_map_slots(struct swap_info_struct *si,
>  	return n_ret;
>  }
>  
> +#ifdef CONFIG_THP_SWAP
>  static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
>  {
>  	unsigned long idx;
> @@ -862,6 +863,13 @@ static void swap_free_cluster(struct swap_info_struct *si, unsigned long idx)
>  	unlock_cluster(ci);
>  	swap_range_free(si, offset, SWAPFILE_CLUSTER);
>  }
> +#else
> +static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
> +{
> +	VM_WARN_ON_ONCE(1);
> +	return 0;
> +}
> +#endif /* CONFIG_THP_SWAP */
>  
>  static unsigned long scan_swap_map(struct swap_info_struct *si,
>  				   unsigned char usage)
> @@ -1145,6 +1153,7 @@ void swapcache_free(swp_entry_t entry)
>  	}
>  }
>  
> +#ifdef CONFIG_THP_SWAP
>  void swapcache_free_cluster(swp_entry_t entry)
>  {
>  	unsigned long offset = swp_offset(entry);
> @@ -1170,6 +1179,7 @@ void swapcache_free_cluster(swp_entry_t entry)
>  	swap_free_cluster(si, idx);
>  	spin_unlock(&si->lock);
>  }
> +#endif /* CONFIG_THP_SWAP */
>  
>  void swapcache_free_entries(swp_entry_t *entries, int n)
>  {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Huang\, Ying" <ying.huang@intel.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huang\, Ying" <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Ebru Akagunduz" <ebru.akagunduz@gmail.com>,
	Michal Hocko <mhocko@kernel.org>, "Tejun Heo" <tj@kernel.org>,
	Hugh Dickins <hughd@google.com>, Shaohua Li <shli@kernel.org>,
	Minchan Kim <minchan@kernel.org>, Rik van Riel <riel@redhat.com>,
	<cgroups@vger.kernel.org>
Subject: Re: [PATCH -mm -v8 1/3] mm, THP, swap: Delay splitting THP during swap out
Date: Tue, 18 Apr 2017 08:33:16 +0800	[thread overview]
Message-ID: <87efwqtv03.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20170417182410.GA26500@cmpxchg.org> (Johannes Weiner's message of "Mon, 17 Apr 2017 14:24:10 -0400")

Johannes Weiner <hannes@cmpxchg.org> writes:

> On Sat, Apr 15, 2017 at 09:17:04AM +0800, Huang, Ying wrote:
>> Hi, Johannes,
>> 
>> Johannes Weiner <hannes@cmpxchg.org> writes:
>> 
>> > Hi Huang,
>> >
>> > I reviewed this patch based on the feedback I already provided, but
>> > eventually gave up and rewrote it. Please take review feedback more
>> > seriously in the future.
>> 
>> Thanks a lot for your help!  I do respect all your review and effort.
>> The -v8 patch doesn't take all your comments, just because I thought we
>> have not reach consensus for some points and I want to use -v8 patch to
>> discuss them.
>> 
>> One concern I have before is whether to split THP firstly when swap
>> space or memcg swap is used up.  Now I think your solution is
>> acceptable. And if we receive any regression report for that in the
>> future, it's not very hard to deal with.
>
> If you look at get_scan_count(), we'll stop scanning anonymous pages
> altogether when swap space runs out. So it might happen to a few THP,
> but it shouldn't be a big deal.

Yes.  It only influences a few THP.

> And yes, in that case I'd really rather wait for any real problems to
> materialize before we complicate things.
>
>> > Attached below is the reworked patch. Most changes are to the layering
>> > (page functions, cluster functions, range functions) so that we don't
>> > make the lowest swap range code require a notion of huge pages, or
>> > make the memcg page functions take size information that can be
>> > gathered from the page itself. I turned the config symbol into a
>> > generic THP_SWAP that can later be extended when we add 2MB IO. The
>> > rest is function naming, #ifdef removal etc.
>> 
>> For some #ifdef in swapfile.c, it is to avoid unnecessary code size
>> increase for !CONFIG_TRANSPARENT_HUGEPAGE or platform with THP swap
>> optimization disabled.  Is it an issue?
>
> It saves some code size, but it looks like the biggest cost comes from
> bloating PageSwapCache(). This is mm/builtin.o with !CONFIG_THP:
>
> add/remove: 1/0 grow/shrink: 34/5 up/down: 920/-311 (609)
> function                                     old     new   delta
> __free_cluster                                 -     106    +106
> get_swap_pages                               465     555     +90
> migrate_page_move_mapping                   1404    1479     +75
> shrink_page_list                            3573    3632     +59
> __delete_from_swap_cache                     235     293     +58
> __test_set_page_writeback                    626     678     +52
> __swap_writepage                             766     812     +46
> try_to_unuse                                1763    1795     +32
> madvise_free_pte_range                       882     912     +30
> __set_page_dirty_nobuffers                   245     268     +23
> migrate_page_copy                            565     587     +22
> swap_slot_free_notify                        133     151     +18
> shmem_replace_page                           616     633     +17
> try_to_free_swap                             135     151     +16
> test_clear_page_writeback                    512     528     +16
> swap_set_page_dirty                          109     125     +16
> swap_readpage                                384     400     +16
> shmem_unuse                                 1535    1551     +16
> reuse_swap_page                              340     356     +16
> page_mapping                                 144     160     +16
> migrate_huge_page_move_mapping               483     499     +16
> free_swap_and_cache                          409     425     +16
> free_pages_and_swap_cache                    161     177     +16
> free_page_and_swap_cache                     145     161     +16
> do_swap_page                                1216    1232     +16
> __remove_mapping                             408     424     +16
> __page_file_mapping                           82      98     +16
> __page_file_index                             70      85     +15
> try_to_unmap_one                            1324    1337     +13
> shmem_getpage_gfp.isra                      2358    2371     +13
> add_to_swap_cache                             47      60     +13
> inc_cluster_info_page                        204     210      +6
> get_swap_page                                411     415      +4
> shmem_writepage                              922     925      +3
> sys_swapon                                  4210    4211      +1
> swapcache_free_entries                       786     768     -18
> __add_to_swap_cache                          445     406     -39
> delete_from_swap_cache                       149     104     -45
> scan_swap_map_slots                         1953    1889     -64
> swap_do_scheduled_discard                    713     568    -145
> Total: Before=454535, After=455144, chg +0.13%
>
> If I make the compound_head() in there conditional, this patch
> actually ends up shrinking the code due to the refactoring of the
> cluster functions:
>
> add/remove: 1/0 grow/shrink: 10/5 up/down: 302/-327 (-25)
> function                                     old     new   delta
> __free_cluster                                 -     106    +106
> get_swap_pages                               465     555     +90
> __delete_from_swap_cache                     235     277     +42
> shmem_replace_page                           616     633     +17
> migrate_page_move_mapping                   1404    1418     +14
> add_to_swap_cache                             47      60     +13
> migrate_page_copy                            565     571      +6
> inc_cluster_info_page                        204     210      +6
> get_swap_page                                411     415      +4
> shmem_writepage                              922     925      +3
> sys_swapon                                  4210    4211      +1
> swapcache_free_entries                       786     768     -18
> delete_from_swap_cache                       149     104     -45
> __add_to_swap_cache                          445     390     -55
> scan_swap_map_slots                         1953    1889     -64
> swap_do_scheduled_discard                    713     568    -145
> Total: Before=454535, After=454510, chg -0.01%

This looks great!  Thanks!

> But PageSwapCache() is somewhat ugly either way. Even with THP_SWAP
> compiled in, it seems like most callsites wouldn't test tailpages?
> Can we get rid of the compound_head() and annotate any callsites
> working on potential tail pages?

I think we can keep the current #ifdef version and make some cleanup in
the next step?

>> > Please review whether this is an acceptable version for you.
>> 
>> Yes.  It is good for me.  I will give it more test on next Monday.
>
> Thanks

I think we will fold the below patch into the original one?

Best Regards,
Huang, Ying

> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index f4acd6c4f808..d33e3280c8ad 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -326,7 +326,9 @@ PAGEFLAG_FALSE(HighMem)
>  #ifdef CONFIG_SWAP
>  static __always_inline int PageSwapCache(struct page *page)
>  {
> +#ifdef CONFIG_THP_SWAP
>  	page = compound_head(page);
> +#endif
>  	return PageSwapBacked(page) && test_bit(PG_swapcache, &page->flags);
>  
>  }
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 34adac6e9457..a4dba6975e7b 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -401,7 +401,6 @@ extern int swap_duplicate(swp_entry_t);
>  extern int swapcache_prepare(swp_entry_t);
>  extern void swap_free(swp_entry_t);
>  extern void swapcache_free(swp_entry_t);
> -extern void swapcache_free_cluster(swp_entry_t);
>  extern void swapcache_free_entries(swp_entry_t *entries, int n);
>  extern int free_swap_and_cache(swp_entry_t);
>  extern int swap_type_of(dev_t, sector_t, struct block_device **);
> @@ -467,10 +466,6 @@ static inline void swapcache_free(swp_entry_t swp)
>  {
>  }
>  
> -static inline void swapcache_free_cluster(swp_entry_t swp)
> -{
> -}
> -
>  static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
>  			struct vm_area_struct *vma, unsigned long addr)
>  {
> @@ -592,5 +587,13 @@ static inline bool mem_cgroup_swap_full(struct page *page)
>  }
>  #endif
>  
> +#ifdef CONFIG_THP_SWAP
> +extern void swapcache_free_cluster(swp_entry_t);
> +#else
> +static inline void swapcache_free_cluster(swp_entry_t swp)
> +{
> +}
> +#endif
> +
>  #endif /* __KERNEL__*/
>  #endif /* _LINUX_SWAP_H */
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index f597cabcaab7..eeaf145b2a20 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -825,6 +825,7 @@ static int scan_swap_map_slots(struct swap_info_struct *si,
>  	return n_ret;
>  }
>  
> +#ifdef CONFIG_THP_SWAP
>  static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
>  {
>  	unsigned long idx;
> @@ -862,6 +863,13 @@ static void swap_free_cluster(struct swap_info_struct *si, unsigned long idx)
>  	unlock_cluster(ci);
>  	swap_range_free(si, offset, SWAPFILE_CLUSTER);
>  }
> +#else
> +static int swap_alloc_cluster(struct swap_info_struct *si, swp_entry_t *slot)
> +{
> +	VM_WARN_ON_ONCE(1);
> +	return 0;
> +}
> +#endif /* CONFIG_THP_SWAP */
>  
>  static unsigned long scan_swap_map(struct swap_info_struct *si,
>  				   unsigned char usage)
> @@ -1145,6 +1153,7 @@ void swapcache_free(swp_entry_t entry)
>  	}
>  }
>  
> +#ifdef CONFIG_THP_SWAP
>  void swapcache_free_cluster(swp_entry_t entry)
>  {
>  	unsigned long offset = swp_offset(entry);
> @@ -1170,6 +1179,7 @@ void swapcache_free_cluster(swp_entry_t entry)
>  	swap_free_cluster(si, idx);
>  	spin_unlock(&si->lock);
>  }
> +#endif /* CONFIG_THP_SWAP */
>  
>  void swapcache_free_entries(swp_entry_t *entries, int n)
>  {

  parent reply	other threads:[~2017-04-18  0:33 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06  5:35 [PATCH -mm -v8 0/3] THP swap: Delay splitting THP during swapping out Huang, Ying
2017-04-06  5:35 ` Huang, Ying
2017-04-06  5:35 ` [PATCH -mm -v8 1/3] mm, THP, swap: Delay splitting THP during swap out Huang, Ying
2017-04-06  5:35   ` Huang, Ying
2017-04-06  5:35   ` Huang, Ying
2017-04-14 14:58   ` Johannes Weiner
2017-04-14 14:58     ` Johannes Weiner
2017-04-14 14:58     ` Johannes Weiner
2017-04-15  1:17     ` Huang, Ying
2017-04-15  1:17       ` Huang, Ying
2017-04-17 18:24       ` Johannes Weiner
2017-04-17 18:24         ` Johannes Weiner
     [not found]         ` <20170417182410.GA26500-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2017-04-18  0:33           ` Huang, Ying [this message]
2017-04-18  0:33             ` Huang, Ying
2017-04-18  0:33             ` Huang, Ying
2017-04-06  5:35 ` [PATCH -mm -v8 2/3] mm, THP, swap: Check whether THP can be split firstly Huang, Ying
2017-04-06  5:35   ` Huang, Ying
2017-04-06  5:35 ` [PATCH -mm -v8 3/3] mm, THP, swap: Enable THP swap optimization only if has compound map Huang, Ying
2017-04-06  5:35   ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87efwqtv03.fsf@yhuang-dev.intel.com \
    --to=ying.huang-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=aarcange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ebru.akagunduz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.