linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Simon Jeons <simon.jeons@gmail.com>
To: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Seth Jennings <sjenning@linux.vnet.ibm.com>,
	Nitin Gupta <ngupta@vflare.org>,
	Konrad Rzeszutek Wilk <konrad@darnok.org>,
	Shaohua Li <shli@kernel.org>,
	Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [RFC] mm: remove swapcache page early
Date: Tue, 02 Apr 2013 21:40:31 +0800	[thread overview]
Message-ID: <515ADFCF.4010209@gmail.com> (raw)
In-Reply-To: <alpine.LNX.2.00.1303271230210.29687@eggly.anvils>

Hi Hugh,
On 03/28/2013 05:41 AM, Hugh Dickins wrote:
> On Wed, 27 Mar 2013, Minchan Kim wrote:
>
>> Swap subsystem does lazy swap slot free with expecting the page
>> would be swapped out again so we can't avoid unnecessary write.
>                               so we can avoid unnecessary write.

If page can be swap out again, which codes can avoid unnecessary write? 
Could you point out to me? Thanks in advance. ;-)

>> But the problem in in-memory swap is that it consumes memory space
>> until vm_swap_full(ie, used half of all of swap device) condition
>> meet. It could be bad if we use multiple swap device, small in-memory swap
>> and big storage swap or in-memory swap alone.
> That is a very good realization: it's surprising that none of us
> thought of it before - no disrespect to you, well done, thank you.
>
> And I guess swap readahead is utterly unhelpful in this case too.
>
>> This patch changes vm_swap_full logic slightly so it could free
>> swap slot early if the backed device is really fast.
>> For it, I used SWP_SOLIDSTATE but It might be controversial.
> But I strongly disagree with almost everything in your patch :)
> I disagree with addressing it in vm_swap_full(), I disagree that
> it can be addressed by device, I disagree that it has anything to
> do with SWP_SOLIDSTATE.
>
> This is not a problem with swapping to /dev/ram0 or to /dev/zram0,
> is it?  In those cases, a fixed amount of memory has been set aside
> for swap, and it works out just like with disk block devices.  The
> memory set aside may be wasted, but that is accepted upfront.
>
> Similarly, this is not a problem with swapping to SSD.  There might
> or might not be other reasons for adjusting the vm_swap_full() logic
> for SSD or generally, but those have nothing to do with this issue.
>
> The problem here is peculiar to frontswap, and the variably sized
> memory behind it, isn't it?  We are accustomed to using swap to free
> up memory by transferring its data to some other, cheaper but slower
> resource.
>
> But in the case of frontswap and zmem (I'll say that to avoid thinking
> through which backends are actually involved), it is not a cheaper and
> slower resource, but the very same memory we are trying to save: swap
> is stolen from the memory under reclaim, so any duplication becomes
> counter-productive (if we ignore cpu compression/decompression costs:
> I have no idea how fair it is to do so, but anyone who chooses zmem
> is prepared to pay some cpu price for that).
>
> And because it's a frontswap thing, we cannot decide this by device:
> frontswap may or may not stand in front of each device.  There is no
> problem with swapcache duplicated on disk (until that area approaches
> being full or fragmented), but at the higher level we cannot see what
> is in zmem and what is on disk: we only want to free up the zmem dup.
>
> I believe the answer is for frontswap/zmem to invalidate the frontswap
> copy of the page (to free up the compressed memory when possible) and
> SetPageDirty on the PageUptodate PageSwapCache page when swapping in
> (setting page dirty so nothing will later go to read it from the
> unfreed location on backing swap disk, which was never written).
>
> We cannot rely on freeing the swap itself, because in general there
> may be multiple references to the swap, and we only satisfy the one
> which has faulted.  It may or may not be a good idea to use rmap to
> locate the other places to insert pte in place of swap entry, to
> resolve them all at once; but we have chosen not to do so in the
> past, and there's no need for that, if the zmem gets invalidated
> and the swapcache page set dirty.
>
> Hugh
>
>> So let's add Ccing Shaohua and Hugh.
>> If it's a problem for SSD, I'd like to create new type SWP_INMEMORY
>> or something for z* family.
>>
>> Other problem is zram is block device so that it can set SWP_INMEMORY
>> or SWP_SOLIDSTATE easily(ie, actually, zram is already done) but
>> I have no idea to use it for frontswap.
>>
>> Any idea?
>>
>> Other optimize point is we remove it unconditionally when we
>> found it's exclusive when swap in happen.
>> It could help frontswap family, too.
>> What do you think about it?
>>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
>> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
>> Cc: Nitin Gupta <ngupta@vflare.org>
>> Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
>> Cc: Shaohua Li <shli@kernel.org>
>> Signed-off-by: Minchan Kim <minchan@kernel.org>
>> ---
>>   include/linux/swap.h | 11 ++++++++---
>>   mm/memory.c          |  3 ++-
>>   mm/swapfile.c        | 11 +++++++----
>>   mm/vmscan.c          |  2 +-
>>   4 files changed, 18 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index 2818a12..1f4df66 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -359,9 +359,14 @@ extern struct page *swapin_readahead(swp_entry_t, gfp_t,
>>   extern atomic_long_t nr_swap_pages;
>>   extern long total_swap_pages;
>>   
>> -/* Swap 50% full? Release swapcache more aggressively.. */
>> -static inline bool vm_swap_full(void)
>> +/*
>> + * Swap 50% full or fast backed device?
>> + * Release swapcache more aggressively.
>> + */
>> +static inline bool vm_swap_full(struct swap_info_struct *si)
>>   {
>> +	if (si->flags & SWP_SOLIDSTATE)
>> +		return true;
>>   	return atomic_long_read(&nr_swap_pages) * 2 < total_swap_pages;
>>   }
>>   
>> @@ -405,7 +410,7 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
>>   #define get_nr_swap_pages()			0L
>>   #define total_swap_pages			0L
>>   #define total_swapcache_pages()			0UL
>> -#define vm_swap_full()				0
>> +#define vm_swap_full(si)			0
>>   
>>   #define si_swapinfo(val) \
>>   	do { (val)->freeswap = (val)->totalswap = 0; } while (0)
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 705473a..1ca21a9 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3084,7 +3084,8 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
>>   	mem_cgroup_commit_charge_swapin(page, ptr);
>>   
>>   	swap_free(entry);
>> -	if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
>> +	if (likely(PageSwapCache(page)) && (vm_swap_full(page_swap_info(page))
>> +			|| (vma->vm_flags & VM_LOCKED) || PageMlocked(page)))
>>   		try_to_free_swap(page);
>>   	unlock_page(page);
>>   	if (page != swapcache) {
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 1bee6fa..f9cc701 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -293,7 +293,7 @@ checks:
>>   		scan_base = offset = si->lowest_bit;
>>   
>>   	/* reuse swap entry of cache-only swap if not busy. */
>> -	if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
>> +	if (vm_swap_full(si) && si->swap_map[offset] == SWAP_HAS_CACHE) {
>>   		int swap_was_freed;
>>   		spin_unlock(&si->lock);
>>   		swap_was_freed = __try_to_reclaim_swap(si, offset);
>> @@ -382,7 +382,8 @@ scan:
>>   			spin_lock(&si->lock);
>>   			goto checks;
>>   		}
>> -		if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
>> +		if (vm_swap_full(si) &&
>> +			si->swap_map[offset] == SWAP_HAS_CACHE) {
>>   			spin_lock(&si->lock);
>>   			goto checks;
>>   		}
>> @@ -397,7 +398,8 @@ scan:
>>   			spin_lock(&si->lock);
>>   			goto checks;
>>   		}
>> -		if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
>> +		if (vm_swap_full(si) &&
>> +			si->swap_map[offset] == SWAP_HAS_CACHE) {
>>   			spin_lock(&si->lock);
>>   			goto checks;
>>   		}
>> @@ -763,7 +765,8 @@ int free_swap_and_cache(swp_entry_t entry)
>>   		 * Also recheck PageSwapCache now page is locked (above).
>>   		 */
>>   		if (PageSwapCache(page) && !PageWriteback(page) &&
>> -				(!page_mapped(page) || vm_swap_full())) {
>> +				(!page_mapped(page) ||
>> +				  vm_swap_full(page_swap_info(page)))) {
>>   			delete_from_swap_cache(page);
>>   			SetPageDirty(page);
>>   		}
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index df78d17..145c59c 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -933,7 +933,7 @@ cull_mlocked:
>>   
>>   activate_locked:
>>   		/* Not a candidate for swapping, so reclaim swap space. */
>> -		if (PageSwapCache(page) && vm_swap_full())
>> +		if (PageSwapCache(page) && vm_swap_full(page_swap_info(page)))
>>   			try_to_free_swap(page);
>>   		VM_BUG_ON(PageActive(page));
>>   		SetPageActive(page);
>> -- 
>> 1.8.2
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-04-02 13:40 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-27  2:22 [RFC] mm: remove swapcache page early Minchan Kim
2013-03-27  5:03 ` Kyungmin Park
2013-03-27  5:15 ` Kamezawa Hiroyuki
2013-03-27  7:05   ` Minchan Kim
2013-03-27 17:19 ` Seth Jennings
2013-03-28  1:36   ` Minchan Kim
2013-03-27 21:41 ` Hugh Dickins
2013-03-27 22:24   ` Dan Magenheimer
2013-03-27 23:16     ` Hugh Dickins
2013-03-28  1:18       ` Minchan Kim
2013-03-28  1:54         ` Shaohua Li
2013-03-28 17:35       ` Dan Magenheimer
2013-03-28  1:07     ` Minchan Kim
2013-03-28 18:19       ` Dan Magenheimer
2013-03-29  1:18         ` Minchan Kim
2013-03-29 20:01           ` Hugh Dickins
2013-04-02  2:04             ` Minchan Kim
2013-04-02  5:13               ` Hugh Dickins
2013-04-02  5:56                 ` Minchan Kim
2013-03-28  0:36   ` Minchan Kim
2013-04-02 13:40   ` Simon Jeons [this message]
2013-04-07  7:26     ` Simon Jeons
2013-04-08  1:48       ` Minchan Kim
2013-04-08  1:51         ` Simon Jeons
     [not found] <<1364350932-12853-1-git-send-email-minchan@kernel.org>
2013-03-27 21:20 ` Dan Magenheimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=515ADFCF.4010209@gmail.com \
    --to=simon.jeons@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.magenheimer@oracle.com \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=konrad@darnok.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=shli@kernel.org \
    --cc=sjenning@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).