linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC]swap: add a simple random read swapin detection
@ 2012-08-22  3:40 Shaohua Li
  2012-08-22 15:47 ` Rik van Riel
  2012-08-23 22:01 ` Minchan Kim
  0 siblings, 2 replies; 3+ messages in thread
From: Shaohua Li @ 2012-08-22  3:40 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, riel, fengguang.wu

The swapin readahead does a blind readahead regardless if the swapin is
sequential. This is ok for harddisk and random read, because read big size has
no penality in harddisk, and if the readahead pages are garbage, they can be
reclaimed fastly. But for SSD, big size read is more expensive than small size
read. If readahead pages are garbage, such readahead only has overhead.

This patch addes a simple random read detection like what file mmap readahead
does. If random read is detected, swapin readahead will be skipped. This
improves a lot for a swap workload with random IO in a fast SSD.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 include/linux/mm_types.h |    1 +
 mm/memory.c              |    3 ++-
 mm/swap_state.c          |    9 +++++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

Index: linux/mm/swap_state.c
===================================================================
--- linux.orig/mm/swap_state.c	2012-08-21 23:01:43.825613437 +0800
+++ linux/mm/swap_state.c	2012-08-22 10:38:36.687902916 +0800
@@ -351,6 +351,7 @@ struct page *read_swap_cache_async(swp_e
 	return found_page;
 }
 
+#define SWAPRA_MISS  (100)
 /**
  * swapin_readahead - swap in pages in hope we need them soon
  * @entry: swap entry of this memory
@@ -379,6 +380,13 @@ struct page *swapin_readahead(swp_entry_
 	unsigned long mask = (1UL << page_cluster) - 1;
 	struct blk_plug plug;
 
+	if (vma) {
+		if (atomic_read(&vma->swapra_miss) < SWAPRA_MISS * 10)
+			atomic_inc(&vma->swapra_miss);
+		if (atomic_read(&vma->swapra_miss) > SWAPRA_MISS)
+			goto skip;
+	}
+
 	/* Read a page_cluster sized and aligned cluster around offset. */
 	start_offset = offset & ~mask;
 	end_offset = offset | mask;
@@ -397,5 +405,6 @@ struct page *swapin_readahead(swp_entry_
 	blk_finish_plug(&plug);
 
 	lru_add_drain();	/* Push any new pages onto the LRU now */
+skip:
 	return read_swap_cache_async(entry, gfp_mask, vma, addr);
 }
Index: linux/include/linux/mm_types.h
===================================================================
--- linux.orig/include/linux/mm_types.h	2012-08-21 23:02:01.969385586 +0800
+++ linux/include/linux/mm_types.h	2012-08-22 10:37:59.028376385 +0800
@@ -279,6 +279,7 @@ struct vm_area_struct {
 #ifdef CONFIG_NUMA
 	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
 #endif
+	atomic_t swapra_miss;
 };
 
 struct core_thread {
Index: linux/mm/memory.c
===================================================================
--- linux.orig/mm/memory.c	2012-08-21 23:01:20.861907922 +0800
+++ linux/mm/memory.c	2012-08-22 10:39:58.638872631 +0800
@@ -2953,7 +2953,8 @@ static int do_swap_page(struct mm_struct
 		ret = VM_FAULT_HWPOISON;
 		delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
 		goto out_release;
-	}
+	} else if (!(flags & FAULT_FLAG_TRIED))
+		atomic_dec_if_positive(&vma->swapra_miss);
 
 	locked = lock_page_or_retry(page, mm, flags);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC]swap: add a simple random read swapin detection
  2012-08-22  3:40 [RFC]swap: add a simple random read swapin detection Shaohua Li
@ 2012-08-22 15:47 ` Rik van Riel
  2012-08-23 22:01 ` Minchan Kim
  1 sibling, 0 replies; 3+ messages in thread
From: Rik van Riel @ 2012-08-22 15:47 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-mm, akpm, fengguang.wu

On 08/21/2012 11:40 PM, Shaohua Li wrote:

> +#define SWAPRA_MISS  (100)
>   /**
>    * swapin_readahead - swap in pages in hope we need them soon
>    * @entry: swap entry of this memory
> @@ -379,6 +380,13 @@ struct page *swapin_readahead(swp_entry_
>   	unsigned long mask = (1UL << page_cluster) - 1;
>   	struct blk_plug plug;
>
> +	if (vma) {
> +		if (atomic_read(&vma->swapra_miss) < SWAPRA_MISS * 10)
> +			atomic_inc(&vma->swapra_miss);
> +		if (atomic_read(&vma->swapra_miss) > SWAPRA_MISS)
> +			goto skip;
> +	}

> --- linux.orig/mm/memory.c	2012-08-21 23:01:20.861907922 +0800
> +++ linux/mm/memory.c	2012-08-22 10:39:58.638872631 +0800
> @@ -2953,7 +2953,8 @@ static int do_swap_page(struct mm_struct
>   		ret = VM_FAULT_HWPOISON;
>   		delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
>   		goto out_release;
> -	}
> +	} else if (!(flags & FAULT_FLAG_TRIED))
> +		atomic_dec_if_positive(&vma->swapra_miss);

The approach makes sense when viewed together with
the changelog, but I fear it will be non-obvious
to anyone who just looks at the code later in time.

Please hide these increments and decrements behind
some simple accessor functions, eg:

swap_cache_hit()
swap_cache_miss()
swap_cache_skip_readahead()

These small functions can then be placed together
(maybe in swap.c?) and get a good comment documenting
exactly what they are supposed to do.

As an aside, how well do these patches work?

What kind of performance changes have you seen, both
on SSDs and hard disks?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC]swap: add a simple random read swapin detection
  2012-08-22  3:40 [RFC]swap: add a simple random read swapin detection Shaohua Li
  2012-08-22 15:47 ` Rik van Riel
@ 2012-08-23 22:01 ` Minchan Kim
  1 sibling, 0 replies; 3+ messages in thread
From: Minchan Kim @ 2012-08-23 22:01 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-mm, akpm, riel, fengguang.wu

Hi Shaohua,

On Wed, Aug 22, 2012 at 11:40:44AM +0800, Shaohua Li wrote:
> The swapin readahead does a blind readahead regardless if the swapin is
> sequential. This is ok for harddisk and random read, because read big size has
> no penality in harddisk, and if the readahead pages are garbage, they can be
> reclaimed fastly. But for SSD, big size read is more expensive than small size
> read. If readahead pages are garbage, such readahead only has overhead.
> 
> This patch addes a simple random read detection like what file mmap readahead
> does. If random read is detected, swapin readahead will be skipped. This
> improves a lot for a swap workload with random IO in a fast SSD.
> 
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> ---
>  include/linux/mm_types.h |    1 +
>  mm/memory.c              |    3 ++-
>  mm/swap_state.c          |    9 +++++++++
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> Index: linux/mm/swap_state.c
> ===================================================================
> --- linux.orig/mm/swap_state.c	2012-08-21 23:01:43.825613437 +0800
> +++ linux/mm/swap_state.c	2012-08-22 10:38:36.687902916 +0800
> @@ -351,6 +351,7 @@ struct page *read_swap_cache_async(swp_e
>  	return found_page;
>  }
>  
> +#define SWAPRA_MISS  (100)
>  /**
>   * swapin_readahead - swap in pages in hope we need them soon
>   * @entry: swap entry of this memory
> @@ -379,6 +380,13 @@ struct page *swapin_readahead(swp_entry_
>  	unsigned long mask = (1UL << page_cluster) - 1;
>  	struct blk_plug plug;
>  
> +	if (vma) {
> +		if (atomic_read(&vma->swapra_miss) < SWAPRA_MISS * 10)
> +			atomic_inc(&vma->swapra_miss);
> +		if (atomic_read(&vma->swapra_miss) > SWAPRA_MISS)
> +			goto skip;
> +	}
> +
>  	/* Read a page_cluster sized and aligned cluster around offset. */
>  	start_offset = offset & ~mask;
>  	end_offset = offset | mask;
> @@ -397,5 +405,6 @@ struct page *swapin_readahead(swp_entry_
>  	blk_finish_plug(&plug);
>  
>  	lru_add_drain();	/* Push any new pages onto the LRU now */
> +skip:
>  	return read_swap_cache_async(entry, gfp_mask, vma, addr);
>  }
> Index: linux/include/linux/mm_types.h
> ===================================================================
> --- linux.orig/include/linux/mm_types.h	2012-08-21 23:02:01.969385586 +0800
> +++ linux/include/linux/mm_types.h	2012-08-22 10:37:59.028376385 +0800
> @@ -279,6 +279,7 @@ struct vm_area_struct {
>  #ifdef CONFIG_NUMA
>  	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
>  #endif
> +	atomic_t swapra_miss;

#ifdef CONFIG_SWAP
	atomic_t swapra_miss;
#endif

Many embedded devices don't have swap.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-08-23 22:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-22  3:40 [RFC]swap: add a simple random read swapin detection Shaohua Li
2012-08-22 15:47 ` Rik van Riel
2012-08-23 22:01 ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).