From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx117.postini.com [74.125.245.117]) by kanga.kvack.org (Postfix) with SMTP id 8C27D8D0001 for ; Mon, 14 May 2012 07:58:34 -0400 (EDT) Received: from /spool/local by e06smtp16.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 14 May 2012 12:58:32 +0100 Received: from d06av02.portsmouth.uk.ibm.com (d06av02.portsmouth.uk.ibm.com [9.149.37.228]) by d06nrmr1407.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q4EBwVMZ2449570 for ; Mon, 14 May 2012 12:58:31 +0100 Received: from d06av02.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av02.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q4EBwVBa027298 for ; Mon, 14 May 2012 05:58:31 -0600 From: ehrhardt@linux.vnet.ibm.com Subject: [PATCH 0/2] swap: improve swap I/O rate Date: Mon, 14 May 2012 13:58:27 +0200 Message-Id: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: axboe@kernel.dk, Ehrhardt Christian From: Ehrhardt Christian From: Christian Ehrhardt In an memory overcommitment scneario with KVM I ran into a lot of wiats for swap. While checking the I/O done on the swap disks I found almost all I/Os to be done as single page 4k request. Despite the fact that swap in is a batch of 1< email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx140.postini.com [74.125.245.140]) by kanga.kvack.org (Postfix) with SMTP id 612D88D0009 for ; Mon, 14 May 2012 07:58:36 -0400 (EDT) Received: from /spool/local by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 14 May 2012 12:58:34 +0100 Received: from d06av06.portsmouth.uk.ibm.com (d06av06.portsmouth.uk.ibm.com [9.149.37.217]) by d06nrmr1407.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q4EBwW3g2506836 for ; Mon, 14 May 2012 12:58:32 +0100 Received: from d06av06.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q4EBwWaM026206 for ; Mon, 14 May 2012 05:58:32 -0600 From: ehrhardt@linux.vnet.ibm.com Subject: [PATCH 1/2] swap: allow swap readahead to be merged Date: Mon, 14 May 2012 13:58:28 +0200 Message-Id: <1336996709-8304-2-git-send-email-ehrhardt@linux.vnet.ibm.com> In-Reply-To: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> References: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: axboe@kernel.dk, Christian Ehrhardt From: Christian Ehrhardt Swap readahead works fine, but the I/O to disk is almost always done in page size requests, despite the fact that readahead submits 1< --- mm/swap_state.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 4c5ff7f..c85b559 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -376,6 +377,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, unsigned long offset = swp_offset(entry); unsigned long start_offset, end_offset; unsigned long mask = (1UL << page_cluster) - 1; + struct blk_plug plug; /* Read a page_cluster sized and aligned cluster around offset. */ start_offset = offset & ~mask; @@ -383,6 +385,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!start_offset) /* First page is swap header. */ start_offset++; + blk_start_plug(&plug); for (offset = start_offset; offset <= end_offset ; offset++) { /* Ok, do the async read-ahead now */ page = read_swap_cache_async(swp_entry(swp_type(entry), offset), @@ -391,6 +394,8 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, continue; page_cache_release(page); } + blk_finish_plug(&plug); + lru_add_drain(); /* Push any new pages onto the LRU now */ return read_swap_cache_async(entry, gfp_mask, vma, addr); } -- 1.7.0.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx205.postini.com [74.125.245.205]) by kanga.kvack.org (Postfix) with SMTP id 062578D000B for ; Mon, 14 May 2012 07:58:37 -0400 (EDT) Received: from /spool/local by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 14 May 2012 12:58:36 +0100 Received: from d06av04.portsmouth.uk.ibm.com (d06av04.portsmouth.uk.ibm.com [9.149.37.216]) by d06nrmr1507.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q4EBwXJ72465980 for ; Mon, 14 May 2012 12:58:33 +0100 Received: from d06av04.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av04.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q4EBwXD2028891 for ; Mon, 14 May 2012 05:58:33 -0600 From: ehrhardt@linux.vnet.ibm.com Subject: [PATCH 2/2] documentation: update how page-cluster affects swap I/O Date: Mon, 14 May 2012 13:58:29 +0200 Message-Id: <1336996709-8304-3-git-send-email-ehrhardt@linux.vnet.ibm.com> In-Reply-To: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> References: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: axboe@kernel.dk, Christian Ehrhardt From: Christian Ehrhardt Fix of the documentation of /proc/sys/vm/page-cluster to match the behavior of the code and add some comments about what the tunable will change in that behavior. Signed-off-by: Christian Ehrhardt --- Documentation/sysctl/vm.txt | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 96f0ee8..4d87dc0 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -574,16 +574,24 @@ of physical RAM. See above. page-cluster -page-cluster controls the number of pages which are written to swap in -a single attempt. The swap I/O size. +page-cluster controls the number of pages up to which consecutive pages (if +available) are read in from swap in a single attempt. This is the swap +counterpart to page cache readahead. +The mentioned consecutivity is not in terms of virtual/physical addresses, +but consecutive on swap space - that means they were swapped out together. It is a logarithmic value - setting it to zero means "1 page", setting it to 1 means "2 pages", setting it to 2 means "4 pages", etc. +Zero disables swap readahead completely. The default value is three (eight pages at a time). There may be some small benefits in tuning this to a different value if your workload is swap-intensive. +Lower values mean lower latencies for initial faults, but at the same time +extra faults and I/O delays for following faults if they would have been part of +that consecutive pages readahead would have brought in. + ============================================================= panic_on_oom -- 1.7.0.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx206.postini.com [74.125.245.206]) by kanga.kvack.org (Postfix) with SMTP id 68D8B6B0081 for ; Tue, 15 May 2012 00:38:25 -0400 (EDT) Message-ID: <4FB1DDE0.2020007@kernel.org> Date: Tue, 15 May 2012 13:38:56 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: [PATCH 1/2] swap: allow swap readahead to be merged References: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> <1336996709-8304-2-git-send-email-ehrhardt@linux.vnet.ibm.com> In-Reply-To: <1336996709-8304-2-git-send-email-ehrhardt@linux.vnet.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: ehrhardt@linux.vnet.ibm.com Cc: linux-mm@kvack.org, axboe@kernel.dk, Hugh Dickins , Rik van Riel On 05/14/2012 08:58 PM, ehrhardt@linux.vnet.ibm.com wrote: > From: Christian Ehrhardt > > Swap readahead works fine, but the I/O to disk is almost always done in page > size requests, despite the fact that readahead submits 1< at a time. > On older kernels the old per device plugging behavior might have captured > this and merged the requests, but currently all comes down to much more I/Os > than required. > > On a single device this might not be an issue, but as soon as a server runs > on shared san resources savin I/Os not only improves swapin throughput but > also provides a lower resource utilization. > > With a load running KVM in a lot of memory overcommitment (the hot memory > is 1.5 times the host memory) swapping throughput improves significantly > and the lead feels more responsive as well as achieves more throughput. > > In a test setup with 16 swap disks running blocktrace on one of those disks > shows the improved merging: > Prior: > Reads Queued: 560,888, 2,243MiB Writes Queued: 226,242, 904,968KiB > Read Dispatches: 544,701, 2,243MiB Write Dispatches: 159,318, 904,968KiB > Reads Requeued: 0 Writes Requeued: 0 > Reads Completed: 544,716, 2,243MiB Writes Completed: 159,321, 904,980KiB > Read Merges: 16,187, 64,748KiB Write Merges: 61,744, 246,976KiB > IO unplugs: 149,614 Timer unplugs: 2,940 > > With the patch: > Reads Queued: 734,315, 2,937MiB Writes Queued: 300,188, 1,200MiB > Read Dispatches: 214,972, 2,937MiB Write Dispatches: 215,176, 1,200MiB > Reads Requeued: 0 Writes Requeued: 0 > Reads Completed: 214,971, 2,937MiB Writes Completed: 215,177, 1,200MiB > Read Merges: 519,343, 2,077MiB Write Merges: 73,325, 293,300KiB > IO unplugs: 337,130 Timer unplugs: 11,184 > > Signed-off-by: Christian Ehrhardt Reviewed-by: Minchan Kim It does make sense to me. > --- > mm/swap_state.c | 5 +++++ > 1 files changed, 5 insertions(+), 0 deletions(-) > > diff --git a/mm/swap_state.c b/mm/swap_state.c > index 4c5ff7f..c85b559 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -376,6 +377,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, > unsigned long offset = swp_offset(entry); > unsigned long start_offset, end_offset; > unsigned long mask = (1UL << page_cluster) - 1; > + struct blk_plug plug; > > /* Read a page_cluster sized and aligned cluster around offset. */ > start_offset = offset & ~mask; > @@ -383,6 +385,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, > if (!start_offset) /* First page is swap header. */ > start_offset++; > > + blk_start_plug(&plug); > for (offset = start_offset; offset <= end_offset ; offset++) { > /* Ok, do the async read-ahead now */ > page = read_swap_cache_async(swp_entry(swp_type(entry), offset), > @@ -391,6 +394,8 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, > continue; > page_cache_release(page); > } > + blk_finish_plug(&plug); > + > lru_add_drain(); /* Push any new pages onto the LRU now */ > return read_swap_cache_async(entry, gfp_mask, vma, addr); > } -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx123.postini.com [74.125.245.123]) by kanga.kvack.org (Postfix) with SMTP id 084B16B0081 for ; Tue, 15 May 2012 00:47:44 -0400 (EDT) Message-ID: <4FB1E00F.2000903@kernel.org> Date: Tue, 15 May 2012 13:48:15 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: [PATCH 2/2] documentation: update how page-cluster affects swap I/O References: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> <1336996709-8304-3-git-send-email-ehrhardt@linux.vnet.ibm.com> In-Reply-To: <1336996709-8304-3-git-send-email-ehrhardt@linux.vnet.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: ehrhardt@linux.vnet.ibm.com Cc: linux-mm@kvack.org, axboe@kernel.dk, Rik van Riel , Hugh Dickins , Andrew Morton On 05/14/2012 08:58 PM, ehrhardt@linux.vnet.ibm.com wrote: > From: Christian Ehrhardt > > Fix of the documentation of /proc/sys/vm/page-cluster to match the behavior of > the code and add some comments about what the tunable will change in that > behavior. > > Signed-off-by: Christian Ehrhardt > --- > Documentation/sysctl/vm.txt | 12 ++++++++++-- > 1 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt > index 96f0ee8..4d87dc0 100644 > --- a/Documentation/sysctl/vm.txt > +++ b/Documentation/sysctl/vm.txt > @@ -574,16 +574,24 @@ of physical RAM. See above. > > page-cluster > > -page-cluster controls the number of pages which are written to swap in > -a single attempt. The swap I/O size. > +page-cluster controls the number of pages up to which consecutive pages (if > +available) are read in from swap in a single attempt. This is the swap "If available" would be wrong in next kernel because recently Rik submit following patch, mm: make swapin readahead skip over holes http://marc.info/?l=linux-mm&m=132743264912987&w=4 > +counterpart to page cache readahead. > +The mentioned consecutivity is not in terms of virtual/physical addresses, > +but consecutive on swap space - that means they were swapped out together. > > It is a logarithmic value - setting it to zero means "1 page", setting > it to 1 means "2 pages", setting it to 2 means "4 pages", etc. > +Zero disables swap readahead completely. > > The default value is three (eight pages at a time). There may be some > small benefits in tuning this to a different value if your workload is > swap-intensive. > > +Lower values mean lower latencies for initial faults, but at the same time > +extra faults and I/O delays for following faults if they would have been part of > +that consecutive pages readahead would have brought in. > + > ============================================================= > > panic_on_oom Otherwise, Looks good to me. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx203.postini.com [74.125.245.203]) by kanga.kvack.org (Postfix) with SMTP id 70E096B0081 for ; Tue, 15 May 2012 00:58:40 -0400 (EDT) Message-ID: <4FB1E2A0.9050900@kernel.org> Date: Tue, 15 May 2012 13:59:12 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: [PATCH 0/2] swap: improve swap I/O rate References: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> In-Reply-To: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: ehrhardt@linux.vnet.ibm.com Cc: linux-mm@kvack.org, axboe@kernel.dk, Andrew Morton , Hugh Dickins , Rik van Riel On 05/14/2012 08:58 PM, ehrhardt@linux.vnet.ibm.com wrote: > From: Ehrhardt Christian > > From: Christian Ehrhardt > > In an memory overcommitment scneario with KVM I ran into a lot of wiats for > swap. While checking the I/O done on the swap disks I found almost all I/Os > to be done as single page 4k request. Despite the fact that swap in is a > batch of 1< pages written in shrink_page_list. > > [1/2 swap in improvment] > The read patch shows improvements of up to 50% swap throughput, much happier > guest systems and even when running with comparable throughput a lot I/O per > seconds saved leaving resources in the SAN for other consumers. > > [2/2 documentation] > While doing so I also realized that the documentation for > proc/sys/vm/page-cluster is no more matching the code > > [missing patch #3] > I tried to get a similar patch working for swap out in shrink_page_list. And > it worked in functional terms, but the additional mergin was negligible. I think we have already done it. Look at shrink_mem_cgroup_zone which ends up calling shrink_page_list so we already have applied I/O plugging. > Maybe the cond_resched triggers much mor often than I expected, I'm open for > suggestions regarding improving the pagout I/O sizes as well. We could enhance write out by batch like ext4_bio_write_page. > > Kind regards, > Christian Ehrhardt > > > Christian Ehrhardt (2): > swap: allow swap readahead to be merged > documentation: update how page-cluster affects swap I/O > > Documentation/sysctl/vm.txt | 12 ++++++++++-- > mm/swap_state.c | 5 +++++ > 2 files changed, 15 insertions(+), 2 deletions(-) > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: email@kvack.org > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx157.postini.com [74.125.245.157]) by kanga.kvack.org (Postfix) with SMTP id B7FC16B004D for ; Tue, 15 May 2012 13:43:54 -0400 (EDT) Message-ID: <4FB295C5.7080008@redhat.com> Date: Tue, 15 May 2012 13:43:33 -0400 From: Rik van Riel MIME-Version: 1.0 Subject: Re: [PATCH 1/2] swap: allow swap readahead to be merged References: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> <1336996709-8304-2-git-send-email-ehrhardt@linux.vnet.ibm.com> In-Reply-To: <1336996709-8304-2-git-send-email-ehrhardt@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: ehrhardt@linux.vnet.ibm.com Cc: linux-mm@kvack.org, axboe@kernel.dk On 05/14/2012 07:58 AM, ehrhardt@linux.vnet.ibm.com wrote: > From: Christian Ehrhardt > > Swap readahead works fine, but the I/O to disk is almost always done in page > size requests, despite the fact that readahead submits 1< at a time. > On older kernels the old per device plugging behavior might have captured > this and merged the requests, but currently all comes down to much more I/Os > than required. > > On a single device this might not be an issue, but as soon as a server runs > on shared san resources savin I/Os not only improves swapin throughput but > also provides a lower resource utilization. > Signed-off-by: Christian Ehrhardt Acked-by: Rik van Riel -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx142.postini.com [74.125.245.142]) by kanga.kvack.org (Postfix) with SMTP id CF9566B004D for ; Tue, 15 May 2012 14:24:23 -0400 (EDT) Message-ID: <4FB29F51.8060605@kernel.dk> Date: Tue, 15 May 2012 20:24:17 +0200 From: Jens Axboe MIME-Version: 1.0 Subject: Re: [PATCH 0/2] swap: improve swap I/O rate References: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> In-Reply-To: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: ehrhardt@linux.vnet.ibm.com Cc: linux-mm@kvack.org On 2012-05-14 13:58, ehrhardt@linux.vnet.ibm.com wrote: > From: Ehrhardt Christian > > From: Christian Ehrhardt > > In an memory overcommitment scneario with KVM I ran into a lot of wiats for > swap. While checking the I/O done on the swap disks I found almost all I/Os > to be done as single page 4k request. Despite the fact that swap in is a > batch of 1< pages written in shrink_page_list. > > [1/2 swap in improvment] > The read patch shows improvements of up to 50% swap throughput, much happier > guest systems and even when running with comparable throughput a lot I/O per > seconds saved leaving resources in the SAN for other consumers. > > [2/2 documentation] > While doing so I also realized that the documentation for > proc/sys/vm/page-cluster is no more matching the code > > [missing patch #3] > I tried to get a similar patch working for swap out in shrink_page_list. And > it worked in functional terms, but the additional mergin was negligible. > Maybe the cond_resched triggers much mor often than I expected, I'm open for > suggestions regarding improving the pagout I/O sizes as well. > > Kind regards, > Christian Ehrhardt > > > Christian Ehrhardt (2): > swap: allow swap readahead to be merged > documentation: update how page-cluster affects swap I/O Looks good to me, you can add my acked-by to both of them. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx166.postini.com [74.125.245.166]) by kanga.kvack.org (Postfix) with SMTP id 103DB6B0081 for ; Mon, 21 May 2012 03:25:25 -0400 (EDT) Received: from /spool/local by e06smtp17.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 21 May 2012 08:25:22 +0100 Received: from d06av04.portsmouth.uk.ibm.com (d06av04.portsmouth.uk.ibm.com [9.149.37.216]) by d06nrmr1806.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q4L7OrOc2474212 for ; Mon, 21 May 2012 08:24:54 +0100 Received: from d06av04.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av04.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q4L7OoNG021594 for ; Mon, 21 May 2012 01:24:50 -0600 Message-ID: <4FB9EDC1.3070401@linux.vnet.ibm.com> Date: Mon, 21 May 2012 09:24:49 +0200 From: Christian Ehrhardt MIME-Version: 1.0 Subject: Re: [PATCH 2/2] documentation: update how page-cluster affects swap I/O References: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> <1336996709-8304-3-git-send-email-ehrhardt@linux.vnet.ibm.com> <4FB1E00F.2000903@kernel.org> In-Reply-To: <4FB1E00F.2000903@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: linux-mm@kvack.org, axboe@kernel.dk, Rik van Riel , Hugh Dickins , Andrew Morton On 05/15/2012 06:48 AM, Minchan Kim wrote: > On 05/14/2012 08:58 PM, ehrhardt@linux.vnet.ibm.com wrote: > >> From: Christian Ehrhardt >> >> Fix of the documentation of /proc/sys/vm/page-cluster to match the behavior of >> the code and add some comments about what the tunable will change in that >> behavior. >> >> Signed-off-by: Christian Ehrhardt >> --- >> Documentation/sysctl/vm.txt | 12 ++++++++++-- >> 1 files changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt >> index 96f0ee8..4d87dc0 100644 >> --- a/Documentation/sysctl/vm.txt >> +++ b/Documentation/sysctl/vm.txt >> @@ -574,16 +574,24 @@ of physical RAM. See above. >> >> page-cluster >> >> -page-cluster controls the number of pages which are written to swap in >> -a single attempt. The swap I/O size. >> +page-cluster controls the number of pages up to which consecutive pages (if >> +available) are read in from swap in a single attempt. This is the swap > > > "If available" would be wrong in next kernel because recently Rik submit following patch, > > mm: make swapin readahead skip over holes > http://marc.info/?l=linux-mm&m=132743264912987&w=4 > > You're right - its not severely wrong, but if we are fixing the documentation we can do it right. I'll send a 2nd version of the patch series with this adapted and all the acks I got so far added. -- GrA 1/4 sse / regards, Christian Ehrhardt IBM Linux Technology Center, System z Linux Performance -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx159.postini.com [74.125.245.159]) by kanga.kvack.org (Postfix) with SMTP id 3E3CF6B0081 for ; Mon, 21 May 2012 03:51:32 -0400 (EDT) Received: from /spool/local by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 21 May 2012 08:51:30 +0100 Received: from d06av05.portsmouth.uk.ibm.com (d06av05.portsmouth.uk.ibm.com [9.149.37.229]) by d06nrmr1806.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q4L7pSxn2150612 for ; Mon, 21 May 2012 08:51:28 +0100 Received: from d06av05.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av05.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q4L7pSVA007928 for ; Mon, 21 May 2012 01:51:28 -0600 Message-ID: <4FB9F3FF.7030709@linux.vnet.ibm.com> Date: Mon, 21 May 2012 09:51:27 +0200 From: Christian Ehrhardt MIME-Version: 1.0 Subject: Re: [PATCH 0/2] swap: improve swap I/O rate References: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> <4FB1E2A0.9050900@kernel.org> In-Reply-To: <4FB1E2A0.9050900@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: linux-mm@kvack.org, axboe@kernel.dk, Andrew Morton , Hugh Dickins , Rik van Riel [...] >> [missing patch #3] >> I tried to get a similar patch working for swap out in shrink_page_list. And >> it worked in functional terms, but the additional mergin was negligible. > > > I think we have already done it. > Look at shrink_mem_cgroup_zone which ends up calling shrink_page_list so we already have applied > I/O plugging. > I saw that code and it is part of the kernel I used to test my patches. But despite that code and my additional experiments of plug/unplug in shrink_page_list the effective I/O size of swap write stays at almost 4k. Thereby so far I can tell you that the plugs in shrink_page_list and shrink_mem_cgroup_zone aren't sufficient - at least for my case. You saw the blocktrace summaries in my first mail, an excerpt of a write submission stream looks like that: 94,4 10 465 0.023520923 116 A W 28868648 + 8 <- (94,5) 28868456 94,5 10 466 0.023521173 116 Q W 28868648 + 8 [kswapd0] 94,5 10 467 0.023522048 116 G W 28868648 + 8 [kswapd0] 94,5 10 468 0.023522235 116 P N [kswapd0] 94,5 10 469 0.023759892 116 I W 28868648 + 8 ( 237844) [kswapd0] 94,5 10 470 0.023760079 116 U N [kswapd0] 1 94,5 10 471 0.023760360 116 D W 28868648 + 8 ( 468) [kswapd0] 94,4 10 472 0.023891235 116 A W 28868656 + 8 <- (94,5) 28868464 94,5 10 473 0.023891454 116 Q W 28868656 + 8 [kswapd0] 94,5 10 474 0.023892110 116 G W 28868656 + 8 [kswapd0] 94,5 10 475 0.023944610 116 I W 28868656 + 8 ( 52500) [kswapd0] 94,5 10 476 0.023944735 116 U N [kswapd0] 1 94,5 10 477 0.023944892 116 D W 28868656 + 8 ( 282) [kswapd0] 94,5 16 19 0.024023192 16033 C W 28868648 + 8 ( 262832) [0] 94,5 24 37 0.024196752 14526 C W 28868656 + 8 ( 251860) [0] [...] But we can split this discussion from my other two patches and I would be happy to provide my test environment for further tests if there are new suggestions/patches/... >> Maybe the cond_resched triggers much mor often than I expected, I'm open for >> suggestions regarding improving the pagout I/O sizes as well. > > > We could enhance write out by batch like ext4_bio_write_page. > Do you mean the changes brought by "bd2d0210 ext4: use bio layer instead of buffer layer in mpage_da_submit_io" ? -- GrA 1/4 sse / regards, Christian Ehrhardt IBM Linux Technology Center, System z Linux Performance -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx189.postini.com [74.125.245.189]) by kanga.kvack.org (Postfix) with SMTP id D0D496B0081 for ; Mon, 21 May 2012 04:09:28 -0400 (EDT) Received: from /spool/local by e06smtp16.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 21 May 2012 09:09:27 +0100 Received: from d06av09.portsmouth.uk.ibm.com (d06av09.portsmouth.uk.ibm.com [9.149.37.250]) by d06nrmr1407.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q4L89Kbg1945686 for ; Mon, 21 May 2012 09:09:20 +0100 Received: from d06av09.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av09.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q4L89J4K015232 for ; Mon, 21 May 2012 02:09:19 -0600 From: ehrhardt@linux.vnet.ibm.com Subject: [PATCH 2/2] documentation: update how page-cluster affects swap I/O Date: Mon, 21 May 2012 10:09:15 +0200 Message-Id: <1337587755-4743-3-git-send-email-ehrhardt@linux.vnet.ibm.com> In-Reply-To: <1337587755-4743-1-git-send-email-ehrhardt@linux.vnet.ibm.com> References: <1337587755-4743-1-git-send-email-ehrhardt@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org Cc: axboe@kernel.dk, Christian Ehrhardt From: Christian Ehrhardt Fix of the documentation of /proc/sys/vm/page-cluster to match the behavior of the code and add some comments about what the tunable will change in that behavior. Signed-off-by: Christian Ehrhardt Acked-by: Jens Axboe --- Documentation/sysctl/vm.txt | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 96f0ee8..4d87dc0 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -574,16 +574,24 @@ of physical RAM. See above. page-cluster -page-cluster controls the number of pages which are written to swap in -a single attempt. The swap I/O size. +page-cluster controls the number of pages up to which consecutive pages +are read in from swap in a single attempt. This is the swap counterpart +to page cache readahead. +The mentioned consecutivity is not in terms of virtual/physical addresses, +but consecutive on swap space - that means they were swapped out together. It is a logarithmic value - setting it to zero means "1 page", setting it to 1 means "2 pages", setting it to 2 means "4 pages", etc. +Zero disables swap readahead completely. The default value is three (eight pages at a time). There may be some small benefits in tuning this to a different value if your workload is swap-intensive. +Lower values mean lower latencies for initial faults, but at the same time +extra faults and I/O delays for following faults if they would have been part of +that consecutive pages readahead would have brought in. + ============================================================= panic_on_oom -- 1.7.0.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx108.postini.com [74.125.245.108]) by kanga.kvack.org (Postfix) with SMTP id B4E9B6B0081 for ; Mon, 21 May 2012 04:46:27 -0400 (EDT) Message-ID: <4FBA00E1.2020103@kernel.org> Date: Mon, 21 May 2012 17:46:25 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: [PATCH 0/2] swap: improve swap I/O rate References: <1336996709-8304-1-git-send-email-ehrhardt@linux.vnet.ibm.com> <4FB1E2A0.9050900@kernel.org> <4FB9F3FF.7030709@linux.vnet.ibm.com> In-Reply-To: <4FB9F3FF.7030709@linux.vnet.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christian Ehrhardt Cc: linux-mm@kvack.org, axboe@kernel.dk, Andrew Morton , Hugh Dickins , Rik van Riel On 05/21/2012 04:51 PM, Christian Ehrhardt wrote: > [...] > >>> [missing patch #3] >>> I tried to get a similar patch working for swap out in >>> shrink_page_list. And >>> it worked in functional terms, but the additional mergin was negligible. >> >> >> I think we have already done it. >> Look at shrink_mem_cgroup_zone which ends up calling shrink_page_list >> so we already have applied >> I/O plugging. >> > > I saw that code and it is part of the kernel I used to test my patches. > But despite that code and my additional experiments of plug/unplug in > shrink_page_list the effective I/O size of swap write stays at almost 4k. I meant your plugging in shrink_page_list is redundant > > Thereby so far I can tell you that the plugs in shrink_page_list and > shrink_mem_cgroup_zone aren't sufficient - at least for my case. Yeb. > You saw the blocktrace summaries in my first mail, an excerpt of a write > submission stream looks like that: > > 94,4 10 465 0.023520923 116 A W 28868648 + 8 <- (94,5) > 28868456 > 94,5 10 466 0.023521173 116 Q W 28868648 + 8 [kswapd0] > 94,5 10 467 0.023522048 116 G W 28868648 + 8 [kswapd0] > 94,5 10 468 0.023522235 116 P N [kswapd0] > 94,5 10 469 0.023759892 116 I W 28868648 + 8 ( 237844) > [kswapd0] > 94,5 10 470 0.023760079 116 U N [kswapd0] 1 > 94,5 10 471 0.023760360 116 D W 28868648 + 8 ( 468) > [kswapd0] > 94,4 10 472 0.023891235 116 A W 28868656 + 8 <- (94,5) > 28868464 > 94,5 10 473 0.023891454 116 Q W 28868656 + 8 [kswapd0] > 94,5 10 474 0.023892110 116 G W 28868656 + 8 [kswapd0] > 94,5 10 475 0.023944610 116 I W 28868656 + 8 ( 52500) > [kswapd0] > 94,5 10 476 0.023944735 116 U N [kswapd0] 1 > 94,5 10 477 0.023944892 116 D W 28868656 + 8 ( 282) > [kswapd0] > 94,5 16 19 0.024023192 16033 C W 28868648 + 8 ( 262832) [0] > 94,5 24 37 0.024196752 14526 C W 28868656 + 8 ( 251860) [0] > [...] > > But we can split this discussion from my other two patches and I would > be happy to provide my test environment for further tests if there are > new suggestions/patches/... > >>> Maybe the cond_resched triggers much mor often than I expected, I'm >>> open for >>> suggestions regarding improving the pagout I/O sizes as well. >> >> >> We could enhance write out by batch like ext4_bio_write_page. >> > > Do you mean the changes brought by "bd2d0210 ext4: use bio layer instead > of buffer layer in mpage_da_submit_io" ? Yeb, I think it's helpful for your case but it's not trivial to implement it, IMHO. > > > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx150.postini.com [74.125.245.150]) by kanga.kvack.org (Postfix) with SMTP id D0D9F6B0081 for ; Mon, 21 May 2012 04:48:10 -0400 (EDT) Message-ID: <4FBA0148.10205@kernel.org> Date: Mon, 21 May 2012 17:48:08 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: [PATCH 2/2] documentation: update how page-cluster affects swap I/O References: <1337587755-4743-1-git-send-email-ehrhardt@linux.vnet.ibm.com> <1337587755-4743-3-git-send-email-ehrhardt@linux.vnet.ibm.com> In-Reply-To: <1337587755-4743-3-git-send-email-ehrhardt@linux.vnet.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: ehrhardt@linux.vnet.ibm.com Cc: linux-mm@kvack.org, axboe@kernel.dk On 05/21/2012 05:09 PM, ehrhardt@linux.vnet.ibm.com wrote: > From: Christian Ehrhardt > > Fix of the documentation of /proc/sys/vm/page-cluster to match the behavior of > the code and add some comments about what the tunable will change in that > behavior. > > Signed-off-by: Christian Ehrhardt > Acked-by: Jens Axboe Reviewed-by: Minchan Kim -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx187.postini.com [74.125.245.187]) by kanga.kvack.org (Postfix) with SMTP id C87116B0062 for ; Mon, 4 Jun 2012 04:34:05 -0400 (EDT) Received: from /spool/local by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 4 Jun 2012 09:34:03 +0100 Received: from d06av12.portsmouth.uk.ibm.com (d06av12.portsmouth.uk.ibm.com [9.149.37.247]) by d06nrmr1707.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q548XUov2388042 for ; Mon, 4 Jun 2012 09:33:30 +0100 Received: from d06av12.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av12.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q548XT75007331 for ; Mon, 4 Jun 2012 02:33:30 -0600 From: ehrhardt@linux.vnet.ibm.com Subject: [PATCH 2/2] documentation: update how page-cluster affects swap I/O Date: Mon, 4 Jun 2012 10:33:23 +0200 Message-Id: <1338798803-5009-3-git-send-email-ehrhardt@linux.vnet.ibm.com> In-Reply-To: <1338798803-5009-1-git-send-email-ehrhardt@linux.vnet.ibm.com> References: <1338798803-5009-1-git-send-email-ehrhardt@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: axboe@kernel.dk, hughd@google.com, minchan@kernel.org, Christian Ehrhardt From: Christian Ehrhardt Fix of the documentation of /proc/sys/vm/page-cluster to match the behavior of the code and add some comments about what the tunable will change in that behavior. Signed-off-by: Christian Ehrhardt Acked-by: Jens Axboe Reviewed-by: Minchan Kim --- Documentation/sysctl/vm.txt | 12 ++++++++++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 96f0ee8..4d87dc0 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -574,16 +574,24 @@ of physical RAM. See above. page-cluster -page-cluster controls the number of pages which are written to swap in -a single attempt. The swap I/O size. +page-cluster controls the number of pages up to which consecutive pages +are read in from swap in a single attempt. This is the swap counterpart +to page cache readahead. +The mentioned consecutivity is not in terms of virtual/physical addresses, +but consecutive on swap space - that means they were swapped out together. It is a logarithmic value - setting it to zero means "1 page", setting it to 1 means "2 pages", setting it to 2 means "4 pages", etc. +Zero disables swap readahead completely. The default value is three (eight pages at a time). There may be some small benefits in tuning this to a different value if your workload is swap-intensive. +Lower values mean lower latencies for initial faults, but at the same time +extra faults and I/O delays for following faults if they would have been part of +that consecutive pages readahead would have brought in. + ============================================================= panic_on_oom -- 1.7.0.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org