From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx107.postini.com [74.125.245.107]) by kanga.kvack.org (Postfix) with SMTP id 632256B0108 for ; Mon, 16 Apr 2012 14:34:24 -0400 (EDT) MIME-Version: 1.0 Message-ID: <7297ae3b-f3e1-480b-838f-69b0e09a733d@default> Date: Mon, 16 Apr 2012 11:34:12 -0700 (PDT) From: Dan Magenheimer Subject: Followup: [PATCH -mm] make swapin readahead skip over holes Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: riel@redhat.com Cc: linux-kernel@vger.kernel.org, linux-mm Hi Rik -- I saw this patch in 3.4-rc1 (because it caused a minor merge conflict with frontswap) and wondered about its impact. Since I had a server still set up from running benchmarks before LSFMM, I ran my kernel compile -jN workload (with N varying from 4 to 40) on 1GB of RAM, on 3.4-rc2 both with and without this patch. For values of N=3D24 and N=3D28, your patch made the workload run 4-9% percent faster. For N=3D16 and N=3D20, it was 5-10% slower. And for N=3D36 and N=3D40, it was 30%-40% slower! Is this expected? Since the swap "disk" is a partition on the one active drive, maybe the advantage is lost due to contention? Thanks, Dan commit removed 67f96aa252e606cdf6c3cf1032952ec207ec0cf0 Workload: =09kernel compile "make -jN" with varying N =09measurements in elapsed seconds =09boot kernel: 3.4-rc2 =09Oracle Linux 6 distro with ext4 =09fresh reboot for each test run =09all tests run as root in multi-user mode Hardware: =09Dell Optiplex 790 =3D ~$500 =09Intel Core i5-2400 @ 3.10 GHz, 4coreX2thread, 6M cache =091GB RAM DDR3 1333Mhz (to force swapping) =09One 7200rpm SATA 6.0Gb/s drive with 8MB cache =0910GB swap partition -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx121.postini.com [74.125.245.121]) by kanga.kvack.org (Postfix) with SMTP id BD5E26B0083 for ; Mon, 16 Apr 2012 16:13:16 -0400 (EDT) Message-ID: <4F8C7D59.1000402@redhat.com> Date: Mon, 16 Apr 2012 16:13:13 -0400 From: Rik van Riel MIME-Version: 1.0 Subject: Re: Followup: [PATCH -mm] make swapin readahead skip over holes References: <7297ae3b-f3e1-480b-838f-69b0e09a733d@default> In-Reply-To: <7297ae3b-f3e1-480b-838f-69b0e09a733d@default> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dan Magenheimer Cc: linux-kernel@vger.kernel.org, linux-mm On 04/16/2012 02:34 PM, Dan Magenheimer wrote: > Hi Rik -- > > I saw this patch in 3.4-rc1 (because it caused a minor merge > conflict with frontswap) and wondered about its impact. > Since I had a server still set up from running benchmarks > before LSFMM, I ran my kernel compile -jN workload (with > N varying from 4 to 40) on 1GB of RAM, on 3.4-rc2 both with > and without this patch. > > For values of N=24 and N=28, your patch made the workload > run 4-9% percent faster. For N=16 and N=20, it was 5-10% > slower. And for N=36 and N=40, it was 30%-40% slower! > > Is this expected? Since the swap "disk" is a partition > on the one active drive, maybe the advantage is lost due > to contention? There are several things going on here: 1) you are running a workload that thrashes 2) the speed at which data is swapped in is increased with this patch 3) with only 1GB memory, the inactive anon list is the same size as the active anon list 4) the above points combined mean that less of the working set could be in memory at once One solution may be to decrease the swap cluster for small systems, when they are thrashing. On the other hand, for most systems swap is very much a special circumstance, and you want to focus on quickly moving excess stuff into swap, and moving it back into memory when needed. Workloads that thrash are very much an exception, and probably not what we should optimize for. -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx141.postini.com [74.125.245.141]) by kanga.kvack.org (Postfix) with SMTP id 7A8F46B004A for ; Tue, 17 Apr 2012 11:21:10 -0400 (EDT) MIME-Version: 1.0 Message-ID: Date: Tue, 17 Apr 2012 08:20:58 -0700 (PDT) From: Dan Magenheimer Subject: RE: Followup: [PATCH -mm] make swapin readahead skip over holes References: <7297ae3b-f3e1-480b-838f-69b0e09a733d@default> <4F8C7D59.1000402@redhat.com> In-Reply-To: <4F8C7D59.1000402@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel Cc: linux-kernel@vger.kernel.org, linux-mm > From: Rik van Riel [mailto:riel@redhat.com] > Subject: Re: Followup: [PATCH -mm] make swapin readahead skip over holes >=20 > On 04/16/2012 02:34 PM, Dan Magenheimer wrote: > > Hi Rik -- > > > > For values of N=3D24 and N=3D28, your patch made the workload > > run 4-9% percent faster. For N=3D16 and N=3D20, it was 5-10% > > slower. And for N=3D36 and N=3D40, it was 30%-40% slower! > > > > Is this expected? Since the swap "disk" is a partition > > on the one active drive, maybe the advantage is lost due > > to contention? >=20 > There are several things going on here: >=20 > 1) you are running a workload that thrashes >=20 > 2) the speed at which data is swapped in is increased > with this patch >=20 > 3) with only 1GB memory, the inactive anon list is > the same size as the active anon list >=20 > 4) the above points combined mean that less of the > working set could be in memory at once >=20 > One solution may be to decrease the swap cluster for > small systems, when they are thrashing. >=20 > On the other hand, for most systems swap is very much > a special circumstance, and you want to focus on quickly > moving excess stuff into swap, and moving it back into > memory when needed. Hmmm... as I look at this patch more, I think I get a picture of what's going on and I'm still concerned. Please correct me if I am misunderstanding: What the patch does is increase the average size of a "cluster" of sequential pages brought in per "read" from the swap device. As a result there are more pages brought back into memory "speculatively" because it is presumably cheaper to bring in more pages per disk seek, even if it results in a lower "swapcache hit rate". In effect, you've done the equivalent of increasing the default swap cluster size (on average). If the above is wrong, please cut here and ignore the following. :-) But in case it is right (or close enough), let me continue... In other words, you are both presuming a "swap workload" that is more sequential than random for which this patch improves performance, and assuming a "swap device"=20 for which the cost of a seek is high enough to overcome the costs of filling the swap cache with pages that won't be used. While it is easy to write a simple test/benchmark that swaps a lot (and we probably all have similar test code that writes data into a huge bigger-than-RAM array and then reads it back), such a test/benchmark is usually sequential, so one would assume most swap testing is done with a sequential-favoring workload. The kernbench workload apparently exercises swap quite a bit more randomly and your patch makes it run slower for low and high levels of swapping, while faster for moderate swapping. I also suspect (without proof) that the patch will result in lower performance on non-rotating devices, such as SSDs. (Sure one can change the swap cluster size to 1, but how many users or even sysadmins know such a thing even exists... so the default is important.) I'm no I/O expert, but I suspect if one of the Linux I/O developers proposed a patch that unilaterally made all sequential I/O faster and all random I/O slower, it would get torn to pieces. I'm certainly not trying to tear your patch to pieces, just trying to evaluate it. Hope that's OK. Thanks, Dan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx180.postini.com [74.125.245.180]) by kanga.kvack.org (Postfix) with SMTP id 7C6DC6B004A for ; Tue, 17 Apr 2012 15:26:24 -0400 (EDT) Message-ID: <4F8DC3DC.7040408@redhat.com> Date: Tue, 17 Apr 2012 15:26:20 -0400 From: Rik van Riel MIME-Version: 1.0 Subject: Re: Followup: [PATCH -mm] make swapin readahead skip over holes References: <7297ae3b-f3e1-480b-838f-69b0e09a733d@default> <4F8C7D59.1000402@redhat.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dan Magenheimer Cc: linux-kernel@vger.kernel.org, linux-mm On 04/17/2012 11:20 AM, Dan Magenheimer wrote: > In other words, you are both presuming a "swap workload" > that is more sequential than random for which this patch > improves performance, and assuming a "swap device" > for which the cost of a seek is high enough to overcome > the costs of filling the swap cache with pages that won't > be used. Indeed, on spinning media the cost of seeking to a cluster and reading one page is essentially the same as the cost of seeking to a cluster and reading the whole thing. > While it is easy to write a simple test/benchmark that > swaps a lot (and we probably all have similar test code > that writes data into a huge bigger-than-RAM array and then > reads it back), such a test/benchmark is usually sequential, > so one would assume most swap testing is done with a > sequential-favoring workload. Lots of programs allocate fairly large memory objects, and access them again in the same large chunks. Take a look at a desktop application like a web browser, for example. > The kernbench workload > apparently exercises swap quite a bit more randomly and > your patch makes it run slower for low and high levels > of swapping, while faster for moderate swapping. The kernbench workload consists of a large number of fairly small, short lived processes. I suspect this is a very non-typical workload to run into swap, on today's systems. A more typical workload consists of multiple large processes, with the working set moving from one part of memory (now inactive) to somewhere else. We need to maximize swap IO throughput in order to allow the system to quickly move to the new working set. > I also suspect (without proof) that the patch will > result in lower performance on non-rotating devices, such > as SSDs. > > (Sure one can change the swap cluster size to 1, but how > many users or even sysadmins know such a thing even > exists... so the default is important.) If the default should be changed for some systems, that is worth doing. How does your test run with smaller swap cluster sizes? Would a swap cluster of 4 or 5 be closer to optimal for a 1GB system? -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755256Ab2DPSe2 (ORCPT ); Mon, 16 Apr 2012 14:34:28 -0400 Received: from acsinet15.oracle.com ([141.146.126.227]:31230 "EHLO acsinet15.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752699Ab2DPSe0 convert rfc822-to-8bit (ORCPT ); Mon, 16 Apr 2012 14:34:26 -0400 MIME-Version: 1.0 Message-ID: <7297ae3b-f3e1-480b-838f-69b0e09a733d@default> Date: Mon, 16 Apr 2012 11:34:12 -0700 (PDT) From: Dan Magenheimer To: riel@redhat.com Cc: linux-kernel@vger.kernel.org, linux-mm Subject: Followup: [PATCH -mm] make swapin readahead skip over holes X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.6 (510070) [OL 12.0.6607.1000 (x86)] Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT X-Source-IP: ucsinet22.oracle.com [156.151.31.94] X-CT-RefId: str=0001.0A090202.4F8C662F.005C,ss=1,re=0.000,fgs=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Rik -- I saw this patch in 3.4-rc1 (because it caused a minor merge conflict with frontswap) and wondered about its impact. Since I had a server still set up from running benchmarks before LSFMM, I ran my kernel compile -jN workload (with N varying from 4 to 40) on 1GB of RAM, on 3.4-rc2 both with and without this patch. For values of N=24 and N=28, your patch made the workload run 4-9% percent faster. For N=16 and N=20, it was 5-10% slower. And for N=36 and N=40, it was 30%-40% slower! Is this expected? Since the swap "disk" is a partition on the one active drive, maybe the advantage is lost due to contention? Thanks, Dan commit removed 67f96aa252e606cdf6c3cf1032952ec207ec0cf0 Workload: kernel compile "make -jN" with varying N measurements in elapsed seconds boot kernel: 3.4-rc2 Oracle Linux 6 distro with ext4 fresh reboot for each test run all tests run as root in multi-user mode Hardware: Dell Optiplex 790 = ~$500 Intel Core i5-2400 @ 3.10 GHz, 4coreX2thread, 6M cache 1GB RAM DDR3 1333Mhz (to force swapping) One 7200rpm SATA 6.0Gb/s drive with 8MB cache 10GB swap partition From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755197Ab2DPUNX (ORCPT ); Mon, 16 Apr 2012 16:13:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:23098 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755128Ab2DPUNT (ORCPT ); Mon, 16 Apr 2012 16:13:19 -0400 Message-ID: <4F8C7D59.1000402@redhat.com> Date: Mon, 16 Apr 2012 16:13:13 -0400 From: Rik van Riel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: Dan Magenheimer CC: linux-kernel@vger.kernel.org, linux-mm Subject: Re: Followup: [PATCH -mm] make swapin readahead skip over holes References: <7297ae3b-f3e1-480b-838f-69b0e09a733d@default> In-Reply-To: <7297ae3b-f3e1-480b-838f-69b0e09a733d@default> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/16/2012 02:34 PM, Dan Magenheimer wrote: > Hi Rik -- > > I saw this patch in 3.4-rc1 (because it caused a minor merge > conflict with frontswap) and wondered about its impact. > Since I had a server still set up from running benchmarks > before LSFMM, I ran my kernel compile -jN workload (with > N varying from 4 to 40) on 1GB of RAM, on 3.4-rc2 both with > and without this patch. > > For values of N=24 and N=28, your patch made the workload > run 4-9% percent faster. For N=16 and N=20, it was 5-10% > slower. And for N=36 and N=40, it was 30%-40% slower! > > Is this expected? Since the swap "disk" is a partition > on the one active drive, maybe the advantage is lost due > to contention? There are several things going on here: 1) you are running a workload that thrashes 2) the speed at which data is swapped in is increased with this patch 3) with only 1GB memory, the inactive anon list is the same size as the active anon list 4) the above points combined mean that less of the working set could be in memory at once One solution may be to decrease the swap cluster for small systems, when they are thrashing. On the other hand, for most systems swap is very much a special circumstance, and you want to focus on quickly moving excess stuff into swap, and moving it back into memory when needed. Workloads that thrash are very much an exception, and probably not what we should optimize for. -- All rights reversed From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756245Ab2DQPVO (ORCPT ); Tue, 17 Apr 2012 11:21:14 -0400 Received: from rcsinet15.oracle.com ([148.87.113.117]:38331 "EHLO rcsinet15.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752987Ab2DQPVN convert rfc822-to-8bit (ORCPT ); Tue, 17 Apr 2012 11:21:13 -0400 MIME-Version: 1.0 Message-ID: Date: Tue, 17 Apr 2012 08:20:58 -0700 (PDT) From: Dan Magenheimer To: Rik van Riel Cc: linux-kernel@vger.kernel.org, linux-mm Subject: RE: Followup: [PATCH -mm] make swapin readahead skip over holes References: <7297ae3b-f3e1-480b-838f-69b0e09a733d@default> <4F8C7D59.1000402@redhat.com> In-Reply-To: <4F8C7D59.1000402@redhat.com> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.6 (510070) [OL 12.0.6607.1000 (x86)] Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-CT-RefId: str=0001.0A090201.4F8D8A65.004B,ss=1,re=0.000,fgs=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > From: Rik van Riel [mailto:riel@redhat.com] > Subject: Re: Followup: [PATCH -mm] make swapin readahead skip over holes > > On 04/16/2012 02:34 PM, Dan Magenheimer wrote: > > Hi Rik -- > > > > For values of N=24 and N=28, your patch made the workload > > run 4-9% percent faster. For N=16 and N=20, it was 5-10% > > slower. And for N=36 and N=40, it was 30%-40% slower! > > > > Is this expected? Since the swap "disk" is a partition > > on the one active drive, maybe the advantage is lost due > > to contention? > > There are several things going on here: > > 1) you are running a workload that thrashes > > 2) the speed at which data is swapped in is increased > with this patch > > 3) with only 1GB memory, the inactive anon list is > the same size as the active anon list > > 4) the above points combined mean that less of the > working set could be in memory at once > > One solution may be to decrease the swap cluster for > small systems, when they are thrashing. > > On the other hand, for most systems swap is very much > a special circumstance, and you want to focus on quickly > moving excess stuff into swap, and moving it back into > memory when needed. Hmmm... as I look at this patch more, I think I get a picture of what's going on and I'm still concerned. Please correct me if I am misunderstanding: What the patch does is increase the average size of a "cluster" of sequential pages brought in per "read" from the swap device. As a result there are more pages brought back into memory "speculatively" because it is presumably cheaper to bring in more pages per disk seek, even if it results in a lower "swapcache hit rate". In effect, you've done the equivalent of increasing the default swap cluster size (on average). If the above is wrong, please cut here and ignore the following. :-) But in case it is right (or close enough), let me continue... In other words, you are both presuming a "swap workload" that is more sequential than random for which this patch improves performance, and assuming a "swap device" for which the cost of a seek is high enough to overcome the costs of filling the swap cache with pages that won't be used. While it is easy to write a simple test/benchmark that swaps a lot (and we probably all have similar test code that writes data into a huge bigger-than-RAM array and then reads it back), such a test/benchmark is usually sequential, so one would assume most swap testing is done with a sequential-favoring workload. The kernbench workload apparently exercises swap quite a bit more randomly and your patch makes it run slower for low and high levels of swapping, while faster for moderate swapping. I also suspect (without proof) that the patch will result in lower performance on non-rotating devices, such as SSDs. (Sure one can change the swap cluster size to 1, but how many users or even sysadmins know such a thing even exists... so the default is important.) I'm no I/O expert, but I suspect if one of the Linux I/O developers proposed a patch that unilaterally made all sequential I/O faster and all random I/O slower, it would get torn to pieces. I'm certainly not trying to tear your patch to pieces, just trying to evaluate it. Hope that's OK. Thanks, Dan From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751837Ab2DQT02 (ORCPT ); Tue, 17 Apr 2012 15:26:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:12360 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750937Ab2DQT01 (ORCPT ); Tue, 17 Apr 2012 15:26:27 -0400 Message-ID: <4F8DC3DC.7040408@redhat.com> Date: Tue, 17 Apr 2012 15:26:20 -0400 From: Rik van Riel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: Dan Magenheimer CC: linux-kernel@vger.kernel.org, linux-mm Subject: Re: Followup: [PATCH -mm] make swapin readahead skip over holes References: <7297ae3b-f3e1-480b-838f-69b0e09a733d@default> <4F8C7D59.1000402@redhat.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/17/2012 11:20 AM, Dan Magenheimer wrote: > In other words, you are both presuming a "swap workload" > that is more sequential than random for which this patch > improves performance, and assuming a "swap device" > for which the cost of a seek is high enough to overcome > the costs of filling the swap cache with pages that won't > be used. Indeed, on spinning media the cost of seeking to a cluster and reading one page is essentially the same as the cost of seeking to a cluster and reading the whole thing. > While it is easy to write a simple test/benchmark that > swaps a lot (and we probably all have similar test code > that writes data into a huge bigger-than-RAM array and then > reads it back), such a test/benchmark is usually sequential, > so one would assume most swap testing is done with a > sequential-favoring workload. Lots of programs allocate fairly large memory objects, and access them again in the same large chunks. Take a look at a desktop application like a web browser, for example. > The kernbench workload > apparently exercises swap quite a bit more randomly and > your patch makes it run slower for low and high levels > of swapping, while faster for moderate swapping. The kernbench workload consists of a large number of fairly small, short lived processes. I suspect this is a very non-typical workload to run into swap, on today's systems. A more typical workload consists of multiple large processes, with the working set moving from one part of memory (now inactive) to somewhere else. We need to maximize swap IO throughput in order to allow the system to quickly move to the new working set. > I also suspect (without proof) that the patch will > result in lower performance on non-rotating devices, such > as SSDs. > > (Sure one can change the swap cluster size to 1, but how > many users or even sysadmins know such a thing even > exists... so the default is important.) If the default should be changed for some systems, that is worth doing. How does your test run with smaller swap cluster sizes? Would a swap cluster of 4 or 5 be closer to optimal for a 1GB system? -- All rights reversed