* HIGHMEM4G config for 1GB RAM on desktop?
@ 2004-08-02 21:02 Steve Snyder
2004-08-02 21:32 ` Bart Alewijnse
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: Steve Snyder @ 2004-08-02 21:02 UTC (permalink / raw)
To: Linux Kernel Mailing List
There seems to be a controversy about the use of the CONFIG_HIGHMEM4G
kernel configuration. After reading many posts on the subject, I still
don't know which setting is best for me.
My x86 system has 1.0GB of installed memory and is primarily used as a
desktop environment. I don't have any SCSI devices that might require a
high-memory buffer. Should I enable the CONFIG_HIGHMEM4G config for this
environment or not?
Thanks.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-02 21:02 HIGHMEM4G config for 1GB RAM on desktop? Steve Snyder @ 2004-08-02 21:32 ` Bart Alewijnse 2004-08-02 22:05 ` Barry K. Nathan [not found] ` <1094030083l.3189l.2l@traveler> 2 siblings, 0 replies; 29+ messages in thread From: Bart Alewijnse @ 2004-08-02 21:32 UTC (permalink / raw) To: Linux Kernel Mailing List Last time I checked (which is already a somewhat older kernel) without the 4G option you can only only address nine hundred something out of of your thousand and twenty four megs. And I believe the highmem thing creates some overhead. So if you can live without the ~100mb, keep it off. If you want it, turn it on. Other people will probably tell you te exact effects of highmem. Probably in hard to decipher english too:) I kept it off for ages, myself, basically because I rarely use all my 1GB. I think it's on now. --Bart ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-02 21:02 HIGHMEM4G config for 1GB RAM on desktop? Steve Snyder 2004-08-02 21:32 ` Bart Alewijnse @ 2004-08-02 22:05 ` Barry K. Nathan 2004-08-03 13:30 ` Jens Axboe [not found] ` <1094030083l.3189l.2l@traveler> 2 siblings, 1 reply; 29+ messages in thread From: Barry K. Nathan @ 2004-08-02 22:05 UTC (permalink / raw) To: Steve Snyder; +Cc: Linux Kernel Mailing List On Mon, Aug 02, 2004 at 04:02:34PM -0500, Steve Snyder wrote: > There seems to be a controversy about the use of the CONFIG_HIGHMEM4G > kernel configuration. After reading many posts on the subject, I still > don't know which setting is best for me. On my own desktop system with 1GB RAM, any highmem slowdown seems to be outweighed by the fact that more disk data stays cached in RAM (so I hit the disk much less often). Everyone else I know has also found the extra RAM to greatly outweigh the highmem slowdown, although those people are running clusters & servers, not desktops, with this much RAM. -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-02 22:05 ` Barry K. Nathan @ 2004-08-03 13:30 ` Jens Axboe 2004-08-03 14:13 ` Prakash K. Cheemplavam 2004-08-03 14:29 ` Con Kolivas 0 siblings, 2 replies; 29+ messages in thread From: Jens Axboe @ 2004-08-03 13:30 UTC (permalink / raw) To: Barry K. Nathan; +Cc: Steve Snyder, Linux Kernel Mailing List On Mon, Aug 02 2004, Barry K. Nathan wrote: > On Mon, Aug 02, 2004 at 04:02:34PM -0500, Steve Snyder wrote: > > There seems to be a controversy about the use of the CONFIG_HIGHMEM4G > > kernel configuration. After reading many posts on the subject, I still > > don't know which setting is best for me. > > On my own desktop system with 1GB RAM, any highmem slowdown seems to be > outweighed by the fact that more disk data stays cached in RAM (so I hit > the disk much less often). There's also the option of moving the mapping only slightly, so that all of the 1G fits in low memory. That's the best option for 1G desktop machines, imho. Changing PAGE_OFFSET from 0xc0000000 to 0xb0000000 would probably be enough. Then you can have your cake and eat it too. -- Jens Axboe ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-03 13:30 ` Jens Axboe @ 2004-08-03 14:13 ` Prakash K. Cheemplavam 2004-08-03 14:29 ` Con Kolivas 1 sibling, 0 replies; 29+ messages in thread From: Prakash K. Cheemplavam @ 2004-08-03 14:13 UTC (permalink / raw) To: Jens Axboe; +Cc: Barry K. Nathan, Steve Snyder, Linux Kernel Mailing List -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jens Axboe wrote: | On Mon, Aug 02 2004, Barry K. Nathan wrote: | |>On Mon, Aug 02, 2004 at 04:02:34PM -0500, Steve Snyder wrote: |> |>>There seems to be a controversy about the use of the CONFIG_HIGHMEM4G |>>kernel configuration. After reading many posts on the subject, I still |>>don't know which setting is best for me. |> |>On my own desktop system with 1GB RAM, any highmem slowdown seems to be |>outweighed by the fact that more disk data stays cached in RAM (so I hit |>the disk much less often). | | | There's also the option of moving the mapping only slightly, so that all | of the 1G fits in low memory. That's the best option for 1G desktop | machines, imho. Changing PAGE_OFFSET from 0xc0000000 to 0xb0000000 would | probably be enough. | | Then you can have your cake and eat it too. This works nicely for me. I wonder why this doesn't become standard behaviour in kernel. At least a lot of people would be happy about it. Prakash -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBD52UxU2n/+9+t5gRArjwAKDhLKcV2C42O++Eqd7yFOQoURtoxgCgyt/m 5dDyjJtoF7WDzelrGzk7MGM= =fdFY -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-03 13:30 ` Jens Axboe 2004-08-03 14:13 ` Prakash K. Cheemplavam @ 2004-08-03 14:29 ` Con Kolivas 2004-08-04 6:06 ` Jens Axboe 1 sibling, 1 reply; 29+ messages in thread From: Con Kolivas @ 2004-08-03 14:29 UTC (permalink / raw) To: Jens Axboe; +Cc: Barry K. Nathan, Steve Snyder, Linux Kernel Mailing List [-- Attachment #1.1: Type: text/plain, Size: 838 bytes --] Jens Axboe wrote: > On Mon, Aug 02 2004, Barry K. Nathan wrote: > >>On Mon, Aug 02, 2004 at 04:02:34PM -0500, Steve Snyder wrote: >> >>>There seems to be a controversy about the use of the CONFIG_HIGHMEM4G >>>kernel configuration. After reading many posts on the subject, I still >>>don't know which setting is best for me. No idea what the performance hit is of highmem these days - it seems insignificant compared to 2.4 so I've had it enabled for 1Gb ram. > There's also the option of moving the mapping only slightly, so that all > of the 1G fits in low memory. That's the best option for 1G desktop > machines, imho. Changing PAGE_OFFSET from 0xc0000000 to 0xb0000000 would > probably be enough. > > Then you can have your cake and eat it too. Something like this attached patch? Seems to work nicely. Thanks! Cheers, Con [-- Attachment #1.2: 1g_lowmem_i386.diff --] [-- Type: text/x-patch, Size: 1038 bytes --] Index: linux-2.6.8-rc2-mm2/arch/i386/kernel/vmlinux.lds.S =================================================================== --- linux-2.6.8-rc2-mm2.orig/arch/i386/kernel/vmlinux.lds.S 2004-05-23 12:54:46.000000000 +1000 +++ linux-2.6.8-rc2-mm2/arch/i386/kernel/vmlinux.lds.S 2004-08-04 00:20:02.219462913 +1000 @@ -11,7 +11,7 @@ jiffies = jiffies_64; SECTIONS { - . = 0xC0000000 + 0x100000; + . = 0xB0000000 + 0x100000; /* read-only */ _text = .; /* Text and read-only data */ .text : { Index: linux-2.6.8-rc2-mm2/include/asm-i386/page.h =================================================================== --- linux-2.6.8-rc2-mm2.orig/include/asm-i386/page.h 2004-08-03 01:29:28.000000000 +1000 +++ linux-2.6.8-rc2-mm2/include/asm-i386/page.h 2004-08-03 23:58:16.000000000 +1000 @@ -123,9 +123,9 @@ #endif /* __ASSEMBLY__ */ #ifdef __ASSEMBLY__ -#define __PAGE_OFFSET (0xC0000000) +#define __PAGE_OFFSET (0xB0000000) #else -#define __PAGE_OFFSET (0xC0000000UL) +#define __PAGE_OFFSET (0xB0000000UL) #endif [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 256 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-03 14:29 ` Con Kolivas @ 2004-08-04 6:06 ` Jens Axboe 2004-08-04 11:14 ` Eric Bambach 0 siblings, 1 reply; 29+ messages in thread From: Jens Axboe @ 2004-08-04 6:06 UTC (permalink / raw) To: Con Kolivas; +Cc: Barry K. Nathan, Steve Snyder, Linux Kernel Mailing List On Wed, Aug 04 2004, Con Kolivas wrote: > Jens Axboe wrote: > >On Mon, Aug 02 2004, Barry K. Nathan wrote: > > > >>On Mon, Aug 02, 2004 at 04:02:34PM -0500, Steve Snyder wrote: > >> > >>>There seems to be a controversy about the use of the CONFIG_HIGHMEM4G > >>>kernel configuration. After reading many posts on the subject, I still > >>>don't know which setting is best for me. > > No idea what the performance hit is of highmem these days - it seems > insignificant compared to 2.4 so I've had it enabled for 1Gb ram. > > >There's also the option of moving the mapping only slightly, so that all > >of the 1G fits in low memory. That's the best option for 1G desktop > >machines, imho. Changing PAGE_OFFSET from 0xc0000000 to 0xb0000000 would > >probably be enough. > > > >Then you can have your cake and eat it too. > > Something like this attached patch? Seems to work nicely. Thanks! > > Cheers, > Con > Index: linux-2.6.8-rc2-mm2/arch/i386/kernel/vmlinux.lds.S > =================================================================== > --- linux-2.6.8-rc2-mm2.orig/arch/i386/kernel/vmlinux.lds.S 2004-05-23 12:54:46.000000000 +1000 > +++ linux-2.6.8-rc2-mm2/arch/i386/kernel/vmlinux.lds.S 2004-08-04 00:20:02.219462913 +1000 > @@ -11,7 +11,7 @@ > jiffies = jiffies_64; > SECTIONS > { > - . = 0xC0000000 + 0x100000; > + . = 0xB0000000 + 0x100000; > /* read-only */ > _text = .; /* Text and read-only data */ > .text : { > Index: linux-2.6.8-rc2-mm2/include/asm-i386/page.h > =================================================================== > --- linux-2.6.8-rc2-mm2.orig/include/asm-i386/page.h 2004-08-03 01:29:28.000000000 +1000 > +++ linux-2.6.8-rc2-mm2/include/asm-i386/page.h 2004-08-03 23:58:16.000000000 +1000 > @@ -123,9 +123,9 @@ > #endif /* __ASSEMBLY__ */ > > #ifdef __ASSEMBLY__ > -#define __PAGE_OFFSET (0xC0000000) > +#define __PAGE_OFFSET (0xB0000000) > #else > -#define __PAGE_OFFSET (0xC0000000UL) > +#define __PAGE_OFFSET (0xB0000000UL) > #endif Yup precisely. I agree that there probably isn't a whole lot of performance hit on a 1GB, it just seems silly that we need highmem on such a standard memory configuration these days. Especially when just moving the offset slightly removes that need. -- Jens Axboe ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-04 6:06 ` Jens Axboe @ 2004-08-04 11:14 ` Eric Bambach 2004-08-04 13:07 ` Jens Axboe 0 siblings, 1 reply; 29+ messages in thread From: Eric Bambach @ 2004-08-04 11:14 UTC (permalink / raw) To: Jens Axboe Cc: Con Kolivas, Barry K. Nathan, Steve Snyder, Linux Kernel Mailing List On Wednesday 04 August 2004 01:06 am, Jens Axboe wrote: > On Wed, Aug 04 2004, Con Kolivas wrote: > > Jens Axboe wrote: > > >On Mon, Aug 02 2004, Barry K. Nathan wrote: > > >>On Mon, Aug 02, 2004 at 04:02:34PM -0500, Steve Snyder wrote: > > >>>There seems to be a controversy about the use of the CONFIG_HIGHMEM4G > > >>>kernel configuration. After reading many posts on the subject, I > > >>> still don't know which setting is best for me. > > > > No idea what the performance hit is of highmem these days - it seems > > insignificant compared to 2.4 so I've had it enabled for 1Gb ram. > > > > >There's also the option of moving the mapping only slightly, so that all > > >of the 1G fits in low memory. That's the best option for 1G desktop > > >machines, imho. Changing PAGE_OFFSET from 0xc0000000 to 0xb0000000 would > > >probably be enough. > > > > > >Then you can have your cake and eat it too. > > > > Something like this attached patch? Seems to work nicely. Thanks! > > > > Cheers, > > Con > > > > Index: linux-2.6.8-rc2-mm2/arch/i386/kernel/vmlinux.lds.S > > =================================================================== > > --- linux-2.6.8-rc2-mm2.orig/arch/i386/kernel/vmlinux.lds.S 2004-05-23 > > 12:54:46.000000000 +1000 +++ > > linux-2.6.8-rc2-mm2/arch/i386/kernel/vmlinux.lds.S 2004-08-04 > > 00:20:02.219462913 +1000 @@ -11,7 +11,7 @@ > > jiffies = jiffies_64; > > SECTIONS > > { > > - . = 0xC0000000 + 0x100000; > > + . = 0xB0000000 + 0x100000; > > /* read-only */ > > _text = .; /* Text and read-only data */ > > .text : { > > Index: linux-2.6.8-rc2-mm2/include/asm-i386/page.h > > =================================================================== > > --- linux-2.6.8-rc2-mm2.orig/include/asm-i386/page.h 2004-08-03 > > 01:29:28.000000000 +1000 +++ > > linux-2.6.8-rc2-mm2/include/asm-i386/page.h 2004-08-03 23:58:16.000000000 > > +1000 @@ -123,9 +123,9 @@ > > #endif /* __ASSEMBLY__ */ > > > > #ifdef __ASSEMBLY__ > > -#define __PAGE_OFFSET (0xC0000000) > > +#define __PAGE_OFFSET (0xB0000000) > > #else > > -#define __PAGE_OFFSET (0xC0000000UL) > > +#define __PAGE_OFFSET (0xB0000000UL) > > #endif > > Yup precisely. I agree that there probably isn't a whole lot of > performance hit on a 1GB, it just seems silly that we need highmem on > such a standard memory configuration these days. Especially when just > moving the offset slightly removes that need. As a desktop user with 1024MB ram I agree that HIMEM has a silly threshold and should not need to be enabled in this case. Its becoming common, especially with dual channel memory systems to use 2x512MB sticks. On a hunch I bet 2x512 is more common that 1x512 and 1x256 so why not merge this up? Who would we submit this patch to? -- -EB ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-04 11:14 ` Eric Bambach @ 2004-08-04 13:07 ` Jens Axboe 2004-08-04 19:06 ` Andrew Morton 0 siblings, 1 reply; 29+ messages in thread From: Jens Axboe @ 2004-08-04 13:07 UTC (permalink / raw) To: Eric Bambach Cc: Con Kolivas, Barry K. Nathan, Steve Snyder, Linux Kernel Mailing List, Andrew Morton On Wed, Aug 04 2004, Eric Bambach wrote: > On Wednesday 04 August 2004 01:06 am, Jens Axboe wrote: > > On Wed, Aug 04 2004, Con Kolivas wrote: > > > Jens Axboe wrote: > > > >On Mon, Aug 02 2004, Barry K. Nathan wrote: > > > >>On Mon, Aug 02, 2004 at 04:02:34PM -0500, Steve Snyder wrote: > > > >>>There seems to be a controversy about the use of the CONFIG_HIGHMEM4G > > > >>>kernel configuration. After reading many posts on the subject, I > > > >>> still don't know which setting is best for me. > > > > > > No idea what the performance hit is of highmem these days - it seems > > > insignificant compared to 2.4 so I've had it enabled for 1Gb ram. > > > > > > >There's also the option of moving the mapping only slightly, so that all > > > >of the 1G fits in low memory. That's the best option for 1G desktop > > > >machines, imho. Changing PAGE_OFFSET from 0xc0000000 to 0xb0000000 would > > > >probably be enough. > > > > > > > >Then you can have your cake and eat it too. > > > > > > Something like this attached patch? Seems to work nicely. Thanks! > > > > > > Cheers, > > > Con > > > > > > Index: linux-2.6.8-rc2-mm2/arch/i386/kernel/vmlinux.lds.S > > > =================================================================== > > > --- linux-2.6.8-rc2-mm2.orig/arch/i386/kernel/vmlinux.lds.S 2004-05-23 > > > 12:54:46.000000000 +1000 +++ > > > linux-2.6.8-rc2-mm2/arch/i386/kernel/vmlinux.lds.S 2004-08-04 > > > 00:20:02.219462913 +1000 @@ -11,7 +11,7 @@ > > > jiffies = jiffies_64; > > > SECTIONS > > > { > > > - . = 0xC0000000 + 0x100000; > > > + . = 0xB0000000 + 0x100000; > > > /* read-only */ > > > _text = .; /* Text and read-only data */ > > > .text : { > > > Index: linux-2.6.8-rc2-mm2/include/asm-i386/page.h > > > =================================================================== > > > --- linux-2.6.8-rc2-mm2.orig/include/asm-i386/page.h 2004-08-03 > > > 01:29:28.000000000 +1000 +++ > > > linux-2.6.8-rc2-mm2/include/asm-i386/page.h 2004-08-03 23:58:16.000000000 > > > +1000 @@ -123,9 +123,9 @@ > > > #endif /* __ASSEMBLY__ */ > > > > > > #ifdef __ASSEMBLY__ > > > -#define __PAGE_OFFSET (0xC0000000) > > > +#define __PAGE_OFFSET (0xB0000000) > > > #else > > > -#define __PAGE_OFFSET (0xC0000000UL) > > > +#define __PAGE_OFFSET (0xB0000000UL) > > > #endif > > > > Yup precisely. I agree that there probably isn't a whole lot of > > performance hit on a 1GB, it just seems silly that we need highmem on > > such a standard memory configuration these days. Especially when just > > moving the offset slightly removes that need. > > As a desktop user with 1024MB ram I agree that HIMEM has a silly threshold and > should not need to be enabled in this case. Its becoming common, especially > with dual channel memory systems to use 2x512MB sticks. On a hunch I bet > 2x512 is more common that 1x512 and 1x256 so why not merge this up? Who would > we submit this patch to? One way would be to ask Andrew what he thinks? -- Jens Axboe ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-04 13:07 ` Jens Axboe @ 2004-08-04 19:06 ` Andrew Morton 2004-08-04 19:21 ` Marc-Christian Petersen ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Andrew Morton @ 2004-08-04 19:06 UTC (permalink / raw) To: Jens Axboe; +Cc: eric, kernel, barryn, swsnyder, linux-kernel Jens Axboe <axboe@suse.de> wrote: > > > > > -#define __PAGE_OFFSET (0xC0000000) > > > > +#define __PAGE_OFFSET (0xB0000000) > > > > #else > > > > -#define __PAGE_OFFSET (0xC0000000UL) > > > > +#define __PAGE_OFFSET (0xB0000000UL) > > > > #endif > > > > > > Yup precisely. I agree that there probably isn't a whole lot of > > > performance hit on a 1GB, it just seems silly that we need highmem on > > > such a standard memory configuration these days. Especially when just > > > moving the offset slightly removes that need. > > > > As a desktop user with 1024MB ram I agree that HIMEM has a silly threshold and > > should not need to be enabled in this case. Its becoming common, especially > > with dual channel memory systems to use 2x512MB sticks. On a hunch I bet > > 2x512 is more common that 1x512 and 1x256 so why not merge this up? Who would > > we submit this patch to? > > One way would be to ask Andrew what he thinks? The 896M/128M split has a bit of a problem now each zone has its own LRU: the size of the highmem zone is less than the amount of memory which is described by the default /proc/sys/vm/dirty_ratio. So it is easy to completely fill highmem with dirty pages. This causes a fairly large amount of writeback via vmscan.c's writepage(). This causes poor I/O submission patterns. This causes a simple large, linear `dd' write to run at only 50-70% of disk bandwidth. (This was 6-12 months ago - it might be a bit better now) But I seem to be the only person who has noticed this yet ;) A workaround is to decrease dirty_ratio and dirty_background_ratio. Decreasing PAGE_OFFSET as above is attractive, but I believe 0xc0000000 is part of the ABI, and although we know (from the 4g/4g and other such patches) that everything will work OK, I wonder if it's really worth doing, especially as it's a compile-time thing. But hey, if someone can identify specific benefits from it then perhaps sneaking in a config option, or maintaining an external patch would be worthwhile. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-04 19:06 ` Andrew Morton @ 2004-08-04 19:21 ` Marc-Christian Petersen 2004-08-04 19:30 ` Martin J. Bligh 2004-08-12 0:53 ` Timothy Miller 2 siblings, 0 replies; 29+ messages in thread From: Marc-Christian Petersen @ 2004-08-04 19:21 UTC (permalink / raw) To: Andrew Morton; +Cc: Jens Axboe, eric, kernel, barryn, swsnyder, linux-kernel On Wednesday 04 August 2004 21:06, Andrew Morton wrote: Hi Andrew, > The 896M/128M split has a bit of a problem now each zone has its own LRU: > the size of the highmem zone is less than the amount of memory which is > described by the default /proc/sys/vm/dirty_ratio. So it is easy to > completely fill highmem with dirty pages. This causes a fairly large > amount of writeback via vmscan.c's writepage(). This causes poor I/O > submission patterns. This causes a simple large, linear `dd' write to run > at only 50-70% of disk bandwidth. (This was 6-12 months ago - it might be > a bit better now) > But I seem to be the only person who has noticed this yet ;) A workaround > is to decrease dirty_ratio and dirty_background_ratio. hmm, never tested to change the split with 2.6.x, but on 2.4 I didn't notice any disk i/o regressions. Maybe due to a different VM ;) > Decreasing PAGE_OFFSET as above is attractive, but I believe 0xc0000000 is > part of the ABI, and although we know (from the 4g/4g and other such > patches) that everything will work OK, I wonder if it's really worth doing, > especially as it's a compile-time thing. > But hey, if someone can identify specific benefits from it then perhaps > sneaking in a config option, or maintaining an external patch would be > worthwhile. Maybe we can introduce something like 3.5GB patch like 2.4-aa and 2.4-wolk has? For reference: http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.23aa3/00_3.5G-address-space-5 Let me know and I'll cook up a 2.6 version. Grmpf, that reminds me of my Documentation cleanup patches ;( ciao, Marc ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-04 19:06 ` Andrew Morton 2004-08-04 19:21 ` Marc-Christian Petersen @ 2004-08-04 19:30 ` Martin J. Bligh 2004-08-04 19:51 ` Andrew Morton 2004-08-04 20:09 ` Roland Dreier 2004-08-12 0:53 ` Timothy Miller 2 siblings, 2 replies; 29+ messages in thread From: Martin J. Bligh @ 2004-08-04 19:30 UTC (permalink / raw) To: Andrew Morton, Jens Axboe; +Cc: eric, kernel, barryn, swsnyder, linux-kernel > The 896M/128M split has a bit of a problem now each zone has its own LRU: > the size of the highmem zone is less than the amount of memory which is > described by the default /proc/sys/vm/dirty_ratio. So it is easy to > completely fill highmem with dirty pages. This causes a fairly large > amount of writeback via vmscan.c's writepage(). This causes poor I/O > submission patterns. This causes a simple large, linear `dd' write to run > at only 50-70% of disk bandwidth. (This was 6-12 months ago - it might be > a bit better now) > > But I seem to be the only person who has noticed this yet ;) A workaround > is to decrease dirty_ratio and dirty_background_ratio. > > Decreasing PAGE_OFFSET as above is attractive, but I believe 0xc0000000 is > part of the ABI, and although we know (from the 4g/4g and other such > patches) that everything will work OK, I wonder if it's really worth doing, > especially as it's a compile-time thing. > > But hey, if someone can identify specific benefits from it then perhaps > sneaking in a config option, or maintaining an external patch would be > worthwhile. I had a patch for a config option, ported forward by someone at IBM (I forget who, possibly Dave) from Andrea's original. I think we finally decided (in consultation with Andrea) we could drop the complicated stuff he had in the asm code, so it's pretty simple ... something like this: diff -purN -X /home/mbligh/.diff.exclude 200-config_hz/arch/i386/Kconfig 210-config_page_offset/arch/i386/Kconfig --- 200-config_hz/arch/i386/Kconfig 2004-03-14 09:48:36.000000000 -0800 +++ 210-config_page_offset/arch/i386/Kconfig 2004-03-14 09:49:04.000000000 -0800 @@ -763,6 +763,44 @@ config HIGHMEM64G endchoice +choice + help + On i386, a process can only virtually address 4GB of memory. This + lets you select how much of that virtual space you would like to + devoted to userspace, and how much to the kernel. + + Some userspace programs would like to address as much as possible and + have few demands of the kernel other than it get out of the way. These + users may opt to use the 3.5GB option to give their userspace program + as much room as possible. Due to alignment issues imposed by PAE, + the "3.5GB" option is unavailable if "64GB" high memory support is + enabled. + + Other users (especially those who use PAE) may be running out of + ZONE_NORMAL memory. Those users may benefit from increasing the + kernel's virtual address space size by taking it away from userspace, + which may not need all of its space. An indicator that this is + happening is when /proc/Meminfo's "LowFree:" is a small percentage of + "LowTotal:" while "HighFree:" is very large. + + If unsure, say "3GB" + prompt "User address space size" + default 1GB + +config 05GB + bool "3.5 GB" + depends on !HIGHMEM64G + +config 1GB + bool "3 GB" + +config 2GB + bool "2 GB" + +config 3GB + bool "1 GB" +endchoice + config HIGHMEM bool depends on HIGHMEM64G || HIGHMEM4G diff -purN -X /home/mbligh/.diff.exclude 200-config_hz/arch/i386/Makefile 210-config_page_offset/arch/i386/Makefile --- 200-config_hz/arch/i386/Makefile 2004-03-12 11:06:23.000000000 -0800 +++ 210-config_page_offset/arch/i386/Makefile 2004-03-14 09:49:04.000000000 -0800 @@ -114,6 +114,7 @@ drivers-$(CONFIG_PM) += arch/i386/powe CFLAGS += $(mflags-y) AFLAGS += $(mflags-y) +AFLAGS_vmlinux.lds.o += -include $(TOPDIR)/include/asm-i386/page.h boot := arch/i386/boot diff -purN -X /home/mbligh/.diff.exclude 200-config_hz/include/asm-i386/page.h 210-config_page_offset/include/asm-i386/page.h --- 200-config_hz/include/asm-i386/page.h 2004-03-12 11:07:27.000000000 -0800 +++ 210-config_page_offset/include/asm-i386/page.h 2004-03-14 09:49:04.000000000 -0800 @@ -97,9 +97,20 @@ typedef struct { unsigned long pgprot; } #ifdef CONFIG_X86_4G_VM_LAYOUT #define __PAGE_OFFSET (0x02000000) #define TASK_SIZE (0xff000000) -#else +#elif defined(CONFIG_05GB) +#define __PAGE_OFFSET (0xe0000000) +#define TASK_SIZE (0xe0000000) +#elif defined(CONFIG_1GB) #define __PAGE_OFFSET (0xc0000000) #define TASK_SIZE (0xc0000000) +#elif defined(CONFIG_2GB) +#define __PAGE_OFFSET (0x80000000) +#define TASK_SIZE (0x80000000) +#elif defined(CONFIG_3GB) +#define __PAGE_OFFSET (0x40000000) +#define TASK_SIZE (0x40000000) +#else +#error I have no idea what VM layout to use #endif /* diff -purN -X /home/mbligh/.diff.exclude 200-config_hz/include/asm-i386/processor.h 210-config_page_offset/include/asm-i386/processor.h --- 200-config_hz/include/asm-i386/processor.h 2004-03-12 11:07:47.000000000 -0800 +++ 210-config_page_offset/include/asm-i386/processor.h 2004-03-14 09:49:04.000000000 -0800 @@ -294,7 +294,11 @@ extern unsigned int mca_pentium_flag; /* This decides where the kernel will search for a free chunk of vm * space during mmap's. */ +#ifdef CONFIG_05GB +#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 16)) +#else #define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3)) +#endif /* * Size of io_bitmap, covering ports 0 to 0x3ff. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-04 19:30 ` Martin J. Bligh @ 2004-08-04 19:51 ` Andrew Morton 2004-08-04 20:09 ` Martin J. Bligh 2004-08-04 20:09 ` Roland Dreier 1 sibling, 1 reply; 29+ messages in thread From: Andrew Morton @ 2004-08-04 19:51 UTC (permalink / raw) To: Martin J. Bligh; +Cc: axboe, eric, kernel, barryn, swsnyder, linux-kernel "Martin J. Bligh" <mbligh@aracnet.com> wrote: > > I had a patch for a config option, ported forward by someone at IBM (I forget > who, possibly Dave) from Andrea's original. I think we finally decided > (in consultation with Andrea) we could drop the complicated stuff he had > in the asm code, so it's pretty simple ... something like this: I sent such a patch to the boss many moons ago and he said "go away, this is a vendor-only thing". ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-04 19:51 ` Andrew Morton @ 2004-08-04 20:09 ` Martin J. Bligh 0 siblings, 0 replies; 29+ messages in thread From: Martin J. Bligh @ 2004-08-04 20:09 UTC (permalink / raw) To: Andrew Morton; +Cc: axboe, eric, kernel, barryn, swsnyder, linux-kernel --On Wednesday, August 04, 2004 12:51:29 -0700 Andrew Morton <akpm@osdl.org> wrote: > "Martin J. Bligh" <mbligh@aracnet.com> wrote: >> >> I had a patch for a config option, ported forward by someone at IBM (I forget >> who, possibly Dave) from Andrea's original. I think we finally decided >> (in consultation with Andrea) we could drop the complicated stuff he had >> in the asm code, so it's pretty simple ... something like this: > > I sent such a patch to the boss many moons ago and he said "go away, this > is a vendor-only thing". Yeah, I know. But then he hates 4/4 split even more, and going to 2/2 is a trivial way to solve the 64GB machines ;-) Maybe I should embed it in an ioctl patch for a mouse driver? /me runs like hell. M. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-04 19:30 ` Martin J. Bligh 2004-08-04 19:51 ` Andrew Morton @ 2004-08-04 20:09 ` Roland Dreier 2004-08-04 20:13 ` Martin J. Bligh 1 sibling, 1 reply; 29+ messages in thread From: Roland Dreier @ 2004-08-04 20:09 UTC (permalink / raw) To: Martin J. Bligh Cc: Andrew Morton, Jens Axboe, eric, kernel, barryn, swsnyder, linux-kernel Martin> I had a patch for a config option, ported forward by Martin> someone at IBM (I forget who, possibly Dave) from Andrea's Martin> original. I think we finally decided (in consultation with Martin> Andrea) we could drop the complicated stuff he had in the Martin> asm code, so it's pretty simple ... something like this: Am I just being dense, or is this patch solving a different problem from "do I really have to turn on HIGHMEM4G just to get the last 128MB of my 1GB of RAM?" It seems to me that none of the PAGE_OFFSET values offered (the patch in allows PAGE_OFFSET to be set to 0xe0000000, 0xc0000000, 0x80000000 or 0x40000000, in addition to the 4G/4G value of 0x02000000) are exactly what someone with 1 GB of RAM wants. They'd be forced to go down to a 2G/2G split which cheats userspace of quite a bit of address space (admittedly with only 1 GB of RAM, a process bigger than 2 GB is a bit of a stretch). In any case a 2.75G/1.25G split as suggested earlier works fine with 1 GB of RAM. Also I notice that Con's patch modifies vmlinux.ld.S to update the kernel base address, while this patch doesn't. Is that intentional or is does the patch depend on some other patches that use the defines in page.h somehow as controlled by the following change? +AFLAGS_vmlinux.lds.o += -include $(TOPDIR)/include/asm-i386/page.h Thanks, Roland ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-04 20:09 ` Roland Dreier @ 2004-08-04 20:13 ` Martin J. Bligh 0 siblings, 0 replies; 29+ messages in thread From: Martin J. Bligh @ 2004-08-04 20:13 UTC (permalink / raw) To: Roland Dreier Cc: Andrew Morton, Jens Axboe, eric, kernel, barryn, swsnyder, linux-kernel > Martin> I had a patch for a config option, ported forward by > Martin> someone at IBM (I forget who, possibly Dave) from Andrea's > Martin> original. I think we finally decided (in consultation with > Martin> Andrea) we could drop the complicated stuff he had in the > Martin> asm code, so it's pretty simple ... something like this: > > Am I just being dense, or is this patch solving a different problem > from "do I really have to turn on HIGHMEM4G just to get the last 128MB > of my 1GB of RAM?" > > It seems to me that none of the PAGE_OFFSET values offered (the patch > in allows PAGE_OFFSET to be set to 0xe0000000, 0xc0000000, 0x80000000 > or 0x40000000, in addition to the 4G/4G value of 0x02000000) are > exactly what someone with 1 GB of RAM wants. They'd be forced to go > down to a 2G/2G split which cheats userspace of quite a bit of address > space (admittedly with only 1 GB of RAM, a process bigger than 2 GB is > a bit of a stretch). In any case a 2.75G/1.25G split as suggested > earlier works fine with 1 GB of RAM. In practice, I suspect 2/2 will do exactly what you want ... and what 99.9% of people want actually ;-) We could add more options, but be sure to mark anything that's not 1GB aligned as not suitable for PAE (as the 0.5 split was). > Also I notice that Con's patch modifies vmlinux.ld.S to update the > kernel base address, while this patch doesn't. Is that intentional or > is does the patch depend on some other patches that use the defines in > page.h somehow as controlled by the following change? > > +AFLAGS_vmlinux.lds.o += -include $(TOPDIR)/include/asm-i386/page.h Dunno ... we should test it really ... it's been a while. IIRC, vmlinux.ld.S was hardcoded, so I don't see how it'd work without those mods. M. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-04 19:06 ` Andrew Morton 2004-08-04 19:21 ` Marc-Christian Petersen 2004-08-04 19:30 ` Martin J. Bligh @ 2004-08-12 0:53 ` Timothy Miller 2004-08-30 18:06 ` Timothy Miller 2 siblings, 1 reply; 29+ messages in thread From: Timothy Miller @ 2004-08-12 0:53 UTC (permalink / raw) To: Andrew Morton; +Cc: Jens Axboe, eric, kernel, barryn, swsnyder, linux-kernel Andrew Morton wrote: > > > The 896M/128M split has a bit of a problem now each zone has its own LRU: > the size of the highmem zone is less than the amount of memory which is > described by the default /proc/sys/vm/dirty_ratio. So it is easy to > completely fill highmem with dirty pages. This causes a fairly large > amount of writeback via vmscan.c's writepage(). This causes poor I/O > submission patterns. This causes a simple large, linear `dd' write to run > at only 50-70% of disk bandwidth. (This was 6-12 months ago - it might be > a bit better now) > Hey, that rings a bell. I have a 3ware 7000-2 controller with two WD1200JB drives in RAID1. I find that if I dd from the disk, I get exactly the read throughput that is the max for the drives (47MB/sec). However, if I do a WRITE test, the performance is miserable. I have been going back and forth with 3ware for months, and what's odd is that my drives with my controller in any machine other than the primary box get great write throughput, BUT on my main box with 1G of RAM, I get MISERABLE write throughput. When I should be getting 36MB/sec or faster, I get 8 to 12 MB/sec. Now, I have tried limiting the memory with a mem= boot option, but that doesn't change the performance any. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-12 0:53 ` Timothy Miller @ 2004-08-30 18:06 ` Timothy Miller 2004-08-30 17:49 ` Miquel van Smoorenburg 0 siblings, 1 reply; 29+ messages in thread From: Timothy Miller @ 2004-08-30 18:06 UTC (permalink / raw) To: Timothy Miller Cc: Andrew Morton, Jens Axboe, eric, kernel, barryn, swsnyder, linux-kernel Timothy Miller wrote: > > > Andrew Morton wrote: > >> >> >> The 896M/128M split has a bit of a problem now each zone has its own LRU: >> the size of the highmem zone is less than the amount of memory which is >> described by the default /proc/sys/vm/dirty_ratio. So it is easy to >> completely fill highmem with dirty pages. This causes a fairly large >> amount of writeback via vmscan.c's writepage(). This causes poor I/O >> submission patterns. This causes a simple large, linear `dd' write to >> run >> at only 50-70% of disk bandwidth. (This was 6-12 months ago - it >> might be >> a bit better now) >> > > > Hey, that rings a bell. I have a 3ware 7000-2 controller with two > WD1200JB drives in RAID1. I find that if I dd from the disk, I get > exactly the read throughput that is the max for the drives (47MB/sec). > However, if I do a WRITE test, the performance is miserable. > > I have been going back and forth with 3ware for months, and what's odd > is that my drives with my controller in any machine other than the > primary box get great write throughput, BUT on my main box with 1G of > RAM, I get MISERABLE write throughput. When I should be getting > 36MB/sec or faster, I get 8 to 12 MB/sec. > > Now, I have tried limiting the memory with a mem= boot option, but that > doesn't change the performance any. > Scratch all this. Even if I physically remove half the memory, I STILL get the performance problem. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-30 18:06 ` Timothy Miller @ 2004-08-30 17:49 ` Miquel van Smoorenburg 2004-08-31 22:46 ` Timothy Miller 0 siblings, 1 reply; 29+ messages in thread From: Miquel van Smoorenburg @ 2004-08-30 17:49 UTC (permalink / raw) To: linux-kernel In article <41336CB1.6030105@techsource.com>, Timothy Miller <miller@techsource.com> wrote: >Timothy Miller wrote: >> Hey, that rings a bell. I have a 3ware 7000-2 controller with two >> WD1200JB drives in RAID1. I find that if I dd from the disk, I get >> exactly the read throughput that is the max for the drives (47MB/sec). >> However, if I do a WRITE test, the performance is miserable. >> >> I have been going back and forth with 3ware for months, and what's odd >> is that my drives with my controller in any machine other than the >> primary box get great write throughput, BUT on my main box with 1G of >> RAM, I get MISERABLE write throughput. When I should be getting >> 36MB/sec or faster, I get 8 to 12 MB/sec. >> >> Now, I have tried limiting the memory with a mem= boot option, but that >> doesn't change the performance any. > >Scratch all this. Even if I physically remove half the memory, I STILL >get the performance problem. 3ware eh? Try setting /sys/block/sda/queue/nr_requests to twice the number in /sys/block/sda/device/queue_depth Mike. -- "In times of universal deceit, telling the truth becomes a revolutionary act." -- George Orwell. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-30 17:49 ` Miquel van Smoorenburg @ 2004-08-31 22:46 ` Timothy Miller 2004-09-01 7:52 ` Miquel van Smoorenburg 2004-09-01 9:38 ` Matt Heler 0 siblings, 2 replies; 29+ messages in thread From: Timothy Miller @ 2004-08-31 22:46 UTC (permalink / raw) To: Miquel van Smoorenburg; +Cc: linux-kernel Miquel van Smoorenburg wrote: > In article <41336CB1.6030105@techsource.com>, > Timothy Miller <miller@techsource.com> wrote: > >>Timothy Miller wrote: >> >>>Hey, that rings a bell. I have a 3ware 7000-2 controller with two >>>WD1200JB drives in RAID1. I find that if I dd from the disk, I get >>>exactly the read throughput that is the max for the drives (47MB/sec). >>>However, if I do a WRITE test, the performance is miserable. >>> >>>I have been going back and forth with 3ware for months, and what's odd >>>is that my drives with my controller in any machine other than the >>>primary box get great write throughput, BUT on my main box with 1G of >>>RAM, I get MISERABLE write throughput. When I should be getting >>>36MB/sec or faster, I get 8 to 12 MB/sec. >>> >>>Now, I have tried limiting the memory with a mem= boot option, but that >>>doesn't change the performance any. >> >>Scratch all this. Even if I physically remove half the memory, I STILL >>get the performance problem. > > > 3ware eh? > > Try setting /sys/block/sda/queue/nr_requests to twice the number > in /sys/block/sda/device/queue_depth This will improve write performance? And if this helps, how do I make it permanent? Thanks! ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-31 22:46 ` Timothy Miller @ 2004-09-01 7:52 ` Miquel van Smoorenburg 2004-09-01 9:38 ` Matt Heler 1 sibling, 0 replies; 29+ messages in thread From: Miquel van Smoorenburg @ 2004-09-01 7:52 UTC (permalink / raw) To: Timothy Miller; +Cc: linux-kernel On Wed, 01 Sep 2004 00:46:27, Timothy Miller wrote: > Miquel van Smoorenburg wrote: > > In article <41336CB1.6030105@techsource.com>, > > Timothy Miller <miller@techsource.com> wrote: > > > >>Timothy Miller wrote: > >> > >>>Hey, that rings a bell. I have a 3ware 7000-2 controller with two > >>>WD1200JB drives in RAID1. I find that if I dd from the disk, I get > >>>exactly the read throughput that is the max for the drives (47MB/sec). > >>>However, if I do a WRITE test, the performance is miserable. > > > > Try setting /sys/block/sda/queue/nr_requests to twice the number > > in /sys/block/sda/device/queue_depth > > This will improve write performance? You won't know before you try it ofcourse. It helps on my 85xx controllers. The problem is that the internal queue size of some 3ware controllers (queue_depth) is larger than the I/O schedulers nr_requests so that the I/O scheduler doesn't get much chance to properly order and merge the requests. I've sent patches to 3ware a couple of times to make queue_depth writable so that you can tune that as well, but they were refused for no good reason AFAICS. Very unfortunate - if you have 8 JBOD disks attached, you want to set queue_depth for each of them to (max_controller_queue_depth / 8) to prevent one disk from starving the other ones, but oh well. > And if this helps, how do I make > it permanent? Can't say, depends on your distribution. For recent Debian at least you can use /etc/sysctl.conf Mike. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: HIGHMEM4G config for 1GB RAM on desktop? 2004-08-31 22:46 ` Timothy Miller 2004-09-01 7:52 ` Miquel van Smoorenburg @ 2004-09-01 9:38 ` Matt Heler 1 sibling, 0 replies; 29+ messages in thread From: Matt Heler @ 2004-09-01 9:38 UTC (permalink / raw) To: Timothy Miller; +Cc: Miquel van Smoorenburg, linux-kernel [-- Attachment #1: Type: text/plain, Size: 89 bytes --] Tim, Try this patch , it seems to help with my 3ware 7000-2 controller card. matt h. [-- Attachment #2: 3ware-64-lun.patch --] [-- Type: text/x-diff, Size: 758 bytes --] diff -urpN linux-2.6.9-base/drivers/scsi/3w-xxxx.h linux-2.6.9/drivers/scsi/3w-xxxx.h --- linux-2.6.9-base/drivers/scsi/3w-xxxx.h 2004-08-28 22:03:22.000000000 -0700 +++ linux-2.6.9/drivers/scsi/3w-xxxx.h 2004-09-01 01:25:21.166428080 -0700 @@ -214,7 +214,7 @@ static unsigned char tw_sense_table[][4] #define TW_MAX_PCI_BUSES 255 #define TW_MAX_RESET_TRIES 3 #define TW_UNIT_INFORMATION_TABLE_BASE 0x300 -#define TW_MAX_CMDS_PER_LUN 254 /* 254 for io, 1 for +#define TW_MAX_CMDS_PER_LUN 64 /* 64 for io, 1 for chrdev ioctl, one for internal aen post */ #define TW_BLOCK_SIZE 0x200 /* 512-byte blocks */ ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <1094030083l.3189l.2l@traveler>]
[parent not found: <1094030194l.3189l.3l@traveler>]
[parent not found: <200409010233.31643.lkml@lpbproductions.com>]
* 3ware queue depth [was: Re: HIGHMEM4G config for 1GB RAM on desktop?] [not found] ` <200409010233.31643.lkml@lpbproductions.com> @ 2004-09-01 9:58 ` Miquel van Smoorenburg 2004-09-01 10:09 ` Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: Miquel van Smoorenburg @ 2004-09-01 9:58 UTC (permalink / raw) To: lkml; +Cc: Timothy Miller, linux-kernel On 2004.09.01 11:33, Matt Heler wrote: > > I have a 3ware 7000-2 card. And I noticed the same problem. > > Actually what I just did now was change the max luns from 254 to 64. > Recompiled and booted up. This seems to fix all my problems, and the speed > seems to be quite faster then before. Yes, that is because the queue_depth parameter gets set from TW_MAX_CMDS_PER_LUN by the 3w-xxxx.c driver ... I found the 3ware patch. The patch below makes the queue depth an optional module parameter, makes sure that the initial nr_requests is twice the size of the queue_depth, and makes queue_depth writable for the 3ware driver. Mike. --- linux-2.6.5-rc2/drivers/scsi/3w-xxxx.c 2004-03-11 03:55:44.000000000 +0100 +++ linux-2.6.5-rc2-dmcong-tw/drivers/scsi/3w-xxxx.c 2004-03-23 14:56:41.000000000 +0100 @@ -13,6 +13,12 @@ Further tiny build fixes and trivial hoovering Alan Cox + Parameters (and default): + + 3w-xxxx.queue_depth Queue depth per connected device (254) + 3w-xxxx.reverse_scan Set to "1" if you want the driver to detect + the 3ware cards in reverse order (0). + This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License. @@ -179,6 +185,10 @@ 1.02.00.036 - Increase character ioctl timeout to 60 seconds. 1.02.00.037 - Fix tw_ioctl() to handle all non-data ATA passthru cmds for 'smartmontools' support. + 1.02.00.XXX - Miquel van Smoorenburg - add command line parameters + to set queue_depth/reverse_scan, make queue_depth + sysfs parameter writable, adjust queue nr_requests. + */ #include <linux/module.h> @@ -205,6 +215,7 @@ #include <linux/reboot.h> #include <linux/spinlock.h> #include <linux/interrupt.h> +#include <linux/moduleparam.h> #include <asm/errno.h> #include <asm/io.h> @@ -246,6 +257,13 @@ TW_Device_Extension *tw_device_extension_list[TW_MAX_SLOT]; int tw_device_extension_count = 0; static int twe_major = -1; +static int reverse_scan; +static int queue_depth; + +module_param(reverse_scan, int, 0); +MODULE_PARM_DESC(reverse_scan, "Scan PCI bus in reverse for 3ware cards"); +module_param(queue_depth, int, 0); +MODULE_PARM_DESC(queue_depth, "Queue depth per device"); /* Functions */ @@ -1029,7 +1047,15 @@ dprintk(KERN_NOTICE "3w-xxxx: tw_findcards()\n"); for (i=0;i<TW_NUMDEVICES;i++) { - while ((tw_pci_dev = pci_find_device(TW_VENDOR_ID, device[i], tw_pci_dev))) { + while (1) { + if (reverse_scan) + tw_pci_dev = pci_find_device_reverse( + TW_VENDOR_ID, device[i], tw_pci_dev); + else + tw_pci_dev = pci_find_device( + TW_VENDOR_ID, device[i], tw_pci_dev); + if (!tw_pci_dev) + break; j++; if (pci_enable_device(tw_pci_dev)) continue; @@ -1141,14 +1167,6 @@ /* Set card status as online */ tw_dev->online = 1; -#ifdef CONFIG_3W_XXXX_CMD_PER_LUN - tw_host->cmd_per_lun = CONFIG_3W_XXXX_CMD_PER_LUN; - if (tw_host->cmd_per_lun > TW_MAX_CMDS_PER_LUN) - tw_host->cmd_per_lun = TW_MAX_CMDS_PER_LUN; -#else - /* Use SHT cmd_per_lun here */ - tw_host->cmd_per_lun = TW_MAX_CMDS_PER_LUN; -#endif tw_dev->free_head = TW_Q_START; tw_dev->free_tail = TW_Q_START; tw_dev->free_wrap = TW_Q_LENGTH - 1; @@ -3379,21 +3397,17 @@ return 0; } /* End tw_shutdown_device() */ -/* This function will configure individual target parameters */ +/* This function configures individual target parameters */ int tw_slave_configure(Scsi_Device *SDptr) { - int max_cmds; - - dprintk(KERN_WARNING "3w-xxxx: tw_slave_configure()\n"); - -#ifdef CONFIG_3W_XXXX_CMD_PER_LUN - max_cmds = CONFIG_3W_XXXX_CMD_PER_LUN; - if (max_cmds > TW_MAX_CMDS_PER_LUN) - max_cmds = TW_MAX_CMDS_PER_LUN; -#else - max_cmds = TW_MAX_CMDS_PER_LUN; -#endif - scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, max_cmds); + /* Set SCSI queue depth to kerne/module param, or default. */ + if (queue_depth < 1 || queue_depth > TW_MAX_CMDS_PER_LUN) + queue_depth = TW_MAX_CMDS_PER_LUN; + scsi_adjust_queue_depth(SDptr, 0, queue_depth); + + /* make sure blockdev queue depth is at least 2 * scsi depth */ + if (SDptr->request_queue->nr_requests < 2 * queue_depth) + SDptr->request_queue->nr_requests = 2 * queue_depth; return 0; } /* End tw_slave_configure() */ @@ -3478,6 +3492,34 @@ outl(control_reg_value, control_reg_addr); } /* End tw_unmask_command_interrupt() */ +static ssize_t +tw_store_queue_depth(struct device *dev, const char *buf, size_t count) +{ + int depth; + + struct scsi_device *SDp = to_scsi_device(dev); + if (sscanf(buf, "%d", &depth) != 1) + return -EINVAL; + if (depth < 1 || depth > TW_MAX_CMDS_PER_LUN) + return -EINVAL; + scsi_adjust_queue_depth(SDp, 0, depth); + + return count; +} + +static struct device_attribute tw_queue_depth_attr = { + .attr = { + .name = "queue_depth", + .mode = S_IWUSR, + }, + .store = tw_store_queue_depth, +}; + +static struct device_attribute *tw_dev_attrs[] = { + &tw_queue_depth_attr, + NULL, +}; + static Scsi_Host_Template driver_template = { .proc_name = "3w-xxxx", .proc_info = tw_scsi_proc_info, @@ -3488,12 +3530,14 @@ .eh_abort_handler = tw_scsi_eh_abort, .eh_host_reset_handler = tw_scsi_eh_reset, .bios_param = tw_scsi_biosparam, + .slave_configure = tw_slave_configure, .can_queue = TW_Q_LENGTH-2, .this_id = -1, .sg_tablesize = TW_MAX_SGL_LENGTH, .max_sectors = TW_MAX_SECTORS, .cmd_per_lun = TW_MAX_CMDS_PER_LUN, .use_clustering = ENABLE_CLUSTERING, + .sdev_attrs = tw_dev_attrs, .emulated = 1 }; #include "scsi_module.c" ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 3ware queue depth [was: Re: HIGHMEM4G config for 1GB RAM on desktop?] 2004-09-01 9:58 ` 3ware queue depth [was: Re: HIGHMEM4G config for 1GB RAM on desktop?] Miquel van Smoorenburg @ 2004-09-01 10:09 ` Christoph Hellwig 2004-09-01 11:08 ` Miquel van Smoorenburg 0 siblings, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2004-09-01 10:09 UTC (permalink / raw) To: Miquel van Smoorenburg; +Cc: lkml, Timothy Miller, linux-kernel, linux-scsi On Wed, Sep 01, 2004 at 09:58:55AM +0000, Miquel van Smoorenburg wrote: > On 2004.09.01 11:33, Matt Heler wrote: > > > > I have a 3ware 7000-2 card. And I noticed the same problem. > > > > Actually what I just did now was change the max luns from 254 to 64. > > Recompiled and booted up. This seems to fix all my problems, and the speed > > seems to be quite faster then before. > > Yes, that is because the queue_depth parameter gets set from > TW_MAX_CMDS_PER_LUN by the 3w-xxxx.c driver ... > > I found the 3ware patch. The patch below makes the queue depth > an optional module parameter, makes sure that the initial > nr_requests is twice the size of the queue_depth, and > makes queue_depth writable for the 3ware driver. - the writeable queue_depth sysfs attr is fine, - the reverse_scan option is vetoed because it can't be supported when the driver will be converted to the pci_driver interface (soon) - I'm not so sure about the module parameter, what's the problem of beeing able to only change the queue depth once sysfs is mounted? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 3ware queue depth [was: Re: HIGHMEM4G config for 1GB RAM on desktop?] 2004-09-01 10:09 ` Christoph Hellwig @ 2004-09-01 11:08 ` Miquel van Smoorenburg 2004-09-01 11:43 ` Christoph Hellwig 2004-09-01 19:43 ` Patrick Mansfield 0 siblings, 2 replies; 29+ messages in thread From: Miquel van Smoorenburg @ 2004-09-01 11:08 UTC (permalink / raw) To: Christoph Hellwig Cc: Miquel van Smoorenburg, lkml, Timothy Miller, linux-kernel, linux-scsi On 2004.09.01 12:09, Christoph Hellwig wrote: > On Wed, Sep 01, 2004 at 09:58:55AM +0000, Miquel van Smoorenburg wrote: > > On 2004.09.01 11:33, Matt Heler wrote: > > > > > > I have a 3ware 7000-2 card. And I noticed the same problem. > > > > > > Actually what I just did now was change the max luns from 254 to 64. > > > Recompiled and booted up. This seems to fix all my problems, and the speed > > > seems to be quite faster then before. > > > > Yes, that is because the queue_depth parameter gets set from > > TW_MAX_CMDS_PER_LUN by the 3w-xxxx.c driver ... > > > > I found the 3ware patch. The patch below makes the queue depth > > an optional module parameter, makes sure that the initial > > nr_requests is twice the size of the queue_depth, and > > makes queue_depth writable for the 3ware driver. > > - the writeable queue_depth sysfs attr is fine, > - the reverse_scan option is vetoed because it can't be supported when > the driver will be converted to the pci_driver interface (soon) Sure. That was more an experiment (the BIOS of the Tyan mobo I use detects PCI cards in the reverse order from the kernel ...) > - I'm not so sure about the module parameter, what's the problem of beeing > able to only change the queue depth once sysfs is mounted? Nothing much, I guess. Just ease of use, or "there's more than one way to do it". Hey wait, that tunable is already a module parameter in at least 2.6.9-rc1, only there it's called 'cmds_per_lun'. Ofcourse cmds_per_lun and queue_depth are the same. Anyway here's the minimal patch against 2.6.9-rc1 [PATCH] 3w-xxxx.c queue depth make 3w-xxxx.c queue_depth sysfs parameter writable, adjust initial queue nr_requests to 2*queue_depth Signed-off-by: Miquel van Smoorenburg <miquels@cistron.nl> --- linux-2.6.9-rc1/drivers/scsi/3w-xxxx.c.orig 2004-08-17 22:07:49.000000000 +0200 +++ linux-2.6.9-rc1/drivers/scsi/3w-xxxx.c 2004-09-01 13:07:32.000000000 +0200 @@ -184,6 +184,8 @@ 1.26.00.039 - Fix bug in tw_chrdev_ioctl() polling code. Fix data_buffer_length usage in tw_chrdev_ioctl(). Update contact information. + 1.02.00.XXX - Miquel van Smoorenburg - make queue_depth sysfs parameter + writable, adjust initial queue nr_requests to 2*queue_depth */ #include <linux/module.h> @@ -3388,8 +3390,6 @@ { int max_cmds; - dprintk(KERN_WARNING "3w-xxxx: tw_slave_configure()\n"); - if (cmds_per_lun) { max_cmds = cmds_per_lun; if (max_cmds > TW_MAX_CMDS_PER_LUN) @@ -3399,6 +3399,10 @@ } scsi_adjust_queue_depth(SDptr, MSG_ORDERED_TAG, max_cmds); + /* make sure blockdev queue depth is at least 2 * scsi depth */ + if (SDptr->request_queue->nr_requests < 2 * max_cmds) + SDptr->request_queue->nr_requests = 2 * max_cmds; + return 0; } /* End tw_slave_configure() */ @@ -3482,6 +3486,34 @@ outl(control_reg_value, control_reg_addr); } /* End tw_unmask_command_interrupt() */ +static ssize_t +tw_store_queue_depth(struct device *dev, const char *buf, size_t count) +{ + int depth; + + struct scsi_device *SDp = to_scsi_device(dev); + if (sscanf(buf, "%d", &depth) != 1) + return -EINVAL; + if (depth < 1 || depth > TW_MAX_CMDS_PER_LUN) + return -EINVAL; + scsi_adjust_queue_depth(SDp, MSG_ORDERED_TAG, depth); + + return count; +} + +static struct device_attribute tw_queue_depth_attr = { + .attr = { + .name = "queue_depth", + .mode = S_IWUSR, + }, + .store = tw_store_queue_depth, +}; + +static struct device_attribute *tw_dev_attrs[] = { + &tw_queue_depth_attr, + NULL, +}; + static Scsi_Host_Template driver_template = { .proc_name = "3w-xxxx", .proc_info = tw_scsi_proc_info, @@ -3499,6 +3531,7 @@ .max_sectors = TW_MAX_SECTORS, .cmd_per_lun = TW_MAX_CMDS_PER_LUN, .use_clustering = ENABLE_CLUSTERING, + .sdev_attrs = tw_dev_attrs, .emulated = 1 }; #include "scsi_module.c" ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 3ware queue depth [was: Re: HIGHMEM4G config for 1GB RAM on desktop?] 2004-09-01 11:08 ` Miquel van Smoorenburg @ 2004-09-01 11:43 ` Christoph Hellwig 2004-09-01 19:43 ` Patrick Mansfield 1 sibling, 0 replies; 29+ messages in thread From: Christoph Hellwig @ 2004-09-01 11:43 UTC (permalink / raw) To: Miquel van Smoorenburg Cc: Christoph Hellwig, lkml, Timothy Miller, linux-kernel, linux-scsi On Wed, Sep 01, 2004 at 11:08:39AM +0000, Miquel van Smoorenburg wrote: > Anyway here's the minimal patch against 2.6.9-rc1 > > [PATCH] 3w-xxxx.c queue depth > > make 3w-xxxx.c queue_depth sysfs parameter writable, adjust initial > queue nr_requests to 2*queue_depth Looks good to me. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 3ware queue depth [was: Re: HIGHMEM4G config for 1GB RAM on desktop?] 2004-09-01 11:08 ` Miquel van Smoorenburg 2004-09-01 11:43 ` Christoph Hellwig @ 2004-09-01 19:43 ` Patrick Mansfield 2004-09-01 22:23 ` Miquel van Smoorenburg 1 sibling, 1 reply; 29+ messages in thread From: Patrick Mansfield @ 2004-09-01 19:43 UTC (permalink / raw) To: Miquel van Smoorenburg Cc: Christoph Hellwig, lkml, Timothy Miller, Jens Axboe, linux-kernel, linux-scsi On Wed, Sep 01, 2004 at 11:08:39AM +0000, Miquel van Smoorenburg wrote: > + /* make sure blockdev queue depth is at least 2 * scsi depth */ > + if (SDptr->request_queue->nr_requests < 2 * max_cmds) > + SDptr->request_queue->nr_requests = 2 * max_cmds; Why would you want nr_requests different (and larger) only for this driver? Is modifying nr_requests allowed? -- Patrick Mansfield ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 3ware queue depth [was: Re: HIGHMEM4G config for 1GB RAM on desktop?] 2004-09-01 19:43 ` Patrick Mansfield @ 2004-09-01 22:23 ` Miquel van Smoorenburg 2004-09-04 10:10 ` Jens Axboe 0 siblings, 1 reply; 29+ messages in thread From: Miquel van Smoorenburg @ 2004-09-01 22:23 UTC (permalink / raw) To: Patrick Mansfield Cc: Miquel van Smoorenburg, Christoph Hellwig, lkml, Timothy Miller, Jens Axboe, linux-kernel, linux-scsi On Wed, 01 Sep 2004 21:43:25, Patrick Mansfield wrote: > On Wed, Sep 01, 2004 at 11:08:39AM +0000, Miquel van Smoorenburg wrote: > > > + /* make sure blockdev queue depth is at least 2 * scsi depth */ > > + if (SDptr->request_queue->nr_requests < 2 * max_cmds) > > + SDptr->request_queue->nr_requests = 2 * max_cmds; > > Why would you want nr_requests different (and larger) only for this > driver? Because for the Linux I/O scheduler to work, nr_requests needs to be at least twice as big as the scsi queue depth. For all other scsi drivers, the scsi queue depth is somewhere between 0 and 63. Most are between 1 and 8. Default nr_requests is 128, so this problem exists only with the 3ware driver/controller that has a queue depth of 254 .. It's more complicated than that though when you have more than one scsi device attached to the 3ware controller (multiple raid arrays or JBODs defined), since the total queue depth of the controller is 254. In that case one scsi device can starve others on the same controller, so you want to tune down the queue depth per device .. e.g. with 8 JBODs set queue_depth per device to 32, set nr_requests to 128. Perhaps the initial queue_depth per device should be set to 254 / tw_dev->tw_num_units, that would be optimal. Something like max_cmds = tw_host->can_queue / tw_dev->tw_num_units; if (max_cmds > TW_MAX_CMDS_PER_LUN) max_cmds = TW_MAX_CMDS_PER_LUN; I think such a change should be submitted through the people at 3ware, though. > Is modifying nr_requests allowed? Well we need to do the same things that ll_rw_blk::queue_requests_store() does, only we don't need to worry about locking or existing queue contents since the queue has been instantiated but the scsi device is not active yet. I do notice now however, that between 2.6.4 and 2.6.9-rc1 blk_queue_congestion_threshold() has been added which we should probably call after adjusting nr_requests. Unfortunately it's a static function in ll_rw_blk.c .. Perhaps we should export the functionality of queue_requests_store() as, say, queue_adjust_nr_requests() (like scsi_adjust_queue_depth) ? Jens ? Anyway, for now, perhaps the mucking with nr_requests should be taken out and a change like the above should be sent to the people at 3ware. I'll submit the sysfs code for inclusion in -mm and the nr_requests stuff to 3ware. Mike. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: 3ware queue depth [was: Re: HIGHMEM4G config for 1GB RAM on desktop?] 2004-09-01 22:23 ` Miquel van Smoorenburg @ 2004-09-04 10:10 ` Jens Axboe 0 siblings, 0 replies; 29+ messages in thread From: Jens Axboe @ 2004-09-04 10:10 UTC (permalink / raw) To: Miquel van Smoorenburg Cc: Patrick Mansfield, Christoph Hellwig, lkml, Timothy Miller, linux-kernel, linux-scsi On Wed, Sep 01 2004, Miquel van Smoorenburg wrote: > On Wed, 01 Sep 2004 21:43:25, Patrick Mansfield wrote: > >On Wed, Sep 01, 2004 at 11:08:39AM +0000, Miquel van Smoorenburg wrote: > > > >> + /* make sure blockdev queue depth is at least 2 * scsi depth */ > >> + if (SDptr->request_queue->nr_requests < 2 * max_cmds) > >> + SDptr->request_queue->nr_requests = 2 * max_cmds; > > > >Why would you want nr_requests different (and larger) only for this > >driver? > > Because for the Linux I/O scheduler to work, nr_requests needs to > be at least twice as big as the scsi queue depth. Well, basically if you want to have a chance to do any io scheduling anywhere, you need to have more than 1 request to play with really. And if the drive is swallowing all your requests all the time, you are screwed. I do think the best option (as some people mentioned in this thread) is to limit the 3ware queue depth, not increase the io scheduler depth. At least for most of the current io schedulers, this will kill your latency quite a bit. > >Is modifying nr_requests allowed? > > Well we need to do the same things that ll_rw_blk::queue_requests_store() > does, only we don't need to worry about locking or existing queue > contents since the queue has been instantiated but the scsi device > is not active yet. > > I do notice now however, that between 2.6.4 and 2.6.9-rc1 > blk_queue_congestion_threshold() has been added which we should > probably call after adjusting nr_requests. Unfortunately it's > a static function in ll_rw_blk.c .. > > Perhaps we should export the functionality of queue_requests_store() > as, say, queue_adjust_nr_requests() (like scsi_adjust_queue_depth) ? > Jens ? Yes, if you want to do this, we need to export a function to do it that takes care of updating the block layer congestion (etc) data. > Anyway, for now, perhaps the mucking with nr_requests should be > taken out and a change like the above should be sent to the > people at 3ware. Indeed. > I'll submit the sysfs code for inclusion in -mm and the nr_requests > stuff to 3ware. Great! -- Jens Axboe ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2004-09-04 10:11 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-02 21:02 HIGHMEM4G config for 1GB RAM on desktop? Steve Snyder
2004-08-02 21:32 ` Bart Alewijnse
2004-08-02 22:05 ` Barry K. Nathan
2004-08-03 13:30 ` Jens Axboe
2004-08-03 14:13 ` Prakash K. Cheemplavam
2004-08-03 14:29 ` Con Kolivas
2004-08-04 6:06 ` Jens Axboe
2004-08-04 11:14 ` Eric Bambach
2004-08-04 13:07 ` Jens Axboe
2004-08-04 19:06 ` Andrew Morton
2004-08-04 19:21 ` Marc-Christian Petersen
2004-08-04 19:30 ` Martin J. Bligh
2004-08-04 19:51 ` Andrew Morton
2004-08-04 20:09 ` Martin J. Bligh
2004-08-04 20:09 ` Roland Dreier
2004-08-04 20:13 ` Martin J. Bligh
2004-08-12 0:53 ` Timothy Miller
2004-08-30 18:06 ` Timothy Miller
2004-08-30 17:49 ` Miquel van Smoorenburg
2004-08-31 22:46 ` Timothy Miller
2004-09-01 7:52 ` Miquel van Smoorenburg
2004-09-01 9:38 ` Matt Heler
[not found] ` <1094030083l.3189l.2l@traveler>
[not found] ` <1094030194l.3189l.3l@traveler>
[not found] ` <200409010233.31643.lkml@lpbproductions.com>
2004-09-01 9:58 ` 3ware queue depth [was: Re: HIGHMEM4G config for 1GB RAM on desktop?] Miquel van Smoorenburg
2004-09-01 10:09 ` Christoph Hellwig
2004-09-01 11:08 ` Miquel van Smoorenburg
2004-09-01 11:43 ` Christoph Hellwig
2004-09-01 19:43 ` Patrick Mansfield
2004-09-01 22:23 ` Miquel van Smoorenburg
2004-09-04 10:10 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox