* Re: Any non-BS VM work queued for 2.5?
From: Arnaldo Carvalho de Melo @ 2002-01-14 2:04 UTC (permalink / raw)
To: Alan Cox; +Cc: Duraid Madina, linux-kernel
In-Reply-To: <E16PsBq-00083i-00@the-village.bc.nu>
Em Sun, Jan 13, 2002 at 09:29:18PM +0000, Alan Cox escreveu:
> > Is this true? Judging by the ease with which AA's hackwork made it into
> > 2.4, I think we may all be, well, fucked.
>
> 2.4 is now in good hands. Now be careful before the sun comes up and you get
> turned to stone ;)
Oh, this has something to do with that "don't feed the..." thing?
;)
- Arnaldo
^ permalink raw reply
* Re: [LARTC] SFQ improvement ideas
From: John Huttley @ 2002-01-14 2:02 UTC (permalink / raw)
To: lartc
In-Reply-To: <marc-lartc-101094979511667@msgid-missing>
> To change, can't you just do:
> tc del dev eth0 root handle 1:
>
> (or whatever your handles are, you get the idea)
>
> Shouldn't that essentially "flush" all the shaping under then 1: handle?
This is not a valid command.
The other way is to stop and start networking. Really, I'd just rather
reboot and be done with it.
Regards
John
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://ds9a.nl/lartc/
^ permalink raw reply
* Re: [2.4.17/18pre] VM and swap - it's really unusable
From: Robert Love @ 2002-01-14 1:56 UTC (permalink / raw)
To: Rik van Riel
Cc: Roman Zippel, Alan Cox, Kenneth Johansson, arjan, Rob Landley,
linux-kernel
In-Reply-To: <Pine.LNX.4.33L.0201132349450.32617-100000@imladris.surriel.com>
On Sun, 2002-01-13 at 20:50, Rik van Riel wrote:
> > So far I haven't seen any evidence, that preempt introduces any _new_
> > serious problems, so I'd rather like to see to get the best out of
> > both.
>
> Are you seriously suggesting you haven't read a single
> email in this thread yet ?
No, I think he is suggesting he doesn't consider any of the problems
serious. A lot of it is just smoke. What is "bad" wrt 2.5?
Robert Love
^ permalink raw reply
* Re: Linux 2.4.18pre3-ac1-aia21 (IDE patches)
From: Anton Altaparmakov @ 2002-01-14 1:53 UTC (permalink / raw)
To: Alan Cox; +Cc: linux-kernel
In-Reply-To: <5.1.0.14.2.20020113232757.04f34ec0@pop.cus.cam.ac.uk>
At 01:53 14/01/02, Alan Cox wrote:
> > Alan's -ac series is back! To celebrate this I added in the IDE patches
> and
> > an NTFS update which dramatically reduces the number of vmalloc()s and
> have
> > posted the resulting (tested) patch (to be applied on top of
> > 2.4.18pre3-ac1) at below URL.
>
>Andre's IDE patch is in the ac2 cut. I took it out just to make testing easier
>in case other people found -ac1 wasnt as reliable as I did 8)
That's ok. -ac2 isn't out yet AFAICS... (-;
Do you have the configure help entries in there, too?
btw. -ac1-aia1 is working very well on my desktop here. At least I wasn't
able to break it with anything I threw at it. In fact it is working much
better than before both interactivity wise and io wise. (-8
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/
^ permalink raw reply
* Oops in kswapd (Kernel 2.4.17)
From: Patrick Burns @ 2002-01-14 1:55 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 767 bytes --]
Is there some kind of memory problem with kernel 2.4.17? I noticed in an
article at:
http://marc.theaimsgroup.com/?l=linux-kernel&m=101096234600708&w=2
and another at:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0201.1/0809.html
that people were getting oopses in kswapd. I also had the same problem
this morning. The kernel froze up totally. Not even SysRq keys would
work. I am running 2x400mhz PII in an SMP machine with 512mb RAM. I have
attatched the syslog of the oops and what I got when I ran it past
ksymoops. I'm using 2.4.17 on a stock Red Hat 7.2 machine. I built it
with gcc 2.96 (the 2.96-98 version that comes with red Hat 7.2.)
Can anyone help me out? I'm not subscribed to this list, so please cc me
any advice. Thank you very much.
-Patrick
[-- Attachment #2: oops.txt --]
[-- Type: text/plain, Size: 1574 bytes --]
Jan 14 08:42:40 pegasus kernel: Unable to handle kernel paging request at virtual address 00300014
Jan 14 08:42:40 pegasus kernel: printing eip:
Jan 14 08:42:40 pegasus kernel: c0147d2f
Jan 14 08:42:40 pegasus kernel: *pde = 00000000
Jan 14 08:42:40 pegasus kernel: Oops: 0000
Jan 14 08:42:40 pegasus kernel: CPU: 0
Jan 14 08:42:40 pegasus kernel: EIP: 0010:[<c0147d2f>] Not tainted
Jan 14 08:42:40 pegasus kernel: EFLAGS: 00010206
Jan 14 08:42:40 pegasus kernel: eax: 00300000 ebx: d82f0e78 ecx: df84de50 edx: d82f0e90
Jan 14 08:42:40 pegasus kernel: esi: d82f0e60 edi: d82f2440 ebp: 0000ba58 esp: c1955f30
Jan 14 08:42:40 pegasus kernel: ds: 0018 es: 0018 ss: 0018
Jan 14 08:42:40 pegasus kernel: Process kswapd (pid: 5, stackpage=c1955000)
Jan 14 08:42:40 pegasus kernel: Stack: c012f67c dffe007c c1954000 ffffffff 000001d0 c0297b28 c1954000 00000000
Jan 14 08:42:40 pegasus kernel: 00000020 000001d0 00000006 00000006 c01480c0 0000bcfd c012f887 00000006
Jan 14 08:42:40 pegasus kernel: 000001d0 c0297b28 00000006 000001d0 c0297b28 00000000 c012f8ec 00000020
Jan 14 08:42:40 pegasus kernel: Call Trace: [<c012f67c>] [<c01480c0>] [<c012f887>] [<c012f8ec>] [<c012f991>]
Jan 14 08:42:40 pegasus kernel: [<c012fa06>] [<c012fb41>] [<c012faa0>] [<c0105000>] [<c0105836>] [<c012faa0>]
Jan 14 08:42:40 pegasus kernel:
Jan 14 08:42:40 pegasus kernel: Code: 8b 40 14 85 c0 74 0a 57 56 ff d0 5a 59 eb 1a 89 f6 57 e8 4a
Jan 14 08:42:47 pegasus kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 000001f8
[-- Attachment #3: trace.txt --]
[-- Type: text/plain, Size: 3900 bytes --]
ksymoops 2.4.3 on i686 2.4.17. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.17/ (default)
-m /usr/src/linux/System.map (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
Jan 14 08:42:40 pegasus kernel: Unable to handle kernel paging request at virtual address 00300014
Jan 14 08:42:40 pegasus kernel: c0147d2f
Jan 14 08:42:40 pegasus kernel: *pde = 00000000
Jan 14 08:42:40 pegasus kernel: Oops: 0000
Jan 14 08:42:40 pegasus kernel: CPU: 0
Jan 14 08:42:40 pegasus kernel: EIP: 0010:[<c0147d2f>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Jan 14 08:42:40 pegasus kernel: EFLAGS: 00010206
Jan 14 08:42:40 pegasus kernel: eax: 00300000 ebx: d82f0e78 ecx: df84de50 edx: d82f0e90
Jan 14 08:42:40 pegasus kernel: esi: d82f0e60 edi: d82f2440 ebp: 0000ba58 esp: c1955f30
Jan 14 08:42:40 pegasus kernel: ds: 0018 es: 0018 ss: 0018
Jan 14 08:42:40 pegasus kernel: Process kswapd (pid: 5, stackpage=c1955000)
Jan 14 08:42:40 pegasus kernel: Stack: c012f67c dffe007c c1954000 ffffffff 000001d0 c0297b28 c1954000 00000000
Jan 14 08:42:40 pegasus kernel: 00000020 000001d0 00000006 00000006 c01480c0 0000bcfd c012f887 00000006
Jan 14 08:42:40 pegasus kernel: 000001d0 c0297b28 00000006 000001d0 c0297b28 00000000 c012f8ec 00000020
Jan 14 08:42:40 pegasus kernel: Call Trace: [<c012f67c>] [<c01480c0>] [<c012f887>] [<c012f8ec>] [<c012f991>]
Jan 14 08:42:40 pegasus kernel: [<c012fa06>] [<c012fb41>] [<c012faa0>] [<c0105000>] [<c0105836>] [<c012faa0>]
Jan 14 08:42:40 pegasus kernel: Code: 8b 40 14 85 c0 74 0a 57 56 ff d0 5a 59 eb 1a 89 f6 57 e8 4a
>>EIP; c0147d2e <prune_dcache+be/160> <=====
Trace; c012f67c <shrink_cache+30c/3b0>
Trace; c01480c0 <shrink_dcache_memory+20/30>
Trace; c012f886 <shrink_caches+66/90>
Trace; c012f8ec <try_to_free_pages+3c/60>
Trace; c012f990 <kswapd_balance_pgdat+50/a0>
Trace; c012fa06 <kswapd_balance+26/40>
Trace; c012fb40 <kswapd+a0/c0>
Trace; c012faa0 <kswapd+0/c0>
Trace; c0105000 <_stext+0/0>
Trace; c0105836 <kernel_thread+26/30>
Trace; c012faa0 <kswapd+0/c0>
Code; c0147d2e <prune_dcache+be/160>
00000000 <_EIP>:
Code; c0147d2e <prune_dcache+be/160> <=====
0: 8b 40 14 mov 0x14(%eax),%eax <=====
Code; c0147d30 <prune_dcache+c0/160>
3: 85 c0 test %eax,%eax
Code; c0147d32 <prune_dcache+c2/160>
5: 74 0a je 11 <_EIP+0x11> c0147d3e <prune_dcache+ce/160>
Code; c0147d34 <prune_dcache+c4/160>
7: 57 push %edi
Code; c0147d36 <prune_dcache+c6/160>
8: 56 push %esi
Code; c0147d36 <prune_dcache+c6/160>
9: ff d0 call *%eax
Code; c0147d38 <prune_dcache+c8/160>
b: 5a pop %edx
Code; c0147d3a <prune_dcache+ca/160>
c: 59 pop %ecx
Code; c0147d3a <prune_dcache+ca/160>
d: eb 1a jmp 29 <_EIP+0x29> c0147d56 <prune_dcache+e6/160>
Code; c0147d3c <prune_dcache+cc/160>
f: 89 f6 mov %esi,%esi
Code; c0147d3e <prune_dcache+ce/160>
11: 57 push %edi
Code; c0147d40 <prune_dcache+d0/160>
12: e8 4a 00 00 00 call 61 <_EIP+0x61> c0147d8e <prune_dcache+11e/160>
Jan 14 08:42:47 pegasus kernel: <1>Unable to handle kernel NULL pointer dereference at virtual address 000001f8
1 warning issued. Results may not be reliable.
^ permalink raw reply
* Re: [linux-lvm] upgrade RH with lvm
From: Dirk Heinrichs @ 2002-01-14 1:51 UTC (permalink / raw)
To: linux-lvm
In-Reply-To: <1010772299.7571.14.camel@dumbo.zwecker.de>
On 11 Jan 2002, Christophe Zwecker wrote:
> hi,
>
>
> RH7.2 doesnt seem to be able to read lvm partitions when upgrading from
> 7.1, neither does the SGI iso (guess they didnt look into that).
>
> I wonder if any of you got the same problem if he solved it and how ?
You don't need to boot the installer to upgrade some RPM's. Use rpm
directly on the running system (after going single user with telinit 1, if
you
like).
Bye...
Dirk
--
Dirk Heinrichs | Tel: +49 (0)241 413 260
Configuration Manager | Fax: +49 (0)241 413 2640
QIS Systemhaus GmbH | Mail: dheinrichs@qis-systemhaus.de
Juelicher Str. 338 | Web: http://www.qis-systemhaus.de
D-52070 Aachen | ICQ#: 110037733
^ permalink raw reply
* Re: [2.4.17/18pre] VM and swap - it's really unusable
From: Rik van Riel @ 2002-01-14 1:50 UTC (permalink / raw)
To: Roman Zippel
Cc: Alan Cox, Robert Love, Kenneth Johansson, arjan, Rob Landley,
linux-kernel
In-Reply-To: <3C41A545.A903F24C@linux-m68k.org>
On Sun, 13 Jan 2002, Roman Zippel wrote:
> So far I haven't seen any evidence, that preempt introduces any _new_
> serious problems, so I'd rather like to see to get the best out of
> both.
Are you seriously suggesting you haven't read a single
email in this thread yet ?
Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply
* Re: Linux 2.4.18pre3-ac1-aia21 (IDE patches)
From: Erik Andersen @ 2002-01-14 1:50 UTC (permalink / raw)
To: Alan Cox; +Cc: linux-kernel
In-Reply-To: <E16PwJO-0000F8-00@the-village.bc.nu>
On Mon Jan 14, 2002 at 01:53:22AM +0000, Alan Cox wrote:
> > Alan's -ac series is back! To celebrate this I added in the IDE patches and
> > an NTFS update which dramatically reduces the number of vmalloc()s and have
> > posted the resulting (tested) patch (to be applied on top of
> > 2.4.18pre3-ac1) at below URL.
>
> Andre's IDE patch is in the ac2 cut. I took it out just to make testing easier
> in case other people found -ac1 wasnt as reliable as I did 8)
Will -ac2 be hitting the mirrors shortly then? BTW, you
mentioned in your earlier email you excluded the low-latency
patches. Mind if I ask which of the many you are using?
mini-ll? full-ll? The sched-O1-H6 patch?
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
^ permalink raw reply
* Re: [2.4.17/18pre] VM and swap - it's really unusable
From: Alan Cox @ 2002-01-14 1:54 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Linux Kernel Mailing List
In-Reply-To: <Pine.LNX.3.96.1020113202508.17441L-100000@gatekeeper.tmr.com>
> Feel free to quantify the savings over the current setup with max power
> saving enabled in the kernel. I just don't see how "wonderful" it would
> be, given that an idle system currently uses very little battery if you
> setup the options to save power.
IBM have a tickless kernel patch set for the S/390. Here its not battery at
stake but VM overhead sending timer interrupts to hundreds of otherwise idle
virtual machines
^ permalink raw reply
* [LARTC] am i on the right track ?
From: Chandrashekhar Joshi @ 2002-01-14 1:44 UTC (permalink / raw)
To: lartc
hi,
i am trying to have bandwidth shaping on my linux router.
i have a 64kbps link, on which i want to restrict the link use for http
/ftp access and give more bandwidth priority for vpn access. thanks to
martin devera i am using the htb for this.
|-----------| |-----------|
| MAIN LINK |-----+------| CLASS A |
|-----------| | |-----------|
|
|------+-----|
| CLASS B |
|------------|
MAIN LINK = 64kbps
CLASS A = 24kbps (for HTTP/ FTP and other internet related activities )
CLASS B = 40kbps ( reserved for PPTP-VPN access incoming and outgoing)
requirements :
each class should be able to borrow bandwidth from the other class when
excess is available
incoming vpn connections should come through the class B bandwidth policy
outgoing vpn connections should go through the class B bandwidth policy
following is the command set (lifted from the htb manual :-) ) i am
trying to use :
# tc qdisc add dev eth0 root handle 1: htb default 11
# tc class add dev eth0 parent 1: classid 1:1 htb rate 64kbps ceil 64
kbps burst 2k
# tc class add dev eth0 parent 1: classid 1:10 htb rate 64kbps ceil 64
kbps burst 2k
# tc class add dev eth0 parent 1: classid 1:11 htb rate 64kbps ceil 64
kbps burst 2k
# tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip src
192.168.1.0 match tcp dst 21 0xffff flowid \ 1:10
# tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip src
192.168.1.0 match tcp dst 80 0xffff flowid \ 1:10
# tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip src
192.168.1.0 match tcp dst 443 0xffff flowid \ 1:10
# tc qdisc add dev eth0 parent 1:10 handle 20: pfifo limit 5
# tc qdisc add dev eth0 parent 1:11 handle 30: sfq perturb 10
first i would like to know whether what i have done is correct or not in
context with my problem , i think i am on right track ?
why i did what i did
now to meet the above requirements what i did was to define bandwidth
shaping only for http, https & ftp (the major bandwidth guzzlers to
class A and made the class B as default.
but because of above rulesets my other inet services (like domain, ssh,
whois, ping etc) will go through the default policy of class B, which i
want to avoid without adding additional rulesets; can i define a ruleset
for VPN something like below : (and remove the rulesets for class A
and make clas A as default)
# tc filter add dev eth0 protocol 47 parent 1:0 prio 1 u32 match ip src
192.168.1.0 flowid 1:11
# tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip src
192.168.1.0 match tcp dst 1723 0xffff \
flowid 1:11
but how do i define the ruleset for incoming connections, can the
following be true
# tc filter add dev eth0 protocol 47 parent 1:0 prio 1 u32 match ip dst
192.168.1.0 flowid 1:11
# tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip dst
192.168.1.0 match tcp dst 1723 0xffff \
flowid 1:11
thanx in advance
regards / shekhar
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://ds9a.nl/lartc/
^ permalink raw reply
* IDE Patches bring amazing performance gain!!!
From: Anton Altaparmakov @ 2002-01-14 1:44 UTC (permalink / raw)
To: linux-kernel
In-Reply-To: <5.1.0.14.2.20020113232757.04f34ec0@pop.cus.cam.ac.uk>
As a heads up, Andre Hedrick's (Andre sorry for the misspelling
previously!) IDE patch improved the performance of my 7200rpm ATA100 IBM
IDE hd from 28Mb/s to 38Mb/s as measured by hdparm -t /dev/hda, which is
quite an improvement by anyones standards! Also hitting the disk with a lot
of io maintains low latency and my mp3s aren't dropping out and my X
session maintains interactivity. (-:
Considering I have seen many good reports and ZERO bad reports about the
IDE patches it is really astonishing that the patches are not being applied
to the 2.4.x kernel series... (especially as they were in the -ac series
previously already)
Best regards,
Anton
At 01:07 14/01/02, Anton Altaparmakov wrote:
>Alan's -ac series is back! To celebrate this I added in the IDE patches
>and an NTFS update which dramatically reduces the number of vmalloc()s and
>have posted the resulting (tested) patch (to be applied on top of
>2.4.18pre3-ac1) at below URL.
>
>http://www-stu.christs.cam.ac.uk/~aia21/linux/patch-2.4.18-pre3-ac1-aia1.bz2
>http://www-stu.christs.cam.ac.uk/~aia21/linux/patch-2.4.18-pre3-ac1-aia1.gz
>
>
>Linux 2.4.18pre3-ac1-aia1
>
>o IDE patch (taskfile, lba-48, ata133, etc) Andre Hedrick
>o Configure help entries for above Andre Hedrick, Rob
>Radez
>o Small IDE cleanups (code beauty only) Pavel Machek, me
>o Reduce NTFS vmalloc() use (NTFS 1.1.22) me
>
>Enjoy,
>
>Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/
^ permalink raw reply
* Re: Linux 2.4.18pre3-ac1-aia21 (IDE patches)
From: Alan Cox @ 2002-01-14 1:53 UTC (permalink / raw)
To: Anton Altaparmakov; +Cc: linux-kernel
In-Reply-To: <5.1.0.14.2.20020113232757.04f34ec0@pop.cus.cam.ac.uk>
> Alan's -ac series is back! To celebrate this I added in the IDE patches and
> an NTFS update which dramatically reduces the number of vmalloc()s and have
> posted the resulting (tested) patch (to be applied on top of
> 2.4.18pre3-ac1) at below URL.
Andre's IDE patch is in the ac2 cut. I took it out just to make testing easier
in case other people found -ac1 wasnt as reliable as I did 8)
Alan
^ permalink raw reply
* [PATCH] page coloring for 2.4.17 kernel
From: Jason Papadopoulos @ 2002-01-14 1:46 UTC (permalink / raw)
To: linux-mm
Hello. Please be patient with this, my first post to linux-mm.
The included patch modifies the free list in the 2.4.17 kernel
to support round-robin page coloring. It seems to work okay
on an Alpha and speeds up a lot of number-crunching code I
have lying around (lmbench reports some higher bandwidths too).
The patch is a port of the 2.2.20 version that I recently posted
to the linux kernelmailing list.
I'd be grateful if the folks on this list can try out other
architectures and benchmarks. I've also been told that the code
which generates a random color needs to be more portable and cannot
rely on 64-bit data types. The coloring scheme is a little simplistic,
and it would be nice to use another scheme that is more efficient
but involves more extensive changes to the kernel.
Thanks in advance for your help.
jasonp
PS: Note that three of the "diff" command lines below wrap around
to the next line. Stupid Eudora...
-----------------------------------------------------------------
diff -ruN linux-2.4.17/drivers/char/Config.in linux-2.4.17a/drivers/char/Config.in
--- linux-2.4.17/drivers/char/Config.in Mon Nov 12 12:34:16 2001
+++ linux-2.4.17a/drivers/char/Config.in Sat Jan 12 09:35:03 2002
@@ -174,6 +174,8 @@
fi
endmenu
+tristate 'Page Coloring' CONFIG_PAGE_COLORING
+
if [ "$CONFIG_ARCH_NETWINDER" = "y" ]; then
tristate 'NetWinder thermometer support' CONFIG_DS1620
tristate 'NetWinder Button' CONFIG_NWBUTTON
diff -ruN linux-2.4.17/drivers/char/Makefile linux-2.4.17a/drivers/char/Makefile
--- linux-2.4.17/drivers/char/Makefile Sun Nov 11 13:09:32 2001
+++ linux-2.4.17a/drivers/char/Makefile Sat Jan 12 09:42:23 2002
@@ -240,6 +240,11 @@
obj-y += mwave/mwave.o
endif
+ifeq ($(CONFIG_PAGE_COLORING),m)
+ CONFIG_PAGE_COLORING_MODULE=y
+ obj-m += page_color.o
+endif
+
include $(TOPDIR)/Rules.make
fastdep:
diff -ruN linux-2.4.17/drivers/char/page_color.c linux-2.4.17a/drivers/char/page_color.c
--- linux-2.4.17/drivers/char/page_color.c Wed Dec 31 19:00:00 1969
+++ linux-2.4.17a/drivers/char/page_color.c Sun Jan 13 17:47:44 2002
@@ -0,0 +1,167 @@
+/*
+ * This module implements page coloring, a systematic way
+ * to get the most performance out of the expensive cache
+ * memory your computer has. At present the code is *only*
+ * to be built as a loadable kernel module.
+ *
+ * After building the kernel and rebooting, load the module
+ * and specify the cache size to use, like so:
+ *
+ * insmod <path to page_color.o> cache_size=X
+ *
+ * where X is the size of the largest cache your system has.
+ * For machines with three cache levels (Alpha 21164, AMD K6-III)
+ * this will be the size in bytes of the L3 cache, and for all
+ * others it will be the size of the L2 cache. If your system
+ * doesn't have at least L2 cache, fer cryin' out loud GET SOME!
+ * When specifying the cache size you can use 'K' or 'M' to signify
+ * kilobytes or megabytes, respectively. In any case, the cache
+ * size *must* be a power of two.
+ *
+ * insmod will create a module called 'page_color' which changes
+ * the way Linux allocates pages from the free list. It is always
+ * safe to start and stop the module while other processes are running.
+ *
+ * If linux is configured for a /proc filesystem, the module will
+ * also create /proc/page_color as a means of reporting statistics.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/version.h>
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <asm/page.h>
+#include <linux/mm.h>
+#include <linux/proc_fs.h>
+
+extern unsigned int page_miss_count;
+extern unsigned int page_hit_count;
+extern unsigned int page_colors;
+extern unsigned int page_alloc_count;
+extern struct list_head *page_color_table;
+
+void page_color_start(void);
+void page_color_stop(void);
+
+#if defined(__alpha__)
+#define CACHE_SIZE_GUESS (4*1024*1024)
+#elif defined(__i386__)
+#define CACHE_SIZE_GUESS (256*1024)
+#else
+#define CACHE_SIZE_GUESS (1*1024*1024)
+#endif
+
+#ifdef CONFIG_PROC_FS
+
+int page_color_getinfo(char *buf, char **start, off_t fpos, int length)
+{
+ int i, j, k, count, num_colors;
+ struct list_head *queue, *curr;
+ char *p = buf;
+
+ p += sprintf(p, "colors: %d\n", page_colors);
+ p += sprintf(p, "hits: %d\n", page_hit_count);
+ p += sprintf(p, "misses: %d\n", page_miss_count);
+ p += sprintf(p, "pages allocated: %d\n", page_alloc_count);
+
+ queue = page_color_table;
+ for(i=0; i<MAX_NR_ZONES; i++) {
+ num_colors = page_colors;
+
+ for(j=0; j<MAX_ORDER; j++) {
+ for(k=0; k<num_colors; k++) {
+ count = 0;
+ if (!queue->next)
+ goto getinfo_done;
+
+ list_for_each(curr, queue) {
+ count++;
+ }
+ p += sprintf(p, "%d ", count);
+ queue++;
+ }
+
+ p += sprintf(p, "\n");
+ if (num_colors > 1)
+ num_colors >>= 1;
+ }
+ }
+
+getinfo_done:
+ return p - buf;
+}
+
+#endif
+
+void cleanup_module(void)
+{
+ printk("page_color: terminating page coloring\n");
+
+#ifdef CONFIG_PROC_FS
+ remove_proc_entry("page_color", NULL);
+#endif
+
+ page_color_stop();
+ kfree(page_color_table);
+}
+
+static char *cache_size;
+MODULE_PARM(cache_size, "s");
+
+int init_module(void)
+{
+ unsigned int cache_size_int;
+ unsigned int alloc_size;
+
+ if (cache_size) {
+ cache_size_int = simple_strtoul(cache_size,
+ (char **)NULL, 10);
+ if ( strchr(cache_size, 'M') ||
+ strchr(cache_size, 'm') )
+ cache_size_int *= 1024*1024;
+
+ if ( strchr(cache_size, 'K') ||
+ strchr(cache_size, 'k') )
+ cache_size_int *= 1024;
+ }
+ else {
+ cache_size_int = CACHE_SIZE_GUESS;
+ }
+
+ if( (-cache_size_int & cache_size_int) != cache_size_int ) {
+ printk ("page_color: cache size is not a power of two\n");
+ return 1;
+ }
+
+ page_colors = cache_size_int / PAGE_SIZE;
+ page_hit_count = 0;
+ page_miss_count = 0;
+ page_alloc_count = 0;
+ alloc_size = MAX_NR_ZONES * sizeof(struct list_head) *
+ (2 * page_colors + MAX_ORDER);
+ page_color_table = (struct list_head *)kmalloc(alloc_size, GFP_KERNEL);
+ if (!page_color_table) {
+ printk("page_color: memory allocation failed\n");
+ return 1;
+ }
+ memset(page_color_table, 0, alloc_size);
+
+ page_color_start();
+
+#ifdef CONFIG_PROC_FS
+ create_proc_info_entry("page_color", 0, NULL, page_color_getinfo);
+#endif
+
+ printk("page_color: starting with %d colors\n", page_colors );
+ return 0;
+}
diff -ruN linux-2.4.17/include/linux/mmzone.h linux-2.4.17a/include/linux/mmzone.h
--- linux-2.4.17/include/linux/mmzone.h Sat Jan 12 10:21:34 2002
+++ linux-2.4.17a/include/linux/mmzone.h Sat Jan 12 23:34:28 2002
@@ -21,6 +21,12 @@
typedef struct free_area_struct {
struct list_head free_list;
unsigned long *map;
+
+#ifdef CONFIG_PAGE_COLORING_MODULE
+ unsigned long count;
+ struct list_head *color_list;
+#endif
+
} free_area_t;
struct pglist_data;
diff -ruN linux-2.4.17/include/linux/sched.h linux-2.4.17a/include/linux/sched.h
--- linux-2.4.17/include/linux/sched.h Sat Jan 12 10:21:34 2002
+++ linux-2.4.17a/include/linux/sched.h Sat Jan 12 23:34:28 2002
@@ -410,6 +410,11 @@
/* journalling filesystem info */
void *journal_info;
+
+#ifdef CONFIG_PAGE_COLORING_MODULE
+ unsigned int color_init;
+ unsigned int target_color;
+#endif
};
/*
diff -ruN linux-2.4.17/kernel/ksyms.c linux-2.4.17a/kernel/ksyms.c
--- linux-2.4.17/kernel/ksyms.c Fri Dec 28 21:51:25 2001
+++ linux-2.4.17a/kernel/ksyms.c Sun Jan 13 17:46:28 2002
@@ -559,3 +559,21 @@
EXPORT_SYMBOL(tasklist_lock);
EXPORT_SYMBOL(pidhash);
+
+#ifdef CONFIG_PAGE_COLORING_MODULE
+extern unsigned int page_miss_count;
+extern unsigned int page_hit_count;
+extern unsigned int page_alloc_count;
+extern unsigned int page_colors;
+extern struct list_head *page_color_table;
+void page_color_start(void);
+void page_color_stop(void);
+
+EXPORT_SYMBOL_NOVERS(page_miss_count);
+EXPORT_SYMBOL_NOVERS(page_hit_count);
+EXPORT_SYMBOL_NOVERS(page_alloc_count);
+EXPORT_SYMBOL_NOVERS(page_colors);
+EXPORT_SYMBOL_NOVERS(page_color_table);
+EXPORT_SYMBOL_NOVERS(page_color_start);
+EXPORT_SYMBOL_NOVERS(page_color_stop);
+#endif
diff -ruN linux-2.4.17/mm/page_alloc.c linux-2.4.17a/mm/page_alloc.c
--- linux-2.4.17/mm/page_alloc.c Mon Nov 19 19:35:40 2001
+++ linux-2.4.17a/mm/page_alloc.c Sun Jan 13 18:46:31 2002
@@ -56,6 +56,288 @@
*/
#define BAD_RANGE(zone,x) (((zone) != (x)->zone) || (((x)-mem_map) < (zone)->zone_start_mapnr) || (((x)-mem_map) >= (zone)->zone_start_mapnr+(zone)->size))
+
+#ifdef CONFIG_PAGE_COLORING_MODULE
+
+#ifdef CONFIG_DISCONTIGMEM
+#error "Page coloring implementation cannot handle NUMA architectures"
+#endif
+
+unsigned int page_coloring = 0;
+unsigned int page_miss_count;
+unsigned int page_hit_count;
+unsigned int page_alloc_count;
+unsigned int page_colors = 0;
+struct list_head *page_color_table;
+
+#define COLOR(x) ((x) & cache_mask)
+
+void page_color_start(void)
+{
+ /* Empty the free list in each zone. For each
+ queue in the free list, transfer the entries
+ in the queue over to another set of queues
+ (the destination queue is determined by the
+ color of each entry). */
+
+ int i, j, k;
+ unsigned int num_colors, cache_mask;
+ unsigned long index;
+ unsigned long flags[MAX_NR_ZONES];
+ struct list_head *color_list_start, *head, *curr;
+ free_area_t *area;
+ struct page *page;
+ zone_t *zone;
+ pg_data_t *pgdata;
+
+ cache_mask = page_colors - 1;
+ color_list_start = page_color_table;
+ pgdata = &contig_page_data;
+
+ /* Stop all allocation of free pages while the
+ reshuffling is taking place */
+
+ for(i = 0; i < MAX_NR_ZONES; i++) {
+ zone = pgdata->node_zones + i;
+ if (zone->size)
+ spin_lock_irqsave(&zone->lock, flags[i]);
+ }
+
+ for(i = 0; i < MAX_NR_ZONES; i++) {
+ num_colors = page_colors;
+ zone = pgdata->node_zones + i;
+
+ if (!zone->size)
+ continue;
+
+ for(j = 0; j < MAX_ORDER; j++) {
+ area = zone->free_area + j;
+ area->count = 0;
+ area->color_list = color_list_start;
+ head = &area->free_list;
+ curr = memlist_next(head);
+
+ for(k = 0; k < num_colors; k++)
+ memlist_init(color_list_start + k);
+
+ while(curr != head) {
+ page = memlist_entry(curr, struct page, list);
+ memlist_del(curr);
+ index = page - zone->zone_mem_map;
+ memlist_add_head(curr, area->color_list +
+ (COLOR(index) >> j));
+ area->count++;
+ curr = memlist_next(head);
+ }
+
+ color_list_start += num_colors;
+ if (num_colors > 1)
+ num_colors >>= 1;
+ }
+ }
+
+ /* Allocation of free pages can continue */
+
+ page_coloring = 1;
+ for(i = 0; i < MAX_NR_ZONES; i++) {
+ zone = pgdata->node_zones + i;
+ if (zone->size)
+ spin_unlock_irqrestore(&zone->lock, flags[i]);
+ }
+}
+
+void page_color_stop(void)
+{
+ /* Reverse the operation of page_color_start(). */
+
+ int i, j, k;
+ unsigned int num_colors;
+ unsigned long flags[MAX_NR_ZONES];
+ struct list_head *head, *curr;
+ free_area_t *area;
+ zone_t *zone;
+ pg_data_t *pgdata;
+
+ pgdata = &contig_page_data;
+
+ for(i = 0; i < MAX_NR_ZONES; i++) {
+ zone = pgdata->node_zones + i;
+ if (zone->size)
+ spin_lock_irqsave(&zone->lock, flags[i]);
+ }
+
+ for(i = 0; i < MAX_NR_ZONES; i++) {
+ num_colors = page_colors;
+ zone = pgdata->node_zones + i;
+
+ if (!zone->size)
+ continue;
+
+ for(j = 0; j<MAX_ORDER; j++) {
+ area = zone->free_area + j;
+ area->count = 0;
+
+ for(k = 0; k < num_colors; k++) {
+ head = area->color_list + k;
+ curr = memlist_next(head);
+ while(curr != head) {
+ memlist_del(curr);
+ memlist_add_head(curr,
+ &area->free_list);
+ curr = memlist_next(head);
+ }
+ }
+
+ if (num_colors > 1)
+ num_colors >>= 1;
+ }
+ }
+
+ page_coloring = 0;
+ for(i = 0; i < MAX_NR_ZONES; i++) {
+ zone = pgdata->node_zones + i;
+ if (zone->size)
+ spin_unlock_irqrestore(&zone->lock, flags[i]);
+ }
+}
+
+unsigned int rand_carry = 0x01234567;
+unsigned int rand_seed = 0x89abcdef;
+
+#define MULT 2131995753
+
+static inline unsigned int get_rand(void)
+{
+ /* A multiply-with-carry random number generator by
+ George Marsaglia. The period is about 1<<63, and
+ each call to get_rand() returns 32 random bits */
+
+ unsigned long long prod;
+
+ prod = (unsigned long long)rand_seed *
+ (unsigned long long)MULT +
+ (unsigned long long)rand_carry;
+ rand_seed = (unsigned int)prod;
+ rand_carry = (unsigned int)(prod >> 32);
+
+ return rand_seed;
+}
+
+static struct page *alloc_pages_by_color(zone_t *zone, unsigned int order)
+{
+ unsigned int i;
+ unsigned int mask, color;
+ unsigned long page_idx;
+ free_area_t *area;
+ struct list_head *curr, *head;
+ struct page *page;
+ unsigned int cache_mask = page_colors - 1;
+
+ /* If this process hasn't asked for free pages
+ before, assign it a random starting color. */
+
+ if (current->color_init != current->pid) {
+ current->color_init = current->pid;
+ current->target_color = COLOR(get_rand());
+ }
+
+ /* Round the target color to look for up to the
+ next 1<<order boundary. */
+
+ mask = (1 << order) - 1;
+ color = current->target_color;
+ color = COLOR((color + mask) & ~mask);
+
+ /* Find out early if there are no free pages at all. */
+
+ for(i = order; i < MAX_ORDER; i++)
+ if (zone->free_area[i].count)
+ break;
+
+ if (i == MAX_ORDER)
+ return NULL;
+
+ /* The memory allocation is guaranteed to succeed
+ (although we may not find the correct color) */
+
+ while(1) {
+ area = zone->free_area + order;
+ for(i = order; i < MAX_ORDER; i++) {
+ head = area->color_list + (color >> i);
+ curr = memlist_next(head);
+ if (curr != head)
+ goto alloc_page_done;
+ area++;
+ }
+
+ page_miss_count++;
+ color = COLOR(color + (1<<order));
+ }
+
+alloc_page_done:
+ page = memlist_entry(curr, struct page, list);
+ if (BAD_RANGE(zone,page))
+ BUG();
+
+ memlist_del(curr);
+ page_idx = page - zone->zone_mem_map;
+ zone->free_pages -= 1 << order;
+ area->count--;
+
+ if (i < (MAX_ORDER - 1))
+ __change_bit(page_idx >> (1+i), area->map);
+
+ while (i > order) {
+
+ /* Return 1<<order contiguous pages out of
+ the 1<<i available now. Without page coloring
+ it would suffice to keep chopping the number of
+ pages in half and return the last 1<<order of
+ them. Here, the bottom bits of the index to
+ return must match the target color. We have to
+ keep chopping 1<<i in half but we can
+ only ignore the halves that don't match the
+ bit pattern of the target color. */
+
+ i--;
+ area--;
+ mask = 1 << i;
+ area->count++;
+ __change_bit(page_idx >> (1+i), area->map);
+ if (color & mask) {
+ if (BAD_RANGE(zone,page + mask))
+ BUG();
+
+ memlist_add_head(&page->list, area->color_list +
+ (COLOR(page_idx) >> i));
+ page_idx += mask;
+ page += mask;
+ }
+ else {
+ memlist_add_head(&(page + mask)->list,
+ area->color_list +
+ (COLOR(page_idx + mask) >> i));
+ }
+ }
+
+ set_page_count(page, 1);
+
+ if (BAD_RANGE(zone,page))
+ BUG();
+ if (PageLRU(page))
+ BUG();
+ if (PageActive(page))
+ BUG();
+
+ current->target_color = COLOR(color + (1<<order));
+ page_hit_count++;
+ page_alloc_count += 1 << order;
+ return page;
+}
+
+#endif /* CONFIG_PAGE_COLORING_MODULE */
+
+
/*
* Buddy system. Hairy. You really aren't expected to understand this
*
@@ -125,12 +407,26 @@
if (BAD_RANGE(zone,buddy2))
BUG();
+#ifdef CONFIG_PAGE_COLORING_MODULE
+ area->count--;
+ order++;
+#endif
memlist_del(&buddy1->list);
mask <<= 1;
area++;
index >>= 1;
page_idx &= mask;
}
+
+#ifdef CONFIG_PAGE_COLORING_MODULE
+ if (page_coloring == 1) {
+ unsigned long cache_mask = page_colors - 1;
+ memlist_add_head(&(base + page_idx)->list,
+ area->color_list + (COLOR(page_idx) >> order));
+ spin_unlock_irqrestore(&zone->lock, flags);
+ return;
+ }
+#endif
memlist_add_head(&(base + page_idx)->list, &area->free_list);
spin_unlock_irqrestore(&zone->lock, flags);
@@ -181,6 +477,15 @@
struct page *page;
spin_lock_irqsave(&zone->lock, flags);
+
+#ifdef CONFIG_PAGE_COLORING_MODULE
+ if (page_coloring == 1) {
+ page = alloc_pages_by_color(zone, order);
+ spin_unlock_irqrestore(&zone->lock, flags);
+ return page;
+ }
+#endif
+
do {
head = &area->free_list;
curr = memlist_next(head);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply
* Re: [PATCH][RFC] Lightweight user-level semaphores
From: Rusty Russell @ 2002-01-14 1:35 UTC (permalink / raw)
To: Alan Cox; +Cc: matthew, manfred, linux-kernel
In-Reply-To: <E16PmKA-0007BS-00@the-village.bc.nu>
On Sun, 13 Jan 2002 15:13:30 +0000 (GMT)
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > Yep, that'd be fine. However, you then lose the neatness
> > of "lock==file descriptor", and need something other than
> > read/write for down/up.
>
> If I have to have 2000 pages and 2000 file handles for 2000 locks I've
> kind of lost interest. read/write syscalls take offsets. I can pread/pwrite
> a lock in a set of locks. The only reason for using an fd I can see is so
> you can poll on a lock. All the other neatness issues are wrapped in the
> library support code anyway.
My interest in this is for TDB (Trivial Database: see sourceforge). TDB
requires a lock per hash chain (ie. arbitrary number of locks), a number
which does not change. With an extended version of the fd code (ie. first
mmap controls size of map, and offset control which semaphore you are
referring to, and semaphores created on demand), this requires:
Each TDB would have an associated ".locks" unix domain socket so you can
read the fd from the "master". This must be atomically created by the
first process to open the TDB, and must be asynchronously serviced by a
process at all times (ie. when the "master" exits, someone else takes
over).
Without even mentioning the impossibility of creating a Unix domain socket
with an arbitrary path name (can't chdir, might not be able to chdir
back), or the problem of cleaning up the socket when noone is using the
tdb, or the horror which is fd passing under Unix, I think it's clear that
the fd solutions are vastly inferior to the "magic cookie" approach.
Still, it was cute hack.
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply
* Re: [PATCH] Balanced Multi Queue Scheduler ...
From: Bill Davidsen @ 2002-01-14 1:33 UTC (permalink / raw)
To: Linux Kernel Mailing List
In-Reply-To: <200112301951.fBUJoxSr011753@svr3.applink.net>
On Sun, 30 Dec 2001, Timothy Covell wrote:
> Ummm, on my Dual P-III (650MHz with 524988416 Bytes), my current Seti
> efficiency is 5.35 CpF. That's a tad high/slower than an Ultra Sparc IIi
> according to their stats. So, it would appear that being SMP is hurting my
> performance a bit. Unless that is that you meant to run a seti instance for
> each CPU? And this reminds me of how "make -j3 bzlilo" is slower than
> "make -j2 bzlilo".
make -j3 bzlilo
make bzlilo MAKE='make -j3'
See if it runs differently on your machine, mem size may matter.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply
* Re: [PATCH] update: preemptive kernel for O(1) sched
From: William Lee Irwin III @ 2002-01-14 1:32 UTC (permalink / raw)
To: Robert Love; +Cc: Dieter N?tzel, Linux Kernel List
In-Reply-To: <1010965697.813.25.camel@phantasy>
On Sun, 2002-01-13 at 18:22, Dieter N?tzel wrote:
>> what about lock-break?
>> I am running your former one as always with
>> lock-break-rml-2.4.18-pre1-1.patch ...;-)
On Sun, Jan 13, 2002 at 06:48:17PM -0500, Robert Love wrote:
> I haven't tested O(1) together with lock-break personally, yet, but I
> have a confirmation of success from a couple of users. There are no
> reasons it shouldn't work.
I have at least run it on my laptop, together with rmap even.
No pathological behavior that I can tell. Of course, the interactive
response is wonderful, but I haven't precisely measured anything, as
I have enough other things to measure precisely it's a bit far afield.
Cheers,
Bill
^ permalink raw reply
* Re: [2.4.17/18pre] VM and swap - it's really unusable
From: yodaiken @ 2002-01-14 1:14 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Andrea Arcangeli, Linux Kernel Mailing List
In-Reply-To: <Pine.LNX.3.96.1020113193700.17441G-100000@gatekeeper.tmr.com>
On Sun, Jan 13, 2002 at 07:46:54PM -0500, Bill Davidsen wrote:
> Finally, I doubt that any of this will address my biggest problem with
> Linux, which is that as memory gets cheap a program doing significant disk
> writing can get buffers VERY full (perhaps a while CD worth) before the
> kernel decides to do the write, at which point the system becomes
> non-responsive for seconds at a time while the disk light comes on and
> stays on. That's another problem, and I did play with some patches this
> weekend without making myself really happy :-( Another topic,
> unfortunately.
I think this is a critical problem. I'd like to be able to have some
assurance that a task with a buffer of size N doing read-disk->write-disk
will maintain data flow at some minimal rate over intervals of 1 or 2
seconds or something like that.
^ permalink raw reply
* Re: [2.4.17/18pre] VM and swap - it's really unusable
From: Bill Davidsen @ 2002-01-14 1:28 UTC (permalink / raw)
To: Linux Kernel Mailing List
In-Reply-To: <20020109115859.C4902@borg.org>
On Wed, 9 Jan 2002, Kent Borg wrote:
> How does all this fit into doing a tick-less kernel?
>
> There is something appealing about doing stuff only when there is
> stuff to do, like: respond to input, handle some device that becomes
> ready, or let another process run for a while. Didn't IBM do some
> nice work on this for Linux? (*Was* it nice work?) I was under the
> impression that the current kernel isn't that far from being tickless.
>
> A tickless kernel would be wonderful for battery powered devices that
> could literally shut off when there be nothing to do, and it seems it
> would (trivially?) help performance on high end power hogs too.
>
> Why do we have regular HZ ticks? (Other than I think I remember Linus
> saying that he likes them.)
Feel free to quantify the savings over the current setup with max power
saving enabled in the kernel. I just don't see how "wonderful" it would
be, given that an idle system currently uses very little battery if you
setup the options to save power.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply
* Re: [2.4.17/18pre] VM and swap - it's really unusable
From: Bill Davidsen @ 2002-01-14 1:22 UTC (permalink / raw)
To: Linux Kernel Mailing List
In-Reply-To: <1010524653.3225.109.camel@phantasy>
On Tue, 8 Jan 2002, Robert Love wrote:
> On Tue, 2002-01-08 at 15:59, Daniel Phillips wrote:
>
> > And while I'm enumerating differences, the preemptable kernel (in this
> > incarnation) has a slight per-spinlock cost, while the non-preemptable kernel
> > has the fixed cost of checking for rescheduling, at intervals throughout all
> > 'interesting' kernel code, essentially all long-running loops. But by clever
> > coding it's possible to finesse away almost all the overhead of those loop
> > checks, so in the end, the non-preemptible low-latency patch has a slight
> > efficiency advantage here, with emphasis on 'slight'.
>
> True (re spinlock weight in preemptible kernel) but how is that not
> comparable to explicit scheduling points? Worse, the preempt-kernel
> typically does its preemption on a branch on return to interrupt
> (similar to user space's preemption). What better time to check and
> reschedule if needed?
I'm not sure that preempt and low latency really are attacking the same
problem. What I am finding is the LL improves overall performance when a
process does something which is physically slow, like a find in a
directory with 20k files. On the other hand PK makes the response of the
system better to changes. In particular I see the DNS servers which have
other work running, even backups or reports, are more responsive with PK,
as are usenet news servers. I find it hard to measure "feels faster" with
either approach, although like the supreme court "I know it when I see
it."
I'd like to hope that some of each will get in the main kernel, PK has
been stable for me for a while, LL has never been unstable but I've run it
less.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply
* Re: [2.4.17/18pre] VM and swap - it's really unusable
From: Robert Love @ 2002-01-14 1:19 UTC (permalink / raw)
To: Roman Zippel
Cc: yodaiken, Alan Cox, Kenneth Johansson, arjan, Rob Landley,
linux-kernel
In-Reply-To: <3C42293F.4962EC82@linux-m68k.org>
On Sun, 2002-01-13 at 19:41, Roman Zippel wrote:
> > That is exactly what Andrew Morton disputes. So why do you think he is
> > wrong?
Victor is saying that Andrew contends the hard parts of his low-latency
patch are just as hard to maintain with a preemptive kernel. This is
true, for the places where spinlocks are held anyway, but it assumes we
continue to treat lock breaking and explicit scheduling as our only
solution. It isn't under a preemptible kernel.
Robert Love
^ permalink raw reply
* Re: [2.4.17/18pre] VM and swap - it's really unusable
From: Robert Love @ 2002-01-14 1:17 UTC (permalink / raw)
To: Alan Cox
Cc: Stephan von Krawczynski, Roman Zippel, Kenneth Johansson, arjan,
Rob Landley, linux-kernel
In-Reply-To: <E16PvKx-00005L-00@the-village.bc.nu>
On Sun, 2002-01-13 at 19:50, Alan Cox wrote:
> Do you want a clean simple solution or complex elegance ? For 2.4 I definitely
> favour clean and simple. For 2.5 its an open debate
Make no mistake, I do not intend to see preempt-kernel in 2.4. I will,
however, continue to maintain the patch for endusers and such that use
it. A proper in-kernel solution for 2.4 in my opinion in mini-ll,
perhaps extend with any other obviously-completely-utterly sane bits
from full-ll.
For 2.5, however, I tout preempt as the answer. This does not mean just
preempt. It means a preemptible kernel as a basis for beginning
low-latency works in manners other than explicit scheduling statements.
Robert Love
^ permalink raw reply
* Re: [2.4.17/18pre] VM and swap - it's really unusable
From: yodaiken @ 2002-01-14 1:05 UTC (permalink / raw)
To: Roman Zippel
Cc: yodaiken, Alan Cox, Robert Love, Kenneth Johansson, arjan,
Rob Landley, linux-kernel
In-Reply-To: <3C42293F.4962EC82@linux-m68k.org>
On Mon, Jan 14, 2002 at 01:41:35AM +0100, Roman Zippel wrote:
> Hi,
>
> yodaiken@fsmlabs.com wrote:
>
> > > It's a useful patch for anyone, who needs good latencies now, but it's
> > > still a quick&dirty solution. Preempt offers a clean solution for a
> > > certain part of the problem, as it's possible to cleanly localize the
> > > needed changes for preemption (at least for UP). That means the ll patch
> > > becomes smaller and future work on ll becomes simpler, since a certain
> >
> > That is exactly what Andrew Morton disputes. So why do you think he is
> > wrong?
>
> Please explain, what do you mean?
I mean, that these conversations are not very useful if you don't
read what the other people write.
Here's a prior response by Andrew to a post by you.
>From akpm@zip.com.au Sat Jan 12 13:15:22 2002
Roman Zippel wrote:
>
> Andrew's patch requires constant audition and Andrew can't audit all
> drivers for possible problems. That doesn't mean Andrew's work is
> wasted, since it identifies problems, which preempting can't solve, but
> it will always be a hunt for the worst cases, where preempting goes for
> the general case.
Guys,
I've heard this so many times, and it just ain't so. The overwhelming
majority of problem areas are inside locks. All the complexity and
maintainability difficulties to which you refer exist in the preempt
patch as well. There just is no difference.
>
> bye, Roman
--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
www.fsmlabs.com www.rtlinux.com
^ permalink raw reply
* Re: [2.4.17/18pre] VM and swap - it's really unusable
From: Bill Davidsen @ 2002-01-14 1:08 UTC (permalink / raw)
To: Daniel Phillips; +Cc: Linux Kernel Mailing List
In-Reply-To: <E16O2hc-0000B3-00@starship.berlin>
On Tue, 8 Jan 2002, Daniel Phillips wrote:
> On January 8, 2002 08:47 pm, Andrew Morton wrote:
> > There's no point in just merging the preempt patch and saying "there,
> > that's done". It doesn't do anything.
> >
> > Instead, a decision needs to be made: "Linux will henceforth be a
> > low-latency kernel".
>
> I thought the intention was to make it a config option?
Irrelevant, it has to be implemented in order to be an option, so the
amount of work involved is the same either way. And if you want to make it
a runtime setting you add a slight bit of work and overhead deciding if LL
is wanted.
I'm not advocating that, but it would allow admins to enable LL when the
system was slow and see if it really made a change. Rebooting is bound to
change the load ;-)
> > Now, IF we can come to this decision, then
> > internal preemption is the way to do it. But it affects ALL kernel
> > developers. Because we'll need to introduce a new rule: "it is a
> > bug to spend more than five milliseconds holding any locks".
> >
> > So. Do we we want a low-latency kernel? Are we prepared to mandate
> > the five-millisecond rule? It can be done, but won't be easy, and
> > we'll never get complete coverage. But I don't see the will around
> > here.
Really? You have people working on low latency, people working on preempt,
and at least a few of us trying to characterize the problems with large
memory and i/o. I would say latency has become a real issue, and you only
need enough "will" to have one person write useful code, this is a
committee.
Since changes of this type don't need to be perfect and address all cases,
just help some and not make other worse, I think we will see improvement
in 2.4.xx without waiting for 2.5 or 2.6. No one is complaining that the
Linux overall thruput is bad, that network performance is bad, etc. But
responsiveness has become an issue, and I'm sure there's enough will to
solve it. "Solve" means getting most of the delays to be caused by
hardware capacity instead of kernel ineptitude.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply
* Linux 2.4.18pre3-ac1-aia21 (IDE patches)
From: Anton Altaparmakov @ 2002-01-14 1:07 UTC (permalink / raw)
To: linux-kernel
Alan's -ac series is back! To celebrate this I added in the IDE patches and
an NTFS update which dramatically reduces the number of vmalloc()s and have
posted the resulting (tested) patch (to be applied on top of
2.4.18pre3-ac1) at below URL.
http://www-stu.christs.cam.ac.uk/~aia21/linux/patch-2.4.18-pre3-ac1-aia1.bz2
http://www-stu.christs.cam.ac.uk/~aia21/linux/patch-2.4.18-pre3-ac1-aia1.gz
Linux 2.4.18pre3-ac1-aia1
o IDE patch (taskfile, lba-48, ata133, etc) Andre Hendrick
o Configure help entries for above Andre Hendrick, Rob
Radez
o Small IDE cleanups (code beauty only) Pavel Machek, me
o Reduce NTFS vmalloc() use (NTFS 1.1.22) me
Enjoy,
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/
^ permalink raw reply
* Re: [2.4.17/18pre] VM and swap - it's really unusable
From: Bill Davidsen @ 2002-01-14 0:55 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Linux Kernel Mailing List
In-Reply-To: <20020108163925.F1894@inspiron.school.suse.de>
On Tue, 8 Jan 2002, Andrea Arcangeli wrote:
> Note that some of them are bugfixes, without them an luser can hang the
> machine for several seconds on any box with some giga of ram by simple
> reading and writing into a large enough buffer. I think we never had
> time to care merging those bits into mainline yet and this is probably
> the main reason they're not integrated but it's something that should be
> in mainline IMHO.
Or just doing a large write while doing lots of reads... my personal
nemesis is "mkisofs" for backups, which reads lots of small files and
builds a CD image, which suddenly gets discovered by the kernel and
written, seemingly in a monolythic chunk. I MAY be able to improve this
with tuning the bdflush parameters, and I tried some tentative patches
which didn't make a huge gain.
I don't know if the solution lies in forcing write to start when a certain
size of buffers are queued regardless of percentages, or in better
scheduling of reads ahead of writes, or whatever.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.