* Scheduling problem with 2.4? @ 2003-05-17 11:22 David Kastrup 2003-05-17 17:41 ` Andrea Arcangeli 0 siblings, 1 reply; 20+ messages in thread From: David Kastrup @ 2003-05-17 11:22 UTC (permalink / raw) To: linux-kernel I have a problem with Emacs which crawls when used as a shell. If I call M-x shell RET and then do something like hexdump -v /dev/null|dd count=100k bs=1 then characters appear rather slowly at first, most of them in single character packets, with an occasional 1024 packet in between. More of them at the end. The rate of the packets does not exceed 100/sec. Other operating systems do not exhibit this extreme case of throttling. Emacs is using select calls AFAICS for dealing with packets arriving, and it would appear that the delivering process causes a context switch immediately when writing its single byte to the pipe or is taken from the run queue while the select call is pending. Whatever. In the ensuing time slice, Emacs processes the single arriving byte and then enters the select call again. Now that the pipe is not allowed to fill up in most cases (most: in some, increasingly more at the later time, the pipe gets filled alternately complete and with single bytes) before a context switch occurs is already unfortunate with regard to the efficiency of processing. What is quite more of a nuisance is the apparent throttling to a single byte per clock tick: the CPU is idling most of the time. It would be unfortunate enough if it were going full blast at processing 1-byte packets, but most of the time it does nothing but wait for the next tick. Now I would want to a) get sufficient information about what might go wrong inside of Linux as compared to other operating systems. b) find a good workaround for not triggering this kind of behavior. I don't have sufficient kernel experience to debug this one, and it definitely impacts the speed of operations severely, and in close cooperation with the operating system. So any pointers would be welcome. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-17 11:22 Scheduling problem with 2.4? David Kastrup @ 2003-05-17 17:41 ` Andrea Arcangeli 2003-05-17 19:36 ` David Kastrup 0 siblings, 1 reply; 20+ messages in thread From: Andrea Arcangeli @ 2003-05-17 17:41 UTC (permalink / raw) To: dak; +Cc: linux-kernel On Sat, May 17, 2003 at 01:22:23PM +0200, David Kastrup wrote: > > I have a problem with Emacs which crawls when used as a shell. If I > call M-x shell RET and then do something like > hexdump -v /dev/null|dd count=100k bs=1 this certainly can't be it andrea@dualathlon:~> hexdump -v /dev/null|dd count=100k bs=1 0+0 records in 0+0 records out andrea@dualathlon:~> how can that run slow, there's nothing to hexdump in /dev/null and nothing to read for dd? One related hint, since you're playing with >4k buffers with dd, make sure to run a -aa kernel with the lowlatency fixes (latest is 2.4.21rc2aa1), or those large writes can loop in the kernel forbid preemption (in this case maybe not if they block in a socket/pipe before 100k is all written) overall often people notices improvement with preempt becuse they compare it with buggy kernels (I know this has nothing to do with your report, but it's related to the above) Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-17 17:41 ` Andrea Arcangeli @ 2003-05-17 19:36 ` David Kastrup 2003-05-17 20:30 ` Andrea Arcangeli 0 siblings, 1 reply; 20+ messages in thread From: David Kastrup @ 2003-05-17 19:36 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 825 bytes --] Andrea Arcangeli <andrea@suse.de> writes: > On Sat, May 17, 2003 at 01:22:23PM +0200, David Kastrup wrote: > > > > I have a problem with Emacs which crawls when used as a shell. If I > > call M-x shell RET and then do something like > > hexdump -v /dev/null|dd count=100k bs=1 > > this certainly can't be it > > andrea@dualathlon:~> hexdump -v /dev/null|dd count=100k bs=1 > 0+0 records in > 0+0 records out Argl. Substitute /dev/zero in the obvious place. My stupidity. Here is a test file for generating further stats from within Emacs. You can call it interactively from within Emacs with M-x load-library RET testio.el RET M-x make-test RET and you can use it from the command line (if you really must) with emacs -batch -q -no-site-file -l testio.el -eval "(progn(make-test t 'kill-emacs)(sit-for 100000))" [-- Attachment #2: testio.el --] [-- Type: application/emacs-lisp, Size: 1557 bytes --] [-- Attachment #3: Type: text/plain, Size: 901 bytes --] It will then output the read stuff from the process followed by statistics of block sizes over each second it ran. The interactive session tends to be somewhat worse, but the command line version is bad enough for a demonstration. It is obvious that during most of the time, the pipe only receives single characters. Again, I am pretty sure that Emacs is at fault too, but I don't understand the implications of what scheduling behavior causes the pipes to remain mostly empty, with an occasional full filling. It would be better if Emacs would not be context-switched into the moment something appears in the pipe, but if the writer to the pipe would keep the opportunity to fill'er'up until it is either preempted or needs to wait. If there was some possibility to force this behavior from within Emacs, this would be good to know. Thanks, -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-17 19:36 ` David Kastrup @ 2003-05-17 20:30 ` Andrea Arcangeli 2003-05-17 20:44 ` David Kastrup 0 siblings, 1 reply; 20+ messages in thread From: Andrea Arcangeli @ 2003-05-17 20:30 UTC (permalink / raw) To: dak; +Cc: linux-kernel On Sat, May 17, 2003 at 09:36:29PM +0200, David Kastrup wrote: > Andrea Arcangeli <andrea@suse.de> writes: > > > On Sat, May 17, 2003 at 01:22:23PM +0200, David Kastrup wrote: > > > > > > I have a problem with Emacs which crawls when used as a shell. If I > > > call M-x shell RET and then do something like > > > hexdump -v /dev/null|dd count=100k bs=1 > > > > this certainly can't be it > > > > andrea@dualathlon:~> hexdump -v /dev/null|dd count=100k bs=1 > > 0+0 records in > > 0+0 records out > > Argl. Substitute /dev/zero in the obvious place. My stupidity. > > Here is a test file for generating further stats from within Emacs. > You can call it interactively from within Emacs with > M-x load-library RET testio.el RET > M-x make-test RET > > and you can use it from the command line (if you really must) with > emacs -batch -q -no-site-file -l testio.el -eval "(progn(make-test t > 'kill-emacs)(sit-for 100000))" > > > > It will then output the read stuff from the process followed by > statistics of block sizes over each second it ran. > > The interactive session tends to be somewhat worse, but the command > line version is bad enough for a demonstration. this is within emacs: 1 blocks with size 95 1 blocks with size 146 1 blocks with size 175 1 blocks with size 189 1 blocks with size 199 2 blocks with size 231 2 blocks with size 232 1 blocks with size 233 1 blocks with size 238 1 blocks with size 241 1 blocks with size 242 1 blocks with size 243 1 blocks with size 244 1 blocks with size 245 1 blocks with size 247 1 blocks with size 252 2 blocks with size 257 1 blocks with size 260 2 blocks with size 261 2 blocks with size 262 1 blocks with size 263 1 blocks with size 266 1 blocks with size 271 1 blocks with size 276 1 blocks with size 282 1 blocks with size 283 1 blocks with size 289 1 blocks with size 296 1 blocks with size 297 1 blocks with size 299 1 blocks with size 300 2 blocks with size 302 1 blocks with size 304 1 blocks with size 306 2 blocks with size 307 1 blocks with size 308 1 blocks with size 310 1 blocks with size 311 3 blocks with size 312 1 blocks with size 313 2 blocks with size 315 1 blocks with size 316 1 blocks with size 317 1 blocks with size 322 1 blocks with size 324 1 blocks with size 330 1 blocks with size 345 1 blocks with size 347 2 blocks with size 355 2 blocks with size 359 2 blocks with size 363 1 blocks with size 368 1 blocks with size 369 2 blocks with size 373 1 blocks with size 374 2 blocks with size 375 1 blocks with size 383 1 blocks with size 384 2 blocks with size 385 1 blocks with size 386 2 blocks with size 387 1 blocks with size 389 1 blocks with size 391 1 blocks with size 392 2 blocks with size 393 3 blocks with size 394 1 blocks with size 397 2 blocks with size 398 4 blocks with size 400 1 blocks with size 402 1 blocks with size 405 1 blocks with size 409 1 blocks with size 414 1 blocks with size 418 1 blocks with size 420 4 blocks with size 423 1 blocks with size 424 1 blocks with size 428 1 blocks with size 429 1 blocks with size 430 1 blocks with size 432 1 blocks with size 433 1 blocks with size 436 1 blocks with size 437 1 blocks with size 443 2 blocks with size 445 1 blocks with size 448 2 blocks with size 451 1 blocks with size 453 1 blocks with size 455 1 blocks with size 472 2 blocks with size 485 1 blocks with size 490 1 blocks with size 498 1 blocks with size 504 1 blocks with size 505 1 blocks with size 510 1 blocks with size 512 2 blocks with size 533 2 blocks with size 537 1 blocks with size 540 1 blocks with size 545 1 blocks with size 551 1 blocks with size 557 2 blocks with size 562 1 blocks with size 571 1 blocks with size 572 1 blocks with size 573 1 blocks with size 578 1 blocks with size 579 1 blocks with size 582 1 blocks with size 588 2 blocks with size 596 1 blocks with size 598 1 blocks with size 599 1 blocks with size 600 1 blocks with size 602 2 blocks with size 610 1 blocks with size 616 1 blocks with size 620 2 blocks with size 623 1 blocks with size 628 1 blocks with size 629 1 blocks with size 630 2 blocks with size 638 1 blocks with size 639 2 blocks with size 640 1 blocks with size 641 1 blocks with size 642 1 blocks with size 644 1 blocks with size 648 1 blocks with size 650 1 blocks with size 651 1 blocks with size 652 1 blocks with size 654 1 blocks with size 656 1 blocks with size 657 1 blocks with size 659 1 blocks with size 662 1 blocks with size 665 1 blocks with size 667 2 blocks with size 670 1 blocks with size 672 1 blocks with size 679 1 blocks with size 687 1 blocks with size 720 1 blocks with size 735 1 blocks with size 746 1 blocks with size 754 1 blocks with size 775 1 blocks with size 834 1 blocks with size 844 1 blocks with size 894 1 blocks with size 896 1 blocks with size 906 1 blocks with size 928 1 blocks with size 980 2 blocks with size 987 1 blocks with size 991 1 blocks with size 998 1 blocks with size 1023 8 blocks with size 1024 from the shell prompt: 6 blocks with size 26 5 blocks with size 27 14 blocks with size 28 10 blocks with size 29 4 blocks with size 30 2 blocks with size 31 2 blocks with size 32 2 blocks with size 33 4 blocks with size 34 1 blocks with size 35 1 blocks with size 36 2 blocks with size 38 2 blocks with size 39 1 blocks with size 41 1 blocks with size 45 1 blocks with size 46 3 blocks with size 48 1 blocks with size 49 1 blocks with size 52 3 blocks with size 55 1 blocks with size 57 1 blocks with size 58 2 blocks with size 60 1 blocks with size 61 1 blocks with size 62 1 blocks with size 63 1 blocks with size 65 1 blocks with size 71 2 blocks with size 75 1 blocks with size 79 1 blocks with size 86 1 blocks with size 120 1 blocks with size 128 2 blocks with size 147 1 blocks with size 190 1 blocks with size 233 1 blocks with size 241 1 blocks with size 250 1 blocks with size 269 1 blocks with size 273 1 blocks with size 407 1 blocks with size 564 1 blocks with size 614 2 blocks with size 626 1 blocks with size 667 1 blocks with size 680 1 blocks with size 709 1 blocks with size 711 1 blocks with size 712 1 blocks with size 719 1 blocks with size 722 1 blocks with size 729 1 blocks with size 732 2 blocks with size 737 1 blocks with size 759 1 blocks with size 772 1 blocks with size 1014 5 blocks with size 1023 31 blocks with size 1024 my emacs is: GNU Emacs 21.2.1 Copyright (C) 2001 Free Software Foundation, Inc. GNU Emacs comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of Emacs under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING. > > It is obvious that during most of the time, the pipe only receives > single characters. > > Again, I am pretty sure that Emacs is at fault too, but I don't > understand the implications of what scheduling behavior causes the > pipes to remain mostly empty, with an occasional full filling. It > would be better if Emacs would not be context-switched into the > moment something appears in the pipe, but if the writer to the pipe > would keep the opportunity to fill'er'up until it is either preempted > or needs to wait. If there was some possibility to force this > behavior from within Emacs, this would be good to know. I don't see any slowdown here. As said the 100k bs will lead to just 1 syscall for lots of I/O, this made me think the lowlatency fixes can be related. So I would suggest you to try to reproduce with my 2.4 tree or with 2.5 + -preempt enabled. Just for the record, this is the patch I'm talking about (you can apply it easily to other 2.4, this should really be merged into mainline too): http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/kernels/v2.4/2.4.21rc2aa1/9998_lowlatency-fixes-12 Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-17 20:30 ` Andrea Arcangeli @ 2003-05-17 20:44 ` David Kastrup 2003-05-17 21:53 ` Andrea Arcangeli 2003-05-17 21:54 ` Barry K. Nathan 0 siblings, 2 replies; 20+ messages in thread From: David Kastrup @ 2003-05-17 20:44 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel Andrea Arcangeli <andrea@suse.de> writes: > > The interactive session tends to be somewhat worse, but the command > > line version is bad enough for a demonstration. > > this is within emacs: > > 1 blocks with size 95 > 1 blocks with size 146 > 1 blocks with size 175 [...] > 1 blocks with size 980 > 2 blocks with size 987 > 1 blocks with size 991 > 1 blocks with size 998 > 1 blocks with size 1023 > 8 blocks with size 1024 > > from the shell prompt: > [...] Wel, those are excellent results and as it should be. > my emacs is: > > GNU Emacs 21.2.1 Same here. I am talking about the kernel coming with RedHat 9, uname -a gives Linux lola.goethe.zz 2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux Unfortunately, this kernel is here to stay for quite a while, and I would want to find a way to let Emacs cooperate with it better without telling people to recompile the kernel or wait a year. > > Again, I am pretty sure that Emacs is at fault too, but I don't > > understand the implications of what scheduling behavior causes the > > pipes to remain mostly empty, with an occasional full filling. It > > would be better if Emacs would not be context-switched into the > > moment something appears in the pipe, but if the writer to the > > pipe would keep the opportunity to fill'er'up until it is either > > preempted or needs to wait. If there was some possibility to > > force this behavior from within Emacs, this would be good to know. > > I don't see any slowdown here. Which kernel? Oh, and of course: on a SMP machine the problem would not be visible since then the context switch to Emacs does not stop the other process from stuffing into the pipe. > http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/kernels/v2.4/2.4.21rc2aa1/9998_lowlatency-fixes-12 Taking notice, but will need some time to apply. And I really would want to find a way to avoid triggering this problem. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-17 20:44 ` David Kastrup @ 2003-05-17 21:53 ` Andrea Arcangeli 2003-05-17 22:37 ` David Kastrup 2003-05-17 21:54 ` Barry K. Nathan 1 sibling, 1 reply; 20+ messages in thread From: Andrea Arcangeli @ 2003-05-17 21:53 UTC (permalink / raw) To: dak; +Cc: linux-kernel On Sat, May 17, 2003 at 10:44:16PM +0200, David Kastrup wrote: > Which kernel? Oh, and of course: on a SMP machine the problem would I run it on smp. but I understood what's going on. --- testio.el.orig 2003-05-17 22:22:24.000000000 +0200 +++ testio.el 2003-05-17 23:39:49.000000000 +0200 @@ -19,7 +19,7 @@ (setq test-pattern nil test-start (float-time)) (let ((process (start-process "test" (and (bufferp printer) printer) "sh" - "-c" "od -v /dev/zero|dd bs=1 count=100k"))) + "-c" "od -v /dev/zero| dd bs=100k count=1"))) (set-process-filter process `(lambda (process string) (test-filter process string ',printer))) this is what I thought you were really doing with this script, not the other way around ;). The reason you get the 1 char reads in your version is because dd will only write 1 char at time to emacs. This has nothing to do with emacs, this is about dd writing 1 char at time. If you don't write 1 char at time, emacs won't read 1 char at time. The only thing you could do to decrease the cxt switch flood is to put a buffer with timeout in between, but by default having a buffer with timeout in the kernel would be an overkill overhead and depending on the timeout it could be visible for normal shells. So the kernel behaviour is right and it isn't going to change. If one important app of yours is dumb and writes flood of data 1 char at time then just write a buffer in userspace yourself, and in 99% of cases a dd bs=4k will just work perfectly (you'll need a timeout-stdout-flush version one only if the pipe keeps staying open i.e. if the app doesn't quit). Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-17 21:53 ` Andrea Arcangeli @ 2003-05-17 22:37 ` David Kastrup 2003-05-17 23:50 ` Andrea Arcangeli 0 siblings, 1 reply; 20+ messages in thread From: David Kastrup @ 2003-05-17 22:37 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel Andrea Arcangeli <andrea@suse.de> writes: > On Sat, May 17, 2003 at 10:44:16PM +0200, David Kastrup wrote: > > Which kernel? Oh, and of course: on a SMP machine the problem would > > I run it on smp. Then the results are worthless. The problem is that the write to the pipe causes an immediate context switch, keeping the pipe from gathering useful amounts of information. > but I understood what's going on. > > --- testio.el.orig 2003-05-17 22:22:24.000000000 +0200 > +++ testio.el 2003-05-17 23:39:49.000000000 +0200 > @@ -19,7 +19,7 @@ > (setq test-pattern nil test-start (float-time)) > (let ((process (start-process > "test" (and (bufferp printer) printer) "sh" > - "-c" "od -v /dev/zero|dd bs=1 count=100k"))) > + "-c" "od -v /dev/zero| dd bs=100k count=1"))) > (set-process-filter process `(lambda (process string) > (test-filter process string > ',printer))) That utterly and entirely defeats the purpose of my demonstration. Emacs crawls as a terminal/shell/whatever exactly because programs writing to a pty do not write in chunks of 100k. > The reason you get the 1 char reads in your version is because dd will > only write 1 char at time to emacs. Of course it does. I told it to do so. But there is no necessity to do an immediate context switch: it would be completely sufficient if Emacs (which is waiting on select) were put in the run queue and scheduled when the time slice of dd was up. Performance gets better at some time only because Emacs spends some time with other business (like garbage collection) and gets preempted outside of the select system call, and only _then_ can dd fill the pipe in a whiff. > This has nothing to do with emacs, this is about dd writing 1 char > at time. If you don't write 1 char at time, emacs won't read 1 char > at time. But if I am doing process communication with other processes, the I/O _will_ arrive in small portions, and when the generating processes are running on the same CPU instead of being I/O bound, I don't stand a chance of working efficiently, namely utilizing the pipes, if I do a quasi-synchronous context switch by force even if the time slice has been hardly touched. Pipes that never fill up cause overhead. I think it amusing that a pipeline on a single processor system will yield _vastly_ increased performance the moment the receiving end works as _slowly_ as to get preempted. Only then will the sender be able to fill the pipe, causing a large performance gain. > The only thing you could do to decrease the cxt switch flood is to put a > buffer with timeout in between, but by default having a buffer with > timeout in the kernel would be an overkill overhead and depending on the > timeout it could be visible for normal shells. So the kernel behaviour > is right and it isn't going to change. Visible for normal shells? Hardly. We are talking about letting the output generating process live out its time slice (10ms) or being scheduled because of waiting for something else. Do you think you will negatively notice that the first letter of the prompt appears 10ms later together with the complete rest of the prompt, instead of the letters appearing one by one, but the first immediately? > If one important app of yours is dumb and writes flood of data 1 > char at time All interactive apps, which includes almost anything run in a pty, write data in small chunks. If you context switch them away from the pipe the moment they have placed a small chunk in, performance will suffer. Wasn't the idea behind the anticipative I/O scheduler something like that? I remember vaguely. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-17 22:37 ` David Kastrup @ 2003-05-17 23:50 ` Andrea Arcangeli 2003-05-18 0:16 ` David Schwartz 2003-05-18 0:24 ` David Kastrup 0 siblings, 2 replies; 20+ messages in thread From: Andrea Arcangeli @ 2003-05-17 23:50 UTC (permalink / raw) To: dak; +Cc: linux-kernel On Sun, May 18, 2003 at 12:37:01AM +0200, David Kastrup wrote: > Of course it does. I told it to do so. But there is no necessity to > do an immediate context switch: it would be completely sufficient if > Emacs (which is waiting on select) were put in the run queue and > scheduled when the time slice of dd was up. Performance gets better the switch happens because emacs has higher dynamic priority, as it was sleeping for the longest time. without these special cases for the interactive tasks we couldn't use these long timeslices without making the system not responsive. > (like garbage collection) and gets preempted outside of the select > system call, and only _then_ can dd fill the pipe in a whiff. > > > This has nothing to do with emacs, this is about dd writing 1 char > > at time. If you don't write 1 char at time, emacs won't read 1 char > > at time. > > But if I am doing process communication with other processes, the I/O > _will_ arrive in small portions, and when the generating processes are > running on the same CPU instead of being I/O bound, I don't stand a > chance of working efficiently, namely utilizing the pipes, if I do a writing 1 byte per syscall isn't very efficient in the first place (no matter if the cxt switch happens or not). I see what you mean, but I still don't think it is a problem. If bandwidth matters you will have to use large writes and reads anyways, if bandwidth doesn't matter the number of ctx switches doesn't matter either and latency usually is way more important with small messages. you're applying small messages to a "throughput" test, this is why you have a problem. If you really did interprocess communication (and not a throughput benchmark) you would probably want the smallest delay in the delivery of the signal/message. Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Scheduling problem with 2.4? 2003-05-17 23:50 ` Andrea Arcangeli @ 2003-05-18 0:16 ` David Schwartz 2003-05-18 1:06 ` Andrea Arcangeli 2003-05-18 8:55 ` Mike Galbraith 2003-05-18 0:24 ` David Kastrup 1 sibling, 2 replies; 20+ messages in thread From: David Schwartz @ 2003-05-18 0:16 UTC (permalink / raw) To: Andrea Arcangeli, dak; +Cc: linux-kernel > I see what you mean, but I still don't think it is a problem. If > bandwidth matters you will have to use large writes and reads anyways, > if bandwidth doesn't matter the number of ctx switches doesn't matter > either and latency usually is way more important with small messages. > Andrea This is the danger of pre-emption based upon dynamic priorities. You can get cases where two processes each are permitted to make a very small amount of progress in alternation. This can happen just as well with large writes as small ones, the amount of data is irrelevent, it's the amount of CPU time that's important, or to put it another way, it's how far a process can get without suffering a context switch. I suggest that a process be permitted to use up at least some portion of its timeslice exempt from any pre-emption based solely on dynamic priorities. DS ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-18 0:16 ` David Schwartz @ 2003-05-18 1:06 ` Andrea Arcangeli 2003-05-18 9:41 ` David Kastrup 2003-05-18 8:55 ` Mike Galbraith 1 sibling, 1 reply; 20+ messages in thread From: Andrea Arcangeli @ 2003-05-18 1:06 UTC (permalink / raw) To: David Schwartz; +Cc: dak, linux-kernel On Sat, May 17, 2003 at 05:16:33PM -0700, David Schwartz wrote: > > > I see what you mean, but I still don't think it is a problem. If > > bandwidth matters you will have to use large writes and reads anyways, > > if bandwidth doesn't matter the number of ctx switches doesn't matter > > either and latency usually is way more important with small messages. > > > Andrea > > This is the danger of pre-emption based upon dynamic priorities. You can > get cases where two processes each are permitted to make a very small amount > of progress in alternation. This can happen just as well with large writes > as small ones, the amount of data is irrelevent, it's the amount of CPU time > that's important, or to put it another way, it's how far a process can get > without suffering a context switch. > > I suggest that a process be permitted to use up at least some portion of > its timeslice exempt from any pre-emption based solely on dynamic > priorities. that's the issue yes. but then when a multithreaded app sends a signal to another process it can take up to this "x" timeslice portion before the signal will run (I mean in UP). Same goes for mouse clicks etc.. 1msec for mouse clicks should not be noticeable though. And over all I don't see a real big issue in the current code. To try it probably the simpler way to add a need_resched_timeout along to need_resched, and to honour the need_resched only when the timeout triggers, immediate need_resched will set the timeout = jiffies so it'll trigger immediatly, the timer interrupt will check it. The reschedule_idle on a non idle cpu will be the only one to set the timeout. Effectively I'm curious to see what will happen. Not all archs would need to check against it (the entry.S code is the main reader of need_resched), it'll be an hint only and idle() for sure must not check it at all. this would guarantee minimal timeslice reserved up to 1/HZ by setting the timeout to jiffies + 2 (jiffies + 1 would return a mean of 1/HZ/2 but the worst case would be ~0, in practice even + 1 would be enough) Does anybody think this could have a value? If yes I can make a quick hack to see what happens. Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-18 1:06 ` Andrea Arcangeli @ 2003-05-18 9:41 ` David Kastrup 0 siblings, 0 replies; 20+ messages in thread From: David Kastrup @ 2003-05-18 9:41 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: David Schwartz, linux-kernel, monnier+gnu/emacs Andrea Arcangeli <andrea@suse.de> writes: > On Sat, May 17, 2003 at 05:16:33PM -0700, David Schwartz wrote: > > > > This is the danger of pre-emption based upon dynamic > > priorities. You can get cases where two processes each are > > permitted to make a very small amount of progress in > > alternation. This can happen just as well with large writes as > > small ones, the amount of data is irrelevent, it's the amount of > > CPU time that's important, or to put it another way, it's how far > > a process can get without suffering a context switch. > > > > I suggest that a process be permitted to use up at least some > > portion of its timeslice exempt from any pre-emption based solely > > on dynamic priorities. > > that's the issue yes. but then when a multithreaded app sends a > signal to another process it can take up to this "x" timeslice > portion before the signal will run (I mean in UP). Same goes for > mouse clicks etc.. 1msec for mouse clicks should not be noticeable > though. And over all I don't see a real big issue in the current > code. Well, the problem that we have is excessive synchronous context switching, so the solution might be to simply throttle on that. It's no problem the first few times round, but after some time the scheduler should recognize the pattern and clamp down. So I'd propose that we give a process that yields synchronously while it could be ready to run an accumulating priority boost that gets wasted pretty fast when the process does a full wait (aka the pipe is full) or is preempted. That is, have the scheduler penalize/throttle application-visible context switches between runnable processes. On particular if the processes are dependent on one another, like in the case of the pipe. We can proud ourselves to optimize the kernel to make a context switch in 1us, but if each context switch occurs 2ms of avoidable overhead in our application, the bottom line to the user will remain ugly. > To try it probably the simpler way to add a need_resched_timeout > along to need_resched, and to honour the need_resched only when the > timeout triggers, immediate need_resched will set the timeout = jiffies > so it'll trigger immediatly, the timer interrupt will check it. The > reschedule_idle on a non idle cpu will be the only one to set the > timeout. Effectively I'm curious to see what will happen. Not all archs > would need to check against it (the entry.S code is the main reader of > need_resched), it'll be an hint only and idle() for sure must not check > it at all. this would guarantee minimal timeslice reserved up to 1/HZ by > setting the timeout to jiffies + 2 (jiffies + 1 would return a mean of > 1/HZ/2 but the worst case would be ~0, in practice even + 1 would be > enough) Does anybody think this could have a value? If yes I can make a > quick hack to see what happens. We probably need not immediate action. A good test case is an xterm, I guess, and I noticed that od has an option for limiting its length. So I propose that comparing the output of time od -vN 100k /dev/zero to that of time od -vN 100k /dev/zero|dd obs=16k in an xterm should provide some idea about _typical_ overhead occured in an _application_ due to excessive context switching. If you want to get really nasty, you can compare to time od -vN 100k /dev/zero|dd obs=1 Note that in all of these cases, it is by far the xterm that is wasting the lion's share of the processing time (and so you need to take a look at the _real_ time expended, since xterm will not be measured in the time command), the kernel people and the generating processes will wash their hands off any guilt: "_we_ do our work efficiently and give xterm all the time of the world to be able to empty the pipe, and let it take all the time it needs to without pushing it". Only that xterm could work much more efficiently if one pushed it by piling things into the pipe. Another silly exercise is the following: time od -v /dev/zero|dd obs=1 in an xterm. Keep it running. Now in a different term, enter while true;do :;done The above xterm will freeze while the interactive shell still has its bonus, and then will start working again, at a much higher speed than when it has an idle CPU at its disposal. Kill off the busy loop, and the xterm gets slow again. Come to think of it, I have witnessed the phenomenon of "slow start xterms" that get ragingly more efficient after several screenfulls when they _can't_ keep up on a busy system for years on a large variety of Unices with single CPU. It is some of those quirks one does not think about too much. So I repeat: I should think a fast accumulating scheduling malus for a synchronous context switch to a process woken by an event from a still runnable process should be appropriate. If anybody can think of a scheme that magically converges to a good tradeoff between average fill level of a pipe and estimated processing overhead by the application at the receiving end, of course this would be appreciated. The aim would be that running an additional while true;do :;done should not usually be able to speed up the _real_ time performance of an unrelated task. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Scheduling problem with 2.4? 2003-05-18 0:16 ` David Schwartz 2003-05-18 1:06 ` Andrea Arcangeli @ 2003-05-18 8:55 ` Mike Galbraith 2003-05-18 17:46 ` David Schwartz ` (2 more replies) 1 sibling, 3 replies; 20+ messages in thread From: Mike Galbraith @ 2003-05-18 8:55 UTC (permalink / raw) To: David Schwartz; +Cc: Andrea Arcangeli, dak, linux-kernel At 05:16 PM 5/17/2003 -0700, David Schwartz wrote: > > I see what you mean, but I still don't think it is a problem. If > > bandwidth matters you will have to use large writes and reads anyways, > > if bandwidth doesn't matter the number of ctx switches doesn't matter > > either and latency usually is way more important with small messages. > > > Andrea > > This is the danger of pre-emption based upon dynamic priorities. > You can >get cases where two processes each are permitted to make a very small amount >of progress in alternation. This can happen just as well with large writes >as small ones, the amount of data is irrelevent, it's the amount of CPU time >that's important, or to put it another way, it's how far a process can get >without suffering a context switch. > > I suggest that a process be permitted to use up at least some > portion of >its timeslice exempt from any pre-emption based solely on dynamic >priorities. Cool. Thank you for the spiffy idea. I implemented this in my (hack/chop) mm5 tree in about 30 seconds, and it works just fine. Very simple time_after(jiffies, p->last_run + MIN_TIMESLICE) checks in wake_up_cpu() plus the obvious dinky change to schedule(), and it worked instantly. Hmm.. I wonder if this might help some of the other scheduler (seemingly) bad behavior as well. Is there any down-side to not preempting quite as often? It seems like there should be a bandwidth gain. -Mike ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Scheduling problem with 2.4? 2003-05-18 8:55 ` Mike Galbraith @ 2003-05-18 17:46 ` David Schwartz 2003-05-18 23:18 ` Andrea Arcangeli 2003-05-18 23:11 ` Andrea Arcangeli 2003-05-19 4:02 ` Mike Galbraith 2 siblings, 1 reply; 20+ messages in thread From: David Schwartz @ 2003-05-18 17:46 UTC (permalink / raw) To: Mike Galbraith; +Cc: Andrea Arcangeli, linux-kernel > Is there any down-side to not preempting quite as often? It seems like > there should be a bandwidth gain. > > -Mike The theoretical down-side is that interactivity might suffer a bit because a process isn't scheduled quite as quickly. Yes, the less-often you preempt a process, the faster the system will go in the sense of work done per unit time. But those pesky users want their characters to echo quickly and the mouse pointer to track their physical motions. Obviously, we must preempt when a process with a higher static priority becomes ready to run. However, preempting based on dynamic priorities has permitted time slices to be even longer, permitting a reduction in context switches without sacrificing interactivity. I still believe, however, that a process should be 'guaranteed' some slice of time every time it's scheduled unless circumstances make it impossible to allow the process to continue running. IMO, the pendulum has swung too far in favor of interactivity. Obviously, if the process faults, blocks, or a process with higher static priority becomes ready to run, then we must terminate the process' time slice early. DS ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-18 17:46 ` David Schwartz @ 2003-05-18 23:18 ` Andrea Arcangeli 2003-05-19 2:59 ` David Schwartz 0 siblings, 1 reply; 20+ messages in thread From: Andrea Arcangeli @ 2003-05-18 23:18 UTC (permalink / raw) To: David Schwartz; +Cc: Mike Galbraith, linux-kernel On Sun, May 18, 2003 at 10:46:24AM -0700, David Schwartz wrote: > > > Is there any down-side to not preempting quite as often? It seems like > > there should be a bandwidth gain. > > > > -Mike > > The theoretical down-side is that interactivity might suffer a bit because > a process isn't scheduled quite as quickly. Yes, the less-often you preempt > a process, the faster the system will go in the sense of work done per unit > time. But those pesky users want their characters to echo quickly and the > mouse pointer to track their physical motions. > > Obviously, we must preempt when a process with a higher static priority the static priority is the same for all tasks in the system unless you use nice. I believe linux should do well without forcing the user to set the GUI at high prio etc.. > becomes ready to run. However, preempting based on dynamic priorities > has > permitted time slices to be even longer, permitting a reduction in context the dyn prio doesn't change the timeslice size, it only chooses if to wakeup a task immediatly or not, timeslices are calculated in function of the static priority, not the dyn prio. Those interactive tasks uses nearly zero of their timeslice anyways, the size of the timeslice has little meaning for interactive tasks, what matters is "when" they run and that's controlled by the dyn prio. > switches without sacrificing interactivity. > > I still believe, however, that a process should be 'guaranteed' some slice > of time every time it's scheduled unless circumstances make it impossible to > allow the process to continue running. IMO, the pendulum has swung too far > in favor of interactivity. Obviously, if the process faults, blocks, or a > process with higher static priority becomes ready to run, then we must if the static priority is much higher the immediate switch already happens. But if we choose to guarantee a min_timeslice to tasks to avoid the ctx switch flood for your testcase, then also the higher static prio tasks will have to wait for this min_timeslice. The issue is no different with higher static prio, or you will complain next time that you get a ctx switch flood from a dd bs=1 writing into a pty connected to an xterm running at -20 prio. > terminate the process' time slice early. > > DS > > Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Scheduling problem with 2.4? 2003-05-18 23:18 ` Andrea Arcangeli @ 2003-05-19 2:59 ` David Schwartz 0 siblings, 0 replies; 20+ messages in thread From: David Schwartz @ 2003-05-19 2:59 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Mike Galbraith, linux-kernel > > Obviously, we must preempt when a process with a higher > > static priority > the static priority is the same for all tasks in the system unless you > use nice. I believe linux should do well without forcing the user to set > the GUI at high prio etc.. Exactly. Pre-emption based on dynamic priorities is an "if it helps" feature and not one required by the semantics. On the other hand, pre-emption based on static priorities is pretty much required and not doing that would certainly violate the principle of least astonishment. > > becomes ready to run. However, preempting based on dynamic priorities > > has > > permitted time slices to be even longer, permitting a reduction > > in context > the dyn prio doesn't change the timeslice size, it only chooses if to > wakeup a task immediatly or not, timeslices are calculated in function > of the static priority, not the dyn prio. Those interactive tasks uses > nearly zero of their timeslice anyways, the size of the timeslice has > little meaning for interactive tasks, what matters is "when" they run > and that's controlled by the dyn prio. You say "the dyn prio doesn't change the timeslice size", but that's not true. If a process with low dynamic priority writes to a FIFO, its timeslice will be cut short, whereas one with a higher dynamic priorit would enjoy a longer timeslice. Yes, priority doesn't affect the *intended* timeslice, but it can affect the effective timeslice. > > I still believe, however, that a process should be > > 'guaranteed' some slice > > of time every time it's scheduled unless circumstances make it > > impossible to > > allow the process to continue running. IMO, the pendulum has > > swung too far > > in favor of interactivity. Obviously, if the process faults, > > blocks, or a > > process with higher static priority becomes ready to run, then we must > if the static priority is much higher the immediate switch already > happens. As it certainly should. The maximum amount of time we could justify delaying a task with a higher static priority in the name of better 'bulk data' performance is so small that it's probably not feasible to do it at all. > But if we choose to guarantee a min_timeslice to tasks to avoid > the ctx switch flood for your testcase, then also the higher static prio > tasks will have to wait for this min_timeslice. No. I don't think that's appropriate. Higher statis priority is an explicit decision made by users for the specific reason of getting the faster response. Under no circumstances should an NTP process that received a UDP packet be prevented from getting that packet as quickly as possible. > The issue is no > different with higher static prio, or you will complain next time that > you get a ctx switch flood from a dd bs=1 writing into a pty connected > to an xterm running at -20 prio. With static priorities, you can always tell people "Don't do that then, you got what you asked for. The system doesn't know what you 'really want'." With dynamic priorities, however, this is not the case. I'm not sure what the best solution is. But I think it's along the lines of deferring a deschedule based purely on dynamic priorities if the task has used only a small fraction of its timeslice. Perhaps, for example, a normal timeslice should be four jiffies. If a reschule would normally occur during the first jiffy, and it's based purely on dynamic priorities, we set a flag which we check when the first jiffy ends. Now maybe in some cases where the static priorities only differ slightly, we should defer even based on static priorities. Perhaps immediate pre-emption during the beginning of a timeslice should only occur when the winning process has an elevated priority (and not just because the losing process has a reduced one). I don't really know. But I think what we're doing now is fundamentally wrong because processes should be allowed to use up their timeslices when possible. DS ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-18 8:55 ` Mike Galbraith 2003-05-18 17:46 ` David Schwartz @ 2003-05-18 23:11 ` Andrea Arcangeli 2003-05-19 7:16 ` Mike Galbraith 2003-05-19 4:02 ` Mike Galbraith 2 siblings, 1 reply; 20+ messages in thread From: Andrea Arcangeli @ 2003-05-18 23:11 UTC (permalink / raw) To: Mike Galbraith; +Cc: David Schwartz, dak, linux-kernel On Sun, May 18, 2003 at 10:55:17AM +0200, Mike Galbraith wrote: > At 05:16 PM 5/17/2003 -0700, David Schwartz wrote: > > >> I see what you mean, but I still don't think it is a problem. If > >> bandwidth matters you will have to use large writes and reads anyways, > >> if bandwidth doesn't matter the number of ctx switches doesn't matter > >> either and latency usually is way more important with small messages. > > > >> Andrea > > > > This is the danger of pre-emption based upon dynamic priorities. > >You can > >get cases where two processes each are permitted to make a very small > >amount > >of progress in alternation. This can happen just as well with large writes > >as small ones, the amount of data is irrelevent, it's the amount of CPU > >time > >that's important, or to put it another way, it's how far a process can get > >without suffering a context switch. > > > > I suggest that a process be permitted to use up at least some > >portion of > >its timeslice exempt from any pre-emption based solely on dynamic > >priorities. > > Cool. > > Thank you for the spiffy idea. I implemented this in my (hack/chop) mm5 > tree in about 30 seconds, and it works just fine. Very simple > time_after(jiffies, p->last_run + MIN_TIMESLICE) checks in wake_up_cpu() If I understand well, what you did is different (in functionalty) from what I described (and what I described in the last email certainly takes more than 30 seconds no matter how smart you implement it ;). I mean, you lose the whole "wakeup" information, yeah that will fix it too like deleting the need_resched =1 after the check on the curr->prio enterely, but while it's so simple you you don't only guarantee the miniumum timeslice, but you let the current task running even after it expired the minimum timeslice. That will most certainly hurt interactivity way too much, I don't think it's an option, unless you want to trim significantly the timeslice length too. The only reason we can take these long timeslices are the interactivity hints, we always had those in linux, all versions. If you start to ignore it, things should not do too well, even throughput can decrease in a multithread environment due the slower delivery of events. Delaying a interprocess message for 1msec (or even 10msec) [i.e. 1/HZ] may not be noticeable, but delaying it for a whole timeslice will certainly be noticeable, that's an order of magnitude bigger delay. Actually besides the way I described yesterday (that would require arch asm changes) to experiment the "miniumum timeslice guarantee", it could also be implemented by moving the timeslice-min_timeslice in a rest_timeslice field if jiffies - last_run < MIN_TIMESLICE and if rest_timeslice is zero, and trimming the timeslice down to min_timeslice. Then the next schedule would put rest_timeslice back in timeslice. This is on the lines of what you implemented but it also guaranteees that the higher dyn-prio task will be scheduled after this min_timeslice (to still provide the ~same interactive behaviour, which is a must IMHO ;) This will be a bit more difficult of the need_resched secondary field, but it's probably conceptually cleaner, since it restricts the algorithm in the scheduler keeping the asm fast paths small and simple. > plus the obvious dinky change to schedule(), and it worked > instantly. Hmm.. I wonder if this might help some of the other scheduler > (seemingly) bad behavior as well. > > Is there any down-side to not preempting quite as often? It seems like > there should be a bandwidth gain. it's certainly on the right trick for this issue, in the sense there will be the wanted bandwidth gain in the 1 byte write case to a pty ;) (like dropping the need_resched = 1, that even goes down to 5 sec and a 1 liner patch), however I don't think the 30 sec modification alone is just a generic solution. delaying the reschedule to min_timeslice may be accpetable instead. Andrea ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-18 23:11 ` Andrea Arcangeli @ 2003-05-19 7:16 ` Mike Galbraith 0 siblings, 0 replies; 20+ messages in thread From: Mike Galbraith @ 2003-05-19 7:16 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: David Schwartz, dak, linux-kernel At 01:11 AM 5/19/2003 +0200, Andrea Arcangeli wrote: >On Sun, May 18, 2003 at 10:55:17AM +0200, Mike Galbraith wrote: > > At 05:16 PM 5/17/2003 -0700, David Schwartz wrote: > > > > >> I see what you mean, but I still don't think it is a problem. If > > >> bandwidth matters you will have to use large writes and reads anyways, > > >> if bandwidth doesn't matter the number of ctx switches doesn't matter > > >> either and latency usually is way more important with small messages. > > > > > >> Andrea > > > > > > This is the danger of pre-emption based upon dynamic priorities. > > >You can > > >get cases where two processes each are permitted to make a very small > > >amount > > >of progress in alternation. This can happen just as well with large writes > > >as small ones, the amount of data is irrelevent, it's the amount of CPU > > >time > > >that's important, or to put it another way, it's how far a process can get > > >without suffering a context switch. > > > > > > I suggest that a process be permitted to use up at least some > > >portion of > > >its timeslice exempt from any pre-emption based solely on dynamic > > >priorities. > > > > Cool. > > > > Thank you for the spiffy idea. I implemented this in my (hack/chop) mm5 > > tree in about 30 seconds, and it works just fine. Very simple > > time_after(jiffies, p->last_run + MIN_TIMESLICE) checks in wake_up_cpu() > >If I understand well, what you did is different (in functionalty) from >what I described (and what I described in the last email certainly takes >more than 30 seconds no matter how smart you implement it ;). I mean, The p->last_run is a typo, it's effectively current->last_run, with last_run also being updated upon task switch so you can see how long he's had the cpu. That can be done much quicker than 30 seconds by someone who can type with fingers instead of thumbs ;-) >you lose the whole "wakeup" information, yeah that will fix >it too like deleting the need_resched =1 after the check on the >curr->prio enterely, but while it's so simple you you don't only >guarantee the miniumum timeslice, but you let the current task running >even after it expired the minimum timeslice. That will most certainly >hurt interactivity way too much, I don't think it's an option, unless >you want to trim significantly the timeslice length too. The only reason >we can take these long timeslices are the interactivity hints, we always >had those in linux, all versions. If you start to ignore it, things >should not do too well, even throughput can decrease in a multithread >environment due the slower delivery of events. Throughput did decrease. I was thinking that there was likely to be another event before my ignoring the preempt could matter much. As mentioned privately, I guaranteed that no task could hold the cpu for more than 50ms max, so on the next schedule he'd get the cpu (if prio high enough) but the test results weren't encouraging. >Delaying a interprocess message for 1msec (or even 10msec) [i.e. 1/HZ] >may not be noticeable, but delaying it for a whole timeslice will >certainly be noticeable, that's an order of magnitude bigger delay. > >Actually besides the way I described yesterday (that would require arch >asm changes) to experiment the "miniumum timeslice guarantee", it could >also be implemented by moving the timeslice-min_timeslice in a >rest_timeslice field if jiffies - last_run < MIN_TIMESLICE and if >rest_timeslice is zero, and trimming the timeslice down to >min_timeslice. Then the next schedule would put rest_timeslice back in >timeslice. This is on the lines of what you implemented but it also >guaranteees that the higher dyn-prio task will be scheduled after this >min_timeslice (to still provide the ~same interactive behaviour, which >is a must IMHO ;) This will be a bit more difficult of the need_resched >secondary field, but it's probably conceptually cleaner, since it >restricts the algorithm in the scheduler keeping the asm fast paths >small and simple. I'd really like to try a deferred preempt. I don't know enough (yet anyway;) to not create spectacular explosions with that idea though. I'll go ponder your idea instead. -Mike ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Scheduling problem with 2.4? 2003-05-18 8:55 ` Mike Galbraith 2003-05-18 17:46 ` David Schwartz 2003-05-18 23:11 ` Andrea Arcangeli @ 2003-05-19 4:02 ` Mike Galbraith 2 siblings, 0 replies; 20+ messages in thread From: Mike Galbraith @ 2003-05-19 4:02 UTC (permalink / raw) To: linux-kernel; +Cc: David Schwartz, Andrea Arcangeli, dak At 10:55 AM 5/18/2003 +0200, Mike Galbraith wrote: >At 05:16 PM 5/17/2003 -0700, David Schwartz wrote: > >> I suggest that a process be permitted to use up at least some >> portion of >>its timeslice exempt from any pre-emption based solely on dynamic >>priorities. > > >Is there any down-side to not preempting quite as often? It seems like >there should be a bandwidth gain. (my reply was supposed to be off-list, but..) The answer appears to be yes, there are down-sides. Instead of the expected throughput gain, I got a loss :-/ which makes sense from the cache side I suppose. While it was instant gratification for the pipe testcase, as a general solution it leaves something to be desired... at least when implemented in it's simplest form Oh well. (hmm.. maybe I should go stare at the cache decay stuff some more. later) -Mike ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-17 23:50 ` Andrea Arcangeli 2003-05-18 0:16 ` David Schwartz @ 2003-05-18 0:24 ` David Kastrup 1 sibling, 0 replies; 20+ messages in thread From: David Kastrup @ 2003-05-18 0:24 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel Andrea Arcangeli <andrea@suse.de> writes: > On Sun, May 18, 2003 at 12:37:01AM +0200, David Kastrup wrote: > > Of course it does. I told it to do so. But there is no necessity to > > do an immediate context switch: it would be completely sufficient if > > Emacs (which is waiting on select) were put in the run queue and > > scheduled when the time slice of dd was up. Performance gets better > > the switch happens because emacs has higher dynamic priority, as it > was sleeping for the longest time. without these special cases for > the interactive tasks we couldn't use these long timeslices without > making the system not responsive. Then we will need a smaller timeslice for making this decision. If I have a sending process able to write 1000000 characters per second, it is wasteful if such a process is scheduled away after writing a single line (most processes running inside of a shell will work line-buffered, won't they?). > > But if I am doing process communication with other processes, the I/O > > _will_ arrive in small portions, and when the generating processes are > > running on the same CPU instead of being I/O bound, I don't stand a > > chance of working efficiently, namely utilizing the pipes, if I do a > > writing 1 byte per syscall isn't very efficient in the first place > (no matter if the cxt switch happens or not). Sure, but we are more typically talking about writing a line at a time, which is easily by a factor 50 smaller than the pipe capacity for typical output. > I see what you mean, but I still don't think it is a problem. If > bandwidth matters you will have to use large writes and reads > anyways, if bandwidth doesn't matter the number of ctx switches > doesn't matter either and latency usually is way more important with > small messages. > > you're applying small messages to a "throughput" test, this is why > you have a problem. If you really did interprocess communication > (and not a throughput benchmark) you would probably want the > smallest delay in the delivery of the signal/message. Not when I could have my pipe filled within a fraction of a millisecond _without_ idling (starting up the reading process the moment that the writing process has taken more than its due of time or is in itself waiting is, of course, perfectly sensible). Xterm: time dd if=/dev/zero bs=16k count=16|od -v real 0m8.656s user 0m0.240s sys 0m0.090s time dd if=/dev/zero bs=16k count=16|od -v|dd obs=16k real 0m3.794s user 0m0.240s sys 0m0.060s Do you really think that I would have appreciated the "smaller latency" of few milliseconds at best in the first case? Note that this is exclusively a uni-processor problem: the fast context switch will starve the writing process from CPU time and make sure that even if it could crank out hundreds of writes in millisecond, it will only get a single one of them placed into the pipe, ever, unless the receiving end gets preempted before reaching select, at which time the writer can finally stuff the pipe completely. This all-or-nothing utilization of a pipe or pty is not a sane tradeoff. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Scheduling problem with 2.4? 2003-05-17 20:44 ` David Kastrup 2003-05-17 21:53 ` Andrea Arcangeli @ 2003-05-17 21:54 ` Barry K. Nathan 1 sibling, 0 replies; 20+ messages in thread From: Barry K. Nathan @ 2003-05-17 21:54 UTC (permalink / raw) To: dak; +Cc: Andrea Arcangeli, linux-kernel On Sat, May 17, 2003 at 10:44:16PM +0200, David Kastrup wrote: > I am talking about the kernel coming with RedHat 9, uname -a gives > Linux lola.goethe.zz 2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux > > Unfortunately, this kernel is here to stay for quite a while, and I > would want to find a way to let Emacs cooperate with it better without > telling people to recompile the kernel or wait a year. They should be upgrading to the kernel here, at least, for security reasons: https://rhn.redhat.com/errata/RHSA-2003-172.html I have no idea what effect, if any, it would have on your performance problems though. -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2003-05-19 6:58 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-05-17 11:22 Scheduling problem with 2.4? David Kastrup 2003-05-17 17:41 ` Andrea Arcangeli 2003-05-17 19:36 ` David Kastrup 2003-05-17 20:30 ` Andrea Arcangeli 2003-05-17 20:44 ` David Kastrup 2003-05-17 21:53 ` Andrea Arcangeli 2003-05-17 22:37 ` David Kastrup 2003-05-17 23:50 ` Andrea Arcangeli 2003-05-18 0:16 ` David Schwartz 2003-05-18 1:06 ` Andrea Arcangeli 2003-05-18 9:41 ` David Kastrup 2003-05-18 8:55 ` Mike Galbraith 2003-05-18 17:46 ` David Schwartz 2003-05-18 23:18 ` Andrea Arcangeli 2003-05-19 2:59 ` David Schwartz 2003-05-18 23:11 ` Andrea Arcangeli 2003-05-19 7:16 ` Mike Galbraith 2003-05-19 4:02 ` Mike Galbraith 2003-05-18 0:24 ` David Kastrup 2003-05-17 21:54 ` Barry K. Nathan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox