* Preempt Threshold Measurements
@ 2004-07-12 23:43 Gabriel Devenyi
2004-07-12 23:59 ` Con Kolivas
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Gabriel Devenyi @ 2004-07-12 23:43 UTC (permalink / raw)
To: ck, linux-kernel
Keeping in mind that I'm using the nvidia-kernel drivers, here are my preempt
threshold violations.
6ms non-preemptible critical section violated 4 ms preempt threshold starting
at kernel_fpu_begin+0xd/0x50 and ending at _mmx_memcpy+0x127/0x170
[<c0241987>] _mmx_memcpy+0x127/0x170
[<c01163b0>] dec_preempt_count+0x110/0x120
[<c0241987>] _mmx_memcpy+0x127/0x170
[<c012d3b5>] load_module+0x835/0x900
[<c013c84e>] unmap_vmas+0x10e/0x1f0
[<c012d4fb>] sys_init_module+0x7b/0x230
[<c0103ee1>] sysenter_past_esp+0x52/0x71
nvidia: module license 'NVIDIA' taints kernel.
ACPI: PCI interrupt 0000:01:00.0[A] -> GSI 11 (level, low) -> IRQ 11
NVRM: loading NVIDIA Linux x86 NVIDIA Kernel Module 1.0-6106 Wed Jun 23
08:14:01 PDT 2004
NTFS volume version 3.1.
14ms non-preemptible critical section violated 4 ms preempt threshold starting
at sys_ioctl+0x42/0x290 and ending at sys_ioctl+0xbd/0x290
[<c0158ded>] sys_ioctl+0xbd/0x290
[<c01163b0>] dec_preempt_count+0x110/0x120
[<c0158ded>] sys_ioctl+0xbd/0x290
[<c0103ee1>] sysenter_past_esp+0x52/0x71
eth0: Setting full-duplex based on MII #1 link partner capability of 45e1.
nfs warning: mount version older than kernel
6ms non-preemptible critical section violated 4 ms preempt threshold starting
at pte_alloc_map+0x3d/0xb0 and ending at copy_mm+0x33b/0x3c0
[<c0116ecb>] copy_mm+0x33b/0x3c0
[<c01163b0>] dec_preempt_count+0x110/0x120
[<c0116ecb>] copy_mm+0x33b/0x3c0
[<c01177c2>] copy_process+0x3b2/0xa90
[<c0117edd>] do_fork+0x3d/0x188
[<c012280f>] sigprocmask+0x4f/0xd0
[<c0242000>] copy_to_user+0x40/0x60
[<c0102b15>] sys_clone+0x35/0x40
[<c0103ee1>] sysenter_past_esp+0x52/0x71
hdd: DMA disabled
41ms non-preemptible critical section violated 4 ms preempt threshold starting
at chrdev_open+0xd9/0x210 and ending at schedule+0x237/0x480
[<c03873f7>] schedule+0x237/0x480
[<c01163b0>] dec_preempt_count+0x110/0x120
[<c03873f7>] schedule+0x237/0x480
[<c0126261>] worker_thread+0x1d1/0x240
[<c01262b2>] worker_thread+0x222/0x240
[<c0114d78>] activate_task+0x68/0x80
[<c0383fb0>] do_cache_clean+0x0/0x40
[<c01153e0>] default_wake_function+0x0/0x10
[<c03873f7>] schedule+0x237/0x480
[<c01153e0>] default_wake_function+0x0/0x10
[<c0126090>] worker_thread+0x0/0x240
[<c01298c4>] kthread+0x94/0xa0
[<c0129830>] kthread+0x0/0xa0
[<c010227d>] kernel_thread_helper+0x5/0x18
agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode
agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode
248ms non-preemptible critical section violated 4 ms preempt threshold
starting at schedule+0x57/0x480 and ending at schedule+0x237/0x480
[<c03873f7>] schedule+0x237/0x480
[<c01163b0>] dec_preempt_count+0x110/0x120
[<c03873f7>] schedule+0x237/0x480
[<c0126261>] worker_thread+0x1d1/0x240
[<c01262b2>] worker_thread+0x222/0x240
[<c0114d78>] activate_task+0x68/0x80
[<c02d0740>] fb_flashcursor+0x0/0x80
[<c01153e0>] default_wake_function+0x0/0x10
[<c03873f7>] schedule+0x237/0x480
[<c01153e0>] default_wake_function+0x0/0x10
[<c0126090>] worker_thread+0x0/0x240
[<c01298c4>] kthread+0x94/0xa0
[<c0129830>] kthread+0x0/0xa0
[<c010227d>] kernel_thread_helper+0x5/0x18
228ms non-preemptible critical section violated 4 ms preempt threshold
starting at schedule+0x57/0x480 and ending at schedule+0x237/0x480
[<c03873f7>] schedule+0x237/0x480
[<c01163b0>] dec_preempt_count+0x110/0x120
[<c03873f7>] schedule+0x237/0x480
[<c0126261>] worker_thread+0x1d1/0x240
[<c01262b2>] worker_thread+0x222/0x240
[<c0114d78>] activate_task+0x68/0x80
[<c02d0740>] fb_flashcursor+0x0/0x80
[<c01153e0>] default_wake_function+0x0/0x10
[<c03873f7>] schedule+0x237/0x480
[<c01153e0>] default_wake_function+0x0/0x10
[<c0126090>] worker_thread+0x0/0x240
[<c01298c4>] kthread+0x94/0xa0
[<c0129830>] kthread+0x0/0xa0
[<c010227d>] kernel_thread_helper+0x5/0x18
36ms non-preemptible critical section violated 4 ms preempt threshold starting
at schedule+0x57/0x480 and ending at schedule+0x237/0x480
[<c03873f7>] schedule+0x237/0x480
[<c01163b0>] dec_preempt_count+0x110/0x120
[<c03873f7>] schedule+0x237/0x480
[<c0387b60>] schedule_timeout+0x60/0xc0
[<c01162f0>] dec_preempt_count+0x50/0x120
[<c0120110>] process_timeout+0x0/0x10
[<c01201fe>] sys_nanosleep+0xce/0x150
[<c0103ee1>] sysenter_past_esp+0x52/0x71
41ms non-preemptible critical section violated 4 ms preempt threshold starting
at chrdev_open+0xd9/0x210 and ending at schedule+0x237/0x480
[<c03873f7>] schedule+0x237/0x480
[<c01163b0>] dec_preempt_count+0x110/0x120
[<c03873f7>] schedule+0x237/0x480
[<c0387b60>] schedule_timeout+0x60/0xc0
[<c01162f0>] dec_preempt_count+0x50/0x120
[<c0120110>] process_timeout+0x0/0x10
[<c01201fe>] sys_nanosleep+0xce/0x150
[<c0103ee1>] sysenter_past_esp+0x52/0x71
agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode
agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode
250ms non-preemptible critical section violated 4 ms preempt threshold
starting at schedule+0x57/0x480 and ending at schedule+0x237/0x480
[<c03873f7>] schedule+0x237/0x480
[<c01163b0>] dec_preempt_count+0x110/0x120
[<c03873f7>] schedule+0x237/0x480
[<c0126261>] worker_thread+0x1d1/0x240
[<c01262b2>] worker_thread+0x222/0x240
[<c0114d78>] activate_task+0x68/0x80
[<c02677b0>] batch_entropy_process+0x0/0x130
[<c01153e0>] default_wake_function+0x0/0x10
[<c03873f7>] schedule+0x237/0x480
[<c01153e0>] default_wake_function+0x0/0x10
[<c0126090>] worker_thread+0x0/0x240
[<c01298c4>] kthread+0x94/0xa0
[<c0129830>] kthread+0x0/0xa0
[<c010227d>] kernel_thread_helper+0x5/0x18
228ms non-preemptible critical section violated 4 ms preempt threshold
starting at schedule+0x57/0x480 and ending at schedule+0x237/0x480
[<c03873f7>] schedule+0x237/0x480
[<c01163b0>] dec_preempt_count+0x110/0x120
[<c03873f7>] schedule+0x237/0x480
[<c0126261>] worker_thread+0x1d1/0x240
[<c01262b2>] worker_thread+0x222/0x240
[<c0114d78>] activate_task+0x68/0x80
[<c02d0740>] fb_flashcursor+0x0/0x80
[<c01153e0>] default_wake_function+0x0/0x10
[<c03873f7>] schedule+0x237/0x480
[<c01153e0>] default_wake_function+0x0/0x10
[<c0126090>] worker_thread+0x0/0x240
[<c01298c4>] kthread+0x94/0xa0
[<c0129830>] kthread+0x0/0xa0
[<c010227d>] kernel_thread_helper+0x5/0x18
36ms non-preemptible critical section violated 4 ms preempt threshold starting
at schedule+0x57/0x480 and ending at chrdev_open+0x105/0x210
[<c0150695>] chrdev_open+0x105/0x210
[<c01163b0>] dec_preempt_count+0x110/0x120
[<c0150695>] chrdev_open+0x105/0x210
[<c0147254>] dentry_open+0x134/0x210
[<c014711c>] filp_open+0x4c/0x50
[<c01162f0>] dec_preempt_count+0x50/0x120
[<c01162f0>] dec_preempt_count+0x50/0x120
[<c0147396>] get_unused_fd+0x66/0xd0
[<c014752d>] sys_open+0x4d/0x80
[<c0103ee1>] sysenter_past_esp+0x52/0x71
6ms non-preemptible critical section violated 4 ms preempt threshold starting
at ksoftirqd+0x59/0xc0 and ending at ksoftirqd+0x63/0xc0
[<c011c653>] ksoftirqd+0x63/0xc0
[<c01163b0>] dec_preempt_count+0x110/0x120
[<c011c5f0>] ksoftirqd+0x0/0xc0
[<c011c653>] ksoftirqd+0x63/0xc0
[<c01298c4>] kthread+0x94/0xa0
[<c0129830>] kthread+0x0/0xa0
[<c010227d>] kernel_thread_helper+0x5/0x18
atkbd.c: Spurious ACK on isa0060/serio0. Some program, like XFree86, might be
trying access hardware directly.
--
Gabriel Devenyi
devenyga@mcmaster.ca
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Preempt Threshold Measurements 2004-07-12 23:43 Preempt Threshold Measurements Gabriel Devenyi @ 2004-07-12 23:59 ` Con Kolivas 2004-07-13 0:15 ` Gabriel Devenyi 2004-07-13 2:40 ` William Lee Irwin III 2004-07-14 10:15 ` [ck] " Jens Bergmann 2 siblings, 1 reply; 23+ messages in thread From: Con Kolivas @ 2004-07-12 23:59 UTC (permalink / raw) To: Gabriel Devenyi; +Cc: ck, linux-kernel Gabriel Devenyi writes: > Keeping in mind that I'm using the nvidia-kernel drivers, here are my preempt > threshold violations. Hi. Thanks for those. Can you confirm which version of the preempt threshold patch you're using? The first one I posted unfortunately had false positives. Those results unfortunately look like the first patch. You would have to be using either wli_preempttest2 + 2.1 separate patches or the snapshot-2.6.8-rc1-ck5-0407130047 or later which includes these patches for more accurate timekeeping (sorry). Cheers, Con ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-12 23:59 ` Con Kolivas @ 2004-07-13 0:15 ` Gabriel Devenyi 0 siblings, 0 replies; 23+ messages in thread From: Gabriel Devenyi @ 2004-07-13 0:15 UTC (permalink / raw) To: Con Kolivas; +Cc: ck, linux-kernel I'm using snapshot-2.6.8-rc1-ck5-0407130047 On Monday 12 July 2004 19:59, Con Kolivas wrote: > Gabriel Devenyi writes: > > Keeping in mind that I'm using the nvidia-kernel drivers, here are my > > preempt threshold violations. > > Hi. > > Thanks for those. Can you confirm which version of the preempt threshold > patch you're using? The first one I posted unfortunately had false > positives. Those results unfortunately look like the first patch. > > You would have to be using either wli_preempttest2 + 2.1 separate patches > or the snapshot-2.6.8-rc1-ck5-0407130047 or later which includes these > patches for more accurate timekeeping (sorry). > > Cheers, > Con -- Gabriel Devenyi devenyga@mcmaster.ca ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-12 23:43 Preempt Threshold Measurements Gabriel Devenyi 2004-07-12 23:59 ` Con Kolivas @ 2004-07-13 2:40 ` William Lee Irwin III 2004-07-13 2:48 ` [ck] " Gabriel Devenyi 2004-07-14 10:15 ` [ck] " Jens Bergmann 2 siblings, 1 reply; 23+ messages in thread From: William Lee Irwin III @ 2004-07-13 2:40 UTC (permalink / raw) To: Gabriel Devenyi; +Cc: ck, linux-kernel On Mon, Jul 12, 2004 at 07:43:25PM -0400, Gabriel Devenyi wrote: > Keeping in mind that I'm using the nvidia-kernel drivers, here are my preempt > threshold violations. > 6ms non-preemptible critical section violated 4 ms preempt threshold starting > at kernel_fpu_begin+0xd/0x50 and ending at _mmx_memcpy+0x127/0x170 > [<c0241987>] _mmx_memcpy+0x127/0x170 > [<c01163b0>] dec_preempt_count+0x110/0x120 > [<c0241987>] _mmx_memcpy+0x127/0x170 > [<c012d3b5>] load_module+0x835/0x900 > [<c013c84e>] unmap_vmas+0x10e/0x1f0 > [<c012d4fb>] sys_init_module+0x7b/0x230 > [<c0103ee1>] sysenter_past_esp+0x52/0x71 Things tend to be slow and stupid in the interest of robustness during system initialization. I'd suggest ignoring those unless you're specifically interested in boot time (in which case you should be doing things for yourself) and focusing on ones reported during normal usage after the system is up. -- wli ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ck] Re: Preempt Threshold Measurements 2004-07-13 2:40 ` William Lee Irwin III @ 2004-07-13 2:48 ` Gabriel Devenyi 2004-07-13 2:54 ` Con Kolivas 2004-07-13 2:55 ` [ck] " William Lee Irwin III 0 siblings, 2 replies; 23+ messages in thread From: Gabriel Devenyi @ 2004-07-13 2:48 UTC (permalink / raw) To: ck; +Cc: William Lee Irwin III, linux-kernel Well I'm not particularly educated in kernel internals yet, here's some reports from the system when its running. 6ms non-preemptible critical section violated 4 ms preempt threshold starting at do_munmap+0xd2/0x140 and ending at do_munmap+0xeb/0x140 [<c014007b>] do_munmap+0xeb/0x140 [<c01163b0>] dec_preempt_count+0x110/0x120 [<c014007b>] do_munmap+0xeb/0x140 [<c014010f>] sys_munmap+0x3f/0x60 [<c0103ee1>] sysenter_past_esp+0x52/0x71 6ms non-preemptible critical section violated 4 ms preempt threshold starting at do_munmap+0xd2/0x140 and ending at do_munmap+0xeb/0x140 [<c014007b>] do_munmap+0xeb/0x140 [<c01163b0>] dec_preempt_count+0x110/0x120 [<c014007b>] do_munmap+0xeb/0x140 [<c014010f>] sys_munmap+0x3f/0x60 [<c0103ee1>] sysenter_past_esp+0x52/0x71 6ms non-preemptible critical section violated 4 ms preempt threshold starting at do_munmap+0xd2/0x140 and ending at do_munmap+0xeb/0x140 [<c014007b>] do_munmap+0xeb/0x140 [<c01163b0>] dec_preempt_count+0x110/0x120 [<c014007b>] do_munmap+0xeb/0x140 [<c014010f>] sys_munmap+0x3f/0x60 [<c0103ee1>] sysenter_past_esp+0x52/0x71 5ms non-preemptible critical section violated 4 ms preempt threshold starting at do_munmap+0xd2/0x140 and ending at do_munmap+0xeb/0x140 [<c014007b>] do_munmap+0xeb/0x140 [<c01163b0>] dec_preempt_count+0x110/0x120 [<c014007b>] do_munmap+0xeb/0x140 [<c014010f>] sys_munmap+0x3f/0x60 [<c0103ee1>] sysenter_past_esp+0x52/0x71 -- Gabriel Devenyi devenyga@mcmaster.ca On Monday 12 July 2004 22:40, William Lee Irwin III wrote: > On Mon, Jul 12, 2004 at 07:43:25PM -0400, Gabriel Devenyi wrote: > > Keeping in mind that I'm using the nvidia-kernel drivers, here are my > > preempt threshold violations. > > 6ms non-preemptible critical section violated 4 ms preempt threshold > > starting at kernel_fpu_begin+0xd/0x50 and ending at > > _mmx_memcpy+0x127/0x170 [<c0241987>] _mmx_memcpy+0x127/0x170 > > [<c01163b0>] dec_preempt_count+0x110/0x120 > > [<c0241987>] _mmx_memcpy+0x127/0x170 > > [<c012d3b5>] load_module+0x835/0x900 > > [<c013c84e>] unmap_vmas+0x10e/0x1f0 > > [<c012d4fb>] sys_init_module+0x7b/0x230 > > [<c0103ee1>] sysenter_past_esp+0x52/0x71 > > Things tend to be slow and stupid in the interest of robustness during > system initialization. > > I'd suggest ignoring those unless you're specifically interested in boot > time (in which case you should be doing things for yourself) and focusing > on ones reported during normal usage after the system is up. > > > -- wli > _______________________________________________ > ck@vds.kolivas.org > ck mailing list - unmoderated. Please reply-to-all when posting. > http://bhhdoa.org.au/mailman/listinfo/ck ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 2:48 ` [ck] " Gabriel Devenyi @ 2004-07-13 2:54 ` Con Kolivas 2004-07-13 4:01 ` Andrew Morton 2004-07-13 2:55 ` [ck] " William Lee Irwin III 1 sibling, 1 reply; 23+ messages in thread From: Con Kolivas @ 2004-07-13 2:54 UTC (permalink / raw) To: Gabriel Devenyi; +Cc: ck, linux-kernel, William Lee Irwin III, akpm Gabriel Devenyi writes: > Well I'm not particularly educated in kernel internals yet, here's some > reports from the system when its running. > > 6ms non-preemptible critical section violated 4 ms preempt threshold starting > at do_munmap+0xd2/0x140 and ending at do_munmap+0xeb/0x140 Certainly the do_munmap and exit_mmap seem to be repeat offenders on my machine too (more the latter in my case). Con. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 2:54 ` Con Kolivas @ 2004-07-13 4:01 ` Andrew Morton 2004-07-13 5:51 ` Con Kolivas 2004-07-13 10:08 ` William Lee Irwin III 0 siblings, 2 replies; 23+ messages in thread From: Andrew Morton @ 2004-07-13 4:01 UTC (permalink / raw) To: Con Kolivas; +Cc: devenyga, ck, linux-kernel, wli Con Kolivas <kernel@kolivas.org> wrote: > > Gabriel Devenyi writes: > > > Well I'm not particularly educated in kernel internals yet, here's some > > reports from the system when its running. > > > > 6ms non-preemptible critical section violated 4 ms preempt threshold starting > > at do_munmap+0xd2/0x140 and ending at do_munmap+0xeb/0x140 > > Certainly the do_munmap and exit_mmap seem to be repeat offenders on my > machine too (more the latter in my case). > This is a false positive. Nothing is setting need_resched(), so unmap_vmas() doesn't bother dropping the lock. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 4:01 ` Andrew Morton @ 2004-07-13 5:51 ` Con Kolivas 2004-07-13 6:14 ` Andrew Morton 2004-07-13 10:08 ` William Lee Irwin III 1 sibling, 1 reply; 23+ messages in thread From: Con Kolivas @ 2004-07-13 5:51 UTC (permalink / raw) To: Andrew Morton; +Cc: devenyga, ck, linux-kernel, wli Andrew Morton writes: > Con Kolivas <kernel@kolivas.org> wrote: >> Certainly the do_munmap and exit_mmap seem to be repeat offenders on my >> machine too (more the latter in my case). >> > > This is a false positive. Nothing is setting need_resched(), so > unmap_vmas() doesn't bother dropping the lock. Ok well excluding do_munmap and exit_mmap the ones that have shown up (some more frequently than others) are: 6ms at ksoftirqd+0x6b 2ms at sys_ioctl+0x47 2ms at b44_open 6ms at fget+0x28 2ms at write_ordered_buffers+0x37 4ms at blkdev_put+0x48 5ms at add_wait_queue+0x21 4ms at blkdev_put+0x48 and you were right; eventually the reiserfs ones showed up again at 6-8ms Now which of the above are not false positives and should I try to extract the exact locations of them? Con ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 5:51 ` Con Kolivas @ 2004-07-13 6:14 ` Andrew Morton 2004-07-13 8:03 ` Lee Revell ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Andrew Morton @ 2004-07-13 6:14 UTC (permalink / raw) To: Con Kolivas; +Cc: devenyga, ck, linux-kernel, wli Con Kolivas <kernel@kolivas.org> wrote: > > Andrew Morton writes: > > > Con Kolivas <kernel@kolivas.org> wrote: > >> Certainly the do_munmap and exit_mmap seem to be repeat offenders on my > >> machine too (more the latter in my case). > >> > > > > This is a false positive. Nothing is setting need_resched(), so > > unmap_vmas() doesn't bother dropping the lock. > > Ok well excluding do_munmap and exit_mmap the ones that have shown up > (some more frequently than others) are: > > 6ms at ksoftirqd+0x6b Dunno. There's an unresolved RCU dentry reaping problem, but that's unlikely to occur within ksoftirqd context. > 2ms at sys_ioctl+0x47 uses lock_kernel() at the top level. Need to know the call trace to work out who the offender is. rtc-debug+amlat will tell you that, because it catches the CPU hog while it's being hoggy, rather than after it has finished. > 2ms at b44_open Lots of udelays() inside spin_lock_irq(). This is a "don't do that", I suspect. > 6ms at fget+0x28 Would need to see the amlat trace. > 2ms at write_ordered_buffers+0x37 reiserfs > 4ms at blkdev_put+0x48 This can run under one of two depths of lock_kernel. filemap_fdatawrite() and filemap_fdatawait() both do cond_resched(), so this is odd. Try this: --- 25/mm/truncate.c~truncate_inode_pages-latency-fix 2004-07-12 23:12:53.871816320 -0700 +++ 25-akpm/mm/truncate.c 2004-07-12 23:13:00.993733624 -0700 @@ -155,6 +155,7 @@ void truncate_inode_pages(struct address next = start; for ( ; ; ) { + cond_resched(); if (!pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) { if (next == start) break; _ > 5ms at add_wait_queue+0x21 Need to see the whole trace. > Now which of the above are not false positives and should I try to extract > the exact locations of them? You'll get better traces from http://www.zip.com.au/~akpm/linux/patches/stuff/rtc-debug.patch and http://www.zip.com.au/~akpm/linux/amlat.tar.gz Just apply rtc-debug, set CONFIG_RTC=y and run `amlat' as root while doing testing. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 6:14 ` Andrew Morton @ 2004-07-13 8:03 ` Lee Revell 2004-07-13 10:26 ` Con Kolivas 2004-07-13 21:52 ` Con Kolivas 2 siblings, 0 replies; 23+ messages in thread From: Lee Revell @ 2004-07-13 8:03 UTC (permalink / raw) To: Andrew Morton; +Cc: Con Kolivas, devenyga, ck, linux-kernel, wli On Tue, 2004-07-13 at 02:14, Andrew Morton wrote: > Con Kolivas <kernel@kolivas.org> wrote: > > > > Andrew Morton writes: > > > > > Con Kolivas <kernel@kolivas.org> wrote: > > >> Certainly the do_munmap and exit_mmap seem to be repeat offenders on my > > >> machine too (more the latter in my case). > > >> > > > > > > This is a false positive. Nothing is setting need_resched(), so > > > unmap_vmas() doesn't bother dropping the lock. > > > > Ok well excluding do_munmap and exit_mmap the ones that have shown up > > (some more frequently than others) are: > > > > 6ms at ksoftirqd+0x6b > > Dunno. There's an unresolved RCU dentry reaping problem, but that's > unlikely to occur within ksoftirqd context. > > > 2ms at sys_ioctl+0x47 > > uses lock_kernel() at the top level. Need to know the call trace to work > out who the offender is. rtc-debug+amlat will tell you that, because it > catches the CPU hog while it's being hoggy, rather than after it has > finished. > > > 2ms at b44_open > > Lots of udelays() inside spin_lock_irq(). This is a "don't do that", I > suspect. > > > 6ms at fget+0x28 > > Would need to see the amlat trace. > > > 2ms at write_ordered_buffers+0x37 > > reiserfs > > > 4ms at blkdev_put+0x48 > > This can run under one of two depths of lock_kernel. filemap_fdatawrite() > and filemap_fdatawait() both do cond_resched(), so this is odd. > Reiserfs uses lock_kernel heavily, could this be related? ./include/linux/reiserfs_fs.h:/* Right now we are still falling back to (un)lock_kernel, but eventually that ./include/linux/reiserfs_fs.h:#define reiserfs_write_lock( sb ) lock_kernel() ./include/linux/reiserfs_fs.h:#define reiserfs_write_unlock( sb ) unlock_kernel() Lee ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 6:14 ` Andrew Morton 2004-07-13 8:03 ` Lee Revell @ 2004-07-13 10:26 ` Con Kolivas 2004-07-13 21:52 ` Con Kolivas 2 siblings, 0 replies; 23+ messages in thread From: Con Kolivas @ 2004-07-13 10:26 UTC (permalink / raw) To: Andrew Morton; +Cc: ck, linux-kernel, wli [-- Attachment #1: Type: text/plain, Size: 2121 bytes --] Andrew Morton wrote: > Con Kolivas <kernel@kolivas.org> wrote: > >>4ms at blkdev_put+0x48 > > > This can run under one of two depths of lock_kernel. filemap_fdatawrite() > and filemap_fdatawait() both do cond_resched(), so this is odd. > > Try this: > Thanks! I'll certainly give that a shot. > Need to see the whole trace. Do you mean the traces from your patch or the ones that come from this preempt test like this? Jul 13 07:22:18 pc kernel: 4ms non-preemptible critical section violated 1 ms preempt threshold starting at exit_mmap+0x1c/0x188 and ending at exit_mmap+0x 118/0x188 Jul 13 07:22:18 pc kernel: [<c011d769>] dec_preempt_count+0x14f/0x151 Jul 13 07:22:18 pc kernel: [<c014d0d4>] exit_mmap+0x118/0x188 Jul 13 07:22:18 pc kernel: [<c014d0d4>] exit_mmap+0x118/0x188 Jul 13 07:22:18 pc kernel: [<c011df0a>] mmput+0x61/0x7b Jul 13 07:22:18 pc kernel: [<c01634dc>] exec_mmap+0x11d/0x248 Jul 13 07:22:18 pc kernel: [<c016364c>] flush_old_exec+0x45/0x284 Jul 13 07:22:18 pc kernel: [<c018119f>] load_elf_binary+0x3e8/0xd00 Jul 13 07:22:18 pc kernel: [<c013a9b1>] do_generic_mapping_read+0x295/0x4f2 Jul 13 07:22:18 pc kernel: [<c0164312>] search_binary_handler+0xd0/0x34c Jul 13 07:22:18 pc kernel: [<c0180db7>] load_elf_binary+0x0/0xd00 Jul 13 07:22:18 pc kernel: [<c0164329>] search_binary_handler+0xe7/0x34c Jul 13 07:22:18 pc kernel: [<c0164759>] do_execve+0x1cb/0x23c Jul 13 07:22:18 pc kernel: [<c0165f39>] getname+0x55/0xb8 Jul 13 07:22:18 pc kernel: [<c0104bcf>] sys_execve+0x35/0x68 Jul 13 07:22:18 pc kernel: [<c0105fd5>] sysenter_past_esp+0x52/0x71 I can try and extract the exact points from the disassembly after some guidance I've received (most of them seem to be buried deep in inlined functions.) The exit_mmap one seemed to work out (as you probably saw in my previous email) to be set off here: profile_exit_mmap(mm); lru_add_drain(); c014cfce: e8 18 72 ff ff call c01441eb <lru_add_drain> spin_lock(&mm->page_table_lock); c014cfd3: e8 16 06 fd ff call c011d5ee <inc_preempt_count> Cheers, Con [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 256 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 6:14 ` Andrew Morton 2004-07-13 8:03 ` Lee Revell 2004-07-13 10:26 ` Con Kolivas @ 2004-07-13 21:52 ` Con Kolivas 2 siblings, 0 replies; 23+ messages in thread From: Con Kolivas @ 2004-07-13 21:52 UTC (permalink / raw) To: Andrew Morton; +Cc: ck, linux-kernel, wli [-- Attachment #1: Type: text/plain, Size: 1926 bytes --] Andrew Morton wrote: > This can run under one of two depths of lock_kernel. filemap_fdatawrite() > and filemap_fdatawait() both do cond_resched(), so this is odd. > > Try this: > > --- 25/mm/truncate.c~truncate_inode_pages-latency-fix 2004-07-12 23:12:53.871816320 -0700 > +++ 25-akpm/mm/truncate.c 2004-07-12 23:13:00.993733624 -0700 > @@ -155,6 +155,7 @@ void truncate_inode_pages(struct address > > next = start; > for ( ; ; ) { > + cond_resched(); > if (!pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) { > if (next == start) > break; > _ I had a few of these with this patch. Is it cond_resched'ing twice? Con 2ms non-preemptible critical section violated 1 ms preempt threshold starting at flush_commit_list+0x11d/0x3ee and ending at schedule+0x216/0x841 [<c011d769>] dec_preempt_count+0x14f/0x151 [<c031d716>] schedule+0x216/0x841 [<c031d716>] schedule+0x216/0x841 [<c01085c7>] do_IRQ+0x176/0x1eb [<c01040b3>] cpu_idle+0x3a/0x3c [<c040094c>] start_kernel+0x19c/0x1e8 [<c0400402>] unknown_bootoption+0x0/0x15e 2ms non-preemptible critical section violated 1 ms preempt threshold starting at do_journal_end+0x539/0xae8 and ending at schedule+0x216/0x841 [<c011d769>] dec_preempt_count+0x14f/0x151 [<c031d716>] schedule+0x216/0x841 [<c031d716>] schedule+0x216/0x841 [<c0116284>] smp_apic_timer_interrupt+0x97/0x166 [<c01040b3>] cpu_idle+0x3a/0x3c [<c0120659>] call_console_drivers+0xcd/0xea [<c01209da>] release_console_sem+0xa3/0x101 [<c012088a>] printk+0x13d/0x19c 2ms non-preemptible critical section violated 1 ms preempt threshold starting at flush_async_commits+0x1b/0xf7 and ending at schedule+0x216/0x841 [<c011d769>] dec_preempt_count+0x14f/0x151 [<c031d716>] schedule+0x216/0x841 [<c031d716>] schedule+0x216/0x841 [<c011b88c>] load_balance+0x147/0x1ff [<c0116284>] smp_apic_timer_interrupt+0x97/0x166 [<c010604e>] work_resched+0x5/0x16 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 256 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 4:01 ` Andrew Morton 2004-07-13 5:51 ` Con Kolivas @ 2004-07-13 10:08 ` William Lee Irwin III 2004-07-13 10:40 ` William Lee Irwin III 1 sibling, 1 reply; 23+ messages in thread From: William Lee Irwin III @ 2004-07-13 10:08 UTC (permalink / raw) To: Andrew Morton; +Cc: Con Kolivas, devenyga, ck, linux-kernel On Mon, Jul 12, 2004 at 09:01:07PM -0700, Andrew Morton wrote: > This is a false positive. Nothing is setting need_resched(), so > unmap_vmas() doesn't bother dropping the lock. I guess I sent too many updates and the whole thing got dropped. The false positives were fixed in this way: --- timing-2.6.7.orig/mm/memory.c 2004-06-15 22:19:22.000000000 -0700 +++ timing-2.6.7/mm/memory.c 2004-07-12 05:33:17.000000000 -0700 @@ -558,14 +558,17 @@ zap_bytes -= block; if ((long)zap_bytes > 0) continue; - if (!atomic && need_resched()) { + zap_bytes = ZAP_BLOCK_SIZE; + if (!atomic) + continue; + touch_preempt_timing(); + if (need_resched()) { int fullmm = tlb_is_full_mm(*tlbp); tlb_finish_mmu(*tlbp, tlb_start, start); cond_resched_lock(&mm->page_table_lock); *tlbp = tlb_gather_mmu(mm, fullmm); tlb_start_valid = 0; } - zap_bytes = ZAP_BLOCK_SIZE; } } return ret; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 10:08 ` William Lee Irwin III @ 2004-07-13 10:40 ` William Lee Irwin III 2004-07-13 11:07 ` Lenar Lõhmus 0 siblings, 1 reply; 23+ messages in thread From: William Lee Irwin III @ 2004-07-13 10:40 UTC (permalink / raw) To: Andrew Morton, Con Kolivas, devenyga, ck, linux-kernel On Mon, Jul 12, 2004 at 09:01:07PM -0700, Andrew Morton wrote: >> This is a false positive. Nothing is setting need_resched(), so >> unmap_vmas() doesn't bother dropping the lock. On Tue, Jul 13, 2004 at 03:08:15AM -0700, William Lee Irwin III wrote: > I guess I sent too many updates and the whole thing got dropped. The false > positives were fixed in this way: > - if (!atomic && need_resched()) { > + zap_bytes = ZAP_BLOCK_SIZE; > + if (!atomic) > + continue; > + touch_preempt_timing(); > + if (need_resched()) { That's not quite right. Amazing it didn't catch might_sleep() warnings. Index: mm7-2.6.7/mm/memory.c =================================================================== --- mm7-2.6.7.orig/mm/memory.c 2004-07-13 03:06:12.784491200 -0700 +++ mm7-2.6.7/mm/memory.c 2004-07-13 03:39:45.843459720 -0700 @@ -568,16 +568,16 @@ if ((long)zap_bytes > 0) continue; zap_bytes = ZAP_BLOCK_SIZE; - if (!atomic) + if (atomic) continue; - touch_preempt_timing(); if (need_resched()) { int fullmm = tlb_is_full_mm(*tlbp); tlb_finish_mmu(*tlbp, tlb_start, start); cond_resched_lock(&mm->page_table_lock); *tlbp = tlb_gather_mmu(mm, fullmm); tlb_start_valid = 0; - } + } else + touch_preempt_timing(); } } return ret; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 10:40 ` William Lee Irwin III @ 2004-07-13 11:07 ` Lenar Lõhmus 2004-07-13 11:13 ` William Lee Irwin III 0 siblings, 1 reply; 23+ messages in thread From: Lenar Lõhmus @ 2004-07-13 11:07 UTC (permalink / raw) To: William Lee Irwin III, linux-kernel With all those little patches added and 2.6.8-rc1-ck5 I got this in my dmesg (preempt=0 voluntary=1 if it makes difference): bad: ld(3362) scheduling while atomic (1)! [<c02797ff>] schedule+0x43f/0x480 [<c01161a3>] kmap_atomic+0x13/0x70 [<c0117b5b>] __touch_preempt_timing+0xb/0x30 [<c0146ef7>] shmem_file_write+0x2f7/0x320 [<c014d890>] vfs_write+0xb0/0x100 [<c014d978>] sys_write+0x38/0x60 [<c0103ee1>] sysenter_past_esp+0x52/0x71 bad: krootimage(3483) scheduling while atomic (1)! [<c02797ff>] schedule+0x43f/0x480 [<c01161a3>] kmap_atomic+0x13/0x70 [<c0117b5b>] __touch_preempt_timing+0xb/0x30 [<c0146ef7>] shmem_file_write+0x2f7/0x320 [<c014d890>] vfs_write+0xb0/0x100 [<c014d978>] sys_write+0x38/0x60 [<c0103ee1>] sysenter_past_esp+0x52/0x71 Lenar William Lee Irwin III wrote: >That's not quite right. Amazing it didn't catch might_sleep() warnings. > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 11:07 ` Lenar Lõhmus @ 2004-07-13 11:13 ` William Lee Irwin III 0 siblings, 0 replies; 23+ messages in thread From: William Lee Irwin III @ 2004-07-13 11:13 UTC (permalink / raw) To: Lenar L?hmus; +Cc: linux-kernel On Tue, Jul 13, 2004 at 02:07:56PM +0300, Lenar L?hmus wrote: > With all those little patches added and 2.6.8-rc1-ck5 I got this in > my dmesg (preempt=0 voluntary=1 if it makes difference): > bad: ld(3362) scheduling while atomic (1)! > [<c02797ff>] schedule+0x43f/0x480 > [<c01161a3>] kmap_atomic+0x13/0x70 > [<c0117b5b>] __touch_preempt_timing+0xb/0x30 > [<c0146ef7>] shmem_file_write+0x2f7/0x320 > [<c014d890>] vfs_write+0xb0/0x100 > [<c014d978>] sys_write+0x38/0x60 > [<c0103ee1>] sysenter_past_esp+0x52/0x71 > bad: krootimage(3483) scheduling while atomic (1)! > [<c02797ff>] schedule+0x43f/0x480 > [<c01161a3>] kmap_atomic+0x13/0x70 > [<c0117b5b>] __touch_preempt_timing+0xb/0x30 > [<c0146ef7>] shmem_file_write+0x2f7/0x320 > [<c014d890>] vfs_write+0xb0/0x100 > [<c014d978>] sys_write+0x38/0x60 > [<c0103ee1>] sysenter_past_esp+0x52/0x71 Amazing. I'll get a coherent patch out vs. 2.6.8-rc1 and so on. -- wli ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ck] Re: Preempt Threshold Measurements 2004-07-13 2:48 ` [ck] " Gabriel Devenyi 2004-07-13 2:54 ` Con Kolivas @ 2004-07-13 2:55 ` William Lee Irwin III 2004-07-13 3:09 ` Gabriel Devenyi 2004-07-13 4:07 ` Andrew Morton 1 sibling, 2 replies; 23+ messages in thread From: William Lee Irwin III @ 2004-07-13 2:55 UTC (permalink / raw) To: Gabriel Devenyi; +Cc: ck, linux-kernel On Mon, Jul 12, 2004 at 10:48:50PM -0400, Gabriel Devenyi wrote: > Well I'm not particularly educated in kernel internals yet, here's some > reports from the system when its running. > 6ms non-preemptible critical section violated 4 ms preempt threshold starting > at do_munmap+0xd2/0x140 and ending at do_munmap+0xeb/0x140 > [<c014007b>] do_munmap+0xeb/0x140 > [<c01163b0>] dec_preempt_count+0x110/0x120 > [<c014007b>] do_munmap+0xeb/0x140 > [<c014010f>] sys_munmap+0x3f/0x60 > [<c0103ee1>] sysenter_past_esp+0x52/0x71 Looks like ZAP_BLOCK_SIZE may be too large for you. Lowering that some may "help" this. It's probably harmless, but try lowering that to half of whatever it is now, or maybe 64*PAGE_SIZE. It may be worthwhile to restructure how the preemption points are done in unmap_vmas() so we don't end up in some kind of tuning nightmare. -- wli ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ck] Re: Preempt Threshold Measurements 2004-07-13 2:55 ` [ck] " William Lee Irwin III @ 2004-07-13 3:09 ` Gabriel Devenyi 2004-07-13 4:07 ` Andrew Morton 1 sibling, 0 replies; 23+ messages in thread From: Gabriel Devenyi @ 2004-07-13 3:09 UTC (permalink / raw) To: ck; +Cc: William Lee Irwin III, linux-kernel Ah good, thanks for the suggestion on how to improve this... Now, what exactly is that, and where/how do I change it.... (I really should start/finish that "understanding the linux kernel" book of mine) -- Gabriel Devenyi devenyga@mcmaster.ca On Monday 12 July 2004 22:55, William Lee Irwin III wrote: > On Mon, Jul 12, 2004 at 10:48:50PM -0400, Gabriel Devenyi wrote: > > Well I'm not particularly educated in kernel internals yet, here's some > > reports from the system when its running. > > 6ms non-preemptible critical section violated 4 ms preempt threshold > > starting at do_munmap+0xd2/0x140 and ending at do_munmap+0xeb/0x140 > > [<c014007b>] do_munmap+0xeb/0x140 > > [<c01163b0>] dec_preempt_count+0x110/0x120 > > [<c014007b>] do_munmap+0xeb/0x140 > > [<c014010f>] sys_munmap+0x3f/0x60 > > [<c0103ee1>] sysenter_past_esp+0x52/0x71 > > Looks like ZAP_BLOCK_SIZE may be too large for you. Lowering that some > may "help" this. It's probably harmless, but try lowering that to half > of whatever it is now, or maybe 64*PAGE_SIZE. It may be worthwhile > to restructure how the preemption points are done in unmap_vmas() so > we don't end up in some kind of tuning nightmare. > > > -- wli > _______________________________________________ > ck@vds.kolivas.org > ck mailing list - unmoderated. Please reply-to-all when posting. > http://bhhdoa.org.au/mailman/listinfo/ck ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ck] Re: Preempt Threshold Measurements 2004-07-13 2:55 ` [ck] " William Lee Irwin III 2004-07-13 3:09 ` Gabriel Devenyi @ 2004-07-13 4:07 ` Andrew Morton 2004-07-13 5:34 ` Con Kolivas 1 sibling, 1 reply; 23+ messages in thread From: Andrew Morton @ 2004-07-13 4:07 UTC (permalink / raw) To: William Lee Irwin III; +Cc: devenyga, ck, linux-kernel William Lee Irwin III <wli@holomorphy.com> wrote: > > On Mon, Jul 12, 2004 at 10:48:50PM -0400, Gabriel Devenyi wrote: > > Well I'm not particularly educated in kernel internals yet, here's some > > reports from the system when its running. > > 6ms non-preemptible critical section violated 4 ms preempt threshold starting > > at do_munmap+0xd2/0x140 and ending at do_munmap+0xeb/0x140 > > [<c014007b>] do_munmap+0xeb/0x140 > > [<c01163b0>] dec_preempt_count+0x110/0x120 > > [<c014007b>] do_munmap+0xeb/0x140 > > [<c014010f>] sys_munmap+0x3f/0x60 > > [<c0103ee1>] sysenter_past_esp+0x52/0x71 > > Looks like ZAP_BLOCK_SIZE may be too large for you. Lowering that some > may "help" this. It's probably harmless, but try lowering that to half > of whatever it is now, or maybe 64*PAGE_SIZE. It may be worthwhile > to restructure how the preemption points are done in unmap_vmas() so > we don't end up in some kind of tuning nightmare. Does that instrumentation patch have the cond_resched_lock() fixups? If not, this is a false positive. The current setting of ZAP_BLOCK_SIZE is good for sub-500usec latencies on a recentish CPU. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 4:07 ` Andrew Morton @ 2004-07-13 5:34 ` Con Kolivas 2004-07-13 5:48 ` Nick Piggin 0 siblings, 1 reply; 23+ messages in thread From: Con Kolivas @ 2004-07-13 5:34 UTC (permalink / raw) To: Andrew Morton; +Cc: William Lee Irwin III, ck, devenyga, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1337 bytes --] Andrew Morton writes: > William Lee Irwin III <wli@holomorphy.com> wrote: >> >> On Mon, Jul 12, 2004 at 10:48:50PM -0400, Gabriel Devenyi wrote: >> > Well I'm not particularly educated in kernel internals yet, here's some >> > reports from the system when its running. >> > 6ms non-preemptible critical section violated 4 ms preempt threshold starting >> > at do_munmap+0xd2/0x140 and ending at do_munmap+0xeb/0x140 >> > [<c014007b>] do_munmap+0xeb/0x140 >> > [<c01163b0>] dec_preempt_count+0x110/0x120 >> > [<c014007b>] do_munmap+0xeb/0x140 >> > [<c014010f>] sys_munmap+0x3f/0x60 >> > [<c0103ee1>] sysenter_past_esp+0x52/0x71 >> >> Looks like ZAP_BLOCK_SIZE may be too large for you. Lowering that some >> may "help" this. It's probably harmless, but try lowering that to half >> of whatever it is now, or maybe 64*PAGE_SIZE. It may be worthwhile >> to restructure how the preemption points are done in unmap_vmas() so >> we don't end up in some kind of tuning nightmare. > > Does that instrumentation patch have the cond_resched_lock() fixups? If > not, this is a false positive. > > The current setting of ZAP_BLOCK_SIZE is good for sub-500usec latencies on > a recentish CPU. Here's what the full patch against 2.6.8-rc1 currently looks like (equivalent is in the snapshot used by Gabriel). Cheers, Con [-- Attachment #2: wli_preempttest2.1 --] [-- Type: text/plain, Size: 6134 bytes --] Index: linux-2.6.8-rc1/include/linux/preempt.h =================================================================== --- linux-2.6.8-rc1.orig/include/linux/preempt.h 2004-07-13 15:29:53.619112346 +1000 +++ linux-2.6.8-rc1/include/linux/preempt.h 2004-07-13 15:30:21.398811516 +1000 @@ -9,17 +9,17 @@ #include <linux/config.h> #include <linux/linkage.h> -#define preempt_count() (current_thread_info()->preempt_count) - -#define inc_preempt_count() \ -do { \ - preempt_count()++; \ -} while (0) +#ifdef CONFIG_PREEMPT_TIMING +void inc_preempt_count(void); +void dec_preempt_count(void); +void touch_preempt_timing(void); +#else +#define touch_preempt_timing() do { } while (0) +#define inc_preempt_count() do { preempt_count()++; } while (0) +#define dec_preempt_count() do { preempt_count()--; } while (0) +#endif -#define dec_preempt_count() \ -do { \ - preempt_count()--; \ -} while (0) +#define preempt_count() (current_thread_info()->preempt_count) #ifdef CONFIG_PREEMPT @@ -51,9 +51,15 @@ #else +#ifdef CONFIG_PREEMPT_TIMING +#define preempt_disable() inc_preempt_count() +#define preempt_enable_no_resched() dec_preempt_count() +#define preempt_enable() dec_preempt_count() +#else #define preempt_disable() do { } while (0) #define preempt_enable_no_resched() do { } while (0) #define preempt_enable() do { } while (0) +#endif #define preempt_check_resched() do { } while (0) #endif Index: linux-2.6.8-rc1/include/linux/sched.h =================================================================== --- linux-2.6.8-rc1.orig/include/linux/sched.h 2004-07-12 16:11:50.000000000 +1000 +++ linux-2.6.8-rc1/include/linux/sched.h 2004-07-13 15:31:00.547749905 +1000 @@ -1021,6 +1021,7 @@ extern void __cond_resched(void); static inline void cond_resched(void) { + touch_preempt_timing(); if (need_resched()) __cond_resched(); } Index: linux-2.6.8-rc1/init/Kconfig =================================================================== --- linux-2.6.8-rc1.orig/init/Kconfig 2004-07-12 16:11:50.000000000 +1000 +++ linux-2.6.8-rc1/init/Kconfig 2004-07-13 15:30:21.423807646 +1000 @@ -279,6 +279,14 @@ Disabling this option will cause the kernel to be built without support for epoll family of system calls. +config PREEMPT_TIMING + bool "Non-preemptible critical section timing" + default n + help + This option measures the time spent in non-preemptible critical + sections and reports warnings when a boot-time configurable + latency threshold is exceeded. + source "drivers/block/Kconfig.iosched" config CC_OPTIMIZE_FOR_SIZE Index: linux-2.6.8-rc1/kernel/printk.c =================================================================== --- linux-2.6.8-rc1.orig/kernel/printk.c 2004-07-12 16:11:50.000000000 +1000 +++ linux-2.6.8-rc1/kernel/printk.c 2004-07-13 15:31:00.549749595 +1000 @@ -650,10 +650,8 @@ */ void console_conditional_schedule(void) { - if (console_may_schedule && need_resched()) { - set_current_state(TASK_RUNNING); - schedule(); - } + if (console_may_schedule) + cond_resched(); } EXPORT_SYMBOL(console_conditional_schedule); Index: linux-2.6.8-rc1/kernel/sched.c =================================================================== --- linux-2.6.8-rc1.orig/kernel/sched.c 2004-07-12 16:11:50.000000000 +1000 +++ linux-2.6.8-rc1/kernel/sched.c 2004-07-13 15:30:21.456802538 +1000 @@ -4040,3 +4040,74 @@ EXPORT_SYMBOL(__preempt_write_lock); #endif /* defined(CONFIG_SMP) && defined(CONFIG_PREEMPT) */ + +#ifdef CONFIG_PREEMPT_TIMING +#include <linux/kallsyms.h> + +static int preempt_thresh; +static DEFINE_PER_CPU(u64, preempt_timings); +static DEFINE_PER_CPU(unsigned long, preempt_entry); + +static int setup_preempt_thresh(char *s) +{ + int thresh; + + get_option(&s, &thresh); + if (thresh > 0) { + preempt_thresh = thresh; + printk("Preemption threshold = %dms\n", preempt_thresh); + } + return 1; +} +__setup("preempt_thresh=", setup_preempt_thresh); + +static void __touch_preempt_timing(void *addr) +{ + __get_cpu_var(preempt_timings) = sched_clock(); + __get_cpu_var(preempt_entry) = (unsigned long)addr; +} + +void touch_preempt_timing(void) +{ + if (preempt_count() > 0 && system_state == SYSTEM_RUNNING && + __get_cpu_var(preempt_entry)) + __touch_preempt_timing(__builtin_return_address(0)); +} +EXPORT_SYMBOL(touch_preempt_timing); + +void inc_preempt_count(void) +{ + preempt_count()++; + if (preempt_count() == 1 && system_state == SYSTEM_RUNNING) + __touch_preempt_timing(__builtin_return_address(0)); +} +EXPORT_SYMBOL(inc_preempt_count); + +void dec_preempt_count(void) +{ + if (preempt_count() == 1 && system_state == SYSTEM_RUNNING && + __get_cpu_var(preempt_entry)) { + u64 hold; + unsigned long preempt_exit + = (unsigned long)__builtin_return_address(0); + hold = sched_clock() - __get_cpu_var(preempt_timings) + 999999; + do_div(hold, 1000000); + if (preempt_thresh && hold > preempt_thresh && + printk_ratelimit()) { + printk("%lums non-preemptible critical section " + "violated %d ms preempt threshold " + "starting at ", + (unsigned long)hold, + preempt_thresh); + print_symbol("%s", __get_cpu_var(preempt_entry)); + printk(" and ending at "); + print_symbol("%s", preempt_exit); + printk("\n"); + dump_stack(); + } + __get_cpu_var(preempt_entry) = 0; + } + preempt_count()--; +} +EXPORT_SYMBOL(dec_preempt_count); +#endif Index: linux-2.6.8-rc1/mm/memory.c =================================================================== --- linux-2.6.8-rc1.orig/mm/memory.c 2004-07-12 16:11:50.000000000 +1000 +++ linux-2.6.8-rc1/mm/memory.c 2004-07-13 15:31:00.550749440 +1000 @@ -567,14 +567,17 @@ zap_bytes -= block; if ((long)zap_bytes > 0) continue; - if (!atomic && need_resched()) { + zap_bytes = ZAP_BLOCK_SIZE; + if (!atomic) + continue; + touch_preempt_timing(); + if (need_resched()) { int fullmm = tlb_is_full_mm(*tlbp); tlb_finish_mmu(*tlbp, tlb_start, start); cond_resched_lock(&mm->page_table_lock); *tlbp = tlb_gather_mmu(mm, fullmm); tlb_start_valid = 0; } - zap_bytes = ZAP_BLOCK_SIZE; } } return ret; ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 5:34 ` Con Kolivas @ 2004-07-13 5:48 ` Nick Piggin 2004-07-13 10:14 ` William Lee Irwin III 0 siblings, 1 reply; 23+ messages in thread From: Nick Piggin @ 2004-07-13 5:48 UTC (permalink / raw) To: Con Kolivas Cc: Andrew Morton, William Lee Irwin III, ck, devenyga, linux-kernel Con Kolivas wrote: > --- linux-2.6.8-rc1.orig/include/linux/sched.h 2004-07-12 16:11:50.000000000 +1000 > +++ linux-2.6.8-rc1/include/linux/sched.h 2004-07-13 15:31:00.547749905 +1000 > @@ -1021,6 +1021,7 @@ > extern void __cond_resched(void); > static inline void cond_resched(void) > { > + touch_preempt_timing(); > if (need_resched()) > __cond_resched(); > } cond_resched_lock just below this needs something similar. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Preempt Threshold Measurements 2004-07-13 5:48 ` Nick Piggin @ 2004-07-13 10:14 ` William Lee Irwin III 0 siblings, 0 replies; 23+ messages in thread From: William Lee Irwin III @ 2004-07-13 10:14 UTC (permalink / raw) To: Nick Piggin; +Cc: Con Kolivas, Andrew Morton, ck, devenyga, linux-kernel On Tue, Jul 13, 2004 at 03:48:45PM +1000, Nick Piggin wrote: > cond_resched_lock just below this needs something similar. Maybe it does. Index: mm7-2.6.7/include/linux/sched.h =================================================================== --- mm7-2.6.7.orig/include/linux/sched.h 2004-07-13 03:06:12.759495000 -0700 +++ mm7-2.6.7/include/linux/sched.h 2004-07-13 03:14:00.122445032 -0700 @@ -1053,7 +1053,8 @@ preempt_enable_no_resched(); __cond_resched(); spin_lock(lock); - } + } else + touch_preempt_timing(); } /* Reevaluate whether the task has signals pending delivery. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [ck] Preempt Threshold Measurements 2004-07-12 23:43 Preempt Threshold Measurements Gabriel Devenyi 2004-07-12 23:59 ` Con Kolivas 2004-07-13 2:40 ` William Lee Irwin III @ 2004-07-14 10:15 ` Jens Bergmann 2 siblings, 0 replies; 23+ messages in thread From: Jens Bergmann @ 2004-07-14 10:15 UTC (permalink / raw) To: Gabriel Devenyi; +Cc: ck, linux-kernel Hi, this is what i got running -ck (downloaded today so including wli's updated preempt latency testing patch): 8ms non-preemptible critical section violated 2 ms preempt threshold starting at generic_shutdown_super+0x73/0x1b0 and ending at schedule+0x255/0x4a0 [<c0354b35>] schedule+0x255/0x4a0 [<c0114a08>] dec_preempt_count+0x118/0x120 [<c0354b35>] schedule+0x255/0x4a0 [<c0354b35>] schedule+0x255/0x4a0 [<c011e7b2>] __mod_timer+0x112/0x180 [<c0355283>] schedule_timeout+0x63/0xc0 [<c011f2c0>] process_timeout+0x0/0x10 [<c0161237>] do_select+0x187/0x2f0 [<c0160ee0>] __pollwait+0x0/0xd0 [<c01616bb>] sys_select+0x2db/0x500 [<c0104237>] syscall_call+0x7/0xb 3ms non-preemptible critical section violated 2 ms preempt threshold starting at schedule+0x65/0x4a0 and ending at schedule+0x255/0x4a0 [<c0354b35>] schedule+0x255/0x4a0 [<c0114a08>] dec_preempt_count+0x118/0x120 [<c0354b35>] schedule+0x255/0x4a0 [<c0354b35>] schedule+0x255/0x4a0 [<c01170b5>] do_syslog+0x205/0x430 [<c0114e30>] autoremove_wake_function+0x0/0x60 [<c0114942>] dec_preempt_count+0x52/0x120 [<c0114e30>] autoremove_wake_function+0x0/0x60 [<c0169a63>] dnotify_parent+0x93/0xc0 [<c014def8>] vfs_read+0xb8/0x130 [<c014e192>] sys_read+0x42/0x70 [<c0104237>] syscall_call+0x7/0xb 6ms non-preemptible critical section violated 2 ms preempt threshold starting at sock_def_readable+0x10/0x90 and ending at sock_def_readable+0x31/0x90 [<c02ec391>] sock_def_readable+0x31/0x90 [<c0114a08>] dec_preempt_count+0x118/0x120 [<c02ec391>] sock_def_readable+0x31/0x90 [<c02ec391>] sock_def_readable+0x31/0x90 [<c033a66f>] unix_stream_sendmsg+0x28f/0x410 [<c02e8d8d>] sock_sendmsg+0x9d/0xc0 [<c0112f4a>] slice+0x2a/0x50 [<c0112fb9>] effective_prio+0x49/0xe0 [<c01131ae>] activate_task+0x6e/0x90 [<c0113271>] try_to_wake_up+0x71/0xc0 [<c01132dd>] wake_up_process+0x1d/0x30 [<c02566dc>] vt_ioctl+0x21c/0x1c40 [<c02e91b5>] sock_readv_writev+0x75/0xb0 [<c02e929f>] sock_writev+0x4f/0x60 [<c014e59c>] do_readv_writev+0x22c/0x280 [<c016071d>] sys_ioctl+0xbd/0x2a0 [<c0114942>] dec_preempt_count+0x52/0x120 [<c014e6b8>] vfs_writev+0x58/0x70 [<c014e782>] sys_writev+0x42/0x70 [<c014e6b8>] vfs_writev+0x58/0x70 [<c014e782>] sys_writev+0x42/0x70 [<c0104237>] syscall_call+0x7/0xb 6ms non-preemptible critical section violated 2 ms preempt threshold starting at exit_mmap+0x1a/0x170 and ending at exit_mmap+0x112/0x170 [<c0142862>] exit_mmap+0x112/0x170 [<c0114a08>] dec_preempt_count+0x118/0x120 [<c0142862>] exit_mmap+0x112/0x170 [<c0142862>] exit_mmap+0x112/0x170 [<c0115154>] mmput+0x74/0xa0 [<c01194ab>] do_exit+0xfb/0x430 [<c0121cd6>] get_signal_to_deliver+0x236/0x380 [<c0114942>] dec_preempt_count+0x52/0x120 [<c011988e>] do_group_exit+0x3e/0xc0 [<c0121d09>] get_signal_to_deliver+0x269/0x380 [<c0103fd3>] do_signal+0x93/0x120 [<c01345c8>] free_hot_cold_page+0xa8/0x120 [<c0160ed4>] poll_freewait+0x44/0x50 [<c0161c5c>] sys_poll+0x20c/0x240 [<c0160ee0>] __pollwait+0x0/0xd0 [<c0104097>] do_notify_resume+0x37/0x3c [<c0104282>] work_notifysig+0x13/0x15 6ms non-preemptible critical section violated 2 ms preempt threshold starting at install_arg_page+0x39/0x100 and ending at pte_alloc_map+0x45/0xb0 [<c013dc95>] pte_alloc_map+0x45/0xb0 [<c0114a08>] dec_preempt_count+0x118/0x120 [<c013dc95>] pte_alloc_map+0x45/0xb0 [<c013dc95>] pte_alloc_map+0x45/0xb0 [<c0114858>] __touch_preempt_timing+0x8/0x20 [<c0158918>] install_arg_page+0x48/0x100 [<c0158b34>] setup_arg_pages+0x164/0x1b0 [<c01788eb>] load_elf_binary+0x41b/0xd50 [<c0130f95>] __generic_file_aio_read+0x205/0x240 [<c014de10>] do_sync_read+0x80/0xb0 [<c0169a63>] dnotify_parent+0x93/0xc0 [<c01784d0>] load_elf_binary+0x0/0xd50 [<c0159a91>] search_binary_handler+0x191/0x2c0 [<c0177b55>] load_script+0x215/0x250 [<c0169a63>] dnotify_parent+0x93/0xc0 [<c0134b20>] __alloc_pages+0x320/0x390 [<c0159a75>] search_binary_handler+0x175/0x2c0 [<c0114942>] dec_preempt_count+0x52/0x120 [<c0177940>] load_script+0x0/0x250 [<c0159a91>] search_binary_handler+0x191/0x2c0 [<c0159d8d>] do_execve+0x1cd/0x250 [<c0102d22>] sys_execve+0x42/0x80 [<c0104237>] syscall_call+0x7/0xb [<c012583b>] ____call_usermodehelper+0x7b/0xb0 [<c0104237>] syscall_call+0x7/0xb [<c012583b>] ____call_usermodehelper+0x7b/0xb0 [<c01257c0>] ____call_usermodehelper+0x0/0xb0 [<c010237d>] kernel_thread_helper+0x5/0x18 6ms non-preemptible critical section violated 2 ms preempt threshold starting at unix_accept+0xbe/0x140 and ending at unix_accept+0xfc/0x140 [<c033a50c>] unix_accept+0xfc/0x140 [<c0114a08>] dec_preempt_count+0x118/0x120 [<c033a50c>] unix_accept+0xfc/0x140 [<c033a50c>] unix_accept+0xfc/0x140 [<c02ea8a1>] sys_accept+0xb1/0x170 [<c0161a5f>] do_select+0x1af/0x2f0 [<c011e7d2>] __mod_timer+0x112/0x180 [<c0114942>] dec_preempt_count+0x52/0x120 [<c011e7d2>] __mod_timer+0x112/0x180 [<c011a51c>] do_setitimer+0x1bc/0x1f0 [<c0224308>] copy_from_user+0x48/0x80 [<c02eb495>] sys_socketcall+0xd5/0x260 [<c0104237>] syscall_call+0x7/0xb 5ms non-preemptible critical section violated 2 ms preempt threshold starting at sys_ioctl+0x41/0x2a0 and ending at sys_ioctl+0xbd/0x2a0 [<c0160f1d>] sys_ioctl+0xbd/0x2a0 [<c0114a08>] dec_preempt_count+0x118/0x120 [<c0160f1d>] sys_ioctl+0xbd/0x2a0 [<c0104648>] show_trace+0x18/0xa0 [<c0160f1d>] sys_ioctl+0xbd/0x2a0 [<c0104648>] show_trace+0x18/0xa0 [<c0104237>] syscall_call+0x7/0xb [<c0104648>] show_trace+0x18/0xa0 115ms non-preemptible critical section violated 2 ms preempt threshold starting at schedule+0x65/0x4a0 and ending at chrdev_open+0x105/0x240 [<c0157af5>] chrdev_open+0x105/0x240 [<c0114a08>] dec_preempt_count+0x118/0x120 [<c0157af5>] chrdev_open+0x105/0x240 [<c0157af5>] chrdev_open+0x105/0x240 [<c014d75a>] dentry_open+0x15a/0x240 [<c014d5f8>] filp_open+0x68/0x70 [<c014d8b4>] get_unused_fd+0x74/0xe0 [<c014da5b>] sys_open+0x5b/0x90 [<c0104237>] syscall_call+0x7/0xb 300ms non-preemptible critical section violated 2 ms preempt threshold starting at chrdev_open+0xd3/0x240 and ending at schedule+0x255/0x4a0 [<c03553d5>] schedule+0x255/0x4a0 [<c0114a08>] dec_preempt_count+0x118/0x120 [<c03553d5>] schedule+0x255/0x4a0 [<c03553d5>] schedule+0x255/0x4a0 [<c011e7d2>] __mod_timer+0x112/0x180 [<c0355b23>] schedule_timeout+0x63/0xc0 [<c011e7d2>] __mod_timer+0x112/0x180 [<c0355b23>] schedule_timeout+0x63/0xc0 [<c011f2e0>] process_timeout+0x0/0x10 [<c0161a37>] do_select+0x187/0x2f0 [<c01616e0>] __pollwait+0x0/0xd0 [<c0161ebb>] sys_select+0x2db/0x500 [<c0104237>] syscall_call+0x7/0xb System specs: P3 1.13 GHz, 512 MB RAM, 1024 MB Swap, Ext3, 20GB IBM DTLA305020 Normal Desktop Maschine running X. -- it's just a game... ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2004-07-14 10:16 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-07-12 23:43 Preempt Threshold Measurements Gabriel Devenyi 2004-07-12 23:59 ` Con Kolivas 2004-07-13 0:15 ` Gabriel Devenyi 2004-07-13 2:40 ` William Lee Irwin III 2004-07-13 2:48 ` [ck] " Gabriel Devenyi 2004-07-13 2:54 ` Con Kolivas 2004-07-13 4:01 ` Andrew Morton 2004-07-13 5:51 ` Con Kolivas 2004-07-13 6:14 ` Andrew Morton 2004-07-13 8:03 ` Lee Revell 2004-07-13 10:26 ` Con Kolivas 2004-07-13 21:52 ` Con Kolivas 2004-07-13 10:08 ` William Lee Irwin III 2004-07-13 10:40 ` William Lee Irwin III 2004-07-13 11:07 ` Lenar Lõhmus 2004-07-13 11:13 ` William Lee Irwin III 2004-07-13 2:55 ` [ck] " William Lee Irwin III 2004-07-13 3:09 ` Gabriel Devenyi 2004-07-13 4:07 ` Andrew Morton 2004-07-13 5:34 ` Con Kolivas 2004-07-13 5:48 ` Nick Piggin 2004-07-13 10:14 ` William Lee Irwin III 2004-07-14 10:15 ` [ck] " Jens Bergmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox