* kswapd continuously active @ 2010-01-22 23:52 Jan Engelhardt 2010-01-25 13:06 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Jan Engelhardt @ 2010-01-22 23:52 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: jens.axboe Hi, with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process busy in D state, with the following trace: sync D 000000000079299c 7552 4851 1 0x208061101000004 Call Trace: [000000000053ca58] bdi_sched_wait+0xc/0x1c [000000000079299c] __wait_on_bit+0x58/0xb8 [0000000000792a5c] out_of_line_wait_on_bit+0x60/0x74 [000000000053ca3c] bdi_sync_writeback+0x6c/0x7c [000000000053ca78] sync_inodes_sb+0x10/0xfc [0000000000540dd0] __sync_filesystem+0x50/0x88 [0000000000540ec8] sync_filesystems+0xc0/0x124 [0000000000540f80] sys_sync+0x1c/0x48 [0000000000406294] linux_sparc_syscall+0x34/0x44 kswapd is also active all the time, writing something to disk - LED is blinking, and that's been going on for over half an hour despite the box being not busy. How do I see what kswapd is still flushing to disk? Even if all RAM (8 GB) was filled with dirty data, syncing it out would not take that long - that is to say, the sync process should have long exited. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kswapd continuously active 2010-01-22 23:52 kswapd continuously active Jan Engelhardt @ 2010-01-25 13:06 ` Jens Axboe 2010-01-25 13:22 ` Jan Engelhardt 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2010-01-25 13:06 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Kernel Mailing List On Sat, Jan 23 2010, Jan Engelhardt wrote: > Hi, > > > with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process > busy in D state, with the following trace: > > sync D 000000000079299c 7552 4851 1 0x208061101000004 > Call Trace: > [000000000053ca58] bdi_sched_wait+0xc/0x1c > [000000000079299c] __wait_on_bit+0x58/0xb8 > [0000000000792a5c] out_of_line_wait_on_bit+0x60/0x74 > [000000000053ca3c] bdi_sync_writeback+0x6c/0x7c > [000000000053ca78] sync_inodes_sb+0x10/0xfc > [0000000000540dd0] __sync_filesystem+0x50/0x88 > [0000000000540ec8] sync_filesystems+0xc0/0x124 > [0000000000540f80] sys_sync+0x1c/0x48 > [0000000000406294] linux_sparc_syscall+0x34/0x44 > > kswapd is also active all the time, writing something to disk - LED is > blinking, and that's been going on for over half an hour despite the box > being not busy. How do I see what kswapd is still flushing to disk? Even > if all RAM (8 GB) was filled with dirty data, syncing it out would not > take that long - that is to say, the sync process should have long > exited. That doesn't sound good. What does /proc/meminfo say? What file systems are you using? -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kswapd continuously active 2010-01-25 13:06 ` Jens Axboe @ 2010-01-25 13:22 ` Jan Engelhardt 2010-02-05 12:35 ` Jan Engelhardt 0 siblings, 1 reply; 13+ messages in thread From: Jan Engelhardt @ 2010-01-25 13:22 UTC (permalink / raw) To: Jens Axboe; +Cc: Linux Kernel Mailing List On Monday 2010-01-25 14:06, Jens Axboe wrote: >> >> with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process >> busy in D state, with the following trace: >> >> sync D 000000000079299c 7552 4851 1 0x208061101000004 >> Call Trace: >> [000000000053ca58] bdi_sched_wait+0xc/0x1c >> [000000000079299c] __wait_on_bit+0x58/0xb8 >> [0000000000792a5c] out_of_line_wait_on_bit+0x60/0x74 >> [000000000053ca3c] bdi_sync_writeback+0x6c/0x7c >> [000000000053ca78] sync_inodes_sb+0x10/0xfc >> [0000000000540dd0] __sync_filesystem+0x50/0x88 >> [0000000000540ec8] sync_filesystems+0xc0/0x124 >> [0000000000540f80] sys_sync+0x1c/0x48 >> [0000000000406294] linux_sparc_syscall+0x34/0x44 >> >> kswapd is also active all the time, writing something to disk - LED is >> blinking, and that's been going on for over half an hour despite the box >> being not busy. How do I see what kswapd is still flushing to disk? Even >> if all RAM (8 GB) was filled with dirty data, syncing it out would not >> take that long - that is to say, the sync process should have long >> exited. > >That doesn't sound good. What does /proc/meminfo say? What file systems >are you using? Eventually that day, the sync finished; not sure what triggered that. Usually, meminfo looks like when the box is doing something: 14:08 ares:~ > cat /proc/meminfo MemTotal: 8166752 kB MemFree: 3243552 kB Buffers: 207968 kB Cached: 2728216 kB SwapCached: 0 kB Active: 2203136 kB Inactive: 2152544 kB Active(anon): 1167256 kB Inactive(anon): 252952 kB Active(file): 1035880 kB Inactive(file): 1899592 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 141624 kB Writeback: 0 kB AnonPages: 1421448 kB Mapped: 49904 kB Shmem: 680 kB Slab: 429784 kB SReclaimable: 315760 kB SUnreclaim: 114024 kB KernelStack: 9248 kB PageTables: 6120 kB Quicklists: 10560 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 4083376 kB Committed_AS: 1626280 kB VmallocTotal: 1069547520 kB VmallocUsed: 28816 kB VmallocChunk: 1069518664 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 4096 kB 14:16 ares:~ > w 14:17:13 up 6:26, 2 users, load average: 21.09, 24.41, 22.49 This is used as a compile box, and there are lots of files created and deleted when there is work - a new chroot for every package basically, with a total of about 300K such "floating" files. Filesystem is ext4. /dev/sda4 / ext4 rw,relatime,barrier=1,data=ordered 0 0 Now that I see barrier=1, maybe I should turn that off in the future like I did with XFS around 2.6.17[1]. [1] http://lkml.indiana.edu/hypermail/linux/kernel/0605.2/1110.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kswapd continuously active 2010-01-25 13:22 ` Jan Engelhardt @ 2010-02-05 12:35 ` Jan Engelhardt 2010-02-05 13:00 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Jan Engelhardt @ 2010-02-05 12:35 UTC (permalink / raw) To: Jens Axboe; +Cc: Linux Kernel Mailing List On Monday 2010-01-25 14:22, Jan Engelhardt wrote: >On Monday 2010-01-25 14:06, Jens Axboe wrote: >>> >>> with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process >>> busy in D state, with the following trace: >>> >>> sync D 000000000079299c 7552 4851 1 0x208061101000004 >>> Call Trace: >>> [000000000053ca58] bdi_sched_wait+0xc/0x1c[...] >>> [000000000053ca78] sync_inodes_sb+0x10/0xfc >>> >>> kswapd is also active all the time, writing something to disk[...] >> >>That doesn't sound good. What does /proc/meminfo say? What file systems >>are you using? >January 25 Feb-05 >MemTotal: 8166752 kB 8166752 >MemFree: 3243552 kB 3781776 >Buffers: 207968 kB 4912 >Cached: 2728216 kB 2684400 >SwapCached: 0 kB 0 >Active: 2203136 kB 495624 >Inactive: 2152544 kB 3263136 >Active(anon): 1167256 kB 488168 >Inactive(anon): 252952 kB 583912 >Active(file): 1035880 kB 7456 >Inactive(file): 1899592 kB 2679224 >Unevictable: 0 kB 0 >Mlocked: 0 kB 0 >SwapTotal: 0 kB 0 >SwapFree: 0 kB 0 >Dirty: 141624 kB 2662184 >Writeback: 0 kB .. Today this happened again. So I looked at /proc/meminfo to paste today's values next to those from January. That is when I noticed the "Dirty" value - and thus I ran watch -d -n 1 'grep Dirty /proc/meminfo' What I see is that the dirty amount - a sync is currently running - only decreases with at most 400 KB/sec, often less than that. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kswapd continuously active 2010-02-05 12:35 ` Jan Engelhardt @ 2010-02-05 13:00 ` Jens Axboe 2010-02-05 13:09 ` Jan Engelhardt 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2010-02-05 13:00 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Kernel Mailing List On Fri, Feb 05 2010, Jan Engelhardt wrote: > > On Monday 2010-01-25 14:22, Jan Engelhardt wrote: > >On Monday 2010-01-25 14:06, Jens Axboe wrote: > >>> > >>> with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process > >>> busy in D state, with the following trace: > >>> > >>> sync D 000000000079299c 7552 4851 1 0x208061101000004 > >>> Call Trace: > >>> [000000000053ca58] bdi_sched_wait+0xc/0x1c[...] > >>> [000000000053ca78] sync_inodes_sb+0x10/0xfc > >>> > >>> kswapd is also active all the time, writing something to disk[...] > >> > >>That doesn't sound good. What does /proc/meminfo say? What file systems > >>are you using? > > >January 25 Feb-05 > >MemTotal: 8166752 kB 8166752 > >MemFree: 3243552 kB 3781776 > >Buffers: 207968 kB 4912 > >Cached: 2728216 kB 2684400 > >SwapCached: 0 kB 0 > >Active: 2203136 kB 495624 > >Inactive: 2152544 kB 3263136 > >Active(anon): 1167256 kB 488168 > >Inactive(anon): 252952 kB 583912 > >Active(file): 1035880 kB 7456 > >Inactive(file): 1899592 kB 2679224 > >Unevictable: 0 kB 0 > >Mlocked: 0 kB 0 > >SwapTotal: 0 kB 0 > >SwapFree: 0 kB 0 > >Dirty: 141624 kB 2662184 > >Writeback: 0 kB .. > > Today this happened again. So I looked at /proc/meminfo to paste today's > values next to those from January. That is when I noticed the "Dirty" > value - and thus I ran > > watch -d -n 1 'grep Dirty /proc/meminfo' > > What I see is that the dirty amount - a sync is currently running - > only decreases with at most 400 KB/sec, often less than that. I'm guessing the barriers and commits are what is killing your performance. What happens with barrier=0? -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kswapd continuously active 2010-02-05 13:00 ` Jens Axboe @ 2010-02-05 13:09 ` Jan Engelhardt 2010-02-05 13:12 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Jan Engelhardt @ 2010-02-05 13:09 UTC (permalink / raw) To: Jens Axboe; +Cc: Linux Kernel Mailing List On Friday 2010-02-05 14:00, Jens Axboe wrote: >> >January 25 Feb-05 >> >MemTotal: 8166752 kB 8166752 >> >MemFree: 3243552 kB 3781776 >> >Buffers: 207968 kB 4912 >> >Cached: 2728216 kB 2684400 >> >SwapCached: 0 kB 0 >> >Active: 2203136 kB 495624 >> >Inactive: 2152544 kB 3263136 >> >Active(anon): 1167256 kB 488168 >> >Inactive(anon): 252952 kB 583912 >> >Active(file): 1035880 kB 7456 >> >Inactive(file): 1899592 kB 2679224 >> >Unevictable: 0 kB 0 >> >Mlocked: 0 kB 0 >> >SwapTotal: 0 kB 0 >> >SwapFree: 0 kB 0 >> >Dirty: 141624 kB 2662184 >> >Writeback: 0 kB .. >> >> Today this happened again. So I looked at /proc/meminfo to paste today's >> values next to those from January. That is when I noticed the "Dirty" >> value - and thus I ran >> >> watch -d -n 1 'grep Dirty /proc/meminfo' >> >> What I see is that the dirty amount - a sync is currently running - >> only decreases with at most 400 KB/sec, often less than that. > >I'm guessing the barriers and commits are what is killing your >performance. What happens with barrier=0? The ext4 filesystem is already mounted with barrier=0. If there is any block-level barriers I also can turn off, what would be the command? thanks, Jan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kswapd continuously active 2010-02-05 13:09 ` Jan Engelhardt @ 2010-02-05 13:12 ` Jens Axboe 2010-02-05 13:24 ` Jan Engelhardt 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2010-02-05 13:12 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Kernel Mailing List On Fri, Feb 05 2010, Jan Engelhardt wrote: > > On Friday 2010-02-05 14:00, Jens Axboe wrote: > >> >January 25 Feb-05 > >> >MemTotal: 8166752 kB 8166752 > >> >MemFree: 3243552 kB 3781776 > >> >Buffers: 207968 kB 4912 > >> >Cached: 2728216 kB 2684400 > >> >SwapCached: 0 kB 0 > >> >Active: 2203136 kB 495624 > >> >Inactive: 2152544 kB 3263136 > >> >Active(anon): 1167256 kB 488168 > >> >Inactive(anon): 252952 kB 583912 > >> >Active(file): 1035880 kB 7456 > >> >Inactive(file): 1899592 kB 2679224 > >> >Unevictable: 0 kB 0 > >> >Mlocked: 0 kB 0 > >> >SwapTotal: 0 kB 0 > >> >SwapFree: 0 kB 0 > >> >Dirty: 141624 kB 2662184 > >> >Writeback: 0 kB .. > >> > >> Today this happened again. So I looked at /proc/meminfo to paste today's > >> values next to those from January. That is when I noticed the "Dirty" > >> value - and thus I ran > >> > >> watch -d -n 1 'grep Dirty /proc/meminfo' > >> > >> What I see is that the dirty amount - a sync is currently running - > >> only decreases with at most 400 KB/sec, often less than that. > > > >I'm guessing the barriers and commits are what is killing your > >performance. What happens with barrier=0? > > The ext4 filesystem is already mounted with barrier=0. If there > is any block-level barriers I also can turn off, what would be > the command? barrier=0 is enough. I do wonder why your writeback rate is that slow, then. The disk has write back caching enabled? -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kswapd continuously active 2010-02-05 13:12 ` Jens Axboe @ 2010-02-05 13:24 ` Jan Engelhardt 2010-02-07 10:50 ` Jan Engelhardt 0 siblings, 1 reply; 13+ messages in thread From: Jan Engelhardt @ 2010-02-05 13:24 UTC (permalink / raw) To: Jens Axboe; +Cc: Linux Kernel Mailing List On Friday 2010-02-05 14:12, Jens Axboe wrote: >> >> Today this happened again. So I looked at /proc/meminfo to paste today's >> >> values next to those from January. That is when I noticed the "Dirty" >> >> value - and thus I ran >> >> >> >> watch -d -n 1 'grep Dirty /proc/meminfo' >> >> >> >> What I see is that the dirty amount - a sync is currently running - >> >> only decreases with at most 400 KB/sec, often less than that. >> > >> >I'm guessing the barriers and commits are what is killing your >> >performance. What happens with barrier=0? >> >> The ext4 filesystem is already mounted with barrier=0. If there >> is any block-level barriers I also can turn off, what would be >> the command? > >barrier=0 is enough. I do wonder why your writeback rate is that slow, >then. The disk has write back caching enabled? Yes, that seems to be the case at least. In fact the box is all fluffy when nobody runs sync(1), which is what makes it so strange. sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kswapd continuously active 2010-02-05 13:24 ` Jan Engelhardt @ 2010-02-07 10:50 ` Jan Engelhardt 2010-02-08 0:07 ` Jan Engelhardt 0 siblings, 1 reply; 13+ messages in thread From: Jan Engelhardt @ 2010-02-07 10:50 UTC (permalink / raw) To: Jens Axboe; +Cc: Linux Kernel Mailing List On Friday 2010-02-05 14:24, Jan Engelhardt wrote: >>> The ext4 filesystem is already mounted with barrier=0. If there >>> is any block-level barriers I also can turn off, what would be >>> the command? >> >>barrier=0 is enough. I do wonder why your writeback rate is that slow, >>then. The disk has write back caching enabled? > >Yes, that seems to be the case at least. In fact the box is all fluffy >when nobody runs sync(1), which is what makes it so strange. > >sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't >support DPO or FUA Here is an alternate trace that just happened. INFO: task flush-8:0:343 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-8:0 D 0000000000555930 6664 343 2 0x18000000000 Call Trace: [00000000005555f4] start_this_handle+0x324/0x4b0 [0000000000555930] jbd2_journal_start+0x94/0xc0 [000000000052ddb8] ext4_da_writepages+0x1e0/0x460 [000000000049bf30] do_writepages+0x28/0x48 [00000000004e6d58] writeback_single_inode+0xf0/0x330 [00000000004e7b24] writeback_inodes_wb+0x4c8/0x5d8 [00000000004e7da4] wb_writeback+0x170/0x1ec [00000000004e8074] wb_do_writeback+0x188/0x1a4 [00000000004e80b8] bdi_writeback_task+0x28/0xa0 [00000000004a76c8] bdi_start_fn+0x64/0xc4 [0000000000475f84] kthread+0x58/0x6c [000000000042ade0] kernel_thread+0x30/0x48 [0000000000475ee0] kthreadd+0xb8/0x104 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kswapd continuously active 2010-02-07 10:50 ` Jan Engelhardt @ 2010-02-08 0:07 ` Jan Engelhardt 2010-02-08 14:08 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Jan Engelhardt @ 2010-02-08 0:07 UTC (permalink / raw) To: Jens Axboe; +Cc: Linux Kernel Mailing List On Sunday 2010-02-07 11:50, Jan Engelhardt wrote: >On Friday 2010-02-05 14:24, Jan Engelhardt wrote: >>>> The ext4 filesystem is already mounted with barrier=0. If there >>>> is any block-level barriers I also can turn off, what would be >>>> the command? >>> >>>barrier=0 is enough. I do wonder why your writeback rate is that slow, >>>then. The disk has write back caching enabled? >> >>Yes, that seems to be the case at least. In fact the box is all fluffy >>when nobody runs sync(1), which is what makes it so strange. >> >>sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't >>support DPO or FUA > >Here is an alternate trace that just happened. > >INFO: task flush-8:0:343 blocked for more than 120 seconds. >"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >flush-8:0 D 0000000000555930 6664 343 2 0x18000000000 >Call Trace: > [00000000005555f4] start_this_handle+0x324/0x4b0 > [0000000000555930] jbd2_journal_start+0x94/0xc0 > [000000000052ddb8] ext4_da_writepages+0x1e0/0x460 > [000000000049bf30] do_writepages+0x28/0x48 > [00000000004e6d58] writeback_single_inode+0xf0/0x330 > [00000000004e7b24] writeback_inodes_wb+0x4c8/0x5d8 > [00000000004e7da4] wb_writeback+0x170/0x1ec > [00000000004e8074] wb_do_writeback+0x188/0x1a4 > [00000000004e80b8] bdi_writeback_task+0x28/0xa0 > [00000000004a76c8] bdi_start_fn+0x64/0xc4 > [0000000000475f84] kthread+0x58/0x6c > [000000000042ade0] kernel_thread+0x30/0x48 > [0000000000475ee0] kthreadd+0xb8/0x104 Could it be that there is something synchronize_rcu()-like in the game that — as a result of how RCU works — just takes ages with 24 VCPUs? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kswapd continuously active 2010-02-08 0:07 ` Jan Engelhardt @ 2010-02-08 14:08 ` Jens Axboe 2010-02-10 15:18 ` Jan Engelhardt 2010-02-10 15:23 ` Jan Engelhardt 0 siblings, 2 replies; 13+ messages in thread From: Jens Axboe @ 2010-02-08 14:08 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Kernel Mailing List On Mon, Feb 08 2010, Jan Engelhardt wrote: > > On Sunday 2010-02-07 11:50, Jan Engelhardt wrote: > >On Friday 2010-02-05 14:24, Jan Engelhardt wrote: > >>>> The ext4 filesystem is already mounted with barrier=0. If there > >>>> is any block-level barriers I also can turn off, what would be > >>>> the command? > >>> > >>>barrier=0 is enough. I do wonder why your writeback rate is that slow, > >>>then. The disk has write back caching enabled? > >> > >>Yes, that seems to be the case at least. In fact the box is all fluffy > >>when nobody runs sync(1), which is what makes it so strange. > >> > >>sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't > >>support DPO or FUA > > > >Here is an alternate trace that just happened. > > > >INFO: task flush-8:0:343 blocked for more than 120 seconds. > >"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >flush-8:0 D 0000000000555930 6664 343 2 0x18000000000 > >Call Trace: > > [00000000005555f4] start_this_handle+0x324/0x4b0 > > [0000000000555930] jbd2_journal_start+0x94/0xc0 > > [000000000052ddb8] ext4_da_writepages+0x1e0/0x460 > > [000000000049bf30] do_writepages+0x28/0x48 > > [00000000004e6d58] writeback_single_inode+0xf0/0x330 > > [00000000004e7b24] writeback_inodes_wb+0x4c8/0x5d8 > > [00000000004e7da4] wb_writeback+0x170/0x1ec > > [00000000004e8074] wb_do_writeback+0x188/0x1a4 > > [00000000004e80b8] bdi_writeback_task+0x28/0xa0 > > [00000000004a76c8] bdi_start_fn+0x64/0xc4 > > [0000000000475f84] kthread+0x58/0x6c > > [000000000042ade0] kernel_thread+0x30/0x48 > > [0000000000475ee0] kthreadd+0xb8/0x104 > > Could it be that there is something synchronize_rcu()-like in the > game that ??? as a result of how RCU works ??? just takes ages with 24 > VCPUs? The only synchronize_rcu() involved in the writeback code happens when a bdi exits, so you should not hit that. It'll do call_rcu() for work completions, but 1) you should not see a lot of work entries, and 2) lots of other kernel code will do that, too. Are you seeing a lot of CPU usage? What does eg perf top -a say? And what setup is this, I didn't realize you were running a virtualized setup? -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kswapd continuously active 2010-02-08 14:08 ` Jens Axboe @ 2010-02-10 15:18 ` Jan Engelhardt 2010-02-10 15:23 ` Jan Engelhardt 1 sibling, 0 replies; 13+ messages in thread From: Jan Engelhardt @ 2010-02-10 15:18 UTC (permalink / raw) To: jens.axboe; +Cc: Linux Kernel Mailing List Hi, the perf top -a dumps: Light fs activity (/bin/rm old build roots) ------------------------------------------------------------------------------ PerfTop: 4800 irqs/sec kernel:94.1% [100000 cycles], (all, 24 CPUs) ------------------------------------------------------------------------------ 12352.00 - 17.7% : __lock_acquire 4975.00 - 7.1% : lock_release_holdtime 4793.00 - 6.9% : lock_release 3586.00 - 5.1% : lock_acquired 3544.00 - 5.1% : stick_get_tick 2245.00 - 3.2% : sched_clock 1938.00 - 2.8% : sched_clock_cpu 1851.00 - 2.7% : lock_acquire 1800.00 - 2.6% : tl0_linux64 1475.00 - 2.1% : ext4_mark_iloc_dirty 1041.00 - 1.5% : fget_light 1035.00 - 1.5% : find_busiest_group 977.00 - 1.4% : getnstimeofday 939.00 - 1.3% : do_sys_poll 926.00 - 1.3% : perf_poll (Starting real work - file creation etc - now) Calling sync a minute later when Dirty > 200MB in /proc/meminfo: ------------------------------------------------------------------------------ PerfTop: 28822 irqs/sec kernel:22.2% [100000 cycles], (all, 24 CPUs) ------------------------------------------------------------------------------ samples pcnt kernel function _______ _____ _______________ 22952.00 - 14.9% : __lock_acquire 9142.00 - 5.9% : lock_release_holdtime 8881.00 - 5.8% : stick_get_tick 8375.00 - 5.4% : lock_release 7401.00 - 4.8% : lock_acquired 5522.00 - 3.6% : tl0_linux64 4325.00 - 2.8% : sched_clock 3634.00 - 2.4% : ext4_rec_len_from_disk 3600.00 - 2.3% : sched_clock_cpu 3261.00 - 2.1% : lock_acquire 3012.00 - 2.0% : NGcopy_to_user 2988.00 - 1.9% : ext4_check_dir_entry 2909.00 - 1.9% : getnstimeofday 2010.00 - 1.3% : __delay 1915.00 - 1.2% : cpu_clock Stopping all write processes so that sync can complete: ------------------------------------------------------------------------------ PerfTop: 2679 irqs/sec kernel:92.6% [100000 cycles], (all, 24 CPUs) ------------------------------------------------------------------------------ samples pcnt kernel function _______ _____ _______________ 6102.00 - 19.7% : __lock_acquire 2504.00 - 8.1% : lock_release_holdtime 2346.00 - 7.6% : lock_release 1685.00 - 5.5% : stick_get_tick 1532.00 - 5.0% : lock_acquired 1135.00 - 3.7% : sched_clock 1068.00 - 3.5% : find_busiest_group 1034.00 - 3.3% : sched_clock_cpu 1023.00 - 3.3% : tl0_linux64 832.00 - 2.7% : lock_acquire 658.00 - 2.1% : fget_light 656.00 - 2.1% : perf_poll 629.00 - 2.0% : journal_clean_one_cp_list 572.00 - 1.9% : find_next_bit 525.00 - 1.7% : getnstimeofday ------------------------------------------------------------------------------ PerfTop: 3182 irqs/sec kernel:90.5% [100000 cycles], (all, 24 CPUs) ------------------------------------------------------------------------------ samples pcnt kernel function _______ _____ _______________ 9076.00 - 19.9% : __lock_acquire 3702.00 - 8.1% : lock_release_holdtime 3420.00 - 7.5% : lock_release 2522.00 - 5.5% : stick_get_tick 2198.00 - 4.8% : lock_acquired 1700.00 - 3.7% : sched_clock 1539.00 - 3.4% : find_busiest_group 1507.00 - 3.3% : sched_clock_cpu 1402.00 - 3.1% : tl0_linux64 1277.00 - 2.8% : lock_acquire 1089.00 - 2.4% : journal_clean_one_cp_list 1085.00 - 2.4% : fget_light 1004.00 - 2.2% : perf_poll 860.00 - 1.9% : do_sys_poll 782.00 - 1.7% : find_next_bit There's that journal_clean_one_cp_list thing .. I have a gut feeling that sys_sync just waits until all data has been flushed. That however won't fly, because there's always new dirty data arriving when the box is not idle. Of course I am not calling /bin/sync on purpose, but some software packages do; rrdtool for example has an autoconf runtime check for fsync(2), so it's inevitable that sync is called at some point by something. I also did # perf top -a -k vmlinux-... -p $(that pid of jbd2-sda6) ------------------------------------------------------------------------------ PerfTop: 105 irqs/sec kernel:100.0% [100000 cycles], (target_pid: 574) ------------------------------------------------------------------------------ samples pcnt kernel function _______ _____ _______________ 42.00 - 12.8% : journal_clean_one_cp_list 34.00 - 10.4% : __lock_acquire 33.00 - 10.1% : __jbd2_journal_clean_checkpoint_list 28.00 - 8.5% : find_busiest_group 20.00 - 6.1% : clear_bit 16.00 - 4.9% : lock_release 16.00 - 4.9% : stick_get_tick 14.00 - 4.3% : test_and_set_bit 13.00 - 4.0% : lock_release_holdtime 12.00 - 3.7% : lock_acquired 9.00 - 2.7% : find_next_bit 7.00 - 2.1% : lock_acquire 7.00 - 2.1% : jbd2_journal_write_revoke_records 6.00 - 1.8% : jbd2_journal_switch_revoke_table 6.00 - 1.8% : cpu_clock ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: kswapd continuously active 2010-02-08 14:08 ` Jens Axboe 2010-02-10 15:18 ` Jan Engelhardt @ 2010-02-10 15:23 ` Jan Engelhardt 1 sibling, 0 replies; 13+ messages in thread From: Jan Engelhardt @ 2010-02-10 15:23 UTC (permalink / raw) To: jens.axboe Hi, >On Mon, Feb 08 2010, Jan Engelhardt wrote: > >> Could it be that there is something synchronize_rcu()-like in the >> game that-- as a result of how RCU works-- just takes ages with 24 >> VCPUs? > >The only synchronize_rcu() involved in the writeback code happens when >a bdi exits, so you should not hit that. It'll do call_rcu() for work >completions, but 1) you should not see a lot of work entries, and 2) >lots of other kernel code will do that, too. > >Are you seeing a lot of CPU usage? What does eg perf top -a say? top - 16:19:42 up 1:02, 3 users, load average: 3.05, 3.87, 3.51 Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 98.2%id, 1.6%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8063800k total, 6676472k used, 1387328k free, 767040k buffers Swap: 0k total, 0k used, 0k free, 4775464k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26354 root 20 0 3472 1704 1184 R 4 0.0 0:00.33 top 26198 jengelh 20 0 2864 1368 1152 S 4 0.0 0:03.72 watch 574 root 20 0 0 0 0 D 3 0.0 0:52.20 jbd2/sda6-8 312 root 20 0 0 0 0 R 1 0.0 0:29.29 flush-8:0 1 root 20 0 2408 936 808 S 0 0.0 0:01.13 init When it comes down hard, i.e. when there is always new data arriving while sys_sync is working, the loadavg is usually in the range of 1.0*CPUs up to 2.0*CPUs. The top dump shown here is after quiescing writers. >And what setup is this, I didn't realize you were running a virtualized >setup? It is real; but I usually use the term VCPU because this iron has 1 CPU with 6 cores with 24 threads, not 24 CPU (packages). ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-02-10 15:23 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-01-22 23:52 kswapd continuously active Jan Engelhardt 2010-01-25 13:06 ` Jens Axboe 2010-01-25 13:22 ` Jan Engelhardt 2010-02-05 12:35 ` Jan Engelhardt 2010-02-05 13:00 ` Jens Axboe 2010-02-05 13:09 ` Jan Engelhardt 2010-02-05 13:12 ` Jens Axboe 2010-02-05 13:24 ` Jan Engelhardt 2010-02-07 10:50 ` Jan Engelhardt 2010-02-08 0:07 ` Jan Engelhardt 2010-02-08 14:08 ` Jens Axboe 2010-02-10 15:18 ` Jan Engelhardt 2010-02-10 15:23 ` Jan Engelhardt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox