public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* kswapd continuously active
@ 2010-01-22 23:52 Jan Engelhardt
  2010-01-25 13:06 ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Engelhardt @ 2010-01-22 23:52 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: jens.axboe

Hi,


with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process 
busy in D state, with the following trace:

sync          D 000000000079299c  7552  4851      1 0x208061101000004
Call Trace:
 [000000000053ca58] bdi_sched_wait+0xc/0x1c
 [000000000079299c] __wait_on_bit+0x58/0xb8
 [0000000000792a5c] out_of_line_wait_on_bit+0x60/0x74
 [000000000053ca3c] bdi_sync_writeback+0x6c/0x7c
 [000000000053ca78] sync_inodes_sb+0x10/0xfc
 [0000000000540dd0] __sync_filesystem+0x50/0x88
 [0000000000540ec8] sync_filesystems+0xc0/0x124
 [0000000000540f80] sys_sync+0x1c/0x48
 [0000000000406294] linux_sparc_syscall+0x34/0x44

kswapd is also active all the time, writing something to disk - LED is 
blinking, and that's been going on for over half an hour despite the box 
being not busy. How do I see what kswapd is still flushing to disk? Even 
if all RAM (8 GB) was filled with dirty data, syncing it out would not 
take that long - that is to say, the sync process should have long 
exited.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kswapd continuously active
  2010-01-22 23:52 kswapd continuously active Jan Engelhardt
@ 2010-01-25 13:06 ` Jens Axboe
  2010-01-25 13:22   ` Jan Engelhardt
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2010-01-25 13:06 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Linux Kernel Mailing List

On Sat, Jan 23 2010, Jan Engelhardt wrote:
> Hi,
> 
> 
> with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process 
> busy in D state, with the following trace:
> 
> sync          D 000000000079299c  7552  4851      1 0x208061101000004
> Call Trace:
>  [000000000053ca58] bdi_sched_wait+0xc/0x1c
>  [000000000079299c] __wait_on_bit+0x58/0xb8
>  [0000000000792a5c] out_of_line_wait_on_bit+0x60/0x74
>  [000000000053ca3c] bdi_sync_writeback+0x6c/0x7c
>  [000000000053ca78] sync_inodes_sb+0x10/0xfc
>  [0000000000540dd0] __sync_filesystem+0x50/0x88
>  [0000000000540ec8] sync_filesystems+0xc0/0x124
>  [0000000000540f80] sys_sync+0x1c/0x48
>  [0000000000406294] linux_sparc_syscall+0x34/0x44
> 
> kswapd is also active all the time, writing something to disk - LED is 
> blinking, and that's been going on for over half an hour despite the box 
> being not busy. How do I see what kswapd is still flushing to disk? Even 
> if all RAM (8 GB) was filled with dirty data, syncing it out would not 
> take that long - that is to say, the sync process should have long 
> exited.

That doesn't sound good. What does /proc/meminfo say? What file systems
are you using?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kswapd continuously active
  2010-01-25 13:06 ` Jens Axboe
@ 2010-01-25 13:22   ` Jan Engelhardt
  2010-02-05 12:35     ` Jan Engelhardt
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Engelhardt @ 2010-01-25 13:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux Kernel Mailing List


On Monday 2010-01-25 14:06, Jens Axboe wrote:
>> 
>> with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process 
>> busy in D state, with the following trace:
>> 
>> sync          D 000000000079299c  7552  4851      1 0x208061101000004
>> Call Trace:
>>  [000000000053ca58] bdi_sched_wait+0xc/0x1c
>>  [000000000079299c] __wait_on_bit+0x58/0xb8
>>  [0000000000792a5c] out_of_line_wait_on_bit+0x60/0x74
>>  [000000000053ca3c] bdi_sync_writeback+0x6c/0x7c
>>  [000000000053ca78] sync_inodes_sb+0x10/0xfc
>>  [0000000000540dd0] __sync_filesystem+0x50/0x88
>>  [0000000000540ec8] sync_filesystems+0xc0/0x124
>>  [0000000000540f80] sys_sync+0x1c/0x48
>>  [0000000000406294] linux_sparc_syscall+0x34/0x44
>> 
>> kswapd is also active all the time, writing something to disk - LED is 
>> blinking, and that's been going on for over half an hour despite the box 
>> being not busy. How do I see what kswapd is still flushing to disk? Even 
>> if all RAM (8 GB) was filled with dirty data, syncing it out would not 
>> take that long - that is to say, the sync process should have long 
>> exited.
>
>That doesn't sound good. What does /proc/meminfo say? What file systems
>are you using?

Eventually that day, the sync finished; not sure what triggered that.
Usually, meminfo looks like when the box is doing something:

14:08 ares:~ > cat /proc/meminfo 
MemTotal:        8166752 kB
MemFree:         3243552 kB
Buffers:          207968 kB
Cached:          2728216 kB
SwapCached:            0 kB
Active:          2203136 kB
Inactive:        2152544 kB
Active(anon):    1167256 kB
Inactive(anon):   252952 kB
Active(file):    1035880 kB
Inactive(file):  1899592 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:            141624 kB
Writeback:             0 kB
AnonPages:       1421448 kB
Mapped:            49904 kB
Shmem:               680 kB
Slab:             429784 kB
SReclaimable:     315760 kB
SUnreclaim:       114024 kB
KernelStack:        9248 kB
PageTables:         6120 kB
Quicklists:        10560 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4083376 kB
Committed_AS:    1626280 kB
VmallocTotal:   1069547520 kB
VmallocUsed:       28816 kB
VmallocChunk:   1069518664 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       4096 kB
14:16 ares:~ > w
 14:17:13 up  6:26,  2 users,  load average: 21.09, 24.41, 22.49

This is used as a compile box, and there are lots of files created
and deleted when there is work - a new chroot for every package
basically, with a total of about 300K such "floating" files.
Filesystem is ext4.

/dev/sda4 / ext4 rw,relatime,barrier=1,data=ordered 0 0

Now that I see barrier=1, maybe I should turn that off in the future
like I did with XFS around 2.6.17[1].

[1] http://lkml.indiana.edu/hypermail/linux/kernel/0605.2/1110.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kswapd continuously active
  2010-01-25 13:22   ` Jan Engelhardt
@ 2010-02-05 12:35     ` Jan Engelhardt
  2010-02-05 13:00       ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Engelhardt @ 2010-02-05 12:35 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux Kernel Mailing List


On Monday 2010-01-25 14:22, Jan Engelhardt wrote:
>On Monday 2010-01-25 14:06, Jens Axboe wrote:
>>> 
>>> with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process 
>>> busy in D state, with the following trace:
>>> 
>>> sync          D 000000000079299c  7552  4851      1 0x208061101000004
>>> Call Trace:
>>>  [000000000053ca58] bdi_sched_wait+0xc/0x1c[...]
>>>  [000000000053ca78] sync_inodes_sb+0x10/0xfc
>>> 
>>> kswapd is also active all the time, writing something to disk[...]
>>
>>That doesn't sound good. What does /proc/meminfo say? What file systems
>>are you using?

>January 25			Feb-05
>MemTotal:        8166752 kB	8166752
>MemFree:         3243552 kB	3781776
>Buffers:          207968 kB	4912
>Cached:          2728216 kB	2684400
>SwapCached:            0 kB	0
>Active:          2203136 kB	495624
>Inactive:        2152544 kB	3263136
>Active(anon):    1167256 kB	488168
>Inactive(anon):   252952 kB	583912
>Active(file):    1035880 kB	7456
>Inactive(file):  1899592 kB	2679224
>Unevictable:           0 kB	0
>Mlocked:               0 kB	0
>SwapTotal:             0 kB	0
>SwapFree:              0 kB	0
>Dirty:            141624 kB	2662184
>Writeback:             0 kB	..

Today this happened again. So I looked at /proc/meminfo to paste today's
values next to those from January. That is when I noticed the "Dirty"
value - and thus I ran

	watch -d -n 1 'grep Dirty /proc/meminfo'

What I see is that the dirty amount - a sync is currently running -
only decreases with at most 400 KB/sec, often less than that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kswapd continuously active
  2010-02-05 12:35     ` Jan Engelhardt
@ 2010-02-05 13:00       ` Jens Axboe
  2010-02-05 13:09         ` Jan Engelhardt
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2010-02-05 13:00 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Linux Kernel Mailing List

On Fri, Feb 05 2010, Jan Engelhardt wrote:
> 
> On Monday 2010-01-25 14:22, Jan Engelhardt wrote:
> >On Monday 2010-01-25 14:06, Jens Axboe wrote:
> >>> 
> >>> with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process 
> >>> busy in D state, with the following trace:
> >>> 
> >>> sync          D 000000000079299c  7552  4851      1 0x208061101000004
> >>> Call Trace:
> >>>  [000000000053ca58] bdi_sched_wait+0xc/0x1c[...]
> >>>  [000000000053ca78] sync_inodes_sb+0x10/0xfc
> >>> 
> >>> kswapd is also active all the time, writing something to disk[...]
> >>
> >>That doesn't sound good. What does /proc/meminfo say? What file systems
> >>are you using?
> 
> >January 25			Feb-05
> >MemTotal:        8166752 kB	8166752
> >MemFree:         3243552 kB	3781776
> >Buffers:          207968 kB	4912
> >Cached:          2728216 kB	2684400
> >SwapCached:            0 kB	0
> >Active:          2203136 kB	495624
> >Inactive:        2152544 kB	3263136
> >Active(anon):    1167256 kB	488168
> >Inactive(anon):   252952 kB	583912
> >Active(file):    1035880 kB	7456
> >Inactive(file):  1899592 kB	2679224
> >Unevictable:           0 kB	0
> >Mlocked:               0 kB	0
> >SwapTotal:             0 kB	0
> >SwapFree:              0 kB	0
> >Dirty:            141624 kB	2662184
> >Writeback:             0 kB	..
> 
> Today this happened again. So I looked at /proc/meminfo to paste today's
> values next to those from January. That is when I noticed the "Dirty"
> value - and thus I ran
> 
> 	watch -d -n 1 'grep Dirty /proc/meminfo'
> 
> What I see is that the dirty amount - a sync is currently running -
> only decreases with at most 400 KB/sec, often less than that.

I'm guessing the barriers and commits are what is killing your
performance. What happens with barrier=0?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kswapd continuously active
  2010-02-05 13:00       ` Jens Axboe
@ 2010-02-05 13:09         ` Jan Engelhardt
  2010-02-05 13:12           ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Engelhardt @ 2010-02-05 13:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux Kernel Mailing List


On Friday 2010-02-05 14:00, Jens Axboe wrote:
>> >January 25			Feb-05
>> >MemTotal:        8166752 kB	8166752
>> >MemFree:         3243552 kB	3781776
>> >Buffers:          207968 kB	4912
>> >Cached:          2728216 kB	2684400
>> >SwapCached:            0 kB	0
>> >Active:          2203136 kB	495624
>> >Inactive:        2152544 kB	3263136
>> >Active(anon):    1167256 kB	488168
>> >Inactive(anon):   252952 kB	583912
>> >Active(file):    1035880 kB	7456
>> >Inactive(file):  1899592 kB	2679224
>> >Unevictable:           0 kB	0
>> >Mlocked:               0 kB	0
>> >SwapTotal:             0 kB	0
>> >SwapFree:              0 kB	0
>> >Dirty:            141624 kB	2662184
>> >Writeback:             0 kB	..
>> 
>> Today this happened again. So I looked at /proc/meminfo to paste today's
>> values next to those from January. That is when I noticed the "Dirty"
>> value - and thus I ran
>> 
>> 	watch -d -n 1 'grep Dirty /proc/meminfo'
>> 
>> What I see is that the dirty amount - a sync is currently running -
>> only decreases with at most 400 KB/sec, often less than that.
>
>I'm guessing the barriers and commits are what is killing your
>performance. What happens with barrier=0?

The ext4 filesystem is already mounted with barrier=0. If there
is any block-level barriers I also can turn off, what would be
the command?


thanks,
Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kswapd continuously active
  2010-02-05 13:09         ` Jan Engelhardt
@ 2010-02-05 13:12           ` Jens Axboe
  2010-02-05 13:24             ` Jan Engelhardt
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2010-02-05 13:12 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Linux Kernel Mailing List

On Fri, Feb 05 2010, Jan Engelhardt wrote:
> 
> On Friday 2010-02-05 14:00, Jens Axboe wrote:
> >> >January 25			Feb-05
> >> >MemTotal:        8166752 kB	8166752
> >> >MemFree:         3243552 kB	3781776
> >> >Buffers:          207968 kB	4912
> >> >Cached:          2728216 kB	2684400
> >> >SwapCached:            0 kB	0
> >> >Active:          2203136 kB	495624
> >> >Inactive:        2152544 kB	3263136
> >> >Active(anon):    1167256 kB	488168
> >> >Inactive(anon):   252952 kB	583912
> >> >Active(file):    1035880 kB	7456
> >> >Inactive(file):  1899592 kB	2679224
> >> >Unevictable:           0 kB	0
> >> >Mlocked:               0 kB	0
> >> >SwapTotal:             0 kB	0
> >> >SwapFree:              0 kB	0
> >> >Dirty:            141624 kB	2662184
> >> >Writeback:             0 kB	..
> >> 
> >> Today this happened again. So I looked at /proc/meminfo to paste today's
> >> values next to those from January. That is when I noticed the "Dirty"
> >> value - and thus I ran
> >> 
> >> 	watch -d -n 1 'grep Dirty /proc/meminfo'
> >> 
> >> What I see is that the dirty amount - a sync is currently running -
> >> only decreases with at most 400 KB/sec, often less than that.
> >
> >I'm guessing the barriers and commits are what is killing your
> >performance. What happens with barrier=0?
> 
> The ext4 filesystem is already mounted with barrier=0. If there
> is any block-level barriers I also can turn off, what would be
> the command?

barrier=0 is enough. I do wonder why your writeback rate is that slow,
then. The disk has write back caching enabled?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kswapd continuously active
  2010-02-05 13:12           ` Jens Axboe
@ 2010-02-05 13:24             ` Jan Engelhardt
  2010-02-07 10:50               ` Jan Engelhardt
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Engelhardt @ 2010-02-05 13:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux Kernel Mailing List

On Friday 2010-02-05 14:12, Jens Axboe wrote:
>> >> Today this happened again. So I looked at /proc/meminfo to paste today's
>> >> values next to those from January. That is when I noticed the "Dirty"
>> >> value - and thus I ran
>> >> 
>> >> 	watch -d -n 1 'grep Dirty /proc/meminfo'
>> >> 
>> >> What I see is that the dirty amount - a sync is currently running -
>> >> only decreases with at most 400 KB/sec, often less than that.
>> >
>> >I'm guessing the barriers and commits are what is killing your
>> >performance. What happens with barrier=0?
>> 
>> The ext4 filesystem is already mounted with barrier=0. If there
>> is any block-level barriers I also can turn off, what would be
>> the command?
>
>barrier=0 is enough. I do wonder why your writeback rate is that slow,
>then. The disk has write back caching enabled?

Yes, that seems to be the case at least. In fact the box is all fluffy 
when nobody runs sync(1), which is what makes it so strange.

sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kswapd continuously active
  2010-02-05 13:24             ` Jan Engelhardt
@ 2010-02-07 10:50               ` Jan Engelhardt
  2010-02-08  0:07                 ` Jan Engelhardt
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Engelhardt @ 2010-02-07 10:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux Kernel Mailing List


On Friday 2010-02-05 14:24, Jan Engelhardt wrote:
>>> The ext4 filesystem is already mounted with barrier=0. If there
>>> is any block-level barriers I also can turn off, what would be
>>> the command?
>>
>>barrier=0 is enough. I do wonder why your writeback rate is that slow,
>>then. The disk has write back caching enabled?
>
>Yes, that seems to be the case at least. In fact the box is all fluffy 
>when nobody runs sync(1), which is what makes it so strange.
>
>sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
>support DPO or FUA

Here is an alternate trace that just happened.

INFO: task flush-8:0:343 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:0     D 0000000000555930  6664   343      2 0x18000000000
Call Trace:
 [00000000005555f4] start_this_handle+0x324/0x4b0
 [0000000000555930] jbd2_journal_start+0x94/0xc0
 [000000000052ddb8] ext4_da_writepages+0x1e0/0x460
 [000000000049bf30] do_writepages+0x28/0x48
 [00000000004e6d58] writeback_single_inode+0xf0/0x330
 [00000000004e7b24] writeback_inodes_wb+0x4c8/0x5d8
 [00000000004e7da4] wb_writeback+0x170/0x1ec
 [00000000004e8074] wb_do_writeback+0x188/0x1a4
 [00000000004e80b8] bdi_writeback_task+0x28/0xa0
 [00000000004a76c8] bdi_start_fn+0x64/0xc4
 [0000000000475f84] kthread+0x58/0x6c
 [000000000042ade0] kernel_thread+0x30/0x48
 [0000000000475ee0] kthreadd+0xb8/0x104

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kswapd continuously active
  2010-02-07 10:50               ` Jan Engelhardt
@ 2010-02-08  0:07                 ` Jan Engelhardt
  2010-02-08 14:08                   ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Engelhardt @ 2010-02-08  0:07 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux Kernel Mailing List


On Sunday 2010-02-07 11:50, Jan Engelhardt wrote:
>On Friday 2010-02-05 14:24, Jan Engelhardt wrote:
>>>> The ext4 filesystem is already mounted with barrier=0. If there
>>>> is any block-level barriers I also can turn off, what would be
>>>> the command?
>>>
>>>barrier=0 is enough. I do wonder why your writeback rate is that slow,
>>>then. The disk has write back caching enabled?
>>
>>Yes, that seems to be the case at least. In fact the box is all fluffy 
>>when nobody runs sync(1), which is what makes it so strange.
>>
>>sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
>>support DPO or FUA
>
>Here is an alternate trace that just happened.
>
>INFO: task flush-8:0:343 blocked for more than 120 seconds.
>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>flush-8:0     D 0000000000555930  6664   343      2 0x18000000000
>Call Trace:
> [00000000005555f4] start_this_handle+0x324/0x4b0
> [0000000000555930] jbd2_journal_start+0x94/0xc0
> [000000000052ddb8] ext4_da_writepages+0x1e0/0x460
> [000000000049bf30] do_writepages+0x28/0x48
> [00000000004e6d58] writeback_single_inode+0xf0/0x330
> [00000000004e7b24] writeback_inodes_wb+0x4c8/0x5d8
> [00000000004e7da4] wb_writeback+0x170/0x1ec
> [00000000004e8074] wb_do_writeback+0x188/0x1a4
> [00000000004e80b8] bdi_writeback_task+0x28/0xa0
> [00000000004a76c8] bdi_start_fn+0x64/0xc4
> [0000000000475f84] kthread+0x58/0x6c
> [000000000042ade0] kernel_thread+0x30/0x48
> [0000000000475ee0] kthreadd+0xb8/0x104

Could it be that there is something synchronize_rcu()-like in the
game that — as a result of how RCU works — just takes ages with 24
VCPUs?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kswapd continuously active
  2010-02-08  0:07                 ` Jan Engelhardt
@ 2010-02-08 14:08                   ` Jens Axboe
  2010-02-10 15:18                     ` Jan Engelhardt
  2010-02-10 15:23                     ` Jan Engelhardt
  0 siblings, 2 replies; 13+ messages in thread
From: Jens Axboe @ 2010-02-08 14:08 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Linux Kernel Mailing List

On Mon, Feb 08 2010, Jan Engelhardt wrote:
> 
> On Sunday 2010-02-07 11:50, Jan Engelhardt wrote:
> >On Friday 2010-02-05 14:24, Jan Engelhardt wrote:
> >>>> The ext4 filesystem is already mounted with barrier=0. If there
> >>>> is any block-level barriers I also can turn off, what would be
> >>>> the command?
> >>>
> >>>barrier=0 is enough. I do wonder why your writeback rate is that slow,
> >>>then. The disk has write back caching enabled?
> >>
> >>Yes, that seems to be the case at least. In fact the box is all fluffy 
> >>when nobody runs sync(1), which is what makes it so strange.
> >>
> >>sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
> >>support DPO or FUA
> >
> >Here is an alternate trace that just happened.
> >
> >INFO: task flush-8:0:343 blocked for more than 120 seconds.
> >"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >flush-8:0     D 0000000000555930  6664   343      2 0x18000000000
> >Call Trace:
> > [00000000005555f4] start_this_handle+0x324/0x4b0
> > [0000000000555930] jbd2_journal_start+0x94/0xc0
> > [000000000052ddb8] ext4_da_writepages+0x1e0/0x460
> > [000000000049bf30] do_writepages+0x28/0x48
> > [00000000004e6d58] writeback_single_inode+0xf0/0x330
> > [00000000004e7b24] writeback_inodes_wb+0x4c8/0x5d8
> > [00000000004e7da4] wb_writeback+0x170/0x1ec
> > [00000000004e8074] wb_do_writeback+0x188/0x1a4
> > [00000000004e80b8] bdi_writeback_task+0x28/0xa0
> > [00000000004a76c8] bdi_start_fn+0x64/0xc4
> > [0000000000475f84] kthread+0x58/0x6c
> > [000000000042ade0] kernel_thread+0x30/0x48
> > [0000000000475ee0] kthreadd+0xb8/0x104
> 
> Could it be that there is something synchronize_rcu()-like in the
> game that ??? as a result of how RCU works ??? just takes ages with 24
> VCPUs?

The only synchronize_rcu() involved in the writeback code happens when a
bdi exits, so you should not hit that. It'll do call_rcu() for work
completions, but 1) you should not see a lot of work entries, and 2)
lots of other kernel code will do that, too.

Are you seeing a lot of CPU usage? What does eg perf top -a say?

And what setup is this, I didn't realize you were running a virtualized
setup?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kswapd continuously active
  2010-02-08 14:08                   ` Jens Axboe
@ 2010-02-10 15:18                     ` Jan Engelhardt
  2010-02-10 15:23                     ` Jan Engelhardt
  1 sibling, 0 replies; 13+ messages in thread
From: Jan Engelhardt @ 2010-02-10 15:18 UTC (permalink / raw)
  To: jens.axboe; +Cc: Linux Kernel Mailing List

Hi,


the perf top -a dumps:


Light fs activity (/bin/rm old build roots)
------------------------------------------------------------------------------
   PerfTop:    4800 irqs/sec  kernel:94.1% [100000 cycles],  (all, 24 CPUs)
------------------------------------------------------------------------------
            12352.00 - 17.7% : __lock_acquire
             4975.00 -  7.1% : lock_release_holdtime
             4793.00 -  6.9% : lock_release
             3586.00 -  5.1% : lock_acquired
             3544.00 -  5.1% : stick_get_tick
             2245.00 -  3.2% : sched_clock
             1938.00 -  2.8% : sched_clock_cpu
             1851.00 -  2.7% : lock_acquire
             1800.00 -  2.6% : tl0_linux64
             1475.00 -  2.1% : ext4_mark_iloc_dirty
             1041.00 -  1.5% : fget_light
             1035.00 -  1.5% : find_busiest_group
              977.00 -  1.4% : getnstimeofday
              939.00 -  1.3% : do_sys_poll
              926.00 -  1.3% : perf_poll

(Starting real work - file creation etc - now)

Calling sync a minute later when Dirty > 200MB in /proc/meminfo:
------------------------------------------------------------------------------
   PerfTop:   28822 irqs/sec  kernel:22.2% [100000 cycles],  (all, 24 CPUs)
------------------------------------------------------------------------------
             samples    pcnt   kernel function
             _______   _____   _______________

            22952.00 - 14.9% : __lock_acquire
             9142.00 -  5.9% : lock_release_holdtime
             8881.00 -  5.8% : stick_get_tick
             8375.00 -  5.4% : lock_release
             7401.00 -  4.8% : lock_acquired
             5522.00 -  3.6% : tl0_linux64
             4325.00 -  2.8% : sched_clock
             3634.00 -  2.4% : ext4_rec_len_from_disk
             3600.00 -  2.3% : sched_clock_cpu
             3261.00 -  2.1% : lock_acquire
             3012.00 -  2.0% : NGcopy_to_user
             2988.00 -  1.9% : ext4_check_dir_entry
             2909.00 -  1.9% : getnstimeofday
             2010.00 -  1.3% : __delay
             1915.00 -  1.2% : cpu_clock

Stopping all write processes so that sync can complete:

------------------------------------------------------------------------------
   PerfTop:    2679 irqs/sec  kernel:92.6% [100000 cycles],  (all, 24 CPUs)
------------------------------------------------------------------------------
             samples    pcnt   kernel function
             _______   _____   _______________

             6102.00 - 19.7% : __lock_acquire
             2504.00 -  8.1% : lock_release_holdtime
             2346.00 -  7.6% : lock_release
             1685.00 -  5.5% : stick_get_tick
             1532.00 -  5.0% : lock_acquired
             1135.00 -  3.7% : sched_clock
             1068.00 -  3.5% : find_busiest_group
             1034.00 -  3.3% : sched_clock_cpu
             1023.00 -  3.3% : tl0_linux64
              832.00 -  2.7% : lock_acquire
              658.00 -  2.1% : fget_light
              656.00 -  2.1% : perf_poll
              629.00 -  2.0% : journal_clean_one_cp_list
              572.00 -  1.9% : find_next_bit
              525.00 -  1.7% : getnstimeofday
------------------------------------------------------------------------------
   PerfTop:    3182 irqs/sec  kernel:90.5% [100000 cycles],  (all, 24 CPUs)
------------------------------------------------------------------------------
             samples    pcnt   kernel function
             _______   _____   _______________

             9076.00 - 19.9% : __lock_acquire
             3702.00 -  8.1% : lock_release_holdtime
             3420.00 -  7.5% : lock_release
             2522.00 -  5.5% : stick_get_tick
             2198.00 -  4.8% : lock_acquired
             1700.00 -  3.7% : sched_clock
             1539.00 -  3.4% : find_busiest_group
             1507.00 -  3.3% : sched_clock_cpu
             1402.00 -  3.1% : tl0_linux64
             1277.00 -  2.8% : lock_acquire
             1089.00 -  2.4% : journal_clean_one_cp_list
             1085.00 -  2.4% : fget_light
             1004.00 -  2.2% : perf_poll
              860.00 -  1.9% : do_sys_poll
              782.00 -  1.7% : find_next_bit

There's that journal_clean_one_cp_list thing ..


I have a gut feeling that sys_sync just waits until all data has been
flushed. That however won't fly, because there's always new dirty data
arriving when the box is not idle.

Of course I am not calling /bin/sync on purpose, but some software
packages do; rrdtool for example has an autoconf runtime
check for fsync(2), so it's inevitable that sync is called at some
point by something.


I also did

# perf top -a -k vmlinux-... -p $(that pid of jbd2-sda6)
------------------------------------------------------------------------------
   PerfTop:     105 irqs/sec  kernel:100.0% [100000 cycles],  (target_pid: 574)
------------------------------------------------------------------------------
             samples    pcnt   kernel function
             _______   _____   _______________

               42.00 - 12.8% : journal_clean_one_cp_list
               34.00 - 10.4% : __lock_acquire
               33.00 - 10.1% : __jbd2_journal_clean_checkpoint_list
               28.00 -  8.5% : find_busiest_group
               20.00 -  6.1% : clear_bit
               16.00 -  4.9% : lock_release
               16.00 -  4.9% : stick_get_tick
               14.00 -  4.3% : test_and_set_bit
               13.00 -  4.0% : lock_release_holdtime
               12.00 -  3.7% : lock_acquired
                9.00 -  2.7% : find_next_bit
                7.00 -  2.1% : lock_acquire
                7.00 -  2.1% : jbd2_journal_write_revoke_records
                6.00 -  1.8% : jbd2_journal_switch_revoke_table
                6.00 -  1.8% : cpu_clock

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kswapd continuously active
  2010-02-08 14:08                   ` Jens Axboe
  2010-02-10 15:18                     ` Jan Engelhardt
@ 2010-02-10 15:23                     ` Jan Engelhardt
  1 sibling, 0 replies; 13+ messages in thread
From: Jan Engelhardt @ 2010-02-10 15:23 UTC (permalink / raw)
  To: jens.axboe

Hi,


>On Mon, Feb 08 2010, Jan Engelhardt wrote:
>
>> Could it be that there is something synchronize_rcu()-like in the
>> game that-- as a result of how RCU works-- just takes ages with 24
>> VCPUs?
>
>The only synchronize_rcu() involved in the writeback code happens when 
>a bdi exits, so you should not hit that. It'll do call_rcu() for work 
>completions, but 1) you should not see a lot of work entries, and 2) 
>lots of other kernel code will do that, too.
>
>Are you seeing a lot of CPU usage? What does eg perf top -a say?

top - 16:19:42 up  1:02,  3 users,  load average: 3.05, 3.87, 3.51
Cpu(s):  0.0%us,  0.2%sy,  0.0%ni, 98.2%id,  1.6%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8063800k total,  6676472k used,  1387328k free,   767040k buffers
Swap:        0k total,        0k used,        0k free,  4775464k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
26354 root      20   0  3472 1704 1184 R    4  0.0   0:00.33 top                
26198 jengelh   20   0  2864 1368 1152 S    4  0.0   0:03.72 watch              
  574 root      20   0     0    0    0 D    3  0.0   0:52.20 jbd2/sda6-8        
  312 root      20   0     0    0    0 R    1  0.0   0:29.29 flush-8:0          
    1 root      20   0  2408  936  808 S    0  0.0   0:01.13 init               


When it comes down hard, i.e. when there is always new data arriving
while sys_sync is working, the loadavg is usually in the range of
1.0*CPUs up to 2.0*CPUs. The top dump shown here is after quiescing
writers.

>And what setup is this, I didn't realize you were running a virtualized
>setup?

It is real; but I usually use the term VCPU because this iron has
1 CPU with 6 cores with 24 threads, not 24 CPU (packages).


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-02-10 15:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-22 23:52 kswapd continuously active Jan Engelhardt
2010-01-25 13:06 ` Jens Axboe
2010-01-25 13:22   ` Jan Engelhardt
2010-02-05 12:35     ` Jan Engelhardt
2010-02-05 13:00       ` Jens Axboe
2010-02-05 13:09         ` Jan Engelhardt
2010-02-05 13:12           ` Jens Axboe
2010-02-05 13:24             ` Jan Engelhardt
2010-02-07 10:50               ` Jan Engelhardt
2010-02-08  0:07                 ` Jan Engelhardt
2010-02-08 14:08                   ` Jens Axboe
2010-02-10 15:18                     ` Jan Engelhardt
2010-02-10 15:23                     ` Jan Engelhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox