* Re: slow performance on disk/network i/o full speed after drop_caches [not found] <4E5494D4.1050605@profihost.ag> @ 2011-08-24 6:20 ` Pekka Enberg 2011-08-24 9:01 ` Stefan Priebe - Profihost AG 2011-08-24 9:32 ` Wu Fengguang 0 siblings, 2 replies; 16+ messages in thread From: Pekka Enberg @ 2011-08-24 6:20 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: LKML, linux-mm, Andrew Morton, Mel Gorman, Jens Axboe, Wu Fengguang, Linux Netdev List On Wed, Aug 24, 2011 at 9:06 AM, Stefan Priebe - Profihost AG <s.priebe@profihost.ag> wrote: > i hope this is the correct list to write to if it would be nice to give me a > hint where i can ask. > > Kernel: 2.6.38 > > I'm seeing some strange problems on some of our servers after upgrading to > 2.6.38. > > I'm copying a 1GB file via scp from Machine A to Machine B. When B is > freshly booted the file transfer is done with about 80 to 85 Mb/s. I can > repeat that various times to performance degrease. > > Then after some days copying is only done with about 900kb/s up to 3Mb/s > going up and down while transfering the file. > > When i then do drop_caches it works again on 80Mb/s. > > sync && echo 3 >/proc/sys/vm/drop_caches && sleep 2 && echo 0 >>/proc/sys/vm/drop_caches > > Attached is also an output of meminfo before and after drop_caches. > > What's going on here? MemFree is pretty high. > > Please CC me i'm not on list. Interesting. I can imagine one or more of the following to be involved: networking, vmscan, block, and writeback. Lets CC all of them! > # before drop_caches > > # cat /proc/meminfo > MemTotal: 8185544 kB > MemFree: 6670292 kB > Buffers: 105164 kB > Cached: 166672 kB > SwapCached: 0 kB > Active: 728308 kB > Inactive: 567428 kB > Active(anon): 639204 kB > Inactive(anon): 394932 kB > Active(file): 89104 kB > Inactive(file): 172496 kB > Unevictable: 2976 kB > Mlocked: 2992 kB > SwapTotal: 1464316 kB > SwapFree: 1464316 kB > Dirty: 52 kB > Writeback: 0 kB > AnonPages: 1026920 kB > Mapped: 54208 kB > Shmem: 8380 kB > Slab: 80724 kB > SReclaimable: 22844 kB > SUnreclaim: 57880 kB > KernelStack: 2872 kB > PageTables: 35448 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 5557088 kB > Committed_AS: 6187972 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 292360 kB > VmallocChunk: 34359425327 kB > HardwareCorrupted: 0 kB > DirectMap4k: 5632 kB > DirectMap2M: 2082816 kB > DirectMap1G: 6291456 kB > > # cat /proc/meminfo > MemTotal: 8185544 kB > MemFree: 6888060 kB > Buffers: 372 kB > Cached: 61492 kB > SwapCached: 0 kB > Active: 659156 kB > Inactive: 426664 kB > Active(anon): 638892 kB > Inactive(anon): 395200 kB > Active(file): 20264 kB > Inactive(file): 31464 kB > Unevictable: 2976 kB > Mlocked: 2992 kB > SwapTotal: 1464316 kB > SwapFree: 1464316 kB > Dirty: 0 kB > Writeback: 0 kB > AnonPages: 1026952 kB > Mapped: 54236 kB > Shmem: 8316 kB > Slab: 70616 kB > SReclaimable: 12264 kB > SUnreclaim: 58352 kB > KernelStack: 2864 kB > PageTables: 35448 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 5557088 kB > Committed_AS: 6187932 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 292360 kB > VmallocChunk: 34359425327 kB > HardwareCorrupted: 0 kB > DirectMap4k: 5632 kB > DirectMap2M: 2082816 kB > DirectMap1G: 6291456 kB > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-24 6:20 ` slow performance on disk/network i/o full speed after drop_caches Pekka Enberg @ 2011-08-24 9:01 ` Stefan Priebe - Profihost AG 2011-08-24 9:33 ` Wu Fengguang 2011-08-24 9:32 ` Wu Fengguang 1 sibling, 1 reply; 16+ messages in thread From: Stefan Priebe - Profihost AG @ 2011-08-24 9:01 UTC (permalink / raw) To: Pekka Enberg Cc: LKML, linux-mm, Andrew Morton, Mel Gorman, Jens Axboe, Wu Fengguang, Linux Netdev List >> sync&& echo 3>/proc/sys/vm/drop_caches&& sleep 2&& echo 0 >>> /proc/sys/vm/drop_caches Another way to get it working again is to stop some processes. Could be mysql or apache or php fcgi doesn't matter. Just free some memory. Although there are already 5GB free. Stefan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-24 9:01 ` Stefan Priebe - Profihost AG @ 2011-08-24 9:33 ` Wu Fengguang 2011-08-25 9:00 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 16+ messages in thread From: Wu Fengguang @ 2011-08-24 9:33 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List On Wed, Aug 24, 2011 at 05:01:03PM +0800, Stefan Priebe - Profihost AG wrote: > > >> sync&& echo 3>/proc/sys/vm/drop_caches&& sleep 2&& echo 0 > >>> /proc/sys/vm/drop_caches > > Another way to get it working again is to stop some processes. Could be > mysql or apache or php fcgi doesn't matter. Just free some memory. > Although there are already 5GB free. Is it a NUMA machine and _every_ node has enough free pages? grep . /sys/devices/system/node/node*/vmstat Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-24 9:33 ` Wu Fengguang @ 2011-08-25 9:00 ` Stefan Priebe - Profihost AG 2011-08-26 2:16 ` Wu Fengguang 0 siblings, 1 reply; 16+ messages in thread From: Stefan Priebe - Profihost AG @ 2011-08-25 9:00 UTC (permalink / raw) To: Wu Fengguang Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List Am 24.08.2011 11:33, schrieb Wu Fengguang: > On Wed, Aug 24, 2011 at 05:01:03PM +0800, Stefan Priebe - Profihost AG wrote: >> >>>> sync&& echo 3>/proc/sys/vm/drop_caches&& sleep 2&& echo 0 >>>>> /proc/sys/vm/drop_caches >> >> Another way to get it working again is to stop some processes. Could be >> mysql or apache or php fcgi doesn't matter. Just free some memory. >> Although there are already 5GB free. > > Is it a NUMA machine and _every_ node has enough free pages? > > grep . /sys/devices/system/node/node*/vmstat > > Thanks, > Fengguang Hi Fengguang, thanks for your fast reply. Here is the data you requested: root@server1015-han:~# grep . /sys/devices/system/node/node*/vmstat /sys/devices/system/node/node0/vmstat:nr_written 5546561 /sys/devices/system/node/node0/vmstat:nr_dirtied 5572497 /sys/devices/system/node/node1/vmstat:nr_written 3936 /sys/devices/system/node/node1/vmstat:nr_dirtied 4190 modified it a little bit: ~# while [ true ]; do ps -eo user,pid,tid,class,rtprio,ni,pri,psr,pcpu,vsz,rss,pmem,stat,wchan:28,cmd | grep scp | grep -v grep; sleep 1; done root 12409 12409 TS - 0 19 0 59.8 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 0 64.0 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 0 67.7 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 8 70.6 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 8 73.5 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 8 76.0 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 8 78.2 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 8 80.0 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 8 80.9 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 2 76.7 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 1 75.6 42136 1724 0.0 Ds pipe_read scp -t /tmp/ root 12409 12409 TS - 0 19 0 76.0 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 1 75.2 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 1 76.6 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 1 77.9 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 1 79.0 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 1 72.8 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 0 73.0 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 0 73.8 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 1 74.3 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 1 73.4 42136 1724 0.0 Ss - scp -t /tmp/ root 12409 12409 TS - 0 19 1 71.3 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 1 71.9 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 0 72.7 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 3 73.5 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 3 74.4 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 3 75.2 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 0 76.0 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 8 76.6 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 1 74.8 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 1 73.2 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 1 73.9 42136 1724 0.0 Rs poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 0 72.4 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 8 72.0 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 8 72.5 42136 1724 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12409 12409 TS - 0 19 8 72.9 42136 1724 0.0 Rs - scp -t /tmp/ root 12409 12409 TS - 0 19 8 73.5 42136 1724 0.0 Rs - scp -t /tmp/ root 12566 12566 TS - 0 19 1 0.0 42136 1728 0.0 Rs - scp -t /tmp/ root 12566 12566 TS - 0 19 1 23.0 42136 1728 0.0 Rs - scp -t /tmp/ root 12566 12566 TS - 0 19 1 49.5 42136 1728 0.0 Rs - scp -t /tmp/ root 12566 12566 TS - 0 19 2 63.3 42136 1728 0.0 Rs - scp -t /tmp/ root 12566 12566 TS - 0 19 1 71.5 42136 1728 0.0 Rs - scp -t /tmp/ root 12566 12566 TS - 0 19 1 77.4 42136 1728 0.0 Rs - scp -t /tmp/ root 12566 12566 TS - 0 19 1 70.3 42136 1728 0.0 Rs - scp -t /tmp/ root 12566 12566 TS - 0 19 1 73.1 42136 1728 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12566 12566 TS - 0 19 0 65.7 42136 1728 0.0 Ss poll_schedule_timeout scp -t /tmp/ root 12566 12566 TS - 0 19 1 61.2 42136 1728 0.0 Ss - scp -t /tmp/ root 12566 12566 TS - 0 19 1 63.7 42136 1728 0.0 Rs - scp -t /tmp/ root 12636 12636 TS - 0 19 8 0.0 42136 1728 0.0 Ss poll_schedule_timeout scp -t /tmp/ Stefan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-25 9:00 ` Stefan Priebe - Profihost AG @ 2011-08-26 2:16 ` Wu Fengguang 2011-08-26 2:54 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 16+ messages in thread From: Wu Fengguang @ 2011-08-26 2:16 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List Hi Stefan, > Here is the data you requested: > > root@server1015-han:~# grep . /sys/devices/system/node/node*/vmstat > /sys/devices/system/node/node0/vmstat:nr_written 5546561 > /sys/devices/system/node/node0/vmstat:nr_dirtied 5572497 > /sys/devices/system/node/node1/vmstat:nr_written 3936 > /sys/devices/system/node/node1/vmstat:nr_dirtied 4190 Ah you are running an older kernel that didn't show all the vmstat numbers. But still it's revealing that node 0 is used heavily and node 1 is almost idle. So I won't be surprised to see most free pages lie in node 1. > modified it a little bit: > ~# while [ true ]; do ps -eo > user,pid,tid,class,rtprio,ni,pri,psr,pcpu,vsz,rss,pmem,stat,wchan:28,cmd > | grep scp | grep -v grep; sleep 1; done > > root 12409 12409 TS - 0 19 0 59.8 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ It's mostly doing poll() waits. There must be some dependency on something other to make progress. Would you post the full ps output for all tasks, and even better, run echo t > /proc/sysrq-trigger To dump the kernel stacks? Thanks, Fengguang > root 12409 12409 TS - 0 19 0 64.0 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 0 67.7 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 8 70.6 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 8 73.5 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 8 76.0 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 8 78.2 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 8 80.0 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 8 80.9 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 2 76.7 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 1 75.6 42136 1724 0.0 Ds > pipe_read scp -t /tmp/ > root 12409 12409 TS - 0 19 0 76.0 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 1 75.2 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 1 76.6 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 1 77.9 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 1 79.0 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 1 72.8 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 0 73.0 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 0 73.8 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 1 74.3 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 1 73.4 42136 1724 0.0 Ss > - scp -t /tmp/ > root 12409 12409 TS - 0 19 1 71.3 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 1 71.9 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 0 72.7 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 3 73.5 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 3 74.4 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 3 75.2 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 0 76.0 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 8 76.6 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 1 74.8 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 1 73.2 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 1 73.9 42136 1724 0.0 Rs > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 0 72.4 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 8 72.0 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 8 72.5 42136 1724 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12409 12409 TS - 0 19 8 72.9 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12409 12409 TS - 0 19 8 73.5 42136 1724 0.0 Rs > - scp -t /tmp/ > root 12566 12566 TS - 0 19 1 0.0 42136 1728 0.0 Rs > - scp -t /tmp/ > root 12566 12566 TS - 0 19 1 23.0 42136 1728 0.0 Rs > - scp -t /tmp/ > root 12566 12566 TS - 0 19 1 49.5 42136 1728 0.0 Rs > - scp -t /tmp/ > root 12566 12566 TS - 0 19 2 63.3 42136 1728 0.0 Rs > - scp -t /tmp/ > root 12566 12566 TS - 0 19 1 71.5 42136 1728 0.0 Rs > - scp -t /tmp/ > root 12566 12566 TS - 0 19 1 77.4 42136 1728 0.0 Rs > - scp -t /tmp/ > root 12566 12566 TS - 0 19 1 70.3 42136 1728 0.0 Rs > - scp -t /tmp/ > root 12566 12566 TS - 0 19 1 73.1 42136 1728 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12566 12566 TS - 0 19 0 65.7 42136 1728 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > root 12566 12566 TS - 0 19 1 61.2 42136 1728 0.0 Ss > - scp -t /tmp/ > root 12566 12566 TS - 0 19 1 63.7 42136 1728 0.0 Rs > - scp -t /tmp/ > root 12636 12636 TS - 0 19 8 0.0 42136 1728 0.0 Ss > poll_schedule_timeout scp -t /tmp/ > > > Stefan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-26 2:16 ` Wu Fengguang @ 2011-08-26 2:54 ` Stefan Priebe - Profihost AG 2011-08-26 3:03 ` Wu Fengguang 0 siblings, 1 reply; 16+ messages in thread From: Stefan Priebe - Profihost AG @ 2011-08-26 2:54 UTC (permalink / raw) To: Wu Fengguang Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List Hi Wu, > Ah you are running an older kernel that didn't show all the vmstat > numbers. But still it's revealing that node 0 is used heavily and node > 1 is almost idle. So I won't be surprised to see most free pages lie > in node 1. I'm running a 2.6.38 kernel. There is at least a numastat proc file. grep . /sys/devices/system/node/node*/numastat /sys/devices/system/node/node0/numastat:numa_hit 5958586 /sys/devices/system/node/node0/numastat:numa_miss 0 /sys/devices/system/node/node0/numastat:numa_foreign 0 /sys/devices/system/node/node0/numastat:interleave_hit 4191 /sys/devices/system/node/node0/numastat:local_node 5885189 /sys/devices/system/node/node0/numastat:other_node 73397 /sys/devices/system/node/node1/numastat:numa_hit 488922 /sys/devices/system/node/node1/numastat:numa_miss 0 /sys/devices/system/node/node1/numastat:numa_foreign 0 /sys/devices/system/node/node1/numastat:interleave_hit 4187 /sys/devices/system/node/node1/numastat:local_node 386741 /sys/devices/system/node/node1/numastat:other_node 102181 >> modified it a little bit: >> ~# while [ true ]; do ps -eo >> user,pid,tid,class,rtprio,ni,pri,psr,pcpu,vsz,rss,pmem,stat,wchan:28,cmd >> | grep scp | grep -v grep; sleep 1; done >> >> root 12409 12409 TS - 0 19 0 59.8 42136 1724 0.0 Ss >> poll_schedule_timeout scp -t /tmp/ > > It's mostly doing poll() waits. There must be some dependency on > something other to make progress. Would you post the full ps output > for all tasks, and even better, run complete ps output: http://pastebin.com/raw.php?i=b948svzN > echo t> /proc/sysrq-trigger sadly i wa sonly able to grab the output in this crazy format: http://pastebin.com/raw.php?i=MBXvvyH1 Hope that still helps. Thanks Stefan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-26 2:54 ` Stefan Priebe - Profihost AG @ 2011-08-26 3:03 ` Wu Fengguang 2011-08-26 3:13 ` Stefan Priebe 0 siblings, 1 reply; 16+ messages in thread From: Wu Fengguang @ 2011-08-26 3:03 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List On Fri, Aug 26, 2011 at 10:54:35AM +0800, Stefan Priebe - Profihost AG wrote: > Hi Wu, > > > Ah you are running an older kernel that didn't show all the vmstat > > numbers. But still it's revealing that node 0 is used heavily and node > > 1 is almost idle. So I won't be surprised to see most free pages lie > > in node 1. > I'm running a 2.6.38 kernel. > > There is at least a numastat proc file. Thanks. This shows that node0 is accessed 10x more than node1. > grep . /sys/devices/system/node/node*/numastat > /sys/devices/system/node/node0/numastat:numa_hit 5958586 > /sys/devices/system/node/node0/numastat:numa_miss 0 > /sys/devices/system/node/node0/numastat:numa_foreign 0 > /sys/devices/system/node/node0/numastat:interleave_hit 4191 > /sys/devices/system/node/node0/numastat:local_node 5885189 > /sys/devices/system/node/node0/numastat:other_node 73397 > /sys/devices/system/node/node1/numastat:numa_hit 488922 > /sys/devices/system/node/node1/numastat:numa_miss 0 > /sys/devices/system/node/node1/numastat:numa_foreign 0 > /sys/devices/system/node/node1/numastat:interleave_hit 4187 > /sys/devices/system/node/node1/numastat:local_node 386741 > /sys/devices/system/node/node1/numastat:other_node 102181 > > >> modified it a little bit: > >> ~# while [ true ]; do ps -eo > >> user,pid,tid,class,rtprio,ni,pri,psr,pcpu,vsz,rss,pmem,stat,wchan:28,cmd > >> | grep scp | grep -v grep; sleep 1; done > >> > >> root 12409 12409 TS - 0 19 0 59.8 42136 1724 0.0 Ss > >> poll_schedule_timeout scp -t /tmp/ > > > > It's mostly doing poll() waits. There must be some dependency on > > something other to make progress. Would you post the full ps output > > for all tasks, and even better, run > complete ps output: > http://pastebin.com/raw.php?i=b948svzN In that log, scp happens to be in R state and also no other tasks in D state. Would you retry in the hope of catching some stucked state? > > echo t> /proc/sysrq-trigger > sadly i wa sonly able to grab the output in this crazy format: > http://pastebin.com/raw.php?i=MBXvvyH1 It's pretty readable dmesg, except that the data is incomplete and there are nothing valuable in the uploaded portion.. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-26 3:03 ` Wu Fengguang @ 2011-08-26 3:13 ` Stefan Priebe 2011-08-26 3:26 ` Wu Fengguang 0 siblings, 1 reply; 16+ messages in thread From: Stefan Priebe @ 2011-08-26 3:13 UTC (permalink / raw) To: Wu Fengguang Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List >> There is at least a numastat proc file. > > Thanks. This shows that node0 is accessed 10x more than node1. What can i do to prevent this or isn't this normal when a machine mostly idles so processes are mostly processed by cpu0. > >> complete ps output: >> http://pastebin.com/raw.php?i=b948svzN > > In that log, scp happens to be in R state and also no other tasks in D > state. Would you retry in the hope of catching some stucked state? Sadly not as the sysrq trigger has rebootet the machine and it will now run fine for 1 or 2 days. > >>> echo t> /proc/sysrq-trigger >> sadly i wa sonly able to grab the output in this crazy format: >> http://pastebin.com/raw.php?i=MBXvvyH1 > > It's pretty readable dmesg, except that the data is incomplete and > there are nothing valuable in the uploaded portion.. That was everything i could grab through netconsole. Is there a better way? Stefan > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-26 3:13 ` Stefan Priebe @ 2011-08-26 3:26 ` Wu Fengguang 2011-08-26 3:30 ` Zhu Yanhai 0 siblings, 1 reply; 16+ messages in thread From: Wu Fengguang @ 2011-08-26 3:26 UTC (permalink / raw) To: Stefan Priebe Cc: Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List On Fri, Aug 26, 2011 at 11:13:07AM +0800, Stefan Priebe wrote: > > >> There is at least a numastat proc file. > > > > Thanks. This shows that node0 is accessed 10x more than node1. > > What can i do to prevent this or isn't this normal when a machine mostly idles so processes are mostly processed by cpu0. Yes, that's normal. However it should explain why it's slow even when there are lots of free pages _globally_. > > > >> complete ps output: > >> http://pastebin.com/raw.php?i=b948svzN > > > > In that log, scp happens to be in R state and also no other tasks in D > > state. Would you retry in the hope of catching some stucked state? > Sadly not as the sysrq trigger has rebootet the machine and it will now run fine for 1 or 2 days. Oops, sorry! It might be possible to reproduce the issue by manually eating all of the memory with sparse file data: truncate -s 1T 1T cp 1T /dev/null > > > >>> echo t> /proc/sysrq-trigger > >> sadly i wa sonly able to grab the output in this crazy format: > >> http://pastebin.com/raw.php?i=MBXvvyH1 > > > > It's pretty readable dmesg, except that the data is incomplete and > > there are nothing valuable in the uploaded portion.. > That was everything i could grab through netconsole. Is there a better way? netconsole is enough. The partial output should be due to the reboot... Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-26 3:26 ` Wu Fengguang @ 2011-08-26 3:30 ` Zhu Yanhai 2011-08-26 6:18 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 16+ messages in thread From: Zhu Yanhai @ 2011-08-26 3:30 UTC (permalink / raw) To: Wu Fengguang Cc: Stefan Priebe, Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List Fengguang, Maybe it's because zone_reclaim_mode? We often have received some reports that scp or something like that is slow with no reason, and mostly it's due to someone enabled zone_reclaim_mode by mistake. Stefan, is your zone_reclaim_mode enabled? try 'cat /proc/sys/vm/zone_reclaim_mode', and echo 0 to it to disable. Thanks, Zhu Yanhai 2011/8/26 Wu Fengguang <fengguang.wu@intel.com>: > On Fri, Aug 26, 2011 at 11:13:07AM +0800, Stefan Priebe wrote: >> >> >> There is at least a numastat proc file. >> > >> > Thanks. This shows that node0 is accessed 10x more than node1. >> >> What can i do to prevent this or isn't this normal when a machine mostly idles so processes are mostly processed by cpu0. > > Yes, that's normal. However it should explain why it's slow even when > there are lots of free pages _globally_. > >> > >> >> complete ps output: >> >> http://pastebin.com/raw.php?i=b948svzN >> > >> > In that log, scp happens to be in R state and also no other tasks in D >> > state. Would you retry in the hope of catching some stucked state? >> Sadly not as the sysrq trigger has rebootet the machine and it will now run fine for 1 or 2 days. > > Oops, sorry! It might be possible to reproduce the issue by manually > eating all of the memory with sparse file data: > > truncate -s 1T 1T > cp 1T /dev/null > >> > >> >>> echo t> /proc/sysrq-trigger >> >> sadly i wa sonly able to grab the output in this crazy format: >> >> http://pastebin.com/raw.php?i=MBXvvyH1 >> > >> > It's pretty readable dmesg, except that the data is incomplete and >> > there are nothing valuable in the uploaded portion.. >> That was everything i could grab through netconsole. Is there a better way? > > netconsole is enough. The partial output should be due to the reboot... > > Thanks, > Fengguang > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-26 3:30 ` Zhu Yanhai @ 2011-08-26 6:18 ` Stefan Priebe - Profihost AG 2011-08-31 7:11 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 16+ messages in thread From: Stefan Priebe - Profihost AG @ 2011-08-26 6:18 UTC (permalink / raw) To: Zhu Yanhai Cc: Wu Fengguang, Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List Yanhai, > Stefan, is your zone_reclaim_mode enabled? try 'cat > /proc/sys/vm/zone_reclaim_mode', > and echo 0 to it to disable. you're abssolutely corect zone_reclaim_mode is on - but why? There must be some linux software which switches it on. ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i ~# also ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i ~# tells us nothing. I've then read this: "zone_reclaim_mode is set during bootup to 1 if it is determined that pages from remote zones will cause a measurable performance reduction. The page allocator will then reclaim easily reusable pages (those page cache pages that are currently not used) before allocating off node pages." Why does the kernel do that here in our case on these machines. Stefan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-26 6:18 ` Stefan Priebe - Profihost AG @ 2011-08-31 7:11 ` Stefan Priebe - Profihost AG 2011-09-01 4:14 ` Wu Fengguang 0 siblings, 1 reply; 16+ messages in thread From: Stefan Priebe - Profihost AG @ 2011-08-31 7:11 UTC (permalink / raw) To: Zhu Yanhai Cc: Wu Fengguang, Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List Hi Fengguang, Hi Yanhai, > you're abssolutely corect zone_reclaim_mode is on - but why? > There must be some linux software which switches it on. > > ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i > ~# > > also > ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i > ~# > > tells us nothing. > > I've then read this: > > "zone_reclaim_mode is set during bootup to 1 if it is determined that > pages from remote zones will cause a measurable performance reduction. > The page allocator will then reclaim easily reusable pages (those page > cache pages that are currently not used) before allocating off node pages." > > Why does the kernel do that here in our case on these machines. Can nobody help why the kernel in this case set it to 1? Stefan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-31 7:11 ` Stefan Priebe - Profihost AG @ 2011-09-01 4:14 ` Wu Fengguang 2011-09-01 5:41 ` Stefan Priebe - Profihost AG 2011-09-01 12:57 ` Mel Gorman 0 siblings, 2 replies; 16+ messages in thread From: Wu Fengguang @ 2011-09-01 4:14 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Zhu Yanhai, Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List, KOSAKI Motohiro Hi Stefan, On Wed, Aug 31, 2011 at 03:11:02PM +0800, Stefan Priebe - Profihost AG wrote: > Hi Fengguang, > Hi Yanhai, > > > you're abssolutely corect zone_reclaim_mode is on - but why? > > There must be some linux software which switches it on. > > > > ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i > > ~# > > > > also > > ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i > > ~# > > > > tells us nothing. > > > > I've then read this: > > > > "zone_reclaim_mode is set during bootup to 1 if it is determined that > > pages from remote zones will cause a measurable performance reduction. > > The page allocator will then reclaim easily reusable pages (those page > > cache pages that are currently not used) before allocating off node pages." > > > > Why does the kernel do that here in our case on these machines. > > Can nobody help why the kernel in this case set it to 1? It's determined by RECLAIM_DISTANCE. build_zonelists(): /* * If another node is sufficiently far away then it is better * to reclaim pages in a zone before going off node. */ if (distance > RECLAIM_DISTANCE) zone_reclaim_mode = 1; Since Linux v3.0 RECLAIM_DISTANCE is increased from 20 to 30 by this commit. It may well help your case, too. commit 32e45ff43eaf5c17f5a82c9ad358d515622c2562 Author: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Date: Wed Jun 15 15:08:20 2011 -0700 mm: increase RECLAIM_DISTANCE to 30 Recently, Robert Mueller reported (http://lkml.org/lkml/2010/9/12/236) that zone_reclaim_mode doesn't work properly on his new NUMA server (Dual Xeon E5520 + Intel S5520UR MB). He is using Cyrus IMAPd and it's built on a very traditional single-process model. * a master process which reads config files and manages the other process * multiple imapd processes, one per connection * multiple pop3d processes, one per connection * multiple lmtpd processes, one per connection * periodical "cleanup" processes. There are thousands of independent processes. The problem is, recent Intel motherboard turn on zone_reclaim_mode by default and traditional prefork model software don't work well on it. Unfortunatelly, such models are still typical even in the 21st century. We can't ignore them. This patch raises the zone_reclaim_mode threshold to 30. 30 doesn't have any specific meaning. but 20 means that one-hop QPI/Hypertransport and such relatively cheap 2-4 socket machine are often used for traditional servers as above. The intention is that these machines don't use zone_reclaim_mode. Note: ia64 and Power have arch specific RECLAIM_DISTANCE definitions. This patch doesn't change such high-end NUMA machine behavior. Dave Hansen said: : I know specifically of pieces of x86 hardware that set the information : in the BIOS to '21' *specifically* so they'll get the zone_reclaim_mode : behavior which that implies. : : They've done performance testing and run very large and scary benchmarks : to make sure that they _want_ this turned on. What this means for them : is that they'll probably be de-optimized, at least on newer versions of : the kernel. : : If you want to do this for particular systems, maybe _that_'s what we : should do. Have a list of specific configurations that need the : defaults overridden either because they're buggy, or they have an : unusual hardware configuration not really reflected in the distance : table. And later said: : The original change in the hardware tables was for the benefit of a : benchmark. Said benchmark isn't going to get run on mainline until the : next batch of enterprise distros drops, at which point the hardware where : this was done will be irrelevant for the benchmark. I'm sure any new : hardware will just set this distance to another yet arbitrary value to : make the kernel do what it wants. :) : : Also, when the hardware got _set_ to this initially, I complained. So, I : guess I'm getting my way now, with this patch. I'm cool with it. diff --git a/include/linux/topology.h b/include/linux/topology.h index b91a40e..fc839bf 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -60,7 +60,7 @@ int arch_update_cpu_topology(void); * (in whatever arch specific measurement units returned by node_distance()) * then switch on zone reclaim on boot. */ -#define RECLAIM_DISTANCE 20 +#define RECLAIM_DISTANCE 30 #endif #ifndef PENALTY_FOR_NODE_WITH_CPUS #define PENALTY_FOR_NODE_WITH_CPUS (1) Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-09-01 4:14 ` Wu Fengguang @ 2011-09-01 5:41 ` Stefan Priebe - Profihost AG 2011-09-01 12:57 ` Mel Gorman 1 sibling, 0 replies; 16+ messages in thread From: Stefan Priebe - Profihost AG @ 2011-09-01 5:41 UTC (permalink / raw) To: Wu Fengguang Cc: Zhu Yanhai, Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List, KOSAKI Motohiro Thanks! Am 01.09.2011 06:14, schrieb Wu Fengguang: > Hi Stefan, > > On Wed, Aug 31, 2011 at 03:11:02PM +0800, Stefan Priebe - Profihost AG wrote: >> Hi Fengguang, >> Hi Yanhai, >> >>> you're abssolutely corect zone_reclaim_mode is on - but why? >>> There must be some linux software which switches it on. >>> >>> ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i >>> ~# >>> >>> also >>> ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i >>> ~# >>> >>> tells us nothing. >>> >>> I've then read this: >>> >>> "zone_reclaim_mode is set during bootup to 1 if it is determined that >>> pages from remote zones will cause a measurable performance reduction. >>> The page allocator will then reclaim easily reusable pages (those page >>> cache pages that are currently not used) before allocating off node pages." >>> >>> Why does the kernel do that here in our case on these machines. >> >> Can nobody help why the kernel in this case set it to 1? > > It's determined by RECLAIM_DISTANCE. > > build_zonelists(): > > /* > * If another node is sufficiently far away then it is better > * to reclaim pages in a zone before going off node. > */ > if (distance> RECLAIM_DISTANCE) > zone_reclaim_mode = 1; > > Since Linux v3.0 RECLAIM_DISTANCE is increased from 20 to 30 by this commit. > It may well help your case, too. > > commit 32e45ff43eaf5c17f5a82c9ad358d515622c2562 > Author: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com> > Date: Wed Jun 15 15:08:20 2011 -0700 > > mm: increase RECLAIM_DISTANCE to 30 > > Recently, Robert Mueller reported (http://lkml.org/lkml/2010/9/12/236) > that zone_reclaim_mode doesn't work properly on his new NUMA server (Dual > Xeon E5520 + Intel S5520UR MB). He is using Cyrus IMAPd and it's built on > a very traditional single-process model. > > * a master process which reads config files and manages the other > process > * multiple imapd processes, one per connection > * multiple pop3d processes, one per connection > * multiple lmtpd processes, one per connection > * periodical "cleanup" processes. > > There are thousands of independent processes. The problem is, recent > Intel motherboard turn on zone_reclaim_mode by default and traditional > prefork model software don't work well on it. Unfortunatelly, such models > are still typical even in the 21st century. We can't ignore them. > > This patch raises the zone_reclaim_mode threshold to 30. 30 doesn't have > any specific meaning. but 20 means that one-hop QPI/Hypertransport and > such relatively cheap 2-4 socket machine are often used for traditional > servers as above. The intention is that these machines don't use > zone_reclaim_mode. > > Note: ia64 and Power have arch specific RECLAIM_DISTANCE definitions. > This patch doesn't change such high-end NUMA machine behavior. > > Dave Hansen said: > > : I know specifically of pieces of x86 hardware that set the information > : in the BIOS to '21' *specifically* so they'll get the zone_reclaim_mode > : behavior which that implies. > : > : They've done performance testing and run very large and scary benchmarks > : to make sure that they _want_ this turned on. What this means for them > : is that they'll probably be de-optimized, at least on newer versions of > : the kernel. > : > : If you want to do this for particular systems, maybe _that_'s what we > : should do. Have a list of specific configurations that need the > : defaults overridden either because they're buggy, or they have an > : unusual hardware configuration not really reflected in the distance > : table. > > And later said: > > : The original change in the hardware tables was for the benefit of a > : benchmark. Said benchmark isn't going to get run on mainline until the > : next batch of enterprise distros drops, at which point the hardware where > : this was done will be irrelevant for the benchmark. I'm sure any new > : hardware will just set this distance to another yet arbitrary value to > : make the kernel do what it wants. :) > : > : Also, when the hardware got _set_ to this initially, I complained. So, I > : guess I'm getting my way now, with this patch. I'm cool with it. > > diff --git a/include/linux/topology.h b/include/linux/topology.h > index b91a40e..fc839bf 100644 > --- a/include/linux/topology.h > +++ b/include/linux/topology.h > @@ -60,7 +60,7 @@ int arch_update_cpu_topology(void); > * (in whatever arch specific measurement units returned by node_distance()) > * then switch on zone reclaim on boot. > */ > -#define RECLAIM_DISTANCE 20 > +#define RECLAIM_DISTANCE 30 > #endif > #ifndef PENALTY_FOR_NODE_WITH_CPUS > #define PENALTY_FOR_NODE_WITH_CPUS (1) > > Thanks, > Fengguang > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-09-01 4:14 ` Wu Fengguang 2011-09-01 5:41 ` Stefan Priebe - Profihost AG @ 2011-09-01 12:57 ` Mel Gorman 1 sibling, 0 replies; 16+ messages in thread From: Mel Gorman @ 2011-09-01 12:57 UTC (permalink / raw) To: Wu Fengguang Cc: Stefan Priebe - Profihost AG, Zhu Yanhai, Pekka Enberg, LKML, linux-mm@kvack.org, Andrew Morton, Jens Axboe, Linux Netdev List, KOSAKI Motohiro On Thu, Sep 01, 2011 at 12:14:58PM +0800, Wu Fengguang wrote: > Hi Stefan, > > On Wed, Aug 31, 2011 at 03:11:02PM +0800, Stefan Priebe - Profihost AG wrote: > > Hi Fengguang, > > Hi Yanhai, > > > > > you're abssolutely corect zone_reclaim_mode is on - but why? > > > There must be some linux software which switches it on. > > > > > > ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i > > > ~# > > > > > > also > > > ~# grep 'zone_reclaim_mode' /etc/sysctl.* -r -i > > > ~# > > > > > > tells us nothing. > > > > > > I've then read this: > > > > > > "zone_reclaim_mode is set during bootup to 1 if it is determined that > > > pages from remote zones will cause a measurable performance reduction. > > > The page allocator will then reclaim easily reusable pages (those page > > > cache pages that are currently not used) before allocating off node pages." > > > > > > Why does the kernel do that here in our case on these machines. > > > > Can nobody help why the kernel in this case set it to 1? > > It's determined by RECLAIM_DISTANCE. > > build_zonelists(): > > /* > * If another node is sufficiently far away then it is better > * to reclaim pages in a zone before going off node. > */ > if (distance > RECLAIM_DISTANCE) > zone_reclaim_mode = 1; > > Since Linux v3.0 RECLAIM_DISTANCE is increased from 20 to 30 by this commit. > It may well help your case, too. > Even with that, it's known that zone_reclaim() can be a disaster when it runs into problems. This should be fixed in 3.1 by the following commits; [cd38b115 mm: page allocator: initialise ZLC for first zone eligible for zone_reclaim] [76d3fbf8 mm: page allocator: reconsider zones for allocation after direct reclaim] The description in cd38b115 has the interesting details. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: slow performance on disk/network i/o full speed after drop_caches 2011-08-24 6:20 ` slow performance on disk/network i/o full speed after drop_caches Pekka Enberg 2011-08-24 9:01 ` Stefan Priebe - Profihost AG @ 2011-08-24 9:32 ` Wu Fengguang 1 sibling, 0 replies; 16+ messages in thread From: Wu Fengguang @ 2011-08-24 9:32 UTC (permalink / raw) To: Pekka Enberg Cc: Stefan Priebe - Profihost AG, LKML, linux-mm@kvack.org, Andrew Morton, Mel Gorman, Jens Axboe, Linux Netdev List On Wed, Aug 24, 2011 at 02:20:07PM +0800, Pekka Enberg wrote: > On Wed, Aug 24, 2011 at 9:06 AM, Stefan Priebe - Profihost AG > <s.priebe@profihost.ag> wrote: > > i hope this is the correct list to write to if it would be nice to give me a > > hint where i can ask. > > > > Kernel: 2.6.38 > > > > I'm seeing some strange problems on some of our servers after upgrading to > > 2.6.38. > > > > I'm copying a 1GB file via scp from Machine A to Machine B. When B is > > freshly booted the file transfer is done with about 80 to 85 Mb/s. I can > > repeat that various times to performance degrease. > > > > Then after some days copying is only done with about 900kb/s up to 3Mb/s > > going up and down while transfering the file. > > > > When i then do drop_caches it works again on 80Mb/s. > > > > sync && echo 3 >/proc/sys/vm/drop_caches && sleep 2 && echo 0 > >>/proc/sys/vm/drop_caches > > > > Attached is also an output of meminfo before and after drop_caches. > > > > What's going on here? MemFree is pretty high. > > > > Please CC me i'm not on list. > > Interesting. I can imagine one or more of the following to be > involved: networking, vmscan, block, and writeback. Lets CC all of > them! > > > # before drop_caches > > > > # cat /proc/meminfo > > MemTotal: A A A A 8185544 kB > > MemFree: A A A A 6670292 kB > > Buffers: A A A A A 105164 kB > > Cached: A A A A A 166672 kB > > SwapCached: A A A A A A 0 kB > > Active: A A A A A 728308 kB > > Inactive: A A A A 567428 kB > > Active(anon): A A 639204 kB > > Inactive(anon): A 394932 kB > > Active(file): A A A 89104 kB > > Inactive(file): A 172496 kB > > Unevictable: A A A A 2976 kB > > Mlocked: A A A A A A 2992 kB > > SwapTotal: A A A 1464316 kB > > SwapFree: A A A A 1464316 kB > > Dirty: A A A A A A A A 52 kB > > Writeback: A A A A A A 0 kB Since dirty/writeback pages are low, it seems not being throttled by balance_dirty_pages(). Stefan, would you please run this several times on the server? ps -eo user,pid,tid,class,rtprio,ni,pri,psr,pcpu,vsz,rss,pmem,stat,wchan:28,cmd | grep scp It will show where the scp task is blocked (the wchan field). Hope it helps. Thanks, Fengguang -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-09-01 12:57 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4E5494D4.1050605@profihost.ag>
2011-08-24 6:20 ` slow performance on disk/network i/o full speed after drop_caches Pekka Enberg
2011-08-24 9:01 ` Stefan Priebe - Profihost AG
2011-08-24 9:33 ` Wu Fengguang
2011-08-25 9:00 ` Stefan Priebe - Profihost AG
2011-08-26 2:16 ` Wu Fengguang
2011-08-26 2:54 ` Stefan Priebe - Profihost AG
2011-08-26 3:03 ` Wu Fengguang
2011-08-26 3:13 ` Stefan Priebe
2011-08-26 3:26 ` Wu Fengguang
2011-08-26 3:30 ` Zhu Yanhai
2011-08-26 6:18 ` Stefan Priebe - Profihost AG
2011-08-31 7:11 ` Stefan Priebe - Profihost AG
2011-09-01 4:14 ` Wu Fengguang
2011-09-01 5:41 ` Stefan Priebe - Profihost AG
2011-09-01 12:57 ` Mel Gorman
2011-08-24 9:32 ` Wu Fengguang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).