* System freezes when RAM is full (64-bit) @ 2013-04-01 19:14 Ivan Danov 2013-04-03 12:12 ` Michal Hocko 0 siblings, 1 reply; 17+ messages in thread From: Ivan Danov @ 2013-04-01 19:14 UTC (permalink / raw) To: linux-mm; +Cc: 1162073 [-- Attachment #1: Type: text/plain, Size: 651 bytes --] The system freezes when RAM gets completely full. By using MATLAB, I can get all 8GB RAM of my laptop full and it immediately freezes, needing restart using the hardware button. Other people have reported the bug at since 2007. It seems that only the 64-bit version is affected and people have reported that enabling DMA in BIOS settings solve the problem. However, my laptop lacks such an option in the BIOS settings, so I am unable to test it. More information about the bug could be found at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1162073 and https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356. Best Regards, Ivan [-- Attachment #2: Type: text/html, Size: 1214 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-01 19:14 System freezes when RAM is full (64-bit) Ivan Danov @ 2013-04-03 12:12 ` Michal Hocko 2013-04-04 0:27 ` Simon Jeons 0 siblings, 1 reply; 17+ messages in thread From: Michal Hocko @ 2013-04-03 12:12 UTC (permalink / raw) To: Ivan Danov; +Cc: linux-mm, 1162073 On Mon 01-04-13 21:14:40, Ivan Danov wrote: > The system freezes when RAM gets completely full. By using MATLAB, I > can get all 8GB RAM of my laptop full and it immediately freezes, > needing restart using the hardware button. Do you use swap (file/partition)? How big? Could you collect /proc/meminfo and /proc/vmstat (every few seconds)[1]? What does it mean when you say the system freezes? No new processes can be started or desktop environment doesn't react on your input? Do you see anything in the kernel log? OOM killer e.g. In case no new processes could be started what does sysrq+m say when the system is frozen? What is your kernel config? > Other people have > reported the bug at since 2007. It seems that only the 64-bit > version is affected and people have reported that enabling DMA in > BIOS settings solve the problem. However, my laptop lacks such an > option in the BIOS settings, so I am unable to test it. More > information about the bug could be found at: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1162073 and > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356. > > Best Regards, > Ivan > --- [1] E.g. by while true do STAMP=`date +%s` cat /proc/meminfo > meminfo.$STAMP cat /proc/vmscan > meminfo.$STAMP sleep 2s done -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-03 12:12 ` Michal Hocko @ 2013-04-04 0:27 ` Simon Jeons 2013-04-04 7:08 ` Michal Hocko 0 siblings, 1 reply; 17+ messages in thread From: Simon Jeons @ 2013-04-04 0:27 UTC (permalink / raw) To: Michal Hocko; +Cc: Ivan Danov, linux-mm, 1162073 On 04/03/2013 08:12 PM, Michal Hocko wrote: > On Mon 01-04-13 21:14:40, Ivan Danov wrote: >> The system freezes when RAM gets completely full. By using MATLAB, I >> can get all 8GB RAM of my laptop full and it immediately freezes, >> needing restart using the hardware button. > Do you use swap (file/partition)? How big? Could you collect > /proc/meminfo and /proc/vmstat (every few seconds)[1]? > What does it mean when you say the system freezes? No new processes can > be started or desktop environment doesn't react on your input? Do you > see anything in the kernel log? OOM killer e.g. > In case no new processes could be started what does sysrq+m say when the > system is frozen? > > What is your kernel config? > >> Other people have >> reported the bug at since 2007. It seems that only the 64-bit >> version is affected and people have reported that enabling DMA in >> BIOS settings solve the problem. However, my laptop lacks such an >> option in the BIOS settings, so I am unable to test it. More >> information about the bug could be found at: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1162073 and >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356. >> >> Best Regards, >> Ivan >> > --- > [1] E.g. by > while true > do > STAMP=`date +%s` > cat /proc/meminfo > meminfo.$STAMP > cat /proc/vmscan > meminfo.$STAMP s/vmscan/vmstat > sleep 2s > done -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-04 0:27 ` Simon Jeons @ 2013-04-04 7:08 ` Michal Hocko 2013-04-04 14:10 ` Ivan Danov 0 siblings, 1 reply; 17+ messages in thread From: Michal Hocko @ 2013-04-04 7:08 UTC (permalink / raw) To: Simon Jeons; +Cc: Ivan Danov, linux-mm, 1162073 On Thu 04-04-13 08:27:18, Simon Jeons wrote: > On 04/03/2013 08:12 PM, Michal Hocko wrote: > >On Mon 01-04-13 21:14:40, Ivan Danov wrote: > >>The system freezes when RAM gets completely full. By using MATLAB, I > >>can get all 8GB RAM of my laptop full and it immediately freezes, > >>needing restart using the hardware button. > >Do you use swap (file/partition)? How big? Could you collect > >/proc/meminfo and /proc/vmstat (every few seconds)[1]? > >What does it mean when you say the system freezes? No new processes can > >be started or desktop environment doesn't react on your input? Do you > >see anything in the kernel log? OOM killer e.g. > >In case no new processes could be started what does sysrq+m say when the > >system is frozen? > > > >What is your kernel config? > > > >>Other people have > >>reported the bug at since 2007. It seems that only the 64-bit > >>version is affected and people have reported that enabling DMA in > >>BIOS settings solve the problem. However, my laptop lacks such an > >>option in the BIOS settings, so I am unable to test it. More > >>information about the bug could be found at: > >>https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1162073 and > >>https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356. > >> > >>Best Regards, > >>Ivan > >> > >--- > >[1] E.g. by > >while true > >do > > STAMP=`date +%s` > > cat /proc/meminfo > meminfo.$STAMP > > cat /proc/vmscan > meminfo.$STAMP > > s/vmscan/vmstat Right. Sorry about the typo and thanks for pointing out Simon. > > > sleep 2s > >done > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-04 7:08 ` Michal Hocko @ 2013-04-04 14:10 ` Ivan Danov 2013-04-04 15:16 ` Michal Hocko 0 siblings, 1 reply; 17+ messages in thread From: Ivan Danov @ 2013-04-04 14:10 UTC (permalink / raw) To: Michal Hocko; +Cc: Simon Jeons, linux-mm, 1162073 [-- Attachment #1.1: Type: text/plain, Size: 2214 bytes --] Hi Michal, Yes, I use swap partition (2GB), but I have applied some things for keeping the life of the SSD hard drive longer. All the things I have done are under point 3. at http://www.rileybrandt.com/2012/11/18/linux-ultrabook/. By system freezes, I mean that the desktop environment doesn't react on my input. Just sometimes the mouse is reacting very very choppy and slowly, but most of the times it is not reacting at all. In the attached file, I have the output of the script and the content of dmesg for all levels from warn to emerg, as well as my kernel config. Best, Ivan -- On 04/04/13 09:08, Michal Hocko wrote: > On Thu 04-04-13 08:27:18, Simon Jeons wrote: >> On 04/03/2013 08:12 PM, Michal Hocko wrote: >>> On Mon 01-04-13 21:14:40, Ivan Danov wrote: >>>> The system freezes when RAM gets completely full. By using MATLAB, I >>>> can get all 8GB RAM of my laptop full and it immediately freezes, >>>> needing restart using the hardware button. >>> Do you use swap (file/partition)? How big? Could you collect >>> /proc/meminfo and /proc/vmstat (every few seconds)[1]? >>> What does it mean when you say the system freezes? No new processes can >>> be started or desktop environment doesn't react on your input? Do you >>> see anything in the kernel log? OOM killer e.g. >>> In case no new processes could be started what does sysrq+m say when the >>> system is frozen? >>> >>> What is your kernel config? >>> >>>> Other people have >>>> reported the bug at since 2007. It seems that only the 64-bit >>>> version is affected and people have reported that enabling DMA in >>>> BIOS settings solve the problem. However, my laptop lacks such an >>>> option in the BIOS settings, so I am unable to test it. More >>>> information about the bug could be found at: >>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1162073 and >>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356. >>>> >>>> Best Regards, >>>> Ivan >>>> >>> --- >>> [1] E.g. by >>> while true >>> do >>> STAMP=`date +%s` >>> cat /proc/meminfo > meminfo.$STAMP >>> cat /proc/vmscan > meminfo.$STAMP >> s/vmscan/vmstat > Right. Sorry about the typo and thanks for pointing out Simon. > >>> sleep 2s >>> done [-- Attachment #1.2: Type: text/html, Size: 3575 bytes --] [-- Attachment #2: bug.tar.gz --] [-- Type: application/x-gzip, Size: 55705 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-04 14:10 ` Ivan Danov @ 2013-04-04 15:16 ` Michal Hocko 2013-04-05 10:13 ` Ivan Danov 0 siblings, 1 reply; 17+ messages in thread From: Michal Hocko @ 2013-04-04 15:16 UTC (permalink / raw) To: Ivan Danov; +Cc: Simon Jeons, linux-mm, 1162073 On Thu 04-04-13 16:10:06, Ivan Danov wrote: > Hi Michal, > > Yes, I use swap partition (2GB), but I have applied some things for > keeping the life of the SSD hard drive longer. All the things I have > done are under point 3. at > http://www.rileybrandt.com/2012/11/18/linux-ultrabook/. OK, I guess I know what's going on here. So you did set vm.swappiness=0 which (for some time) means that there is almost no swapping going on (although you have plenty of swap as you are mentioning above). This shouldn't be a big deal normally but you are also backing your /tmp on tmpfs which is in-memory filesystem. This means that if you are writing to /tmp a lot then this content will fill up your memory which is not swapped out until the memory reclaim is getting into real troubles - most of the page cache is dropped by that time so your system starts trashing. I would encourage you to set swappiness to a more reasonable value (I would use the default value which is 60). I understand that you are concerned about your SSD lifetime but your user experience sounds like a bigger priority ;) > By system freezes, I mean that the desktop environment doesn't react > on my input. Just sometimes the mouse is reacting very very choppy > and slowly, but most of the times it is not reacting at all. In the > attached file, I have the output of the script and the content of > dmesg for all levels from warn to emerg, as well as my kernel config. I haven't checked your attached data but you should get an overview from Shmem line from /proc/meminfo which tells you how much shmem/tmpfs memory you are using and grep "^Swap" /proc/meminfo will tell you more about your swap usage. > > Best, > Ivan HTH -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-04 15:16 ` Michal Hocko @ 2013-04-05 10:13 ` Ivan Danov 2013-04-05 11:59 ` Michal Hocko 0 siblings, 1 reply; 17+ messages in thread From: Ivan Danov @ 2013-04-05 10:13 UTC (permalink / raw) To: Michal Hocko; +Cc: Simon Jeons, linux-mm, 1162073 Tried with vm.swappiness=60, but the only improvement is that now the mouse input is less choppy than before, but still the problem remains - the computer is not usable at all, one could not even stop the program, causing the problem. Best, Ivan -- On 04/04/13 17:16, Michal Hocko wrote: > On Thu 04-04-13 16:10:06, Ivan Danov wrote: >> Hi Michal, >> >> Yes, I use swap partition (2GB), but I have applied some things for >> keeping the life of the SSD hard drive longer. All the things I have >> done are under point 3. at >> http://www.rileybrandt.com/2012/11/18/linux-ultrabook/. > OK, I guess I know what's going on here. > So you did set vm.swappiness=0 which (for some time) means that there is > almost no swapping going on (although you have plenty of swap as you are > mentioning above). > This shouldn't be a big deal normally but you are also backing your > /tmp on tmpfs which is in-memory filesystem. This means that if you > are writing to /tmp a lot then this content will fill up your memory > which is not swapped out until the memory reclaim is getting into real > troubles - most of the page cache is dropped by that time so your system > starts trashing. > > I would encourage you to set swappiness to a more reasonable value (I > would use the default value which is 60). I understand that you are > concerned about your SSD lifetime but your user experience sounds like a > bigger priority ;) > >> By system freezes, I mean that the desktop environment doesn't react >> on my input. Just sometimes the mouse is reacting very very choppy >> and slowly, but most of the times it is not reacting at all. In the >> attached file, I have the output of the script and the content of >> dmesg for all levels from warn to emerg, as well as my kernel config. > I haven't checked your attached data but you should get an overview from > Shmem line from /proc/meminfo which tells you how much shmem/tmpfs > memory you are using and grep "^Swap" /proc/meminfo will tell you more > about your swap usage. > >> Best, >> Ivan > HTH -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-05 10:13 ` Ivan Danov @ 2013-04-05 11:59 ` Michal Hocko 2013-04-05 20:41 ` Ivan Danov 0 siblings, 1 reply; 17+ messages in thread From: Michal Hocko @ 2013-04-05 11:59 UTC (permalink / raw) To: Ivan Danov; +Cc: Simon Jeons, linux-mm, 1162073 On Fri 05-04-13 12:13:11, Ivan Danov wrote: > Tried with vm.swappiness=60, but the only improvement is that now > the mouse input is less choppy than before, but still the problem > remains - the computer is not usable at all, one could not even stop > the program, causing the problem. OK, could you collect /proc/vmstat and /proc/meminfo during that load? > Best, > Ivan > -- > On 04/04/13 17:16, Michal Hocko wrote: > >On Thu 04-04-13 16:10:06, Ivan Danov wrote: > >>Hi Michal, > >> > >>Yes, I use swap partition (2GB), but I have applied some things for > >>keeping the life of the SSD hard drive longer. All the things I have > >>done are under point 3. at > >>http://www.rileybrandt.com/2012/11/18/linux-ultrabook/. > >OK, I guess I know what's going on here. > >So you did set vm.swappiness=0 which (for some time) means that there is > >almost no swapping going on (although you have plenty of swap as you are > >mentioning above). > >This shouldn't be a big deal normally but you are also backing your > >/tmp on tmpfs which is in-memory filesystem. This means that if you > >are writing to /tmp a lot then this content will fill up your memory > >which is not swapped out until the memory reclaim is getting into real > >troubles - most of the page cache is dropped by that time so your system > >starts trashing. > > > >I would encourage you to set swappiness to a more reasonable value (I > >would use the default value which is 60). I understand that you are > >concerned about your SSD lifetime but your user experience sounds like a > >bigger priority ;) > > > >>By system freezes, I mean that the desktop environment doesn't react > >>on my input. Just sometimes the mouse is reacting very very choppy > >>and slowly, but most of the times it is not reacting at all. In the > >>attached file, I have the output of the script and the content of > >>dmesg for all levels from warn to emerg, as well as my kernel config. > >I haven't checked your attached data but you should get an overview from > >Shmem line from /proc/meminfo which tells you how much shmem/tmpfs > >memory you are using and grep "^Swap" /proc/meminfo will tell you more > >about your swap usage. > > > >>Best, > >>Ivan > >HTH > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-05 11:59 ` Michal Hocko @ 2013-04-05 20:41 ` Ivan Danov 2013-04-12 10:20 ` Michal Hocko 0 siblings, 1 reply; 17+ messages in thread From: Ivan Danov @ 2013-04-05 20:41 UTC (permalink / raw) To: Michal Hocko; +Cc: Simon Jeons, linux-mm, 1162073 [-- Attachment #1: Type: text/plain, Size: 2557 bytes --] Here you can find attached the script, collecting the logs and the logs themselves during the described process of freezing. It appeared that the previous logs are corrupted, because both /proc/vmstat and /proc/meminfo have been logging to the same file. -- On 05/04/13 13:59, Michal Hocko wrote: > On Fri 05-04-13 12:13:11, Ivan Danov wrote: >> Tried with vm.swappiness=60, but the only improvement is that now >> the mouse input is less choppy than before, but still the problem >> remains - the computer is not usable at all, one could not even stop >> the program, causing the problem. > OK, could you collect /proc/vmstat and /proc/meminfo during that load? > >> Best, >> Ivan >> -- >> On 04/04/13 17:16, Michal Hocko wrote: >>> On Thu 04-04-13 16:10:06, Ivan Danov wrote: >>>> Hi Michal, >>>> >>>> Yes, I use swap partition (2GB), but I have applied some things for >>>> keeping the life of the SSD hard drive longer. All the things I have >>>> done are under point 3. at >>>> http://www.rileybrandt.com/2012/11/18/linux-ultrabook/. >>> OK, I guess I know what's going on here. >>> So you did set vm.swappiness=0 which (for some time) means that there is >>> almost no swapping going on (although you have plenty of swap as you are >>> mentioning above). >>> This shouldn't be a big deal normally but you are also backing your >>> /tmp on tmpfs which is in-memory filesystem. This means that if you >>> are writing to /tmp a lot then this content will fill up your memory >>> which is not swapped out until the memory reclaim is getting into real >>> troubles - most of the page cache is dropped by that time so your system >>> starts trashing. >>> >>> I would encourage you to set swappiness to a more reasonable value (I >>> would use the default value which is 60). I understand that you are >>> concerned about your SSD lifetime but your user experience sounds like a >>> bigger priority ;) >>> >>>> By system freezes, I mean that the desktop environment doesn't react >>>> on my input. Just sometimes the mouse is reacting very very choppy >>>> and slowly, but most of the times it is not reacting at all. In the >>>> attached file, I have the output of the script and the content of >>>> dmesg for all levels from warn to emerg, as well as my kernel config. >>> I haven't checked your attached data but you should get an overview from >>> Shmem line from /proc/meminfo which tells you how much shmem/tmpfs >>> memory you are using and grep "^Swap" /proc/meminfo will tell you more >>> about your swap usage. >>> >>>> Best, >>>> Ivan >>> HTH [-- Attachment #2: bug-with-swappiness.tar.gz --] [-- Type: application/x-gzip, Size: 12037 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-05 20:41 ` Ivan Danov @ 2013-04-12 10:20 ` Michal Hocko 2013-04-12 10:49 ` Ivan Danov 2013-04-15 5:03 ` Simon Jeons 0 siblings, 2 replies; 17+ messages in thread From: Michal Hocko @ 2013-04-12 10:20 UTC (permalink / raw) To: Ivan Danov; +Cc: Simon Jeons, linux-mm, 1162073, Mel Gorman, Johannes Weiner [CCing Mel and Johannes] On Fri 05-04-13 22:41:37, Ivan Danov wrote: > Here you can find attached the script, collecting the logs and the > logs themselves during the described process of freezing. It > appeared that the previous logs are corrupted, because both > /proc/vmstat and /proc/meminfo have been logging to the same file. Sorry for the late reply: $ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}' min:165256 max:3254516 avg:1642475.35 So the free memory dropped down to 165M at minimum. This doesn't sound terribly low and the average free memory was even above 1.5G. But maybe the memory consumption peak was very short between 2 measured moments. The peak seems to be around this time: meminfo.1365194083:MemFree: 650792 kB meminfo.1365194085:MemFree: 664920 kB meminfo.1365194087:MemFree: 165256 kB <<< meminfo.1365194089:MemFree: 822968 kB meminfo.1365194094:MemFree: 666940 kB Let's have a look at the memory reclaim activity vmstat.1365194085:pgscan_kswapd_dma32 760 vmstat.1365194085:pgscan_kswapd_normal 10444 vmstat.1365194087:pgscan_kswapd_dma32 760 vmstat.1365194087:pgscan_kswapd_normal 10444 vmstat.1365194089:pgscan_kswapd_dma32 5855 vmstat.1365194089:pgscan_kswapd_normal 80621 vmstat.1365194094:pgscan_kswapd_dma32 54333 vmstat.1365194094:pgscan_kswapd_normal 285562 [...] vmstat.1365194098:pgscan_kswapd_dma32 54333 vmstat.1365194098:pgscan_kswapd_normal 285562 vmstat.1365194100:pgscan_kswapd_dma32 55760 vmstat.1365194100:pgscan_kswapd_normal 289493 vmstat.1365194102:pgscan_kswapd_dma32 55760 vmstat.1365194102:pgscan_kswapd_normal 289493 So the background reclaim was active only twice for a short amount of time: - 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone - 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone The second one looks sane so we can ignore it for now but the first one scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd had hard time to find something to reclaim or it couldn't cope with the ongoing memory pressure. vmstat.1365194087:pgsteal_kswapd_dma32 373 vmstat.1365194087:pgsteal_kswapd_normal 9057 vmstat.1365194089:pgsteal_kswapd_dma32 3249 vmstat.1365194089:pgsteal_kswapd_normal 56756 vmstat.1365194094:pgsteal_kswapd_dma32 14731 vmstat.1365194094:pgsteal_kswapd_normal 221733 ...087-...089 - dma32 scanned 5095, reclaimed 0 - normal scanned 70177, reclaimed 0 ...089-...094 -dma32 scanned 48478, reclaimed 2876 - normal scanned 204941, reclaimed 164977 This shows that kswapd was not able to reclaim any page at first and then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in dma32 and 80% for normal) although normal zone seems to be doing much better. The direct reclaim was active during that time as well: vmstat.1365194089:pgscan_direct_dma32 0 vmstat.1365194089:pgscan_direct_normal 0 vmstat.1365194094:pgscan_direct_dma32 29339 vmstat.1365194094:pgscan_direct_normal 86869 which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed: vmstat.1365194089:pgsteal_direct_dma32 0 vmstat.1365194089:pgsteal_direct_normal 0 vmstat.1365194094:pgsteal_direct_dma32 6137 vmstat.1365194094:pgsteal_direct_normal 57677 225M in the normal zone but it was still not effective very much (~20% for dma32 and 66% for normal). vmstat.1365194087:nr_written 9013 vmstat.1365194089:nr_written 9013 vmstat.1365194094:nr_written 15387 Only around 24M have been written out during the massive scanning. So we have two problems here I guess. First is that there is not much reclaimable memory when the peak consumption starts and then we have hard times to balance dma32 zone. vmstat.1365194087:nr_shmem 103548 vmstat.1365194089:nr_shmem 102227 vmstat.1365194094:nr_shmem 100679 This tells us that you didn't have that many shmem pages allocated at the time (only 404M). So the /tmp backed by tmpfs shouldn't be the primary issue here. We still have a lot of anonymous memory though: vmstat.1365194087:nr_anon_pages 1430922 vmstat.1365194089:nr_anon_pages 1317009 vmstat.1365194094:nr_anon_pages 1540460 which is around 5.5G. It is interesting that the number of these pages even drops first and then starts growing again (between 089..094 by 870M while we reclaimed around the same amount). This would suggest that the load started trashing on swap but: meminfo.1365194087:SwapFree: 1999868 kB meminfo.1365194089:SwapFree: 1999808 kB meminfo.1365194094:SwapFree: 1784544 kB tells us that we swapped out only 210M after 1365194089. So we had to reclaim a lot of page cache during that time while the anonymous memory pressure was really high. vmstat.1365194087:nr_file_pages 428632 vmstat.1365194089:nr_file_pages 378132 vmstat.1365194094:nr_file_pages 192009 the page cache pages dropped down by 920M which covers the anon increase. This all suggests that the workload is simply too aggressive and the memory reclaim doesn't cope with it. Let's check the active and inactive lists (maybe we are not aging pages properly): meminfo.1365194087:Active(anon): 5613412 kB meminfo.1365194087:Active(file): 261180 kB meminfo.1365194089:Active(anon): 4794472 kB meminfo.1365194089:Active(file): 348396 kB meminfo.1365194094:Active(anon): 5424684 kB meminfo.1365194094:Active(file): 77364 kB meminfo.1365194087:Inactive(anon): 496092 kB meminfo.1365194087:Inactive(file): 1033184 kB meminfo.1365194089:Inactive(anon): 853608 kB meminfo.1365194089:Inactive(file): 749564 kB meminfo.1365194094:Inactive(anon): 1313648 kB meminfo.1365194094:Inactive(file): 82008 kB While the file LRUs looks good active Anon LRUs seem to be too big. This either suggests a bug in aging or the working set is really that big. Considering the previous data (increase during the memory pressure) I would be inclined to the second option. Just out of curiosity what is the vm_swappiness setting? You've said that you have changed that from 0 but it seems like we would swap much more. It almost looks like the swappiness is 0. Could you double check? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-12 10:20 ` Michal Hocko @ 2013-04-12 10:49 ` Ivan Danov 2013-04-12 11:11 ` Michal Hocko 2013-04-15 5:03 ` Simon Jeons 1 sibling, 1 reply; 17+ messages in thread From: Ivan Danov @ 2013-04-12 10:49 UTC (permalink / raw) To: Michal Hocko; +Cc: Simon Jeons, linux-mm, 1162073, Mel Gorman, Johannes Weiner [-- Attachment #1: Type: text/plain, Size: 7200 bytes --] $ cat /proc/sys/vm/swappiness 60 I have increased my swap partition from nearly 2GB to around 16GB, but the problem remains. Here I attach the logs for the larger swap partition. I use a MATLAB script to simulate the problem, but it also works in Octave: X = ones(100000,10000); I have tried to simulate the problem on a desktop installation with 4GB of RAM, 10GB of swap partition, installed Ubuntu Lucid and then upgraded to 12.04, the problem isn't there, but the input is still quite choppy during the load. After the script finishes, everything looks fine. For the desktop installation the hard drive is not an SSD hard drive. On 12/04/13 12:20, Michal Hocko wrote: > [CCing Mel and Johannes] > On Fri 05-04-13 22:41:37, Ivan Danov wrote: >> Here you can find attached the script, collecting the logs and the >> logs themselves during the described process of freezing. It >> appeared that the previous logs are corrupted, because both >> /proc/vmstat and /proc/meminfo have been logging to the same file. > Sorry for the late reply: > $ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}' > min:165256 max:3254516 avg:1642475.35 > > So the free memory dropped down to 165M at minimum. This doesn't sound > terribly low and the average free memory was even above 1.5G. But maybe > the memory consumption peak was very short between 2 measured moments. > > The peak seems to be around this time: > meminfo.1365194083:MemFree: 650792 kB > meminfo.1365194085:MemFree: 664920 kB > meminfo.1365194087:MemFree: 165256 kB <<< > meminfo.1365194089:MemFree: 822968 kB > meminfo.1365194094:MemFree: 666940 kB > > Let's have a look at the memory reclaim activity > vmstat.1365194085:pgscan_kswapd_dma32 760 > vmstat.1365194085:pgscan_kswapd_normal 10444 > > vmstat.1365194087:pgscan_kswapd_dma32 760 > vmstat.1365194087:pgscan_kswapd_normal 10444 > > vmstat.1365194089:pgscan_kswapd_dma32 5855 > vmstat.1365194089:pgscan_kswapd_normal 80621 > > vmstat.1365194094:pgscan_kswapd_dma32 54333 > vmstat.1365194094:pgscan_kswapd_normal 285562 > > [...] > vmstat.1365194098:pgscan_kswapd_dma32 54333 > vmstat.1365194098:pgscan_kswapd_normal 285562 > > vmstat.1365194100:pgscan_kswapd_dma32 55760 > vmstat.1365194100:pgscan_kswapd_normal 289493 > > vmstat.1365194102:pgscan_kswapd_dma32 55760 > vmstat.1365194102:pgscan_kswapd_normal 289493 > > So the background reclaim was active only twice for a short amount of > time: > - 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone > - 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone > > The second one looks sane so we can ignore it for now but the first one > scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd > had hard time to find something to reclaim or it couldn't cope with the > ongoing memory pressure. > > vmstat.1365194087:pgsteal_kswapd_dma32 373 > vmstat.1365194087:pgsteal_kswapd_normal 9057 > > vmstat.1365194089:pgsteal_kswapd_dma32 3249 > vmstat.1365194089:pgsteal_kswapd_normal 56756 > > vmstat.1365194094:pgsteal_kswapd_dma32 14731 > vmstat.1365194094:pgsteal_kswapd_normal 221733 > > ...087-...089 > - dma32 scanned 5095, reclaimed 0 > - normal scanned 70177, reclaimed 0 > ...089-...094 > -dma32 scanned 48478, reclaimed 2876 > - normal scanned 204941, reclaimed 164977 > > This shows that kswapd was not able to reclaim any page at first and > then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in > dma32 and 80% for normal) although normal zone seems to be doing much > better. > > The direct reclaim was active during that time as well: > vmstat.1365194089:pgscan_direct_dma32 0 > vmstat.1365194089:pgscan_direct_normal 0 > > vmstat.1365194094:pgscan_direct_dma32 29339 > vmstat.1365194094:pgscan_direct_normal 86869 > > which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed: > > vmstat.1365194089:pgsteal_direct_dma32 0 > vmstat.1365194089:pgsteal_direct_normal 0 > > vmstat.1365194094:pgsteal_direct_dma32 6137 > vmstat.1365194094:pgsteal_direct_normal 57677 > > 225M in the normal zone but it was still not effective very much (~20% > for dma32 and 66% for normal). > > vmstat.1365194087:nr_written 9013 > vmstat.1365194089:nr_written 9013 > vmstat.1365194094:nr_written 15387 > > Only around 24M have been written out during the massive scanning. > > So we have two problems here I guess. First is that there is not much > reclaimable memory when the peak consumption starts and then we have > hard times to balance dma32 zone. > > vmstat.1365194087:nr_shmem 103548 > vmstat.1365194089:nr_shmem 102227 > vmstat.1365194094:nr_shmem 100679 > > This tells us that you didn't have that many shmem pages allocated at > the time (only 404M). So the /tmp backed by tmpfs shouldn't be the > primary issue here. > > We still have a lot of anonymous memory though: > vmstat.1365194087:nr_anon_pages 1430922 > vmstat.1365194089:nr_anon_pages 1317009 > vmstat.1365194094:nr_anon_pages 1540460 > > which is around 5.5G. It is interesting that the number of these pages > even drops first and then starts growing again (between 089..094 by 870M > while we reclaimed around the same amount). This would suggest that the > load started trashing on swap but: > > meminfo.1365194087:SwapFree: 1999868 kB > meminfo.1365194089:SwapFree: 1999808 kB > meminfo.1365194094:SwapFree: 1784544 kB > > tells us that we swapped out only 210M after 1365194089. So we had to > reclaim a lot of page cache during that time while the anonymous memory > pressure was really high. > > vmstat.1365194087:nr_file_pages 428632 > vmstat.1365194089:nr_file_pages 378132 > vmstat.1365194094:nr_file_pages 192009 > > the page cache pages dropped down by 920M which covers the anon increase. > > This all suggests that the workload is simply too aggressive and the > memory reclaim doesn't cope with it. > > Let's check the active and inactive lists (maybe we are not aging pages properly): > meminfo.1365194087:Active(anon): 5613412 kB > meminfo.1365194087:Active(file): 261180 kB > > meminfo.1365194089:Active(anon): 4794472 kB > meminfo.1365194089:Active(file): 348396 kB > > meminfo.1365194094:Active(anon): 5424684 kB > meminfo.1365194094:Active(file): 77364 kB > > meminfo.1365194087:Inactive(anon): 496092 kB > meminfo.1365194087:Inactive(file): 1033184 kB > > meminfo.1365194089:Inactive(anon): 853608 kB > meminfo.1365194089:Inactive(file): 749564 kB > > meminfo.1365194094:Inactive(anon): 1313648 kB > meminfo.1365194094:Inactive(file): 82008 kB > > While the file LRUs looks good active Anon LRUs seem to be too big. This > either suggests a bug in aging or the working set is really that big. > Considering the previous data (increase during the memory pressure) I > would be inclined to the second option. > > Just out of curiosity what is the vm_swappiness setting? You've said > that you have changed that from 0 but it seems like we would swap much > more. It almost looks like the swappiness is 0. Could you double check? [-- Attachment #2: bug.tar.gz --] [-- Type: application/x-gzip, Size: 17647 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-12 10:49 ` Ivan Danov @ 2013-04-12 11:11 ` Michal Hocko 2013-04-12 12:38 ` Ivan Danov 0 siblings, 1 reply; 17+ messages in thread From: Michal Hocko @ 2013-04-12 11:11 UTC (permalink / raw) To: Ivan Danov; +Cc: Simon Jeons, linux-mm, 1162073, Mel Gorman, Johannes Weiner On Fri 12-04-13 12:49:30, Ivan Danov wrote: > $ cat /proc/sys/vm/swappiness > 60 OK, thanks for confirming this. It is really strange that we do not swap almost at all, then. > I have increased my swap partition from nearly 2GB to around 16GB, > but the problem remains. Increasing the swap partition will not help much as it almost unused with 2G already (at least last data shown that). > Here I attach the logs for the larger swap partition. I use a MATLAB > script to simulate the problem, but it also works in Octave: > X = ones(100000,10000); AFAIU this will create a matrix with 10^9 elements and initialize them to 1. I am not familiar with octave but do you happen to know what is the data type used for the element? 8B? It would be also interesting to know how is the matrix organized and initialized. Does it fit into memory at all? > I have tried to simulate the problem on a desktop installation with > 4GB of RAM, 10GB of swap partition, installed Ubuntu Lucid and then > upgraded to 12.04, the problem isn't there, but the input is still > quite choppy during the load. After the script finishes, everything > looks fine. For the desktop installation the hard drive is not an > SSD hard drive. What is the kernel version used here? > On 12/04/13 12:20, Michal Hocko wrote: > >[CCing Mel and Johannes] > >On Fri 05-04-13 22:41:37, Ivan Danov wrote: > >>Here you can find attached the script, collecting the logs and the > >>logs themselves during the described process of freezing. It > >>appeared that the previous logs are corrupted, because both > >>/proc/vmstat and /proc/meminfo have been logging to the same file. > >Sorry for the late reply: > >$ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}' > >min:165256 max:3254516 avg:1642475.35 > > > >So the free memory dropped down to 165M at minimum. This doesn't sound > >terribly low and the average free memory was even above 1.5G. But maybe > >the memory consumption peak was very short between 2 measured moments. > > > >The peak seems to be around this time: > >meminfo.1365194083:MemFree: 650792 kB > >meminfo.1365194085:MemFree: 664920 kB > >meminfo.1365194087:MemFree: 165256 kB <<< > >meminfo.1365194089:MemFree: 822968 kB > >meminfo.1365194094:MemFree: 666940 kB > > > >Let's have a look at the memory reclaim activity > >vmstat.1365194085:pgscan_kswapd_dma32 760 > >vmstat.1365194085:pgscan_kswapd_normal 10444 > > > >vmstat.1365194087:pgscan_kswapd_dma32 760 > >vmstat.1365194087:pgscan_kswapd_normal 10444 > > > >vmstat.1365194089:pgscan_kswapd_dma32 5855 > >vmstat.1365194089:pgscan_kswapd_normal 80621 > > > >vmstat.1365194094:pgscan_kswapd_dma32 54333 > >vmstat.1365194094:pgscan_kswapd_normal 285562 > > > >[...] > >vmstat.1365194098:pgscan_kswapd_dma32 54333 > >vmstat.1365194098:pgscan_kswapd_normal 285562 > > > >vmstat.1365194100:pgscan_kswapd_dma32 55760 > >vmstat.1365194100:pgscan_kswapd_normal 289493 > > > >vmstat.1365194102:pgscan_kswapd_dma32 55760 > >vmstat.1365194102:pgscan_kswapd_normal 289493 > > > >So the background reclaim was active only twice for a short amount of > >time: > >- 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone > >- 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone > > > >The second one looks sane so we can ignore it for now but the first one > >scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd > >had hard time to find something to reclaim or it couldn't cope with the > >ongoing memory pressure. > > > >vmstat.1365194087:pgsteal_kswapd_dma32 373 > >vmstat.1365194087:pgsteal_kswapd_normal 9057 > > > >vmstat.1365194089:pgsteal_kswapd_dma32 3249 > >vmstat.1365194089:pgsteal_kswapd_normal 56756 > > > >vmstat.1365194094:pgsteal_kswapd_dma32 14731 > >vmstat.1365194094:pgsteal_kswapd_normal 221733 > > > >...087-...089 > > - dma32 scanned 5095, reclaimed 0 > > - normal scanned 70177, reclaimed 0 > >...089-...094 > > -dma32 scanned 48478, reclaimed 2876 > > - normal scanned 204941, reclaimed 164977 > > > >This shows that kswapd was not able to reclaim any page at first and > >then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in > >dma32 and 80% for normal) although normal zone seems to be doing much > >better. > > > >The direct reclaim was active during that time as well: > >vmstat.1365194089:pgscan_direct_dma32 0 > >vmstat.1365194089:pgscan_direct_normal 0 > > > >vmstat.1365194094:pgscan_direct_dma32 29339 > >vmstat.1365194094:pgscan_direct_normal 86869 > > > >which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed: > > > >vmstat.1365194089:pgsteal_direct_dma32 0 > >vmstat.1365194089:pgsteal_direct_normal 0 > > > >vmstat.1365194094:pgsteal_direct_dma32 6137 > >vmstat.1365194094:pgsteal_direct_normal 57677 > > > >225M in the normal zone but it was still not effective very much (~20% > >for dma32 and 66% for normal). > > > >vmstat.1365194087:nr_written 9013 > >vmstat.1365194089:nr_written 9013 > >vmstat.1365194094:nr_written 15387 > > > >Only around 24M have been written out during the massive scanning. > > > >So we have two problems here I guess. First is that there is not much > >reclaimable memory when the peak consumption starts and then we have > >hard times to balance dma32 zone. > > > >vmstat.1365194087:nr_shmem 103548 > >vmstat.1365194089:nr_shmem 102227 > >vmstat.1365194094:nr_shmem 100679 > > > >This tells us that you didn't have that many shmem pages allocated at > >the time (only 404M). So the /tmp backed by tmpfs shouldn't be the > >primary issue here. > > > >We still have a lot of anonymous memory though: > >vmstat.1365194087:nr_anon_pages 1430922 > >vmstat.1365194089:nr_anon_pages 1317009 > >vmstat.1365194094:nr_anon_pages 1540460 > > > >which is around 5.5G. It is interesting that the number of these pages > >even drops first and then starts growing again (between 089..094 by 870M > >while we reclaimed around the same amount). This would suggest that the > >load started trashing on swap but: > > > >meminfo.1365194087:SwapFree: 1999868 kB > >meminfo.1365194089:SwapFree: 1999808 kB > >meminfo.1365194094:SwapFree: 1784544 kB > > > >tells us that we swapped out only 210M after 1365194089. So we had to > >reclaim a lot of page cache during that time while the anonymous memory > >pressure was really high. > > > >vmstat.1365194087:nr_file_pages 428632 > >vmstat.1365194089:nr_file_pages 378132 > >vmstat.1365194094:nr_file_pages 192009 > > > >the page cache pages dropped down by 920M which covers the anon increase. > > > >This all suggests that the workload is simply too aggressive and the > >memory reclaim doesn't cope with it. > > > >Let's check the active and inactive lists (maybe we are not aging pages properly): > >meminfo.1365194087:Active(anon): 5613412 kB > >meminfo.1365194087:Active(file): 261180 kB > > > >meminfo.1365194089:Active(anon): 4794472 kB > >meminfo.1365194089:Active(file): 348396 kB > > > >meminfo.1365194094:Active(anon): 5424684 kB > >meminfo.1365194094:Active(file): 77364 kB > > > >meminfo.1365194087:Inactive(anon): 496092 kB > >meminfo.1365194087:Inactive(file): 1033184 kB > > > >meminfo.1365194089:Inactive(anon): 853608 kB > >meminfo.1365194089:Inactive(file): 749564 kB > > > >meminfo.1365194094:Inactive(anon): 1313648 kB > >meminfo.1365194094:Inactive(file): 82008 kB > > > >While the file LRUs looks good active Anon LRUs seem to be too big. This > >either suggests a bug in aging or the working set is really that big. > >Considering the previous data (increase during the memory pressure) I > >would be inclined to the second option. > > > >Just out of curiosity what is the vm_swappiness setting? You've said > >that you have changed that from 0 but it seems like we would swap much > >more. It almost looks like the swappiness is 0. Could you double check? > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-12 11:11 ` Michal Hocko @ 2013-04-12 12:38 ` Ivan Danov 2013-04-14 14:58 ` Michal Hocko 0 siblings, 1 reply; 17+ messages in thread From: Ivan Danov @ 2013-04-12 12:38 UTC (permalink / raw) To: Michal Hocko; +Cc: Simon Jeons, linux-mm, 1162073, Mel Gorman, Johannes Weiner On 12/04/13 13:11, Michal Hocko wrote: > On Fri 12-04-13 12:49:30, Ivan Danov wrote: >> $ cat /proc/sys/vm/swappiness >> 60 > OK, thanks for confirming this. It is really strange that we do not swap > almost at all, then. > >> I have increased my swap partition from nearly 2GB to around 16GB, >> but the problem remains. > Increasing the swap partition will not help much as it almost unused > with 2G already (at least last data shown that). > >> Here I attach the logs for the larger swap partition. I use a MATLAB >> script to simulate the problem, but it also works in Octave: >> X = ones(100000,10000); > AFAIU this will create a matrix with 10^9 elements and initialize them > to 1. I am not familiar with octave but do you happen to know what is > the data type used for the element? 8B? It would be also interesting to > know how is the matrix organized and initialized. Does it fit into > memory at all? Yes, 8B each, so it will be almost 8GB and it should fit into the memory. I don't know details how it actually works, but if it cannot create the matrix, MATLAB complains about that. Since it starts complaining even after 2000 more in the second dimension, maybe it needs the RAM to create it all. However on the desktop machine, both RAM and swap are being used (quite a lot of them both). > >> I have tried to simulate the problem on a desktop installation with >> 4GB of RAM, 10GB of swap partition, installed Ubuntu Lucid and then >> upgraded to 12.04, the problem isn't there, but the input is still >> quite choppy during the load. After the script finishes, everything >> looks fine. For the desktop installation the hard drive is not an >> SSD hard drive. > What is the kernel version used here? $ uname -a Linux ivan 3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux > >> On 12/04/13 12:20, Michal Hocko wrote: >>> [CCing Mel and Johannes] >>> On Fri 05-04-13 22:41:37, Ivan Danov wrote: >>>> Here you can find attached the script, collecting the logs and the >>>> logs themselves during the described process of freezing. It >>>> appeared that the previous logs are corrupted, because both >>>> /proc/vmstat and /proc/meminfo have been logging to the same file. >>> Sorry for the late reply: >>> $ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}' >>> min:165256 max:3254516 avg:1642475.35 >>> >>> So the free memory dropped down to 165M at minimum. This doesn't sound >>> terribly low and the average free memory was even above 1.5G. But maybe >>> the memory consumption peak was very short between 2 measured moments. >>> >>> The peak seems to be around this time: >>> meminfo.1365194083:MemFree: 650792 kB >>> meminfo.1365194085:MemFree: 664920 kB >>> meminfo.1365194087:MemFree: 165256 kB <<< >>> meminfo.1365194089:MemFree: 822968 kB >>> meminfo.1365194094:MemFree: 666940 kB >>> >>> Let's have a look at the memory reclaim activity >>> vmstat.1365194085:pgscan_kswapd_dma32 760 >>> vmstat.1365194085:pgscan_kswapd_normal 10444 >>> >>> vmstat.1365194087:pgscan_kswapd_dma32 760 >>> vmstat.1365194087:pgscan_kswapd_normal 10444 >>> >>> vmstat.1365194089:pgscan_kswapd_dma32 5855 >>> vmstat.1365194089:pgscan_kswapd_normal 80621 >>> >>> vmstat.1365194094:pgscan_kswapd_dma32 54333 >>> vmstat.1365194094:pgscan_kswapd_normal 285562 >>> >>> [...] >>> vmstat.1365194098:pgscan_kswapd_dma32 54333 >>> vmstat.1365194098:pgscan_kswapd_normal 285562 >>> >>> vmstat.1365194100:pgscan_kswapd_dma32 55760 >>> vmstat.1365194100:pgscan_kswapd_normal 289493 >>> >>> vmstat.1365194102:pgscan_kswapd_dma32 55760 >>> vmstat.1365194102:pgscan_kswapd_normal 289493 >>> >>> So the background reclaim was active only twice for a short amount of >>> time: >>> - 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone >>> - 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone >>> >>> The second one looks sane so we can ignore it for now but the first one >>> scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd >>> had hard time to find something to reclaim or it couldn't cope with the >>> ongoing memory pressure. >>> >>> vmstat.1365194087:pgsteal_kswapd_dma32 373 >>> vmstat.1365194087:pgsteal_kswapd_normal 9057 >>> >>> vmstat.1365194089:pgsteal_kswapd_dma32 3249 >>> vmstat.1365194089:pgsteal_kswapd_normal 56756 >>> >>> vmstat.1365194094:pgsteal_kswapd_dma32 14731 >>> vmstat.1365194094:pgsteal_kswapd_normal 221733 >>> >>> ...087-...089 >>> - dma32 scanned 5095, reclaimed 0 >>> - normal scanned 70177, reclaimed 0 >>> ...089-...094 >>> -dma32 scanned 48478, reclaimed 2876 >>> - normal scanned 204941, reclaimed 164977 >>> >>> This shows that kswapd was not able to reclaim any page at first and >>> then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in >>> dma32 and 80% for normal) although normal zone seems to be doing much >>> better. >>> >>> The direct reclaim was active during that time as well: >>> vmstat.1365194089:pgscan_direct_dma32 0 >>> vmstat.1365194089:pgscan_direct_normal 0 >>> >>> vmstat.1365194094:pgscan_direct_dma32 29339 >>> vmstat.1365194094:pgscan_direct_normal 86869 >>> >>> which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed: >>> >>> vmstat.1365194089:pgsteal_direct_dma32 0 >>> vmstat.1365194089:pgsteal_direct_normal 0 >>> >>> vmstat.1365194094:pgsteal_direct_dma32 6137 >>> vmstat.1365194094:pgsteal_direct_normal 57677 >>> >>> 225M in the normal zone but it was still not effective very much (~20% >>> for dma32 and 66% for normal). >>> >>> vmstat.1365194087:nr_written 9013 >>> vmstat.1365194089:nr_written 9013 >>> vmstat.1365194094:nr_written 15387 >>> >>> Only around 24M have been written out during the massive scanning. >>> >>> So we have two problems here I guess. First is that there is not much >>> reclaimable memory when the peak consumption starts and then we have >>> hard times to balance dma32 zone. >>> >>> vmstat.1365194087:nr_shmem 103548 >>> vmstat.1365194089:nr_shmem 102227 >>> vmstat.1365194094:nr_shmem 100679 >>> >>> This tells us that you didn't have that many shmem pages allocated at >>> the time (only 404M). So the /tmp backed by tmpfs shouldn't be the >>> primary issue here. >>> >>> We still have a lot of anonymous memory though: >>> vmstat.1365194087:nr_anon_pages 1430922 >>> vmstat.1365194089:nr_anon_pages 1317009 >>> vmstat.1365194094:nr_anon_pages 1540460 >>> >>> which is around 5.5G. It is interesting that the number of these pages >>> even drops first and then starts growing again (between 089..094 by 870M >>> while we reclaimed around the same amount). This would suggest that the >>> load started trashing on swap but: >>> >>> meminfo.1365194087:SwapFree: 1999868 kB >>> meminfo.1365194089:SwapFree: 1999808 kB >>> meminfo.1365194094:SwapFree: 1784544 kB >>> >>> tells us that we swapped out only 210M after 1365194089. So we had to >>> reclaim a lot of page cache during that time while the anonymous memory >>> pressure was really high. >>> >>> vmstat.1365194087:nr_file_pages 428632 >>> vmstat.1365194089:nr_file_pages 378132 >>> vmstat.1365194094:nr_file_pages 192009 >>> >>> the page cache pages dropped down by 920M which covers the anon increase. >>> >>> This all suggests that the workload is simply too aggressive and the >>> memory reclaim doesn't cope with it. >>> >>> Let's check the active and inactive lists (maybe we are not aging pages properly): >>> meminfo.1365194087:Active(anon): 5613412 kB >>> meminfo.1365194087:Active(file): 261180 kB >>> >>> meminfo.1365194089:Active(anon): 4794472 kB >>> meminfo.1365194089:Active(file): 348396 kB >>> >>> meminfo.1365194094:Active(anon): 5424684 kB >>> meminfo.1365194094:Active(file): 77364 kB >>> >>> meminfo.1365194087:Inactive(anon): 496092 kB >>> meminfo.1365194087:Inactive(file): 1033184 kB >>> >>> meminfo.1365194089:Inactive(anon): 853608 kB >>> meminfo.1365194089:Inactive(file): 749564 kB >>> >>> meminfo.1365194094:Inactive(anon): 1313648 kB >>> meminfo.1365194094:Inactive(file): 82008 kB >>> >>> While the file LRUs looks good active Anon LRUs seem to be too big. This >>> either suggests a bug in aging or the working set is really that big. >>> Considering the previous data (increase during the memory pressure) I >>> would be inclined to the second option. >>> >>> Just out of curiosity what is the vm_swappiness setting? You've said >>> that you have changed that from 0 but it seems like we would swap much >>> more. It almost looks like the swappiness is 0. Could you double check? > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-12 12:38 ` Ivan Danov @ 2013-04-14 14:58 ` Michal Hocko 2013-05-14 15:25 ` Ivan Danov 0 siblings, 1 reply; 17+ messages in thread From: Michal Hocko @ 2013-04-14 14:58 UTC (permalink / raw) To: Ivan Danov; +Cc: Simon Jeons, linux-mm, 1162073, Mel Gorman, Johannes Weiner On Fri 12-04-13 14:38:00, Ivan Danov wrote: > On 12/04/13 13:11, Michal Hocko wrote: > >On Fri 12-04-13 12:49:30, Ivan Danov wrote: > >>$ cat /proc/sys/vm/swappiness > >>60 > >OK, thanks for confirming this. It is really strange that we do not swap > >almost at all, then. > >>I have increased my swap partition from nearly 2GB to around 16GB, > >>but the problem remains. > >Increasing the swap partition will not help much as it almost unused > >with 2G already (at least last data shown that). > > > >>Here I attach the logs for the larger swap partition. I use a MATLAB > >>script to simulate the problem, but it also works in Octave: > >>X = ones(100000,10000); > >AFAIU this will create a matrix with 10^9 elements and initialize them > >to 1. I am not familiar with octave but do you happen to know what is > >the data type used for the element? 8B? It would be also interesting to > >know how is the matrix organized and initialized. Does it fit into > >memory at all? > Yes, 8B each, so it will be almost 8GB and it should fit into the > memory. It won't fit in because kernel and other processes consume some memory as well. So you have to swap. > I don't know details how it actually works, but if it cannot > create the matrix, MATLAB complains about that. Since it starts > complaining even after 2000 more in the second dimension, maybe it > needs the RAM to create it all. However on the desktop machine, both > RAM and swap are being used (quite a lot of them both). How much you swap depends on vm.swappiness. I would suggest increasing the value if your workload is really so anononymous memory based. Otherwise a lot of file pages are reclaimed which can lead to problems you are seeing. > >>I have tried to simulate the problem on a desktop installation with > >>4GB of RAM, 10GB of swap partition, installed Ubuntu Lucid and then > >>upgraded to 12.04, the problem isn't there, but the input is still > >>quite choppy during the load. After the script finishes, everything > >>looks fine. For the desktop installation the hard drive is not an > >>SSD hard drive. > >What is the kernel version used here? > $ uname -a > Linux ivan 3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC > 2013 x86_64 x86_64 x86_64 GNU/Linux Is there any chance you could test with the latest vanilla kernel and Mel's patches from https://lkml.org/lkml/2013/4/11/516 on top? [...] -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-14 14:58 ` Michal Hocko @ 2013-05-14 15:25 ` Ivan Danov 0 siblings, 0 replies; 17+ messages in thread From: Ivan Danov @ 2013-05-14 15:25 UTC (permalink / raw) To: Michal Hocko; +Cc: Simon Jeons, linux-mm, 1162073, Mel Gorman, Johannes Weiner -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 14/04/13 16:58, Michal Hocko wrote: > On Fri 12-04-13 14:38:00, Ivan Danov wrote: >> On 12/04/13 13:11, Michal Hocko wrote: >>> On Fri 12-04-13 12:49:30, Ivan Danov wrote: >>>> $ cat /proc/sys/vm/swappiness >>>> 60 >>> OK, thanks for confirming this. It is really strange that we do not swap >>> almost at all, then. >>>> I have increased my swap partition from nearly 2GB to around 16GB, >>>> but the problem remains. >>> Increasing the swap partition will not help much as it almost unused >>> with 2G already (at least last data shown that). >>> >>>> Here I attach the logs for the larger swap partition. I use a MATLAB >>>> script to simulate the problem, but it also works in Octave: >>>> X = ones(100000,10000); >>> AFAIU this will create a matrix with 10^9 elements and initialize them >>> to 1. I am not familiar with octave but do you happen to know what is >>> the data type used for the element? 8B? It would be also interesting to >>> know how is the matrix organized and initialized. Does it fit into >>> memory at all? >> Yes, 8B each, so it will be almost 8GB and it should fit into the >> memory. > > It won't fit in because kernel and other processes consume some memory > as well. So you have to swap. > >> I don't know details how it actually works, but if it cannot >> create the matrix, MATLAB complains about that. Since it starts >> complaining even after 2000 more in the second dimension, maybe it >> needs the RAM to create it all. However on the desktop machine, both >> RAM and swap are being used (quite a lot of them both). > > How much you swap depends on vm.swappiness. I would suggest increasing > the value if your workload is really so anononymous memory based. > Otherwise a lot of file pages are reclaimed which can lead to problems > you are seeing. > >>>> I have tried to simulate the problem on a desktop installation with >>>> 4GB of RAM, 10GB of swap partition, installed Ubuntu Lucid and then >>>> upgraded to 12.04, the problem isn't there, but the input is still >>>> quite choppy during the load. After the script finishes, everything >>>> looks fine. For the desktop installation the hard drive is not an >>>> SSD hard drive. >>> What is the kernel version used here? >> $ uname -a >> Linux ivan 3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC >> 2013 x86_64 x86_64 x86_64 GNU/Linux > > Is there any chance you could test with the latest vanilla kernel and > Mel's patches from https://lkml.org/lkml/2013/4/11/516 on top? Sorry for the late comeback, I've been quite busy these days. I will try to compile the kernel in the weekend. However, I will do kernel compiling for first time, so maybe it could take me some time. Btw I have upgraded to Ubuntu 13.04, so my kernel now is: $ uname -a Linux ivan 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux The problem is still there. > > [...] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRkldrAAoJEJ8e6XRVH9TGgAIIAI7hw+YWhtiZ3LAr4SOQRvce xkRcSFUjmhW3FCSm5TERtUY6Ney3hJ9NU7I4yA56WGwOf2E6GnNkG0plewNMsWrM 4ZCFevsm1MoGP5576PUm8F0FF/0EfpFRLLwNNB7dDYDsXdmG8KlOYjlEB5H31lrR Ycx155ZvUgVUQXNg0tthaPoy8Qaw5sGI062d9tRA4f45fh7KFhX58HVHKT+L6LtV caLXwdH5Wkzi+Xshl6h8BGnB57fjQCDCEdltR3f1ddbUnh5kjvXKLGUWvuut7Ish vkbOTfSBbjOPtQly6DJ/xxS8HbSIsdDU4ecfWHGztl6r1lteFJDiqFA+bPsBszU= =oULy -----END PGP SIGNATURE----- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-12 10:20 ` Michal Hocko 2013-04-12 10:49 ` Ivan Danov @ 2013-04-15 5:03 ` Simon Jeons 2013-04-15 14:12 ` Michal Hocko 1 sibling, 1 reply; 17+ messages in thread From: Simon Jeons @ 2013-04-15 5:03 UTC (permalink / raw) To: Michal Hocko; +Cc: Ivan Danov, linux-mm, 1162073, Mel Gorman, Johannes Weiner Hi Michal, On 04/12/2013 06:20 PM, Michal Hocko wrote: > [CCing Mel and Johannes] > On Fri 05-04-13 22:41:37, Ivan Danov wrote: >> Here you can find attached the script, collecting the logs and the >> logs themselves during the described process of freezing. It >> appeared that the previous logs are corrupted, because both >> /proc/vmstat and /proc/meminfo have been logging to the same file. > Sorry for the late reply: > $ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}' > min:165256 max:3254516 avg:1642475.35 > > So the free memory dropped down to 165M at minimum. This doesn't sound > terribly low and the average free memory was even above 1.5G. But maybe > the memory consumption peak was very short between 2 measured moments. > > The peak seems to be around this time: > meminfo.1365194083:MemFree: 650792 kB > meminfo.1365194085:MemFree: 664920 kB > meminfo.1365194087:MemFree: 165256 kB <<< > meminfo.1365194089:MemFree: 822968 kB > meminfo.1365194094:MemFree: 666940 kB > > Let's have a look at the memory reclaim activity > vmstat.1365194085:pgscan_kswapd_dma32 760 > vmstat.1365194085:pgscan_kswapd_normal 10444 > > vmstat.1365194087:pgscan_kswapd_dma32 760 > vmstat.1365194087:pgscan_kswapd_normal 10444 > > vmstat.1365194089:pgscan_kswapd_dma32 5855 > vmstat.1365194089:pgscan_kswapd_normal 80621 > > vmstat.1365194094:pgscan_kswapd_dma32 54333 > vmstat.1365194094:pgscan_kswapd_normal 285562 > > [...] > vmstat.1365194098:pgscan_kswapd_dma32 54333 > vmstat.1365194098:pgscan_kswapd_normal 285562 > > vmstat.1365194100:pgscan_kswapd_dma32 55760 > vmstat.1365194100:pgscan_kswapd_normal 289493 > > vmstat.1365194102:pgscan_kswapd_dma32 55760 > vmstat.1365194102:pgscan_kswapd_normal 289493 > > So the background reclaim was active only twice for a short amount of > time: > - 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone > - 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone > > The second one looks sane so we can ignore it for now but the first one > scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd > had hard time to find something to reclaim or it couldn't cope with the > ongoing memory pressure. > > vmstat.1365194087:pgsteal_kswapd_dma32 373 > vmstat.1365194087:pgsteal_kswapd_normal 9057 > > vmstat.1365194089:pgsteal_kswapd_dma32 3249 > vmstat.1365194089:pgsteal_kswapd_normal 56756 > > vmstat.1365194094:pgsteal_kswapd_dma32 14731 > vmstat.1365194094:pgsteal_kswapd_normal 221733 > > ...087-...089 > - dma32 scanned 5095, reclaimed 0 > - normal scanned 70177, reclaimed 0 This is not correct. - dma32 scanned 5095, reclaimed 2876, effective = 56% - normal scanned 70177, reclaimed 47699, effective = 68% > ...089-...094 > -dma32 scanned 48478, reclaimed 2876 > - normal scanned 204941, reclaimed 164977 - dma32 scanned 48478, reclaimed 11482, effective = 23% - normal scanned 204941, reclaimed 164977, effective = 80% > This shows that kswapd was not able to reclaim any page at first and > then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in > dma32 and 80% for normal) although normal zone seems to be doing much > better. > > The direct reclaim was active during that time as well: > vmstat.1365194089:pgscan_direct_dma32 0 > vmstat.1365194089:pgscan_direct_normal 0 > > vmstat.1365194094:pgscan_direct_dma32 29339 > vmstat.1365194094:pgscan_direct_normal 86869 > > which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed: > > vmstat.1365194089:pgsteal_direct_dma32 0 > vmstat.1365194089:pgsteal_direct_normal 0 > > vmstat.1365194094:pgsteal_direct_dma32 6137 > vmstat.1365194094:pgsteal_direct_normal 57677 > > 225M in the normal zone but it was still not effective very much (~20% > for dma32 and 66% for normal). > > vmstat.1365194087:nr_written 9013 > vmstat.1365194089:nr_written 9013 > vmstat.1365194094:nr_written 15387 > > Only around 24M have been written out during the massive scanning. > > So we have two problems here I guess. First is that there is not much > reclaimable memory when the peak consumption starts and then we have > hard times to balance dma32 zone. > > vmstat.1365194087:nr_shmem 103548 > vmstat.1365194089:nr_shmem 102227 > vmstat.1365194094:nr_shmem 100679 > > This tells us that you didn't have that many shmem pages allocated at > the time (only 404M). So the /tmp backed by tmpfs shouldn't be the > primary issue here. > > We still have a lot of anonymous memory though: > vmstat.1365194087:nr_anon_pages 1430922 > vmstat.1365194089:nr_anon_pages 1317009 > vmstat.1365194094:nr_anon_pages 1540460 > > which is around 5.5G. It is interesting that the number of these pages > even drops first and then starts growing again (between 089..094 by 870M > while we reclaimed around the same amount). This would suggest that the > load started trashing on swap but: > > meminfo.1365194087:SwapFree: 1999868 kB > meminfo.1365194089:SwapFree: 1999808 kB > meminfo.1365194094:SwapFree: 1784544 kB > > tells us that we swapped out only 210M after 1365194089. So we had to How about set vm.swapiness to 200? > reclaim a lot of page cache during that time while the anonymous memory > pressure was really high. > > vmstat.1365194087:nr_file_pages 428632 > vmstat.1365194089:nr_file_pages 378132 > vmstat.1365194094:nr_file_pages 192009 > > the page cache pages dropped down by 920M which covers the anon increase. > > This all suggests that the workload is simply too aggressive and the > memory reclaim doesn't cope with it. > > Let's check the active and inactive lists (maybe we are not aging pages properly): > meminfo.1365194087:Active(anon): 5613412 kB > meminfo.1365194087:Active(file): 261180 kB > > meminfo.1365194089:Active(anon): 4794472 kB > meminfo.1365194089:Active(file): 348396 kB > > meminfo.1365194094:Active(anon): 5424684 kB > meminfo.1365194094:Active(file): 77364 kB > > meminfo.1365194087:Inactive(anon): 496092 kB > meminfo.1365194087:Inactive(file): 1033184 kB > > meminfo.1365194089:Inactive(anon): 853608 kB > meminfo.1365194089:Inactive(file): 749564 kB > > meminfo.1365194094:Inactive(anon): 1313648 kB > meminfo.1365194094:Inactive(file): 82008 kB > > While the file LRUs looks good active Anon LRUs seem to be too big. This > either suggests a bug in aging or the working set is really that big. > Considering the previous data (increase during the memory pressure) I > would be inclined to the second option. > > Just out of curiosity what is the vm_swappiness setting? You've said > that you have changed that from 0 but it seems like we would swap much > more. It almost looks like the swappiness is 0. Could you double check? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: System freezes when RAM is full (64-bit) 2013-04-15 5:03 ` Simon Jeons @ 2013-04-15 14:12 ` Michal Hocko 0 siblings, 0 replies; 17+ messages in thread From: Michal Hocko @ 2013-04-15 14:12 UTC (permalink / raw) To: Simon Jeons; +Cc: Ivan Danov, linux-mm, 1162073, Mel Gorman, Johannes Weiner On Mon 15-04-13 13:03:34, Simon Jeons wrote: > Hi Michal, > On 04/12/2013 06:20 PM, Michal Hocko wrote: > >[CCing Mel and Johannes] > >On Fri 05-04-13 22:41:37, Ivan Danov wrote: > >>Here you can find attached the script, collecting the logs and the > >>logs themselves during the described process of freezing. It > >>appeared that the previous logs are corrupted, because both > >>/proc/vmstat and /proc/meminfo have been logging to the same file. > >Sorry for the late reply: > >$ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}' > >min:165256 max:3254516 avg:1642475.35 > > > >So the free memory dropped down to 165M at minimum. This doesn't sound > >terribly low and the average free memory was even above 1.5G. But maybe > >the memory consumption peak was very short between 2 measured moments. > > > >The peak seems to be around this time: > >meminfo.1365194083:MemFree: 650792 kB > >meminfo.1365194085:MemFree: 664920 kB > >meminfo.1365194087:MemFree: 165256 kB <<< > >meminfo.1365194089:MemFree: 822968 kB > >meminfo.1365194094:MemFree: 666940 kB > > > >Let's have a look at the memory reclaim activity > >vmstat.1365194085:pgscan_kswapd_dma32 760 > >vmstat.1365194085:pgscan_kswapd_normal 10444 > > > >vmstat.1365194087:pgscan_kswapd_dma32 760 > >vmstat.1365194087:pgscan_kswapd_normal 10444 > > > >vmstat.1365194089:pgscan_kswapd_dma32 5855 > >vmstat.1365194089:pgscan_kswapd_normal 80621 > > > >vmstat.1365194094:pgscan_kswapd_dma32 54333 > >vmstat.1365194094:pgscan_kswapd_normal 285562 > > > >[...] > >vmstat.1365194098:pgscan_kswapd_dma32 54333 > >vmstat.1365194098:pgscan_kswapd_normal 285562 > > > >vmstat.1365194100:pgscan_kswapd_dma32 55760 > >vmstat.1365194100:pgscan_kswapd_normal 289493 > > > >vmstat.1365194102:pgscan_kswapd_dma32 55760 > >vmstat.1365194102:pgscan_kswapd_normal 289493 > > > >So the background reclaim was active only twice for a short amount of > >time: > >- 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone > >- 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone > > > >The second one looks sane so we can ignore it for now but the first one > >scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd > >had hard time to find something to reclaim or it couldn't cope with the > >ongoing memory pressure. > > > >vmstat.1365194087:pgsteal_kswapd_dma32 373 > >vmstat.1365194087:pgsteal_kswapd_normal 9057 > > > >vmstat.1365194089:pgsteal_kswapd_dma32 3249 > >vmstat.1365194089:pgsteal_kswapd_normal 56756 > > > >vmstat.1365194094:pgsteal_kswapd_dma32 14731 > >vmstat.1365194094:pgsteal_kswapd_normal 221733 > > > >...087-...089 > > - dma32 scanned 5095, reclaimed 0 > > - normal scanned 70177, reclaimed 0 > > This is not correct. > - dma32 scanned 5095, reclaimed 2876, effective = 56% > - normal scanned 70177, reclaimed 47699, effective = 68% Right you are! I've made a mistake compared wrong timestamps. > >...089-...094 > > -dma32 scanned 48478, reclaimed 2876 > > - normal scanned 204941, reclaimed 164977 > > - dma32 scanned 48478, reclaimed 11482, effective = 23% > - normal scanned 204941, reclaimed 164977, effective = 80% Same here. > > >This shows that kswapd was not able to reclaim any page at first and > >then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in > >dma32 and 80% for normal) although normal zone seems to be doing much > >better. > > > >The direct reclaim was active during that time as well: > >vmstat.1365194089:pgscan_direct_dma32 0 > >vmstat.1365194089:pgscan_direct_normal 0 > > > >vmstat.1365194094:pgscan_direct_dma32 29339 > >vmstat.1365194094:pgscan_direct_normal 86869 > > > >which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed: > > > >vmstat.1365194089:pgsteal_direct_dma32 0 > >vmstat.1365194089:pgsteal_direct_normal 0 > > > >vmstat.1365194094:pgsteal_direct_dma32 6137 > >vmstat.1365194094:pgsteal_direct_normal 57677 > > > >225M in the normal zone but it was still not effective very much (~20% > >for dma32 and 66% for normal). > > > >vmstat.1365194087:nr_written 9013 > >vmstat.1365194089:nr_written 9013 > >vmstat.1365194094:nr_written 15387 > > > >Only around 24M have been written out during the massive scanning. > > > >So we have two problems here I guess. First is that there is not much > >reclaimable memory when the peak consumption starts and then we have > >hard times to balance dma32 zone. > > > >vmstat.1365194087:nr_shmem 103548 > >vmstat.1365194089:nr_shmem 102227 > >vmstat.1365194094:nr_shmem 100679 > > > >This tells us that you didn't have that many shmem pages allocated at > >the time (only 404M). So the /tmp backed by tmpfs shouldn't be the > >primary issue here. > > > >We still have a lot of anonymous memory though: > >vmstat.1365194087:nr_anon_pages 1430922 > >vmstat.1365194089:nr_anon_pages 1317009 > >vmstat.1365194094:nr_anon_pages 1540460 > > > >which is around 5.5G. It is interesting that the number of these pages > >even drops first and then starts growing again (between 089..094 by 870M > >while we reclaimed around the same amount). This would suggest that the > >load started trashing on swap but: > > > >meminfo.1365194087:SwapFree: 1999868 kB > >meminfo.1365194089:SwapFree: 1999808 kB > >meminfo.1365194094:SwapFree: 1784544 kB > > > >tells us that we swapped out only 210M after 1365194089. So we had to > > How about set vm.swapiness to 200? swappiness is limited to 0 to 100 values. And it treats anon vs. file LRUs equally at 100. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2013-05-14 15:25 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-01 19:14 System freezes when RAM is full (64-bit) Ivan Danov 2013-04-03 12:12 ` Michal Hocko 2013-04-04 0:27 ` Simon Jeons 2013-04-04 7:08 ` Michal Hocko 2013-04-04 14:10 ` Ivan Danov 2013-04-04 15:16 ` Michal Hocko 2013-04-05 10:13 ` Ivan Danov 2013-04-05 11:59 ` Michal Hocko 2013-04-05 20:41 ` Ivan Danov 2013-04-12 10:20 ` Michal Hocko 2013-04-12 10:49 ` Ivan Danov 2013-04-12 11:11 ` Michal Hocko 2013-04-12 12:38 ` Ivan Danov 2013-04-14 14:58 ` Michal Hocko 2013-05-14 15:25 ` Ivan Danov 2013-04-15 5:03 ` Simon Jeons 2013-04-15 14:12 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).