System freezes when RAM is full (64-bit)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* System freezes when RAM is full (64-bit)
@ 2013-04-01 19:14 Ivan Danov
  2013-04-03 12:12 ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Ivan Danov @ 2013-04-01 19:14 UTC (permalink / raw)
  To: linux-mm; +Cc: 1162073

[-- Attachment #1: Type: text/plain, Size: 651 bytes --]

The system freezes when RAM gets completely full. By using MATLAB, I can 
get all 8GB RAM of my laptop full and it immediately freezes, needing 
restart using the hardware button. Other people have reported the bug at 
since 2007. It seems that only the 64-bit version is affected and people 
have reported that enabling DMA in BIOS settings solve the problem. 
However, my laptop lacks such an option in the BIOS settings, so I am 
unable to test it. More information about the bug could be found at: 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1162073 and 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356.

Best Regards,
Ivan

[-- Attachment #2: Type: text/html, Size: 1214 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-01 19:14 System freezes when RAM is full (64-bit) Ivan Danov
@ 2013-04-03 12:12 ` Michal Hocko
  2013-04-04  0:27   ` Simon Jeons
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2013-04-03 12:12 UTC (permalink / raw)
  To: Ivan Danov; +Cc: linux-mm, 1162073

On Mon 01-04-13 21:14:40, Ivan Danov wrote:
> The system freezes when RAM gets completely full. By using MATLAB, I
> can get all 8GB RAM of my laptop full and it immediately freezes,
> needing restart using the hardware button.

Do you use swap (file/partition)? How big? Could you collect
/proc/meminfo and /proc/vmstat (every few seconds)[1]?
What does it mean when you say the system freezes? No new processes can
be started or desktop environment doesn't react on your input? Do you
see anything in the kernel log? OOM killer e.g.
In case no new processes could be started what does sysrq+m say when the
system is frozen?

What is your kernel config?

> Other people have
> reported the bug at since 2007. It seems that only the 64-bit
> version is affected and people have reported that enabling DMA in
> BIOS settings solve the problem. However, my laptop lacks such an
> option in the BIOS settings, so I am unable to test it. More
> information about the bug could be found at:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1162073 and
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356.
> 
> Best Regards,
> Ivan
> 

---
[1] E.g. by
while true
do
	STAMP=`date +%s`
	cat /proc/meminfo > meminfo.$STAMP
	cat /proc/vmscan > meminfo.$STAMP
	sleep 2s
done
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-03 12:12 ` Michal Hocko
@ 2013-04-04  0:27   ` Simon Jeons
  2013-04-04  7:08     ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Simon Jeons @ 2013-04-04  0:27 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Ivan Danov, linux-mm, 1162073

On 04/03/2013 08:12 PM, Michal Hocko wrote:
> On Mon 01-04-13 21:14:40, Ivan Danov wrote:
>> The system freezes when RAM gets completely full. By using MATLAB, I
>> can get all 8GB RAM of my laptop full and it immediately freezes,
>> needing restart using the hardware button.
> Do you use swap (file/partition)? How big? Could you collect
> /proc/meminfo and /proc/vmstat (every few seconds)[1]?
> What does it mean when you say the system freezes? No new processes can
> be started or desktop environment doesn't react on your input? Do you
> see anything in the kernel log? OOM killer e.g.
> In case no new processes could be started what does sysrq+m say when the
> system is frozen?
>
> What is your kernel config?
>
>> Other people have
>> reported the bug at since 2007. It seems that only the 64-bit
>> version is affected and people have reported that enabling DMA in
>> BIOS settings solve the problem. However, my laptop lacks such an
>> option in the BIOS settings, so I am unable to test it. More
>> information about the bug could be found at:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1162073 and
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356.
>>
>> Best Regards,
>> Ivan
>>
> ---
> [1] E.g. by
> while true
> do
> 	STAMP=`date +%s`
> 	cat /proc/meminfo > meminfo.$STAMP
> 	cat /proc/vmscan > meminfo.$STAMP

s/vmscan/vmstat

> 	sleep 2s
> done

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-04  0:27   ` Simon Jeons
@ 2013-04-04  7:08     ` Michal Hocko
  2013-04-04 14:10       ` Ivan Danov
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2013-04-04  7:08 UTC (permalink / raw)
  To: Simon Jeons; +Cc: Ivan Danov, linux-mm, 1162073

On Thu 04-04-13 08:27:18, Simon Jeons wrote:
> On 04/03/2013 08:12 PM, Michal Hocko wrote:
> >On Mon 01-04-13 21:14:40, Ivan Danov wrote:
> >>The system freezes when RAM gets completely full. By using MATLAB, I
> >>can get all 8GB RAM of my laptop full and it immediately freezes,
> >>needing restart using the hardware button.
> >Do you use swap (file/partition)? How big? Could you collect
> >/proc/meminfo and /proc/vmstat (every few seconds)[1]?
> >What does it mean when you say the system freezes? No new processes can
> >be started or desktop environment doesn't react on your input? Do you
> >see anything in the kernel log? OOM killer e.g.
> >In case no new processes could be started what does sysrq+m say when the
> >system is frozen?
> >
> >What is your kernel config?
> >
> >>Other people have
> >>reported the bug at since 2007. It seems that only the 64-bit
> >>version is affected and people have reported that enabling DMA in
> >>BIOS settings solve the problem. However, my laptop lacks such an
> >>option in the BIOS settings, so I am unable to test it. More
> >>information about the bug could be found at:
> >>https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1162073 and
> >>https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356.
> >>
> >>Best Regards,
> >>Ivan
> >>
> >---
> >[1] E.g. by
> >while true
> >do
> >	STAMP=`date +%s`
> >	cat /proc/meminfo > meminfo.$STAMP
> >	cat /proc/vmscan > meminfo.$STAMP
> 
> s/vmscan/vmstat

Right. Sorry about the typo and thanks for pointing out Simon.

> 
> >	sleep 2s
> >done
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-04  7:08     ` Michal Hocko
@ 2013-04-04 14:10       ` Ivan Danov
  2013-04-04 15:16         ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Ivan Danov @ 2013-04-04 14:10 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Simon Jeons, linux-mm, 1162073


[-- Attachment #1.1: Type: text/plain, Size: 2214 bytes --]

Hi Michal,

Yes, I use swap partition (2GB), but I have applied some things for 
keeping the life of the SSD hard drive longer. All the things I have 
done are under point 3. at 
http://www.rileybrandt.com/2012/11/18/linux-ultrabook/. By system 
freezes, I mean that the desktop environment doesn't react on my input. 
Just sometimes the mouse is reacting very very choppy and slowly, but 
most of the times it is not reacting at all. In the attached file, I 
have the output of the script and the content of dmesg for all levels 
from warn to emerg, as well as my kernel config.

Best,
Ivan
--
On 04/04/13 09:08, Michal Hocko wrote:
> On Thu 04-04-13 08:27:18, Simon Jeons wrote:
>> On 04/03/2013 08:12 PM, Michal Hocko wrote:
>>> On Mon 01-04-13 21:14:40, Ivan Danov wrote:
>>>> The system freezes when RAM gets completely full. By using MATLAB, I
>>>> can get all 8GB RAM of my laptop full and it immediately freezes,
>>>> needing restart using the hardware button.
>>> Do you use swap (file/partition)? How big? Could you collect
>>> /proc/meminfo and /proc/vmstat (every few seconds)[1]?
>>> What does it mean when you say the system freezes? No new processes can
>>> be started or desktop environment doesn't react on your input? Do you
>>> see anything in the kernel log? OOM killer e.g.
>>> In case no new processes could be started what does sysrq+m say when the
>>> system is frozen?
>>>
>>> What is your kernel config?
>>>
>>>> Other people have
>>>> reported the bug at since 2007. It seems that only the 64-bit
>>>> version is affected and people have reported that enabling DMA in
>>>> BIOS settings solve the problem. However, my laptop lacks such an
>>>> option in the BIOS settings, so I am unable to test it. More
>>>> information about the bug could be found at:
>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1162073 and
>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356.
>>>>
>>>> Best Regards,
>>>> Ivan
>>>>
>>> ---
>>> [1] E.g. by
>>> while true
>>> do
>>> 	STAMP=`date +%s`
>>> 	cat /proc/meminfo > meminfo.$STAMP
>>> 	cat /proc/vmscan > meminfo.$STAMP
>> s/vmscan/vmstat
> Right. Sorry about the typo and thanks for pointing out Simon.
>
>>> 	sleep 2s
>>> done


[-- Attachment #1.2: Type: text/html, Size: 3575 bytes --]

[-- Attachment #2: bug.tar.gz --]
[-- Type: application/x-gzip, Size: 55705 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-04 14:10       ` Ivan Danov
@ 2013-04-04 15:16         ` Michal Hocko
  2013-04-05 10:13           ` Ivan Danov
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2013-04-04 15:16 UTC (permalink / raw)
  To: Ivan Danov; +Cc: Simon Jeons, linux-mm, 1162073

On Thu 04-04-13 16:10:06, Ivan Danov wrote:
> Hi Michal,
> 
> Yes, I use swap partition (2GB), but I have applied some things for
> keeping the life of the SSD hard drive longer. All the things I have
> done are under point 3. at
> http://www.rileybrandt.com/2012/11/18/linux-ultrabook/.

OK, I guess I know what's going on here.
So you did set vm.swappiness=0 which (for some time) means that there is
almost no swapping going on (although you have plenty of swap as you are
mentioning above).
This shouldn't be a big deal normally but you are also backing your
/tmp on tmpfs which is in-memory filesystem. This means that if you
are writing to /tmp a lot then this content will fill up your memory
which is not swapped out until the memory reclaim is getting into real
troubles - most of the page cache is dropped by that time so your system
starts trashing.

I would encourage you to set swappiness to a more reasonable value (I
would use the default value which is 60). I understand that you are
concerned about your SSD lifetime but your user experience sounds like a
bigger priority ;)

> By system freezes, I mean that the desktop environment doesn't react
> on my input. Just sometimes the mouse is reacting very very choppy
> and slowly, but most of the times it is not reacting at all. In the
> attached file, I have the output of the script and the content of
> dmesg for all levels from warn to emerg, as well as my kernel config.

I haven't checked your attached data but you should get an overview from
Shmem line from /proc/meminfo which tells you how much shmem/tmpfs
memory you are using and grep "^Swap" /proc/meminfo will tell you more
about your swap usage.

> 
> Best,
> Ivan

HTH
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-04 15:16         ` Michal Hocko
@ 2013-04-05 10:13           ` Ivan Danov
  2013-04-05 11:59             ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Ivan Danov @ 2013-04-05 10:13 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Simon Jeons, linux-mm, 1162073

Tried with vm.swappiness=60, but the only improvement is that now the 
mouse input is less choppy than before, but still the problem remains - 
the computer is not usable at all, one could not even stop the program, 
causing the problem.

Best,
Ivan
--
On 04/04/13 17:16, Michal Hocko wrote:
> On Thu 04-04-13 16:10:06, Ivan Danov wrote:
>> Hi Michal,
>>
>> Yes, I use swap partition (2GB), but I have applied some things for
>> keeping the life of the SSD hard drive longer. All the things I have
>> done are under point 3. at
>> http://www.rileybrandt.com/2012/11/18/linux-ultrabook/.
> OK, I guess I know what's going on here.
> So you did set vm.swappiness=0 which (for some time) means that there is
> almost no swapping going on (although you have plenty of swap as you are
> mentioning above).
> This shouldn't be a big deal normally but you are also backing your
> /tmp on tmpfs which is in-memory filesystem. This means that if you
> are writing to /tmp a lot then this content will fill up your memory
> which is not swapped out until the memory reclaim is getting into real
> troubles - most of the page cache is dropped by that time so your system
> starts trashing.
>
> I would encourage you to set swappiness to a more reasonable value (I
> would use the default value which is 60). I understand that you are
> concerned about your SSD lifetime but your user experience sounds like a
> bigger priority ;)
>
>> By system freezes, I mean that the desktop environment doesn't react
>> on my input. Just sometimes the mouse is reacting very very choppy
>> and slowly, but most of the times it is not reacting at all. In the
>> attached file, I have the output of the script and the content of
>> dmesg for all levels from warn to emerg, as well as my kernel config.
> I haven't checked your attached data but you should get an overview from
> Shmem line from /proc/meminfo which tells you how much shmem/tmpfs
> memory you are using and grep "^Swap" /proc/meminfo will tell you more
> about your swap usage.
>
>> Best,
>> Ivan
> HTH

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-05 10:13           ` Ivan Danov
@ 2013-04-05 11:59             ` Michal Hocko
  2013-04-05 20:41               ` Ivan Danov
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2013-04-05 11:59 UTC (permalink / raw)
  To: Ivan Danov; +Cc: Simon Jeons, linux-mm, 1162073

On Fri 05-04-13 12:13:11, Ivan Danov wrote:
> Tried with vm.swappiness=60, but the only improvement is that now
> the mouse input is less choppy than before, but still the problem
> remains - the computer is not usable at all, one could not even stop
> the program, causing the problem.

OK, could you collect /proc/vmstat and /proc/meminfo during that load?

> Best,
> Ivan
> --
> On 04/04/13 17:16, Michal Hocko wrote:
> >On Thu 04-04-13 16:10:06, Ivan Danov wrote:
> >>Hi Michal,
> >>
> >>Yes, I use swap partition (2GB), but I have applied some things for
> >>keeping the life of the SSD hard drive longer. All the things I have
> >>done are under point 3. at
> >>http://www.rileybrandt.com/2012/11/18/linux-ultrabook/.
> >OK, I guess I know what's going on here.
> >So you did set vm.swappiness=0 which (for some time) means that there is
> >almost no swapping going on (although you have plenty of swap as you are
> >mentioning above).
> >This shouldn't be a big deal normally but you are also backing your
> >/tmp on tmpfs which is in-memory filesystem. This means that if you
> >are writing to /tmp a lot then this content will fill up your memory
> >which is not swapped out until the memory reclaim is getting into real
> >troubles - most of the page cache is dropped by that time so your system
> >starts trashing.
> >
> >I would encourage you to set swappiness to a more reasonable value (I
> >would use the default value which is 60). I understand that you are
> >concerned about your SSD lifetime but your user experience sounds like a
> >bigger priority ;)
> >
> >>By system freezes, I mean that the desktop environment doesn't react
> >>on my input. Just sometimes the mouse is reacting very very choppy
> >>and slowly, but most of the times it is not reacting at all. In the
> >>attached file, I have the output of the script and the content of
> >>dmesg for all levels from warn to emerg, as well as my kernel config.
> >I haven't checked your attached data but you should get an overview from
> >Shmem line from /proc/meminfo which tells you how much shmem/tmpfs
> >memory you are using and grep "^Swap" /proc/meminfo will tell you more
> >about your swap usage.
> >
> >>Best,
> >>Ivan
> >HTH
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-05 11:59             ` Michal Hocko
@ 2013-04-05 20:41               ` Ivan Danov
  2013-04-12 10:20                 ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Ivan Danov @ 2013-04-05 20:41 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Simon Jeons, linux-mm, 1162073

[-- Attachment #1: Type: text/plain, Size: 2557 bytes --]

Here you can find attached the script, collecting the logs and the logs 
themselves during the described process of freezing. It appeared that 
the previous logs are corrupted, because both /proc/vmstat and 
/proc/meminfo have been logging to the same file.
--
On 05/04/13 13:59, Michal Hocko wrote:
> On Fri 05-04-13 12:13:11, Ivan Danov wrote:
>> Tried with vm.swappiness=60, but the only improvement is that now
>> the mouse input is less choppy than before, but still the problem
>> remains - the computer is not usable at all, one could not even stop
>> the program, causing the problem.
> OK, could you collect /proc/vmstat and /proc/meminfo during that load?
>
>> Best,
>> Ivan
>> --
>> On 04/04/13 17:16, Michal Hocko wrote:
>>> On Thu 04-04-13 16:10:06, Ivan Danov wrote:
>>>> Hi Michal,
>>>>
>>>> Yes, I use swap partition (2GB), but I have applied some things for
>>>> keeping the life of the SSD hard drive longer. All the things I have
>>>> done are under point 3. at
>>>> http://www.rileybrandt.com/2012/11/18/linux-ultrabook/.
>>> OK, I guess I know what's going on here.
>>> So you did set vm.swappiness=0 which (for some time) means that there is
>>> almost no swapping going on (although you have plenty of swap as you are
>>> mentioning above).
>>> This shouldn't be a big deal normally but you are also backing your
>>> /tmp on tmpfs which is in-memory filesystem. This means that if you
>>> are writing to /tmp a lot then this content will fill up your memory
>>> which is not swapped out until the memory reclaim is getting into real
>>> troubles - most of the page cache is dropped by that time so your system
>>> starts trashing.
>>>
>>> I would encourage you to set swappiness to a more reasonable value (I
>>> would use the default value which is 60). I understand that you are
>>> concerned about your SSD lifetime but your user experience sounds like a
>>> bigger priority ;)
>>>
>>>> By system freezes, I mean that the desktop environment doesn't react
>>>> on my input. Just sometimes the mouse is reacting very very choppy
>>>> and slowly, but most of the times it is not reacting at all. In the
>>>> attached file, I have the output of the script and the content of
>>>> dmesg for all levels from warn to emerg, as well as my kernel config.
>>> I haven't checked your attached data but you should get an overview from
>>> Shmem line from /proc/meminfo which tells you how much shmem/tmpfs
>>> memory you are using and grep "^Swap" /proc/meminfo will tell you more
>>> about your swap usage.
>>>
>>>> Best,
>>>> Ivan
>>> HTH


[-- Attachment #2: bug-with-swappiness.tar.gz --]
[-- Type: application/x-gzip, Size: 12037 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-05 20:41               ` Ivan Danov
@ 2013-04-12 10:20                 ` Michal Hocko
  2013-04-12 10:49                   ` Ivan Danov
  2013-04-15  5:03                   ` Simon Jeons
  0 siblings, 2 replies; 17+ messages in thread
From: Michal Hocko @ 2013-04-12 10:20 UTC (permalink / raw)
  To: Ivan Danov; +Cc: Simon Jeons, linux-mm, 1162073, Mel Gorman, Johannes Weiner

[CCing Mel and Johannes]
On Fri 05-04-13 22:41:37, Ivan Danov wrote:
> Here you can find attached the script, collecting the logs and the
> logs themselves during the described process of freezing. It
> appeared that the previous logs are corrupted, because both
> /proc/vmstat and /proc/meminfo have been logging to the same file.

Sorry for the late reply:
$ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}'
min:165256 max:3254516 avg:1642475.35

So the free memory dropped down to 165M at minimum. This doesn't sound
terribly low and the average free memory was even above 1.5G. But maybe
the memory consumption peak was very short between 2 measured moments.

The peak seems to be around this time:
meminfo.1365194083:MemFree:          650792 kB
meminfo.1365194085:MemFree:          664920 kB
meminfo.1365194087:MemFree:          165256 kB  <<<
meminfo.1365194089:MemFree:          822968 kB
meminfo.1365194094:MemFree:          666940 kB

Let's have a look at the memory reclaim activity
vmstat.1365194085:pgscan_kswapd_dma32 760
vmstat.1365194085:pgscan_kswapd_normal 10444

vmstat.1365194087:pgscan_kswapd_dma32 760
vmstat.1365194087:pgscan_kswapd_normal 10444

vmstat.1365194089:pgscan_kswapd_dma32 5855
vmstat.1365194089:pgscan_kswapd_normal 80621

vmstat.1365194094:pgscan_kswapd_dma32 54333
vmstat.1365194094:pgscan_kswapd_normal 285562

[...]
vmstat.1365194098:pgscan_kswapd_dma32 54333
vmstat.1365194098:pgscan_kswapd_normal 285562

vmstat.1365194100:pgscan_kswapd_dma32 55760
vmstat.1365194100:pgscan_kswapd_normal 289493

vmstat.1365194102:pgscan_kswapd_dma32 55760
vmstat.1365194102:pgscan_kswapd_normal 289493

So the background reclaim was active only twice for a short amount of
time:
- 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone
- 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone

The second one looks sane so we can ignore it for now but the first one
scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd
had hard time to find something to reclaim or it couldn't cope with the
ongoing memory pressure.

vmstat.1365194087:pgsteal_kswapd_dma32 373
vmstat.1365194087:pgsteal_kswapd_normal 9057

vmstat.1365194089:pgsteal_kswapd_dma32 3249
vmstat.1365194089:pgsteal_kswapd_normal 56756

vmstat.1365194094:pgsteal_kswapd_dma32 14731
vmstat.1365194094:pgsteal_kswapd_normal 221733

...087-...089
	- dma32 scanned 5095, reclaimed 0 
	- normal scanned 70177, reclaimed 0
...089-...094
	-dma32 scanned 48478, reclaimed 2876
	- normal scanned 204941, reclaimed 164977

This shows that kswapd was not able to reclaim any page at first and
then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in
dma32 and 80% for normal) although normal zone seems to be doing much
better.

The direct reclaim was active during that time as well:
vmstat.1365194089:pgscan_direct_dma32 0
vmstat.1365194089:pgscan_direct_normal 0

vmstat.1365194094:pgscan_direct_dma32 29339
vmstat.1365194094:pgscan_direct_normal 86869

which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed:

vmstat.1365194089:pgsteal_direct_dma32 0
vmstat.1365194089:pgsteal_direct_normal 0

vmstat.1365194094:pgsteal_direct_dma32 6137
vmstat.1365194094:pgsteal_direct_normal 57677

225M in the normal zone but it was still not effective very much (~20%
for dma32 and 66% for normal).

vmstat.1365194087:nr_written 9013
vmstat.1365194089:nr_written 9013
vmstat.1365194094:nr_written 15387

Only around 24M have been written out during the massive scanning.

So we have two problems here I guess. First is that there is not much
reclaimable memory when the peak consumption starts and then we have
hard times to balance dma32 zone.

vmstat.1365194087:nr_shmem 103548
vmstat.1365194089:nr_shmem 102227
vmstat.1365194094:nr_shmem 100679

This tells us that you didn't have that many shmem pages allocated at
the time (only 404M). So the /tmp backed by tmpfs shouldn't be the
primary issue here.

We still have a lot of anonymous memory though:
vmstat.1365194087:nr_anon_pages 1430922
vmstat.1365194089:nr_anon_pages 1317009
vmstat.1365194094:nr_anon_pages 1540460

which is around 5.5G. It is interesting that the number of these pages
even drops first and then starts growing again (between 089..094 by 870M
while we reclaimed around the same amount). This would suggest that the
load started trashing on swap but:

meminfo.1365194087:SwapFree:        1999868 kB
meminfo.1365194089:SwapFree:        1999808 kB
meminfo.1365194094:SwapFree:        1784544 kB

tells us that we swapped out only 210M after 1365194089. So we had to
reclaim a lot of page cache during that time while the anonymous memory
pressure was really high.

vmstat.1365194087:nr_file_pages 428632
vmstat.1365194089:nr_file_pages 378132
vmstat.1365194094:nr_file_pages 192009

the page cache pages dropped down by 920M which covers the anon increase.

This all suggests that the workload is simply too aggressive and the
memory reclaim doesn't cope with it.

Let's check the active and inactive lists (maybe we are not aging pages properly):
meminfo.1365194087:Active(anon):    5613412 kB
meminfo.1365194087:Active(file):     261180 kB

meminfo.1365194089:Active(anon):    4794472 kB
meminfo.1365194089:Active(file):     348396 kB

meminfo.1365194094:Active(anon):    5424684 kB
meminfo.1365194094:Active(file):      77364 kB

meminfo.1365194087:Inactive(anon):   496092 kB
meminfo.1365194087:Inactive(file):  1033184 kB

meminfo.1365194089:Inactive(anon):   853608 kB
meminfo.1365194089:Inactive(file):   749564 kB

meminfo.1365194094:Inactive(anon):  1313648 kB
meminfo.1365194094:Inactive(file):    82008 kB

While the file LRUs looks good active Anon LRUs seem to be too big. This
either suggests a bug in aging or the working set is really that big.
Considering the previous data (increase during the memory pressure) I
would be inclined to the second option.

Just out of curiosity what is the vm_swappiness setting? You've said
that you have changed that from 0 but it seems like we would swap much
more. It almost looks like the swappiness is 0. Could you double check?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-12 10:20                 ` Michal Hocko
@ 2013-04-12 10:49                   ` Ivan Danov
  2013-04-12 11:11                     ` Michal Hocko
  2013-04-15  5:03                   ` Simon Jeons
  1 sibling, 1 reply; 17+ messages in thread
From: Ivan Danov @ 2013-04-12 10:49 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Simon Jeons, linux-mm, 1162073, Mel Gorman, Johannes Weiner

[-- Attachment #1: Type: text/plain, Size: 7200 bytes --]

$ cat /proc/sys/vm/swappiness
60

I have increased my swap partition from nearly 2GB to around 16GB, but 
the problem remains. Here I attach the logs for the larger swap 
partition. I use a MATLAB script to simulate the problem, but it also 
works in Octave:
X = ones(100000,10000);

I have tried to simulate the problem on a desktop installation with 4GB 
of RAM, 10GB of swap partition, installed Ubuntu Lucid and then upgraded 
to 12.04, the problem isn't there, but the input is still quite choppy 
during the load. After the script finishes, everything looks fine. For 
the desktop installation the hard drive is not an SSD hard drive.

On 12/04/13 12:20, Michal Hocko wrote:
> [CCing Mel and Johannes]
> On Fri 05-04-13 22:41:37, Ivan Danov wrote:
>> Here you can find attached the script, collecting the logs and the
>> logs themselves during the described process of freezing. It
>> appeared that the previous logs are corrupted, because both
>> /proc/vmstat and /proc/meminfo have been logging to the same file.
> Sorry for the late reply:
> $ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}'
> min:165256 max:3254516 avg:1642475.35
>
> So the free memory dropped down to 165M at minimum. This doesn't sound
> terribly low and the average free memory was even above 1.5G. But maybe
> the memory consumption peak was very short between 2 measured moments.
>
> The peak seems to be around this time:
> meminfo.1365194083:MemFree:          650792 kB
> meminfo.1365194085:MemFree:          664920 kB
> meminfo.1365194087:MemFree:          165256 kB  <<<
> meminfo.1365194089:MemFree:          822968 kB
> meminfo.1365194094:MemFree:          666940 kB
>
> Let's have a look at the memory reclaim activity
> vmstat.1365194085:pgscan_kswapd_dma32 760
> vmstat.1365194085:pgscan_kswapd_normal 10444
>
> vmstat.1365194087:pgscan_kswapd_dma32 760
> vmstat.1365194087:pgscan_kswapd_normal 10444
>
> vmstat.1365194089:pgscan_kswapd_dma32 5855
> vmstat.1365194089:pgscan_kswapd_normal 80621
>
> vmstat.1365194094:pgscan_kswapd_dma32 54333
> vmstat.1365194094:pgscan_kswapd_normal 285562
>
> [...]
> vmstat.1365194098:pgscan_kswapd_dma32 54333
> vmstat.1365194098:pgscan_kswapd_normal 285562
>
> vmstat.1365194100:pgscan_kswapd_dma32 55760
> vmstat.1365194100:pgscan_kswapd_normal 289493
>
> vmstat.1365194102:pgscan_kswapd_dma32 55760
> vmstat.1365194102:pgscan_kswapd_normal 289493
>
> So the background reclaim was active only twice for a short amount of
> time:
> - 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone
> - 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone
>
> The second one looks sane so we can ignore it for now but the first one
> scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd
> had hard time to find something to reclaim or it couldn't cope with the
> ongoing memory pressure.
>
> vmstat.1365194087:pgsteal_kswapd_dma32 373
> vmstat.1365194087:pgsteal_kswapd_normal 9057
>
> vmstat.1365194089:pgsteal_kswapd_dma32 3249
> vmstat.1365194089:pgsteal_kswapd_normal 56756
>
> vmstat.1365194094:pgsteal_kswapd_dma32 14731
> vmstat.1365194094:pgsteal_kswapd_normal 221733
>
> ...087-...089
> 	- dma32 scanned 5095, reclaimed 0
> 	- normal scanned 70177, reclaimed 0
> ...089-...094
> 	-dma32 scanned 48478, reclaimed 2876
> 	- normal scanned 204941, reclaimed 164977
>
> This shows that kswapd was not able to reclaim any page at first and
> then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in
> dma32 and 80% for normal) although normal zone seems to be doing much
> better.
>
> The direct reclaim was active during that time as well:
> vmstat.1365194089:pgscan_direct_dma32 0
> vmstat.1365194089:pgscan_direct_normal 0
>
> vmstat.1365194094:pgscan_direct_dma32 29339
> vmstat.1365194094:pgscan_direct_normal 86869
>
> which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed:
>
> vmstat.1365194089:pgsteal_direct_dma32 0
> vmstat.1365194089:pgsteal_direct_normal 0
>
> vmstat.1365194094:pgsteal_direct_dma32 6137
> vmstat.1365194094:pgsteal_direct_normal 57677
>
> 225M in the normal zone but it was still not effective very much (~20%
> for dma32 and 66% for normal).
>
> vmstat.1365194087:nr_written 9013
> vmstat.1365194089:nr_written 9013
> vmstat.1365194094:nr_written 15387
>
> Only around 24M have been written out during the massive scanning.
>
> So we have two problems here I guess. First is that there is not much
> reclaimable memory when the peak consumption starts and then we have
> hard times to balance dma32 zone.
>
> vmstat.1365194087:nr_shmem 103548
> vmstat.1365194089:nr_shmem 102227
> vmstat.1365194094:nr_shmem 100679
>
> This tells us that you didn't have that many shmem pages allocated at
> the time (only 404M). So the /tmp backed by tmpfs shouldn't be the
> primary issue here.
>
> We still have a lot of anonymous memory though:
> vmstat.1365194087:nr_anon_pages 1430922
> vmstat.1365194089:nr_anon_pages 1317009
> vmstat.1365194094:nr_anon_pages 1540460
>
> which is around 5.5G. It is interesting that the number of these pages
> even drops first and then starts growing again (between 089..094 by 870M
> while we reclaimed around the same amount). This would suggest that the
> load started trashing on swap but:
>
> meminfo.1365194087:SwapFree:        1999868 kB
> meminfo.1365194089:SwapFree:        1999808 kB
> meminfo.1365194094:SwapFree:        1784544 kB
>
> tells us that we swapped out only 210M after 1365194089. So we had to
> reclaim a lot of page cache during that time while the anonymous memory
> pressure was really high.
>
> vmstat.1365194087:nr_file_pages 428632
> vmstat.1365194089:nr_file_pages 378132
> vmstat.1365194094:nr_file_pages 192009
>
> the page cache pages dropped down by 920M which covers the anon increase.
>
> This all suggests that the workload is simply too aggressive and the
> memory reclaim doesn't cope with it.
>
> Let's check the active and inactive lists (maybe we are not aging pages properly):
> meminfo.1365194087:Active(anon):    5613412 kB
> meminfo.1365194087:Active(file):     261180 kB
>
> meminfo.1365194089:Active(anon):    4794472 kB
> meminfo.1365194089:Active(file):     348396 kB
>
> meminfo.1365194094:Active(anon):    5424684 kB
> meminfo.1365194094:Active(file):      77364 kB
>
> meminfo.1365194087:Inactive(anon):   496092 kB
> meminfo.1365194087:Inactive(file):  1033184 kB
>
> meminfo.1365194089:Inactive(anon):   853608 kB
> meminfo.1365194089:Inactive(file):   749564 kB
>
> meminfo.1365194094:Inactive(anon):  1313648 kB
> meminfo.1365194094:Inactive(file):    82008 kB
>
> While the file LRUs looks good active Anon LRUs seem to be too big. This
> either suggests a bug in aging or the working set is really that big.
> Considering the previous data (increase during the memory pressure) I
> would be inclined to the second option.
>
> Just out of curiosity what is the vm_swappiness setting? You've said
> that you have changed that from 0 but it seems like we would swap much
> more. It almost looks like the swappiness is 0. Could you double check?


[-- Attachment #2: bug.tar.gz --]
[-- Type: application/x-gzip, Size: 17647 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-12 10:49                   ` Ivan Danov
@ 2013-04-12 11:11                     ` Michal Hocko
  2013-04-12 12:38                       ` Ivan Danov
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2013-04-12 11:11 UTC (permalink / raw)
  To: Ivan Danov; +Cc: Simon Jeons, linux-mm, 1162073, Mel Gorman, Johannes Weiner

On Fri 12-04-13 12:49:30, Ivan Danov wrote:
> $ cat /proc/sys/vm/swappiness
> 60

OK, thanks for confirming this. It is really strange that we do not swap
almost at all, then.
 
> I have increased my swap partition from nearly 2GB to around 16GB,
> but the problem remains.

Increasing the swap partition will not help much as it almost unused
with 2G already (at least last data shown that).

> Here I attach the logs for the larger swap partition. I use a MATLAB
> script to simulate the problem, but it also works in Octave:
> X = ones(100000,10000);

AFAIU this will create a matrix with 10^9 elements and initialize them
to 1. I am not familiar with octave but do you happen to know what is
the data type used for the element? 8B? It would be also interesting to
know how is the matrix organized and initialized. Does it fit into
memory at all?

> I have tried to simulate the problem on a desktop installation with
> 4GB of RAM, 10GB of swap partition, installed Ubuntu Lucid and then
> upgraded to 12.04, the problem isn't there, but the input is still
> quite choppy during the load. After the script finishes, everything
> looks fine. For the desktop installation the hard drive is not an
> SSD hard drive.

What is the kernel version used here?

> On 12/04/13 12:20, Michal Hocko wrote:
> >[CCing Mel and Johannes]
> >On Fri 05-04-13 22:41:37, Ivan Danov wrote:
> >>Here you can find attached the script, collecting the logs and the
> >>logs themselves during the described process of freezing. It
> >>appeared that the previous logs are corrupted, because both
> >>/proc/vmstat and /proc/meminfo have been logging to the same file.
> >Sorry for the late reply:
> >$ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}'
> >min:165256 max:3254516 avg:1642475.35
> >
> >So the free memory dropped down to 165M at minimum. This doesn't sound
> >terribly low and the average free memory was even above 1.5G. But maybe
> >the memory consumption peak was very short between 2 measured moments.
> >
> >The peak seems to be around this time:
> >meminfo.1365194083:MemFree:          650792 kB
> >meminfo.1365194085:MemFree:          664920 kB
> >meminfo.1365194087:MemFree:          165256 kB  <<<
> >meminfo.1365194089:MemFree:          822968 kB
> >meminfo.1365194094:MemFree:          666940 kB
> >
> >Let's have a look at the memory reclaim activity
> >vmstat.1365194085:pgscan_kswapd_dma32 760
> >vmstat.1365194085:pgscan_kswapd_normal 10444
> >
> >vmstat.1365194087:pgscan_kswapd_dma32 760
> >vmstat.1365194087:pgscan_kswapd_normal 10444
> >
> >vmstat.1365194089:pgscan_kswapd_dma32 5855
> >vmstat.1365194089:pgscan_kswapd_normal 80621
> >
> >vmstat.1365194094:pgscan_kswapd_dma32 54333
> >vmstat.1365194094:pgscan_kswapd_normal 285562
> >
> >[...]
> >vmstat.1365194098:pgscan_kswapd_dma32 54333
> >vmstat.1365194098:pgscan_kswapd_normal 285562
> >
> >vmstat.1365194100:pgscan_kswapd_dma32 55760
> >vmstat.1365194100:pgscan_kswapd_normal 289493
> >
> >vmstat.1365194102:pgscan_kswapd_dma32 55760
> >vmstat.1365194102:pgscan_kswapd_normal 289493
> >
> >So the background reclaim was active only twice for a short amount of
> >time:
> >- 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone
> >- 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone
> >
> >The second one looks sane so we can ignore it for now but the first one
> >scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd
> >had hard time to find something to reclaim or it couldn't cope with the
> >ongoing memory pressure.
> >
> >vmstat.1365194087:pgsteal_kswapd_dma32 373
> >vmstat.1365194087:pgsteal_kswapd_normal 9057
> >
> >vmstat.1365194089:pgsteal_kswapd_dma32 3249
> >vmstat.1365194089:pgsteal_kswapd_normal 56756
> >
> >vmstat.1365194094:pgsteal_kswapd_dma32 14731
> >vmstat.1365194094:pgsteal_kswapd_normal 221733
> >
> >...087-...089
> >	- dma32 scanned 5095, reclaimed 0
> >	- normal scanned 70177, reclaimed 0
> >...089-...094
> >	-dma32 scanned 48478, reclaimed 2876
> >	- normal scanned 204941, reclaimed 164977
> >
> >This shows that kswapd was not able to reclaim any page at first and
> >then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in
> >dma32 and 80% for normal) although normal zone seems to be doing much
> >better.
> >
> >The direct reclaim was active during that time as well:
> >vmstat.1365194089:pgscan_direct_dma32 0
> >vmstat.1365194089:pgscan_direct_normal 0
> >
> >vmstat.1365194094:pgscan_direct_dma32 29339
> >vmstat.1365194094:pgscan_direct_normal 86869
> >
> >which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed:
> >
> >vmstat.1365194089:pgsteal_direct_dma32 0
> >vmstat.1365194089:pgsteal_direct_normal 0
> >
> >vmstat.1365194094:pgsteal_direct_dma32 6137
> >vmstat.1365194094:pgsteal_direct_normal 57677
> >
> >225M in the normal zone but it was still not effective very much (~20%
> >for dma32 and 66% for normal).
> >
> >vmstat.1365194087:nr_written 9013
> >vmstat.1365194089:nr_written 9013
> >vmstat.1365194094:nr_written 15387
> >
> >Only around 24M have been written out during the massive scanning.
> >
> >So we have two problems here I guess. First is that there is not much
> >reclaimable memory when the peak consumption starts and then we have
> >hard times to balance dma32 zone.
> >
> >vmstat.1365194087:nr_shmem 103548
> >vmstat.1365194089:nr_shmem 102227
> >vmstat.1365194094:nr_shmem 100679
> >
> >This tells us that you didn't have that many shmem pages allocated at
> >the time (only 404M). So the /tmp backed by tmpfs shouldn't be the
> >primary issue here.
> >
> >We still have a lot of anonymous memory though:
> >vmstat.1365194087:nr_anon_pages 1430922
> >vmstat.1365194089:nr_anon_pages 1317009
> >vmstat.1365194094:nr_anon_pages 1540460
> >
> >which is around 5.5G. It is interesting that the number of these pages
> >even drops first and then starts growing again (between 089..094 by 870M
> >while we reclaimed around the same amount). This would suggest that the
> >load started trashing on swap but:
> >
> >meminfo.1365194087:SwapFree:        1999868 kB
> >meminfo.1365194089:SwapFree:        1999808 kB
> >meminfo.1365194094:SwapFree:        1784544 kB
> >
> >tells us that we swapped out only 210M after 1365194089. So we had to
> >reclaim a lot of page cache during that time while the anonymous memory
> >pressure was really high.
> >
> >vmstat.1365194087:nr_file_pages 428632
> >vmstat.1365194089:nr_file_pages 378132
> >vmstat.1365194094:nr_file_pages 192009
> >
> >the page cache pages dropped down by 920M which covers the anon increase.
> >
> >This all suggests that the workload is simply too aggressive and the
> >memory reclaim doesn't cope with it.
> >
> >Let's check the active and inactive lists (maybe we are not aging pages properly):
> >meminfo.1365194087:Active(anon):    5613412 kB
> >meminfo.1365194087:Active(file):     261180 kB
> >
> >meminfo.1365194089:Active(anon):    4794472 kB
> >meminfo.1365194089:Active(file):     348396 kB
> >
> >meminfo.1365194094:Active(anon):    5424684 kB
> >meminfo.1365194094:Active(file):      77364 kB
> >
> >meminfo.1365194087:Inactive(anon):   496092 kB
> >meminfo.1365194087:Inactive(file):  1033184 kB
> >
> >meminfo.1365194089:Inactive(anon):   853608 kB
> >meminfo.1365194089:Inactive(file):   749564 kB
> >
> >meminfo.1365194094:Inactive(anon):  1313648 kB
> >meminfo.1365194094:Inactive(file):    82008 kB
> >
> >While the file LRUs looks good active Anon LRUs seem to be too big. This
> >either suggests a bug in aging or the working set is really that big.
> >Considering the previous data (increase during the memory pressure) I
> >would be inclined to the second option.
> >
> >Just out of curiosity what is the vm_swappiness setting? You've said
> >that you have changed that from 0 but it seems like we would swap much
> >more. It almost looks like the swappiness is 0. Could you double check?
> 



-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-12 11:11                     ` Michal Hocko
@ 2013-04-12 12:38                       ` Ivan Danov
  2013-04-14 14:58                         ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Ivan Danov @ 2013-04-12 12:38 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Simon Jeons, linux-mm, 1162073, Mel Gorman, Johannes Weiner

On 12/04/13 13:11, Michal Hocko wrote:
> On Fri 12-04-13 12:49:30, Ivan Danov wrote:
>> $ cat /proc/sys/vm/swappiness
>> 60
> OK, thanks for confirming this. It is really strange that we do not swap
> almost at all, then.
>   
>> I have increased my swap partition from nearly 2GB to around 16GB,
>> but the problem remains.
> Increasing the swap partition will not help much as it almost unused
> with 2G already (at least last data shown that).
>
>> Here I attach the logs for the larger swap partition. I use a MATLAB
>> script to simulate the problem, but it also works in Octave:
>> X = ones(100000,10000);
> AFAIU this will create a matrix with 10^9 elements and initialize them
> to 1. I am not familiar with octave but do you happen to know what is
> the data type used for the element? 8B? It would be also interesting to
> know how is the matrix organized and initialized. Does it fit into
> memory at all?
Yes, 8B each, so it will be almost 8GB and it should fit into the 
memory. I don't know details how it actually works, but if it cannot 
create the matrix, MATLAB complains about that. Since it starts 
complaining even after 2000 more in the second dimension, maybe it needs 
the RAM to create it all. However on the desktop machine, both RAM and 
swap are being used (quite a lot of them both).
>
>> I have tried to simulate the problem on a desktop installation with
>> 4GB of RAM, 10GB of swap partition, installed Ubuntu Lucid and then
>> upgraded to 12.04, the problem isn't there, but the input is still
>> quite choppy during the load. After the script finishes, everything
>> looks fine. For the desktop installation the hard drive is not an
>> SSD hard drive.
> What is the kernel version used here?
$ uname -a
Linux ivan 3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC 2013 
x86_64 x86_64 x86_64 GNU/Linux
>
>> On 12/04/13 12:20, Michal Hocko wrote:
>>> [CCing Mel and Johannes]
>>> On Fri 05-04-13 22:41:37, Ivan Danov wrote:
>>>> Here you can find attached the script, collecting the logs and the
>>>> logs themselves during the described process of freezing. It
>>>> appeared that the previous logs are corrupted, because both
>>>> /proc/vmstat and /proc/meminfo have been logging to the same file.
>>> Sorry for the late reply:
>>> $ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}'
>>> min:165256 max:3254516 avg:1642475.35
>>>
>>> So the free memory dropped down to 165M at minimum. This doesn't sound
>>> terribly low and the average free memory was even above 1.5G. But maybe
>>> the memory consumption peak was very short between 2 measured moments.
>>>
>>> The peak seems to be around this time:
>>> meminfo.1365194083:MemFree:          650792 kB
>>> meminfo.1365194085:MemFree:          664920 kB
>>> meminfo.1365194087:MemFree:          165256 kB  <<<
>>> meminfo.1365194089:MemFree:          822968 kB
>>> meminfo.1365194094:MemFree:          666940 kB
>>>
>>> Let's have a look at the memory reclaim activity
>>> vmstat.1365194085:pgscan_kswapd_dma32 760
>>> vmstat.1365194085:pgscan_kswapd_normal 10444
>>>
>>> vmstat.1365194087:pgscan_kswapd_dma32 760
>>> vmstat.1365194087:pgscan_kswapd_normal 10444
>>>
>>> vmstat.1365194089:pgscan_kswapd_dma32 5855
>>> vmstat.1365194089:pgscan_kswapd_normal 80621
>>>
>>> vmstat.1365194094:pgscan_kswapd_dma32 54333
>>> vmstat.1365194094:pgscan_kswapd_normal 285562
>>>
>>> [...]
>>> vmstat.1365194098:pgscan_kswapd_dma32 54333
>>> vmstat.1365194098:pgscan_kswapd_normal 285562
>>>
>>> vmstat.1365194100:pgscan_kswapd_dma32 55760
>>> vmstat.1365194100:pgscan_kswapd_normal 289493
>>>
>>> vmstat.1365194102:pgscan_kswapd_dma32 55760
>>> vmstat.1365194102:pgscan_kswapd_normal 289493
>>>
>>> So the background reclaim was active only twice for a short amount of
>>> time:
>>> - 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone
>>> - 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone
>>>
>>> The second one looks sane so we can ignore it for now but the first one
>>> scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd
>>> had hard time to find something to reclaim or it couldn't cope with the
>>> ongoing memory pressure.
>>>
>>> vmstat.1365194087:pgsteal_kswapd_dma32 373
>>> vmstat.1365194087:pgsteal_kswapd_normal 9057
>>>
>>> vmstat.1365194089:pgsteal_kswapd_dma32 3249
>>> vmstat.1365194089:pgsteal_kswapd_normal 56756
>>>
>>> vmstat.1365194094:pgsteal_kswapd_dma32 14731
>>> vmstat.1365194094:pgsteal_kswapd_normal 221733
>>>
>>> ...087-...089
>>> 	- dma32 scanned 5095, reclaimed 0
>>> 	- normal scanned 70177, reclaimed 0
>>> ...089-...094
>>> 	-dma32 scanned 48478, reclaimed 2876
>>> 	- normal scanned 204941, reclaimed 164977
>>>
>>> This shows that kswapd was not able to reclaim any page at first and
>>> then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in
>>> dma32 and 80% for normal) although normal zone seems to be doing much
>>> better.
>>>
>>> The direct reclaim was active during that time as well:
>>> vmstat.1365194089:pgscan_direct_dma32 0
>>> vmstat.1365194089:pgscan_direct_normal 0
>>>
>>> vmstat.1365194094:pgscan_direct_dma32 29339
>>> vmstat.1365194094:pgscan_direct_normal 86869
>>>
>>> which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed:
>>>
>>> vmstat.1365194089:pgsteal_direct_dma32 0
>>> vmstat.1365194089:pgsteal_direct_normal 0
>>>
>>> vmstat.1365194094:pgsteal_direct_dma32 6137
>>> vmstat.1365194094:pgsteal_direct_normal 57677
>>>
>>> 225M in the normal zone but it was still not effective very much (~20%
>>> for dma32 and 66% for normal).
>>>
>>> vmstat.1365194087:nr_written 9013
>>> vmstat.1365194089:nr_written 9013
>>> vmstat.1365194094:nr_written 15387
>>>
>>> Only around 24M have been written out during the massive scanning.
>>>
>>> So we have two problems here I guess. First is that there is not much
>>> reclaimable memory when the peak consumption starts and then we have
>>> hard times to balance dma32 zone.
>>>
>>> vmstat.1365194087:nr_shmem 103548
>>> vmstat.1365194089:nr_shmem 102227
>>> vmstat.1365194094:nr_shmem 100679
>>>
>>> This tells us that you didn't have that many shmem pages allocated at
>>> the time (only 404M). So the /tmp backed by tmpfs shouldn't be the
>>> primary issue here.
>>>
>>> We still have a lot of anonymous memory though:
>>> vmstat.1365194087:nr_anon_pages 1430922
>>> vmstat.1365194089:nr_anon_pages 1317009
>>> vmstat.1365194094:nr_anon_pages 1540460
>>>
>>> which is around 5.5G. It is interesting that the number of these pages
>>> even drops first and then starts growing again (between 089..094 by 870M
>>> while we reclaimed around the same amount). This would suggest that the
>>> load started trashing on swap but:
>>>
>>> meminfo.1365194087:SwapFree:        1999868 kB
>>> meminfo.1365194089:SwapFree:        1999808 kB
>>> meminfo.1365194094:SwapFree:        1784544 kB
>>>
>>> tells us that we swapped out only 210M after 1365194089. So we had to
>>> reclaim a lot of page cache during that time while the anonymous memory
>>> pressure was really high.
>>>
>>> vmstat.1365194087:nr_file_pages 428632
>>> vmstat.1365194089:nr_file_pages 378132
>>> vmstat.1365194094:nr_file_pages 192009
>>>
>>> the page cache pages dropped down by 920M which covers the anon increase.
>>>
>>> This all suggests that the workload is simply too aggressive and the
>>> memory reclaim doesn't cope with it.
>>>
>>> Let's check the active and inactive lists (maybe we are not aging pages properly):
>>> meminfo.1365194087:Active(anon):    5613412 kB
>>> meminfo.1365194087:Active(file):     261180 kB
>>>
>>> meminfo.1365194089:Active(anon):    4794472 kB
>>> meminfo.1365194089:Active(file):     348396 kB
>>>
>>> meminfo.1365194094:Active(anon):    5424684 kB
>>> meminfo.1365194094:Active(file):      77364 kB
>>>
>>> meminfo.1365194087:Inactive(anon):   496092 kB
>>> meminfo.1365194087:Inactive(file):  1033184 kB
>>>
>>> meminfo.1365194089:Inactive(anon):   853608 kB
>>> meminfo.1365194089:Inactive(file):   749564 kB
>>>
>>> meminfo.1365194094:Inactive(anon):  1313648 kB
>>> meminfo.1365194094:Inactive(file):    82008 kB
>>>
>>> While the file LRUs looks good active Anon LRUs seem to be too big. This
>>> either suggests a bug in aging or the working set is really that big.
>>> Considering the previous data (increase during the memory pressure) I
>>> would be inclined to the second option.
>>>
>>> Just out of curiosity what is the vm_swappiness setting? You've said
>>> that you have changed that from 0 but it seems like we would swap much
>>> more. It almost looks like the swappiness is 0. Could you double check?
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-12 12:38                       ` Ivan Danov
@ 2013-04-14 14:58                         ` Michal Hocko
  2013-05-14 15:25                           ` Ivan Danov
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Hocko @ 2013-04-14 14:58 UTC (permalink / raw)
  To: Ivan Danov; +Cc: Simon Jeons, linux-mm, 1162073, Mel Gorman, Johannes Weiner

On Fri 12-04-13 14:38:00, Ivan Danov wrote:
> On 12/04/13 13:11, Michal Hocko wrote:
> >On Fri 12-04-13 12:49:30, Ivan Danov wrote:
> >>$ cat /proc/sys/vm/swappiness
> >>60
> >OK, thanks for confirming this. It is really strange that we do not swap
> >almost at all, then.
> >>I have increased my swap partition from nearly 2GB to around 16GB,
> >>but the problem remains.
> >Increasing the swap partition will not help much as it almost unused
> >with 2G already (at least last data shown that).
> >
> >>Here I attach the logs for the larger swap partition. I use a MATLAB
> >>script to simulate the problem, but it also works in Octave:
> >>X = ones(100000,10000);
> >AFAIU this will create a matrix with 10^9 elements and initialize them
> >to 1. I am not familiar with octave but do you happen to know what is
> >the data type used for the element? 8B? It would be also interesting to
> >know how is the matrix organized and initialized. Does it fit into
> >memory at all?
> Yes, 8B each, so it will be almost 8GB and it should fit into the
> memory.

It won't fit in because kernel and other processes consume some memory
as well. So you have to swap.

> I don't know details how it actually works, but if it cannot
> create the matrix, MATLAB complains about that. Since it starts
> complaining even after 2000 more in the second dimension, maybe it
> needs the RAM to create it all. However on the desktop machine, both
> RAM and swap are being used (quite a lot of them both).

How much you swap depends on vm.swappiness. I would suggest increasing
the value if your workload is really so anononymous memory based.
Otherwise a lot of file pages are reclaimed which can lead to problems
you are seeing.

> >>I have tried to simulate the problem on a desktop installation with
> >>4GB of RAM, 10GB of swap partition, installed Ubuntu Lucid and then
> >>upgraded to 12.04, the problem isn't there, but the input is still
> >>quite choppy during the load. After the script finishes, everything
> >>looks fine. For the desktop installation the hard drive is not an
> >>SSD hard drive.
> >What is the kernel version used here?
> $ uname -a
> Linux ivan 3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC
> 2013 x86_64 x86_64 x86_64 GNU/Linux

Is there any chance you could test with the latest vanilla kernel and 
Mel's patches from https://lkml.org/lkml/2013/4/11/516 on top?

[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-14 14:58                         ` Michal Hocko
@ 2013-05-14 15:25                           ` Ivan Danov
  0 siblings, 0 replies; 17+ messages in thread
From: Ivan Danov @ 2013-05-14 15:25 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Simon Jeons, linux-mm, 1162073, Mel Gorman, Johannes Weiner


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 14/04/13 16:58, Michal Hocko wrote:
> On Fri 12-04-13 14:38:00, Ivan Danov wrote:
>> On 12/04/13 13:11, Michal Hocko wrote:
>>> On Fri 12-04-13 12:49:30, Ivan Danov wrote:
>>>> $ cat /proc/sys/vm/swappiness
>>>> 60
>>> OK, thanks for confirming this. It is really strange that we do not swap
>>> almost at all, then.
>>>> I have increased my swap partition from nearly 2GB to around 16GB,
>>>> but the problem remains.
>>> Increasing the swap partition will not help much as it almost unused
>>> with 2G already (at least last data shown that).
>>>
>>>> Here I attach the logs for the larger swap partition. I use a MATLAB
>>>> script to simulate the problem, but it also works in Octave:
>>>> X = ones(100000,10000);
>>> AFAIU this will create a matrix with 10^9 elements and initialize them
>>> to 1. I am not familiar with octave but do you happen to know what is
>>> the data type used for the element? 8B? It would be also interesting to
>>> know how is the matrix organized and initialized. Does it fit into
>>> memory at all?
>> Yes, 8B each, so it will be almost 8GB and it should fit into the
>> memory.
>
> It won't fit in because kernel and other processes consume some memory
> as well. So you have to swap.
>
>> I don't know details how it actually works, but if it cannot
>> create the matrix, MATLAB complains about that. Since it starts
>> complaining even after 2000 more in the second dimension, maybe it
>> needs the RAM to create it all. However on the desktop machine, both
>> RAM and swap are being used (quite a lot of them both).
>
> How much you swap depends on vm.swappiness. I would suggest increasing
> the value if your workload is really so anononymous memory based.
> Otherwise a lot of file pages are reclaimed which can lead to problems
> you are seeing.
>
>>>> I have tried to simulate the problem on a desktop installation with
>>>> 4GB of RAM, 10GB of swap partition, installed Ubuntu Lucid and then
>>>> upgraded to 12.04, the problem isn't there, but the input is still
>>>> quite choppy during the load. After the script finishes, everything
>>>> looks fine. For the desktop installation the hard drive is not an
>>>> SSD hard drive.
>>> What is the kernel version used here?
>> $ uname -a
>> Linux ivan 3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC
>> 2013 x86_64 x86_64 x86_64 GNU/Linux
>
> Is there any chance you could test with the latest vanilla kernel and
> Mel's patches from https://lkml.org/lkml/2013/4/11/516 on top?
Sorry for the late comeback, I've been quite busy these days. I will try
to compile the kernel in the weekend. However, I will do kernel
compiling for first time, so maybe it could take me some time. Btw I
have upgraded to Ubuntu 13.04, so my kernel now is:
$ uname -a
Linux ivan 3.8.0-19-generic #30-Ubuntu SMP Wed May 1 16:35:23 UTC 2013
x86_64 x86_64 x86_64 GNU/Linux

The problem is still there.
>
> [...]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJRkldrAAoJEJ8e6XRVH9TGgAIIAI7hw+YWhtiZ3LAr4SOQRvce
xkRcSFUjmhW3FCSm5TERtUY6Ney3hJ9NU7I4yA56WGwOf2E6GnNkG0plewNMsWrM
4ZCFevsm1MoGP5576PUm8F0FF/0EfpFRLLwNNB7dDYDsXdmG8KlOYjlEB5H31lrR
Ycx155ZvUgVUQXNg0tthaPoy8Qaw5sGI062d9tRA4f45fh7KFhX58HVHKT+L6LtV
caLXwdH5Wkzi+Xshl6h8BGnB57fjQCDCEdltR3f1ddbUnh5kjvXKLGUWvuut7Ish
vkbOTfSBbjOPtQly6DJ/xxS8HbSIsdDU4ecfWHGztl6r1lteFJDiqFA+bPsBszU=
=oULy
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-12 10:20                 ` Michal Hocko
  2013-04-12 10:49                   ` Ivan Danov
@ 2013-04-15  5:03                   ` Simon Jeons
  2013-04-15 14:12                     ` Michal Hocko
  1 sibling, 1 reply; 17+ messages in thread
From: Simon Jeons @ 2013-04-15  5:03 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Ivan Danov, linux-mm, 1162073, Mel Gorman, Johannes Weiner

Hi Michal,
On 04/12/2013 06:20 PM, Michal Hocko wrote:
> [CCing Mel and Johannes]
> On Fri 05-04-13 22:41:37, Ivan Danov wrote:
>> Here you can find attached the script, collecting the logs and the
>> logs themselves during the described process of freezing. It
>> appeared that the previous logs are corrupted, because both
>> /proc/vmstat and /proc/meminfo have been logging to the same file.
> Sorry for the late reply:
> $ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}'
> min:165256 max:3254516 avg:1642475.35
>
> So the free memory dropped down to 165M at minimum. This doesn't sound
> terribly low and the average free memory was even above 1.5G. But maybe
> the memory consumption peak was very short between 2 measured moments.
>
> The peak seems to be around this time:
> meminfo.1365194083:MemFree:          650792 kB
> meminfo.1365194085:MemFree:          664920 kB
> meminfo.1365194087:MemFree:          165256 kB  <<<
> meminfo.1365194089:MemFree:          822968 kB
> meminfo.1365194094:MemFree:          666940 kB
>
> Let's have a look at the memory reclaim activity
> vmstat.1365194085:pgscan_kswapd_dma32 760
> vmstat.1365194085:pgscan_kswapd_normal 10444
>
> vmstat.1365194087:pgscan_kswapd_dma32 760
> vmstat.1365194087:pgscan_kswapd_normal 10444
>
> vmstat.1365194089:pgscan_kswapd_dma32 5855
> vmstat.1365194089:pgscan_kswapd_normal 80621
>
> vmstat.1365194094:pgscan_kswapd_dma32 54333
> vmstat.1365194094:pgscan_kswapd_normal 285562
>
> [...]
> vmstat.1365194098:pgscan_kswapd_dma32 54333
> vmstat.1365194098:pgscan_kswapd_normal 285562
>
> vmstat.1365194100:pgscan_kswapd_dma32 55760
> vmstat.1365194100:pgscan_kswapd_normal 289493
>
> vmstat.1365194102:pgscan_kswapd_dma32 55760
> vmstat.1365194102:pgscan_kswapd_normal 289493
>
> So the background reclaim was active only twice for a short amount of
> time:
> - 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone
> - 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone
>
> The second one looks sane so we can ignore it for now but the first one
> scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd
> had hard time to find something to reclaim or it couldn't cope with the
> ongoing memory pressure.
>
> vmstat.1365194087:pgsteal_kswapd_dma32 373
> vmstat.1365194087:pgsteal_kswapd_normal 9057
>
> vmstat.1365194089:pgsteal_kswapd_dma32 3249
> vmstat.1365194089:pgsteal_kswapd_normal 56756
>
> vmstat.1365194094:pgsteal_kswapd_dma32 14731
> vmstat.1365194094:pgsteal_kswapd_normal 221733
>
> ...087-...089
> 	- dma32 scanned 5095, reclaimed 0
> 	- normal scanned 70177, reclaimed 0

This is not correct.
     - dma32 scanned 5095, reclaimed 2876, effective = 56%
     - normal scanned 70177, reclaimed 47699, effective = 68%

> ...089-...094
> 	-dma32 scanned 48478, reclaimed 2876
> 	- normal scanned 204941, reclaimed 164977

     - dma32 scanned 48478, reclaimed 11482, effective = 23%
     - normal scanned 204941, reclaimed 164977, effective = 80%

> This shows that kswapd was not able to reclaim any page at first and
> then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in
> dma32 and 80% for normal) although normal zone seems to be doing much
> better.
>
> The direct reclaim was active during that time as well:
> vmstat.1365194089:pgscan_direct_dma32 0
> vmstat.1365194089:pgscan_direct_normal 0
>
> vmstat.1365194094:pgscan_direct_dma32 29339
> vmstat.1365194094:pgscan_direct_normal 86869
>
> which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed:
>
> vmstat.1365194089:pgsteal_direct_dma32 0
> vmstat.1365194089:pgsteal_direct_normal 0
>
> vmstat.1365194094:pgsteal_direct_dma32 6137
> vmstat.1365194094:pgsteal_direct_normal 57677
>
> 225M in the normal zone but it was still not effective very much (~20%
> for dma32 and 66% for normal).
>
> vmstat.1365194087:nr_written 9013
> vmstat.1365194089:nr_written 9013
> vmstat.1365194094:nr_written 15387
>
> Only around 24M have been written out during the massive scanning.
>
> So we have two problems here I guess. First is that there is not much
> reclaimable memory when the peak consumption starts and then we have
> hard times to balance dma32 zone.
>
> vmstat.1365194087:nr_shmem 103548
> vmstat.1365194089:nr_shmem 102227
> vmstat.1365194094:nr_shmem 100679
>
> This tells us that you didn't have that many shmem pages allocated at
> the time (only 404M). So the /tmp backed by tmpfs shouldn't be the
> primary issue here.
>
> We still have a lot of anonymous memory though:
> vmstat.1365194087:nr_anon_pages 1430922
> vmstat.1365194089:nr_anon_pages 1317009
> vmstat.1365194094:nr_anon_pages 1540460
>
> which is around 5.5G. It is interesting that the number of these pages
> even drops first and then starts growing again (between 089..094 by 870M
> while we reclaimed around the same amount). This would suggest that the
> load started trashing on swap but:
>
> meminfo.1365194087:SwapFree:        1999868 kB
> meminfo.1365194089:SwapFree:        1999808 kB
> meminfo.1365194094:SwapFree:        1784544 kB
>
> tells us that we swapped out only 210M after 1365194089. So we had to

How about set vm.swapiness to 200?

> reclaim a lot of page cache during that time while the anonymous memory
> pressure was really high.
>
> vmstat.1365194087:nr_file_pages 428632
> vmstat.1365194089:nr_file_pages 378132
> vmstat.1365194094:nr_file_pages 192009
>
> the page cache pages dropped down by 920M which covers the anon increase.
>
> This all suggests that the workload is simply too aggressive and the
> memory reclaim doesn't cope with it.
>
> Let's check the active and inactive lists (maybe we are not aging pages properly):
> meminfo.1365194087:Active(anon):    5613412 kB
> meminfo.1365194087:Active(file):     261180 kB
>
> meminfo.1365194089:Active(anon):    4794472 kB
> meminfo.1365194089:Active(file):     348396 kB
>
> meminfo.1365194094:Active(anon):    5424684 kB
> meminfo.1365194094:Active(file):      77364 kB
>
> meminfo.1365194087:Inactive(anon):   496092 kB
> meminfo.1365194087:Inactive(file):  1033184 kB
>
> meminfo.1365194089:Inactive(anon):   853608 kB
> meminfo.1365194089:Inactive(file):   749564 kB
>
> meminfo.1365194094:Inactive(anon):  1313648 kB
> meminfo.1365194094:Inactive(file):    82008 kB
>
> While the file LRUs looks good active Anon LRUs seem to be too big. This
> either suggests a bug in aging or the working set is really that big.
> Considering the previous data (increase during the memory pressure) I
> would be inclined to the second option.
>
> Just out of curiosity what is the vm_swappiness setting? You've said
> that you have changed that from 0 but it seems like we would swap much
> more. It almost looks like the swappiness is 0. Could you double check?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: System freezes when RAM is full (64-bit)
  2013-04-15  5:03                   ` Simon Jeons
@ 2013-04-15 14:12                     ` Michal Hocko
  0 siblings, 0 replies; 17+ messages in thread
From: Michal Hocko @ 2013-04-15 14:12 UTC (permalink / raw)
  To: Simon Jeons; +Cc: Ivan Danov, linux-mm, 1162073, Mel Gorman, Johannes Weiner

On Mon 15-04-13 13:03:34, Simon Jeons wrote:
> Hi Michal,
> On 04/12/2013 06:20 PM, Michal Hocko wrote:
> >[CCing Mel and Johannes]
> >On Fri 05-04-13 22:41:37, Ivan Danov wrote:
> >>Here you can find attached the script, collecting the logs and the
> >>logs themselves during the described process of freezing. It
> >>appeared that the previous logs are corrupted, because both
> >>/proc/vmstat and /proc/meminfo have been logging to the same file.
> >Sorry for the late reply:
> >$ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}'
> >min:165256 max:3254516 avg:1642475.35
> >
> >So the free memory dropped down to 165M at minimum. This doesn't sound
> >terribly low and the average free memory was even above 1.5G. But maybe
> >the memory consumption peak was very short between 2 measured moments.
> >
> >The peak seems to be around this time:
> >meminfo.1365194083:MemFree:          650792 kB
> >meminfo.1365194085:MemFree:          664920 kB
> >meminfo.1365194087:MemFree:          165256 kB  <<<
> >meminfo.1365194089:MemFree:          822968 kB
> >meminfo.1365194094:MemFree:          666940 kB
> >
> >Let's have a look at the memory reclaim activity
> >vmstat.1365194085:pgscan_kswapd_dma32 760
> >vmstat.1365194085:pgscan_kswapd_normal 10444
> >
> >vmstat.1365194087:pgscan_kswapd_dma32 760
> >vmstat.1365194087:pgscan_kswapd_normal 10444
> >
> >vmstat.1365194089:pgscan_kswapd_dma32 5855
> >vmstat.1365194089:pgscan_kswapd_normal 80621
> >
> >vmstat.1365194094:pgscan_kswapd_dma32 54333
> >vmstat.1365194094:pgscan_kswapd_normal 285562
> >
> >[...]
> >vmstat.1365194098:pgscan_kswapd_dma32 54333
> >vmstat.1365194098:pgscan_kswapd_normal 285562
> >
> >vmstat.1365194100:pgscan_kswapd_dma32 55760
> >vmstat.1365194100:pgscan_kswapd_normal 289493
> >
> >vmstat.1365194102:pgscan_kswapd_dma32 55760
> >vmstat.1365194102:pgscan_kswapd_normal 289493
> >
> >So the background reclaim was active only twice for a short amount of
> >time:
> >- 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone
> >- 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone
> >
> >The second one looks sane so we can ignore it for now but the first one
> >scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd
> >had hard time to find something to reclaim or it couldn't cope with the
> >ongoing memory pressure.
> >
> >vmstat.1365194087:pgsteal_kswapd_dma32 373
> >vmstat.1365194087:pgsteal_kswapd_normal 9057
> >
> >vmstat.1365194089:pgsteal_kswapd_dma32 3249
> >vmstat.1365194089:pgsteal_kswapd_normal 56756
> >
> >vmstat.1365194094:pgsteal_kswapd_dma32 14731
> >vmstat.1365194094:pgsteal_kswapd_normal 221733
> >
> >...087-...089
> >	- dma32 scanned 5095, reclaimed 0
> >	- normal scanned 70177, reclaimed 0
> 
> This is not correct.
>     - dma32 scanned 5095, reclaimed 2876, effective = 56%
>     - normal scanned 70177, reclaimed 47699, effective = 68%

Right you are! I've made a mistake compared wrong timestamps.

> >...089-...094
> >	-dma32 scanned 48478, reclaimed 2876
> >	- normal scanned 204941, reclaimed 164977
> 
>     - dma32 scanned 48478, reclaimed 11482, effective = 23%
>     - normal scanned 204941, reclaimed 164977, effective = 80%

Same here.
> 
> >This shows that kswapd was not able to reclaim any page at first and
> >then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in
> >dma32 and 80% for normal) although normal zone seems to be doing much
> >better.
> >
> >The direct reclaim was active during that time as well:
> >vmstat.1365194089:pgscan_direct_dma32 0
> >vmstat.1365194089:pgscan_direct_normal 0
> >
> >vmstat.1365194094:pgscan_direct_dma32 29339
> >vmstat.1365194094:pgscan_direct_normal 86869
> >
> >which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed:
> >
> >vmstat.1365194089:pgsteal_direct_dma32 0
> >vmstat.1365194089:pgsteal_direct_normal 0
> >
> >vmstat.1365194094:pgsteal_direct_dma32 6137
> >vmstat.1365194094:pgsteal_direct_normal 57677
> >
> >225M in the normal zone but it was still not effective very much (~20%
> >for dma32 and 66% for normal).
> >
> >vmstat.1365194087:nr_written 9013
> >vmstat.1365194089:nr_written 9013
> >vmstat.1365194094:nr_written 15387
> >
> >Only around 24M have been written out during the massive scanning.
> >
> >So we have two problems here I guess. First is that there is not much
> >reclaimable memory when the peak consumption starts and then we have
> >hard times to balance dma32 zone.
> >
> >vmstat.1365194087:nr_shmem 103548
> >vmstat.1365194089:nr_shmem 102227
> >vmstat.1365194094:nr_shmem 100679
> >
> >This tells us that you didn't have that many shmem pages allocated at
> >the time (only 404M). So the /tmp backed by tmpfs shouldn't be the
> >primary issue here.
> >
> >We still have a lot of anonymous memory though:
> >vmstat.1365194087:nr_anon_pages 1430922
> >vmstat.1365194089:nr_anon_pages 1317009
> >vmstat.1365194094:nr_anon_pages 1540460
> >
> >which is around 5.5G. It is interesting that the number of these pages
> >even drops first and then starts growing again (between 089..094 by 870M
> >while we reclaimed around the same amount). This would suggest that the
> >load started trashing on swap but:
> >
> >meminfo.1365194087:SwapFree:        1999868 kB
> >meminfo.1365194089:SwapFree:        1999808 kB
> >meminfo.1365194094:SwapFree:        1784544 kB
> >
> >tells us that we swapped out only 210M after 1365194089. So we had to
> 
> How about set vm.swapiness to 200?

swappiness is limited to 0 to 100 values. And it treats anon vs. file
LRUs equally at 100.
 
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-05-14 15:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-01 19:14 System freezes when RAM is full (64-bit) Ivan Danov
2013-04-03 12:12 ` Michal Hocko
2013-04-04  0:27   ` Simon Jeons
2013-04-04  7:08     ` Michal Hocko
2013-04-04 14:10       ` Ivan Danov
2013-04-04 15:16         ` Michal Hocko
2013-04-05 10:13           ` Ivan Danov
2013-04-05 11:59             ` Michal Hocko
2013-04-05 20:41               ` Ivan Danov
2013-04-12 10:20                 ` Michal Hocko
2013-04-12 10:49                   ` Ivan Danov
2013-04-12 11:11                     ` Michal Hocko
2013-04-12 12:38                       ` Ivan Danov
2013-04-14 14:58                         ` Michal Hocko
2013-05-14 15:25                           ` Ivan Danov
2013-04-15  5:03                   ` Simon Jeons
2013-04-15 14:12                     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).