All of lore.kernel.org
 help / color / mirror / Atom feed
* Slow, persistent memory leak in 2.6.20
@ 2007-08-26 14:39 Fred Tyler
  2007-08-26 15:32 ` Alexey Dobriyan
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Fred Tyler @ 2007-08-26 14:39 UTC (permalink / raw)
  To: linux-kernel

I think I've come across a memory leak in 2.6.20. I've upgraded to the
latest 2.6.20.17, but it didn't seem to help.

A little background: I saw something exactly like this many months ago
with a 2.6.12 kernel. However, by 2.6.16.x the leak had apparently
been fixed, so I didn't pursue it. I just assumed it had been fixed.
But either it remains in 2.6.20 or else a new leak has appeared.

FWIW, this is an x86_64 machine, but I also saw nearly the same
behavior on a i386 machine running 2.6.12. (Links to graphs showing
long-term memory usage are at the bottom of this email if you want to
skip all the text stats in the middle.)

Immediately after booting the system, I shut down all services to get
a baseline for comparison. Here is the output of top and vmstat with
virtually nothing running:

=========== top =============

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  754 root      15   0 16948 2368 1732 R  0.0  0.3   0:00.01 sshd
  757 root      15   0  5620 1440 1124 S  0.0  0.2   0:00.00 bash
 1195 root      15   0  6300 1116  880 R  0.3  0.1   0:00.02 top
 1196 root      18   0  3880  628  516 S  0.0  0.1   0:00.00 agetty
    1 root      18   0  3888  516  412 S  0.0  0.1   0:00.26 init
  741 root      18   0  3880  508  412 S  0.0  0.1   0:00.00 agetty
  742 root      18   0  3876  504  412 S  0.0  0.1   0:00.00 agetty
  743 root      18   0  3876  504  412 S  0.0  0.1   0:00.00 agetty
    2 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
    4 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/0
    5 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 khelper
    6 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread
   57 root      10  -5     0    0    0 S  0.0  0.0   0:00.02 kblockd/0
   58 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 ata/0
   59 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 ata_aux
   62 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 khubd
   64 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod
  121 root      25   0     0    0    0 S  0.0  0.0   0:00.00 pdflush
  122 root      15   0     0    0    0 S  0.0  0.0   0:00.00 pdflush
  123 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kswapd0
  124 root      19  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0
  220 root      13  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_0
  221 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_1
  245 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_2
  246 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 usb-storage
  256 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 reiserfs/0

=========== free ============

             total       used       free     shared    buffers     cached
Mem:        899408      96824     802584          0      12604      70064
-/+ buffers/cache:      14156     885252
Swap:        65528          0      65528


=========== vmstat ==============

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 802152  13008  70096    0    0   402   184  282   87  2  1 88  9


=========== vmstat -s ===============

       899408  total memory
        97248  used memory
        50352  active memory
        34368  inactive memory
       802160  free memory
        13080  buffer memory
        70104  swap cache
        65528  total swap
            0  used swap
        65528  free swap
          349 non-nice user cpu ticks
            0 nice user cpu ticks
          172 system cpu ticks
        15743 idle cpu ticks
         1522 IO-wait cpu ticks
            0 IRQ cpu ticks
           10 softirq cpu ticks
            0 stolen cpu ticks
        69682 pages paged in
        32228 pages paged out
            0 pages swapped in
            0 pages swapped out
        50228 interrupts
        15207 CPU context switches
   1188132534 boot time
         1213 forks

==================================



Ok, now I start back up all services and let the system run for about
12 hours. At the end of this time, I shut down all services again so
that virtually nothing is running. Here are the stats:



=========== top ================

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17250 root      15   0 16952 2372 1732 R  0.0  0.3   0:00.09 sshd
17253 root      15   0  5624 1448 1124 S  0.0  0.2   0:00.01 bash
23409 root      15   0  6304 1124  884 R  0.0  0.1   0:00.00 top
23410 root      18   0  3880  628  516 S  0.0  0.1   0:00.00 agetty
    1 root      18   0  3884  516  412 S  0.0  0.1   0:00.56 init
  750 root      18   0  3880  508  412 S  0.0  0.1   0:00.00 agetty
  751 root      18   0  3880  508  412 S  0.0  0.1   0:00.00 agetty
  749 root      18   0  3876  504  412 S  0.0  0.1   0:00.00 agetty
    2 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/0
    4 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/0
    5 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 khelper
    6 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread
   57 root      10  -5     0    0    0 S  0.0  0.0   0:00.31 kblockd/0
   58 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 ata/0
   59 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 ata_aux
   62 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 khubd
   64 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod
  121 root      15   0     0    0    0 S  0.0  0.0   0:00.00 pdflush
  123 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kswapd0
  124 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0
  220 root      13  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_0
  221 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_1
  245 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_2
  246 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 usb-storage
  256 root      10  -5     0    0    0 S  0.0  0.0   0:00.02 reiserfs/0
17277 root      15   0     0    0    0 S  0.0  0.0   0:00.16 pdflush

============= free ===========

             total       used       free     shared    buffers     cached
Mem:        899408     747128     152280          0     166228     444540
-/+ buffers/cache:     136360     763048
Swap:        65528          0      65528


============= vmstat ==============

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 152288 166228 444564    0    0    10    30  255   29  0  0 99  0

============ vmstat -s ==============

       899408  total memory
       747248  used memory
       338700  active memory
       273736  inactive memory
       152160  free memory
       166228  buffer memory
       444660  swap cache
        65528  total swap
            0  used swap
        65528  free swap
         7522 non-nice user cpu ticks
          300 nice user cpu ticks
         4120 system cpu ticks
      3699397 idle cpu ticks
        13963 IO-wait cpu ticks
           49 IRQ cpu ticks
          146 softirq cpu ticks
            0 stolen cpu ticks
       355378 pages paged in
      1108508 pages paged out
            0 pages swapped in
            0 pages swapped out
      9505965 interrupts
      1095062 CPU context switches
   1188095217 boot time
        23440 forks

======================================


After 12 hours, you can see that when I shut down all of the services
there is a lot of memory being used. But where is it going?

I have compared this to a machine running i386 2.6.16.2x and when I
stop all services down to nothing but ssh, there is only a tiny amount
of RAM in use, as expected.

I can verify that this memory loss never stops: The lost memory keeps
increasing until eventually the machine goes into swap and will
eventually crash if left to its own devices. However, on machines with
big RAM, this process can take a month or more.

Here are links to three cacti graphs where you can see the effect over
the long term:

This graph is from a machine running 2.6.16.27/i386, which does not
have any memory loss. You can see the long-term memory line is flat:

    http://i239.photobucket.com/albums/ff117/fredty8/memory-a4.png

Now here is a graph from a machine running 2.6.12/i386, which clearly
shows a long-term memory loss. The points where the memory shoots back
up to its full level are when the machine had to be rebooted because
it was going into swap:

    http://i239.photobucket.com/albums/ff117/fredty8/memory-a2.png

And finally, here is a graph from a machine running 2.6.20.15/x86_64,
which shows a very similar memory loss as the 2.6.12 machine. (This
machine has only been up for a few weeks, which is why the graph is so
short. But it is clear that the graph is doing the same thing as
2.6.12):

    http://i239.photobucket.com/albums/ff117/fredty8/memory-b1.png


If you need any more information from me, I'll be happy to provide it.
Please CC me on replies.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 14:39 Slow, persistent memory leak in 2.6.20 Fred Tyler
@ 2007-08-26 15:32 ` Alexey Dobriyan
  2007-08-26 15:40 ` Fred Tyler
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Alexey Dobriyan @ 2007-08-26 15:32 UTC (permalink / raw)
  To: Fred Tyler; +Cc: linux-kernel

On Sun, Aug 26, 2007 at 10:39:11AM -0400, Fred Tyler wrote:
> I think I've come across a memory leak in 2.6.20. I've upgraded to the
> latest 2.6.20.17, but it didn't seem to help.
> 
> A little background: I saw something exactly like this many months ago
> with a 2.6.12 kernel. However, by 2.6.16.x the leak had apparently
> been fixed, so I didn't pursue it. I just assumed it had been fixed.
> But either it remains in 2.6.20 or else a new leak has appeared.
> 
> FWIW, this is an x86_64 machine, but I also saw nearly the same
> behavior on a i386 machine running 2.6.12. (Links to graphs showing
> long-term memory usage are at the bottom of this email if you want to
> skip all the text stats in the middle.)
> 
> Immediately after booting the system, I shut down all services to get
> a baseline for comparison. Here is the output of top and vmstat with
> virtually nothing running:

You can try "Kernel Hacking" => "Debug slab memory allocations" =>
"Memory leak debugging". After you think it leaked pretty much, post
output of

	sort -n -k2 /proc/slab_allocators

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 14:39 Slow, persistent memory leak in 2.6.20 Fred Tyler
  2007-08-26 15:32 ` Alexey Dobriyan
@ 2007-08-26 15:40 ` Fred Tyler
  2007-08-26 15:51 ` Fred Tyler
  2007-08-26 22:16 ` Jesper Juhl
  3 siblings, 0 replies; 16+ messages in thread
From: Fred Tyler @ 2007-08-26 15:40 UTC (permalink / raw)
  To: linux-kernel

On 8/26/07, Fred Tyler <fredty8@gmail.com> wrote:
> I think I've come across a memory leak in 2.6.20. I've upgraded to the
> latest 2.6.20.17, but it didn't seem to help.

One more thing, I just found this message from July from someone
seeing a similar problem:

    http://lkml.org/lkml/2007/7/27/305

I am also running reiserfs, so I wonder if that has something to do
with this. Unlike the other poster, though, I am running an unmodified
kernel and have not seen the error he saw in the system logs.

Here's my output from /proc/meminfo in case it helps:

$ cat /proc/meminfo
MemTotal:      4053564 kB
MemFree:        144344 kB
Buffers:        310824 kB
Cached:        2684244 kB
SwapCached:         64 kB
Active:        1858644 kB
Inactive:      1510808 kB
SwapTotal:       65528 kB
SwapFree:        65316 kB
Dirty:            1772 kB
Writeback:           0 kB
AnonPages:      363844 kB
Mapped:          46924 kB
Slab:           509276 kB
SReclaimable:   467220 kB
SUnreclaim:      42056 kB
PageTables:       9660 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   2092308 kB
Committed_AS:   854520 kB
VmallocTotal: 34359738367 kB
VmallocUsed:      4936 kB
VmallocChunk: 34359733423 kB

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 14:39 Slow, persistent memory leak in 2.6.20 Fred Tyler
  2007-08-26 15:32 ` Alexey Dobriyan
  2007-08-26 15:40 ` Fred Tyler
@ 2007-08-26 15:51 ` Fred Tyler
  2007-08-26 15:52   ` Jan Engelhardt
  2007-08-26 22:16 ` Jesper Juhl
  3 siblings, 1 reply; 16+ messages in thread
From: Fred Tyler @ 2007-08-26 15:51 UTC (permalink / raw)
  To: linux-kernel

On 8/26/07, Fred Tyler <fredty8@gmail.com> wrote:
> I think I've come across a memory leak in 2.6.20. I've upgraded to the
> latest 2.6.20.17, but it didn't seem to help.

Sorry to keep replying to my own post, but further investigation
suggests that the memory losses may be occurring at times of heavy
filesystem access. The machines in question run rsyncs of hundreds of
thousands of files every few hours, and I'm starting to think that the
memory loss occurs during these times. I don't know how I'd go about
proving this though...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 15:51 ` Fred Tyler
@ 2007-08-26 15:52   ` Jan Engelhardt
       [not found]     ` <466ad3f90708260914g3cb92f30q8d5672e3f9cf960c@mail.gmail.com>
  2007-08-26 16:16     ` Fred Tyler
  0 siblings, 2 replies; 16+ messages in thread
From: Jan Engelhardt @ 2007-08-26 15:52 UTC (permalink / raw)
  To: Fred Tyler; +Cc: linux-kernel


On Aug 26 2007 11:51, Fred Tyler wrote:
>On 8/26/07, Fred Tyler <fredty8@gmail.com> wrote:
>> I think I've come across a memory leak in 2.6.20. I've upgraded to the
>> latest 2.6.20.17, but it didn't seem to help.
>
>Sorry to keep replying to my own post, but further investigation
>suggests that the memory losses may be occurring at times of heavy
>filesystem access. The machines in question run rsyncs of hundreds of
>thousands of files every few hours, and I'm starting to think that the
>memory loss occurs during these times. I don't know how I'd go about
>proving this though...

Please rule out filesystem caches by issuing
	sync;
	echo 3 >/proc/sys/vm/drop_caches;



	Jan
-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
       [not found]     ` <466ad3f90708260914g3cb92f30q8d5672e3f9cf960c@mail.gmail.com>
@ 2007-08-26 16:14       ` Fred Tyler
  0 siblings, 0 replies; 16+ messages in thread
From: Fred Tyler @ 2007-08-26 16:14 UTC (permalink / raw)
  To: linux-kernel

On 8/26/07, Fred Tyler <fredty8@gmail.com> wrote:
> On 8/26/07, Jan Engelhardt <jengelh@computergmbh.de> wrote:
> >
> > On Aug 26 2007 11:51, Fred Tyler wrote:
> > >On 8/26/07, Fred Tyler <fredty8@gmail.com> wrote:
> > >> I think I've come across a memory leak in 2.6.20. I've upgraded to the
> > >> latest 2.6.20.17, but it didn't seem to help.
> > >
> > >Sorry to keep replying to my own post, but further investigation
> > >suggests that the memory losses may be occurring at times of heavy
> > >filesystem access. The machines in question run rsyncs of hundreds of
> > >thousands of files every few hours, and I'm starting to think that the
> > >memory loss occurs during these times. I don't know how I'd go about
> > >proving this though...
> >
> > Please rule out filesystem caches by issuing
> >         sync;
> >         echo 3 >/proc/sys/vm/drop_caches;
>
>
> Ok, I did this on a non-production machine that has only been up for a
> few hours, and here's what happened:
>
> ======== Before =========
>
> $ free -m
>              total       used       free     shared    buffers     cached
> Mem:           878        824         54          0        111        422
> -/+ buffers/cache:        290        587
> Swap:           63          0         63
>
>
> ======== After ========
>
> root@b0$ free -m
>              total       used       free     shared    buffers     cached
> Mem:           878         47        830          0          6          4
> -/+ buffers/cache:         36        841
> Swap:           63          0         63
>
> ======================
>
> So, I guess it worked? (I don't know what was supposed to happen, but
> memory usage dropped significantly when I did this.)
>
> However, I'm not sure this staging machine has been up long enough or
> doing enough to exhibit the problem. I can try this on my production
> servers (the ones I provided graphs for) late tonight, but how safe is
> running this command? Does it permanently disable file caching? Do I
> need to reset it afterwards? If I stop all services (databases,
> logging, etc) first, am I protected against data loss?
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 15:52   ` Jan Engelhardt
       [not found]     ` <466ad3f90708260914g3cb92f30q8d5672e3f9cf960c@mail.gmail.com>
@ 2007-08-26 16:16     ` Fred Tyler
  2007-08-26 16:30       ` Jan Engelhardt
  2007-08-26 17:03       ` Denys Vlasenko
  1 sibling, 2 replies; 16+ messages in thread
From: Fred Tyler @ 2007-08-26 16:16 UTC (permalink / raw)
  To: linux-kernel

On 8/26/07, Jan Engelhardt <jengelh@computergmbh.de> wrote:
>
> On Aug 26 2007 11:51, Fred Tyler wrote:
> >On 8/26/07, Fred Tyler <fredty8@gmail.com> wrote:
> >> I think I've come across a memory leak in 2.6.20. I've upgraded to the
> >> latest 2.6.20.17, but it didn't seem to help.
> >
> >Sorry to keep replying to my own post, but further investigation
> >suggests that the memory losses may be occurring at times of heavy
> >filesystem access. The machines in question run rsyncs of hundreds of
> >thousands of files every few hours, and I'm starting to think that the
> >memory loss occurs during these times. I don't know how I'd go about
> >proving this though...
>
> Please rule out filesystem caches by issuing
>         sync;
>         echo 3 >/proc/sys/vm/drop_caches;

(Sorry if this goes to the list twice... Mailer problems.)

Ok, I did this on a non-production machine that has only been up for a
few hours, and here's what happened:

======== Before =========

$ free -m
            total       used       free     shared    buffers     cached
Mem:           878        824         54          0        111        422
-/+ buffers/cache:        290        587
Swap:           63          0         63


======== After ========

root@b0$ free -m
            total       used       free     shared    buffers     cached
Mem:           878         47        830          0          6          4
-/+ buffers/cache:         36        841
Swap:           63          0         63

======================

So, I guess it worked? (I don't know what was supposed to happen, but
memory usage dropped significantly when I did this.)

However, I'm not sure this staging machine has been up long enough or
doing enough to exhibit the problem. I can try this on my production
servers (the ones I provided graphs for) late tonight, but how safe is
running this command? Does it permanently disable file caching? Do I
need to reset it afterwards? If I stop all services (databases,
logging, etc) first, am I protected against data loss?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 16:16     ` Fred Tyler
@ 2007-08-26 16:30       ` Jan Engelhardt
  2007-08-26 16:49         ` Fred Tyler
  2007-08-26 16:58         ` Fred Tyler
  2007-08-26 17:03       ` Denys Vlasenko
  1 sibling, 2 replies; 16+ messages in thread
From: Jan Engelhardt @ 2007-08-26 16:30 UTC (permalink / raw)
  To: Fred Tyler; +Cc: linux-kernel


On Aug 26 2007 12:16, Fred Tyler wrote:
>> Please rule out filesystem caches by issuing
>>         sync;
>>         echo 3 >/proc/sys/vm/drop_caches;
>
>(Sorry if this goes to the list twice... Mailer problems.)
alright..

>Ok, I did this on a non-production machine that has only been up for a
>few hours, and here's what happened:
>
>======== Before =========
>
>$ free -m
>            total       used       free     shared    buffers     cached
>Mem:           878        824         54          0        111        422
>-/+ buffers/cache:        290        587
>Swap:           63          0         63
>
>
>======== After ========
>
>root@b0$ free -m
>            total       used       free     shared    buffers     cached
>Mem:           878         47        830          0          6          4
>-/+ buffers/cache:         36        841
>Swap:           63          0         63
>
>======================
>
>So, I guess it worked? (I don't know what was supposed to happen, but
>memory usage dropped significantly when I did this.)

So I guess you are not seeing any memory leak at all, but just the regular
caching?

>However, I'm not sure this staging machine has been up long enough or
>doing enough to exhibit the problem. I can try this on my production
>servers (the ones I provided graphs for) late tonight, but how safe is
>running this command? Does it permanently disable file caching? Do I
>need to reset it afterwards? If I stop all services (databases,

drop_cache is a trigger, not a setting. Hence your RAM will be used again
after you have used drop_caches.

>logging, etc) first, am I protected against data loss?


	Jan
-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 16:30       ` Jan Engelhardt
@ 2007-08-26 16:49         ` Fred Tyler
  2007-08-26 16:58           ` Jan Engelhardt
  2007-08-26 16:58         ` Fred Tyler
  1 sibling, 1 reply; 16+ messages in thread
From: Fred Tyler @ 2007-08-26 16:49 UTC (permalink / raw)
  To: jengelh, linux-kernel

On 8/26/07, Jan Engelhardt <jengelh@computergmbh.de> wrote:
>
> On Aug 26 2007 12:16, Fred Tyler wrote:
> >> Please rule out filesystem caches by issuing
> >>         sync;
> >>         echo 3 >/proc/sys/vm/drop_caches;
> >
>
> >Ok, I did this on a non-production machine that has only been up for a
> >few hours, and here's what happened:
> > ...
> >So, I guess it worked? (I don't know what was supposed to happen, but
> >memory usage dropped significantly when I did this.)
>
> So I guess you are not seeing any memory leak at all, but just the regular
> caching?

I certainly hope that is the case, but until I try it on the
production machine tonight I won't know for sure. If this is indeed a
leak, it's pretty slow, and it takes a week or so before you can even
start noticing it on the graphs

I can say with absolute certainty that something very similar was
happening in 2.6.12 (compare the graphs in my original email), and in
2.6.12 it would inevitably lead to the server running entirely out of
memory, to the point where applications could no longer allocate
memory and the server would have to be rebooted.

The symptoms were almost identical in that case: I'd shut down
virtually every application on the server, but the memory would still
be almost entirely in use. I understand there's kernel caching, but if
the kernel caching occurs at the expense of any other applications
being able to access memory, then there's a real problem. (I actually
still have one 2.6.12 machine running, but drop_caches doesn't appear
to exist on it so I can't test it there. Is there an analogue?)

Anyway, I'll post the results from the 2.6.20 server as soon as I have
them. Should be late tonight.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 16:30       ` Jan Engelhardt
  2007-08-26 16:49         ` Fred Tyler
@ 2007-08-26 16:58         ` Fred Tyler
  2007-08-26 17:42           ` Jan Engelhardt
  1 sibling, 1 reply; 16+ messages in thread
From: Fred Tyler @ 2007-08-26 16:58 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: linux-kernel

On 8/26/07, Jan Engelhardt <jengelh@computergmbh.de> wrote:
>
> On Aug 26 2007 12:16, Fred Tyler wrote:
> >> Please rule out filesystem caches by issuing
> >>         sync;
> >>         echo 3 >/proc/sys/vm/drop_caches;
> >
> >
> >So, I guess it worked? (I don't know what was supposed to happen, but
> >memory usage dropped significantly when I did this.)
>
> So I guess you are not seeing any memory leak at all, but just the regular
> caching?

Also, how can you explain the differences between the graphs of
long-term memory usage? This first graph is from a server running
2.6.16 that never has memory problems:

    http://i239.photobucket.com/albums/ff117/fredty8/memory-a4.png

And here's a graph of a server running 2.6.12 that has to be rebooted
every month or two because it runs out of memory:

    http://i239.photobucket.com/albums/ff117/fredty8/memory-a2.png

Now, admittedly, the 2.6.20 server has not been running long enough to
know whether or not it's going to start starving applications of
memory, but the graph here looks a whole lot more like 2.6.12 than
2.6.16, wouldn't you agree:

    http://i239.photobucket.com/albums/ff117/fredty8/memory-b1.png


Those 2.6.12 servers caused me a ton of stress because I let the
problem go too long before I did anything. In the event that 2.6.20 is
doing the same thing, I'm trying to fix it before things get out of
control.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 16:49         ` Fred Tyler
@ 2007-08-26 16:58           ` Jan Engelhardt
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Engelhardt @ 2007-08-26 16:58 UTC (permalink / raw)
  To: Fred Tyler; +Cc: linux-kernel


On Aug 26 2007 12:49, Fred Tyler wrote:
>>
>> So I guess you are not seeing any memory leak at all, but just the regular
>> caching?
>
>I certainly hope that is the case, but until I try it on the
>production machine tonight I won't know for sure.

Note that not all kernels have the 'drop_caches' control file.
So there is not much you can do there.

>If this is indeed a
>leak, it's pretty slow, and it takes a week or so before you can even
>start noticing it on the graphs

Well if it helps, you can accelerate the 'problem', by issuing, for example:

(1)
	dd_rescue /dev/sda /dev/null -m $[4*1048576*1024]

for reading 4 GB from disk straight and populating 'buffers'.

(2)
	cat /some/big/big/big/file >/dev/null

for reading X GB from disk and populating 'cache'.

and then you'll see.  Also note that a kernel leak will eventually lead 
to very low buffers/cached values (the ones to the far right) even when 
large amounts of data are read (using either dd_rescue/cat as mentioned 
above), because, of course, the leak clogs up memory.

             total       used       free     shared    buffers     cached
Mem:        775792     493724     282068          0          8     308416
-/+ buffers/cache:     185300     590492
Swap:       795136         60     795076


>I can say with absolute certainty that something very similar was
>happening in 2.6.12 (compare the graphs in my original email), and in
>2.6.12 it would inevitably lead to the server running entirely out of
>memory, to the point where applications could no longer allocate
>memory and the server would have to be rebooted.

>The symptoms were almost identical in that case: I'd shut down
>virtually every application on the server, but the memory would still
>be almost entirely in use. I understand there's kernel caching, but if
>the kernel caching occurs at the expense of any other applications
>being able to access memory, then there's a real problem. (I actually
>still have one 2.6.12 machine running, but drop_caches doesn't appear
>to exist on it so I can't test it there. Is there an analogue?)

If you see an Out Of Memory notice in dmesg, you'll know there is a 
leak even if everything was shut down.

>
>Anyway, I'll post the results from the 2.6.20 server as soon as I have
>them. Should be late tonight.
>
>Thanks.
>

	Jan
-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 16:16     ` Fred Tyler
  2007-08-26 16:30       ` Jan Engelhardt
@ 2007-08-26 17:03       ` Denys Vlasenko
  2007-08-26 17:41         ` Fred Tyler
  1 sibling, 1 reply; 16+ messages in thread
From: Denys Vlasenko @ 2007-08-26 17:03 UTC (permalink / raw)
  To: Fred Tyler; +Cc: linux-kernel

On Sunday 26 August 2007 17:16, Fred Tyler wrote:
> So, I guess it worked? (I don't know what was supposed to happen, but
> memory usage dropped significantly when I did this.)

If you can reclaim "leaked" memory this way, it means that
you found a bug where cached data is incorrectly kept
in RAM in preference of other data.
(I'm assuming that you do have real problems after some time
of "leaking" memory - you mention that you get swap storms
and eventually machine is dead.)

> However, I'm not sure this staging machine has been up long enough or
> doing enough to exhibit the problem. I can try this on my production
> servers (the ones I provided graphs for) late tonight, but how safe is
> running this command? Does it permanently disable file caching? Do I

Yes, it's safe to do, anytime.

It's just a command to kernel to drop as much of currently
accumulated filesystem cache as it can. It is strictly
a debugging/benchmarking aid.

If you end up needing to do it once in a while to keep your machine
alive, something is definitely wrong.
--
vda

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 17:03       ` Denys Vlasenko
@ 2007-08-26 17:41         ` Fred Tyler
  2007-08-26 17:44           ` Jan Engelhardt
  0 siblings, 1 reply; 16+ messages in thread
From: Fred Tyler @ 2007-08-26 17:41 UTC (permalink / raw)
  To: vda.linux, linux-kernel

On 8/26/07, Denys Vlasenko <vda.linux@googlemail.com> wrote:
> On Sunday 26 August 2007 17:16, Fred Tyler wrote:
> > So, I guess it worked? (I don't know what was supposed to happen, but
> > memory usage dropped significantly when I did this.)
>
> If you can reclaim "leaked" memory this way, it means that
> you found a bug where cached data is incorrectly kept
> in RAM in preference of other data.
> (I'm assuming that you do have real problems after some time
> of "leaking" memory - you mention that you get swap storms
> and eventually machine is dead.)

This was exactly what happened with 2.6.12 -- more and more memory
used until there was a swap storm and a dead machine.

The 2.6.20 machines haven't been up long enough to know if they're
going to be hit by the same problem, but it seems peculiar to me that
the 2.6.16 machine does not do anything remotely like this. As you can
see in the graphs, the 2.6.16 memory use levels off very quickly, but
2.6.12 keeps dropping until the machine bombs.

The 2.6.20 graph looks like it's heading the same direction as 2.6.12.

I'm going to run drop_caches on the 2.6.20 machines tonight and see
what happens...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 16:58         ` Fred Tyler
@ 2007-08-26 17:42           ` Jan Engelhardt
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Engelhardt @ 2007-08-26 17:42 UTC (permalink / raw)
  To: Fred Tyler; +Cc: linux-kernel


On Aug 26 2007 12:58, Fred Tyler wrote:
>> So I guess you are not seeing any memory leak at all, but just the regular
>> caching?
>
>Also, how can you explain the differences between the graphs of
>long-term memory usage? This first graph is from a server running
>2.6.16 that never has memory problems:
>
>    http://i239.photobucket.com/albums/ff117/fredty8/memory-a4.png
>
>And here's a graph of a server running 2.6.12 that has to be rebooted
>every month or two because it runs out of memory:
>
>    http://i239.photobucket.com/albums/ff117/fredty8/memory-a2.png

Indeed that looks like a leak. But perhaps it would be helpful to not only
match MemFree Buffers and Cached (from /proc/meminfo) but also Slab.

>Now, admittedly, the 2.6.20 server has not been running long enough to
>know whether or not it's going to start starving applications of
>memory, but the graph here looks a whole lot more like 2.6.12 than
>2.6.16, wouldn't you agree:
>
>    http://i239.photobucket.com/albums/ff117/fredty8/memory-b1.png

But it looks like.

>Those 2.6.12 servers caused me a ton of stress because I let the
>problem go too long before I did anything. In the event that 2.6.20 is
>doing the same thing, I'm trying to fix it before things get out of
>control.
>

	Jan
-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 17:41         ` Fred Tyler
@ 2007-08-26 17:44           ` Jan Engelhardt
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Engelhardt @ 2007-08-26 17:44 UTC (permalink / raw)
  To: Fred Tyler; +Cc: vda.linux, linux-kernel


On Aug 26 2007 13:41, Fred Tyler wrote:
>
>I'm going to run drop_caches on the 2.6.20 machines tonight and see
>what happens...

Better add "Slab" to your graphs, that looks like it's the amount of
non-cache kernel memory used.

	Jan
-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Slow, persistent memory leak in 2.6.20
  2007-08-26 14:39 Slow, persistent memory leak in 2.6.20 Fred Tyler
                   ` (2 preceding siblings ...)
  2007-08-26 15:51 ` Fred Tyler
@ 2007-08-26 22:16 ` Jesper Juhl
  3 siblings, 0 replies; 16+ messages in thread
From: Jesper Juhl @ 2007-08-26 22:16 UTC (permalink / raw)
  To: Fred Tyler; +Cc: linux-kernel

On 26/08/07, Fred Tyler <fredty8@gmail.com> wrote:
> I think I've come across a memory leak in 2.6.20. I've upgraded to the
> latest 2.6.20.17, but it didn't seem to help.
>
Have you tried the latest 2.6.22.5 ?
A lot of memory leaks have been fixed between 2.6.20 and the latest
stable kernel - could be that yours is amongst the ones fixed :-)


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-08-26 22:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-26 14:39 Slow, persistent memory leak in 2.6.20 Fred Tyler
2007-08-26 15:32 ` Alexey Dobriyan
2007-08-26 15:40 ` Fred Tyler
2007-08-26 15:51 ` Fred Tyler
2007-08-26 15:52   ` Jan Engelhardt
     [not found]     ` <466ad3f90708260914g3cb92f30q8d5672e3f9cf960c@mail.gmail.com>
2007-08-26 16:14       ` Fred Tyler
2007-08-26 16:16     ` Fred Tyler
2007-08-26 16:30       ` Jan Engelhardt
2007-08-26 16:49         ` Fred Tyler
2007-08-26 16:58           ` Jan Engelhardt
2007-08-26 16:58         ` Fred Tyler
2007-08-26 17:42           ` Jan Engelhardt
2007-08-26 17:03       ` Denys Vlasenko
2007-08-26 17:41         ` Fred Tyler
2007-08-26 17:44           ` Jan Engelhardt
2007-08-26 22:16 ` Jesper Juhl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.