public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: bgagnon@coradiant.com
Cc: linux-kernel@vger.kernel.org
Subject: Re: Memory leak in 2.4.27 kernel, using mmap raw packet sockets
Date: Fri, 15 Oct 2004 15:23:52 -0300	[thread overview]
Message-ID: <20041015182352.GA4937@logos.cnet> (raw)
In-Reply-To: <OFDDC77A23.4D34536B-ON85256F2D.00514F15-85256F2D.00517F52@coradiant.com>

On Thu, Oct 14, 2004 at 10:50:14AM -0400, bgagnon@coradiant.com wrote:
> (Please cc me directly, since I am not a subscriber to this list)
> 
> Hi,
> 
> I discovered a memory leak in the mmap raw packet socket implementation, 
> specifically when the client application core dumps (elf format). More 
> specifically, the entire ring buffer memory leaks under such 
> circumstances. I have reproduced the problem with our custom app, which 
> links with Phil Wood's mmap version of libpcap (
> http://plublic.lanl.gov/cpw) as well as with tcpdump binding against the 
> same library. I have asserted the problem presence in kernels 2.4.26 and 
> 2.4.27. It has been resolved in kernel 2.6.8.
> 
> In a nutshell, the reference count of the ring buffer page frame is not 
> handled properly by the core dump function elf_core_dump() in 
> fs/binfmt_elf.c. After exiting this function, all ring buffer page frames 
> reference counts have been increased by one, which impedes their release 
> when the socket is closed. 
> 
> The reference count mishap is related to the fact that the ring buffer 
> page frames have their PG_reserved bit set when they are created by the 
> raw packet socket. According to the comments in mm.h, this is to ensure 
> the pages don't get swapped out to disk. elf_core_dump() calls 
> get_user_pages(), which increments the reference count, irrespective of 
> the PG_reserved bit, while the subsequent put_page() call only decrements 
> it if the PG_reserved bit is clear.
> 
> According to a similar kernel mailing list thread (
> http://www.ussg.iu.edu/hypermail/linux/kernel/0311.3/0495.html), the page 
> reference count field is not significant if the PG_reserved bit is set. 
> Applied to our particular case, the raw packet socket implementation 
> should unconditionally reset the reference count when the socket is 
> closed, so that the memory gets released. However, I fear this would cause 
> user space applications to crash: according to the mmap man page, the 
> memory mapping should survive the socket closure:
> 
> "The  munmap  system call deletes the mappings for the specified address
> range, and causes further references to addresses within the  range  to
> generate  invalid  memory references.  The region is also automatically
> unmapped when the process is terminated.  On the  other  hand,  closing
> the file descriptor does not unmap the region."
> 
> So, on 2.4 kernels, we are facing the following situation:
> 
> 1- The reference count information IS important to control the ring buffer 
> lifetime
> 2- The PG_reserved bit causes the reference count not to be handled 
> properly by different kernel subsystems, the core dump functions being one 
> of them
> 
> >From what I could understand, 2.6 developer fixed the get_user_pages() 
> code so it does not increment to reference count if the page has the 
> PG_reserved bit set. I don't know if this is readily applicable to 2.4 
> kernels, since it may break some other subsystem that rely on its current 
> behavior.
> 
> So, unless a better solution is found, I would propose a localized fix for 
> the leak, in function elf_core_dump(), starting at line 1275  (comments 
> are my own) :
> 
> if (get_user_pages(current, current->mm, addr, 1, 0, 1,   /*BG: this 
> ALWAYS will increment the ref count*/
>                    &page, &vma) <= 0) {
>        DUMP_SEEK (file->f_pos + PAGE_SIZE);
> } else {
>        if (page == ZERO_PAGE(addr)) {
>               DUMP_SEEK (file->f_pos + PAGE_SIZE);
>     } else {
>             void *kaddr;
>             flush_cache_page(vma, addr);
>             kaddr = kmap(page);
>             DUMP_WRITE(kaddr, PAGE_SIZE);
>             flush_page_to_ram(page);
>             kunmap(page);
>     }
>     /*BG: Start of fix*/ 
>     if (PageReserved(page))
>             /* BG: just undo the ref count increase done in 
> get_user_pages*/
>             put_page_testzero(page);
>     else
>             put_page(page);
>     /*BG: end of fix */
> }

Hi Bernand, 

I prefer doing the "if (PageReserved(page)) put_page_testzero(page)" as
you propose instead of changing get_user_pages(), as there are several
users which rely on its behaviour.

I have applied your fix to the 2.4 BK tree.

Thanks.

  reply	other threads:[~2004-10-15 20:11 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-14 14:50 Memory leak in 2.4.27 kernel, using mmap raw packet sockets bgagnon
2004-10-15 18:23 ` Marcelo Tosatti [this message]
2004-10-17  2:39   ` Alan Cox
2004-10-19 14:35     ` Marcelo Tosatti
2004-10-20 18:43       ` Alan Cox
2004-10-20 23:24         ` Andrea Arcangeli
2004-10-23 14:17           ` Marcelo Tosatti
2004-11-25 15:02     ` Marcelo Tosatti
2004-11-25 20:32       ` Andrea Arcangeli
2004-11-25 17:12         ` Marcelo Tosatti
2004-11-25 23:13           ` Andrea Arcangeli
2004-11-25 19:45             ` Marcelo Tosatti
2004-11-26  1:04               ` Andrea Arcangeli
2004-11-30  4:03                 ` David S. Miller
2004-11-30  4:16                   ` Andrea Arcangeli
2004-11-30  6:11                     ` David S. Miller
2004-11-30  6:19                     ` David S. Miller
  -- strict thread matches above, loose matches on Subject: below --
2004-10-21 13:39 O.Sezer
2004-10-21 14:26 ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041015182352.GA4937@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=bgagnon@coradiant.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox