All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
	bgagnon@coradiant.com,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <andrea@novell.com>,
	davem@redhat.com
Subject: Re: Memory leak in 2.4.27 kernel, using mmap raw packet sockets
Date: Thu, 25 Nov 2004 21:32:48 +0100	[thread overview]
Message-ID: <20041125203248.GD5904@dualathlon.random> (raw)
In-Reply-To: <20041125150206.GF16633@logos.cnet>

On Thu, Nov 25, 2004 at 01:02:06PM -0200, Marcelo Tosatti wrote:
> On Sun, Oct 17, 2004 at 03:39:26AM +0100, Alan Cox wrote:
> > On Gwe, 2004-10-15 at 19:23, Marcelo Tosatti wrote:
> > > I prefer doing the "if (PageReserved(page)) put_page_testzero(page)" as
> > > you propose instead of changing get_user_pages(), as there are several
> > > users which rely on its behaviour.
> > > 
> > > I have applied your fix to the 2.4 BK tree.
> > 
> > That isnt sufficient. Consider anything else taking a reference to the
> > page and the refcount going negative. And yes 2.6.x has this problem and
> > far worse in some ways, but it also has the mechanism to fix it.
> > 
> > 2.6.x uses VM_IO as a VMA flag which tells the kernel two things
> > a) get_user_pages fails on it
> > b) core dumping of it is forbidden
> > 
> > 2.6.x is missing a whole pile of these (fixed in the 2.6.9-ac tree I'm
> > putting together). I *think* remap_page_range() in 2.6.x can just set
> > VM_IO, but older kernels didn't pass the vma so all the users would need
> > fixing (OSS audio, media/video, usb audio, usb video, frame buffer
> > etc).
> 
> I dont see any practical problem with 2.4.x right now.
> 
> get_user_pages() wont be called on driver created VMA's with PageReserved 
> pages because of the VM_IO bit which is set at remap_page_range(). 
> 
> Its not possible to have any vma mapped by a driver without VM_IO set.
> 
> But the network packet mmap was an isolated case, so I'll apply Andrea's 
> fix just for safety, although I can't find any offender in the tree.
> 
> 
> --- memory.c    2004-10-22 15:58:28.000000000 -0200
> +++ memory.c  2004-10-28 14:32:26.585813200 -0200
> @@ -499,7 +499,7 @@
>                                 /* FIXME: call the correct function,
>                                  * depending on the type of the found page
>                                  */
> -                               if (!pages[i])
> +                               if (!pages[i] || PageReserved(pages[i]))
>                                         goto bad_page;
>                                 page_cache_get(pages[i]);
>                         }

this needs to be modified to take the ZERO_PAGE(start) into account.
It's a minor detail, but we should allow that. (my 2.4-aa tree was
already checking the zeropage for other reasons, so it didn't need any
change)

in short the above fix should be modified like this:

	if (!pages[i] || PageReserved(pages[i])) {
		if (pages[i] != ZERO_PAGE(start))
			goto bad_page;
	} else
		page_cache_get(pages[i]);

the zero page is guaranteed to remain reserved. the major bug is that
get_user_pages was allowing _temporarily_ reserved pages to be pinned.
__free_pages isn't checking the VM_IO, and as such get_user_pages should
be robust against its __free_pages counterpart, this is why I believe
the above fix is the right fix.

ZEROPAGE is special because:

1) it's guaranteed to never be unpinned
2) it's the only reserved page that handle_mm_fault is allowed to
istantiate for a filesystem data mapping

The VM_IO enforcment is a nice improvement on top of the above.

Checking the PageReserved bitflag is a good thing at least for the
zeropage, so we don't overflow the zeropage count, which isn't nice.

If you really want to fix it only using VM_IO, I still recommend to
apply the above patch, and to turn 'goto bad_page' into BUG().
It'd be a bad idea not to at least add the above code as a robustness
check.

Comments welcome. thanks.

  reply	other threads:[~2004-11-26 23:39 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-14 14:50 Memory leak in 2.4.27 kernel, using mmap raw packet sockets bgagnon
2004-10-15 18:23 ` Marcelo Tosatti
2004-10-17  2:39   ` Alan Cox
2004-10-19 14:35     ` Marcelo Tosatti
2004-10-20 18:43       ` Alan Cox
2004-10-20 23:24         ` Andrea Arcangeli
2004-10-23 14:17           ` Marcelo Tosatti
2004-11-25 15:02     ` Marcelo Tosatti
2004-11-25 20:32       ` Andrea Arcangeli [this message]
2004-11-25 17:12         ` Marcelo Tosatti
2004-11-25 23:13           ` Andrea Arcangeli
2004-11-25 19:45             ` Marcelo Tosatti
2004-11-26  1:04               ` Andrea Arcangeli
2004-11-30  4:03                 ` David S. Miller
2004-11-30  4:16                   ` Andrea Arcangeli
2004-11-30  6:11                     ` David S. Miller
2004-11-30  6:19                     ` David S. Miller
  -- strict thread matches above, loose matches on Subject: below --
2004-10-21 13:39 O.Sezer
2004-10-21 14:26 ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041125203248.GD5904@dualathlon.random \
    --to=andrea@suse.de \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=andrea@novell.com \
    --cc=bgagnon@coradiant.com \
    --cc=davem@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo.tosatti@cyclades.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.