From: Andrea Arcangeli <andrea@suse.de>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
bgagnon@coradiant.com,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andrea Arcangeli <andrea@novell.com>,
davem@redhat.com
Subject: Re: Memory leak in 2.4.27 kernel, using mmap raw packet sockets
Date: Thu, 25 Nov 2004 21:32:48 +0100 [thread overview]
Message-ID: <20041125203248.GD5904@dualathlon.random> (raw)
In-Reply-To: <20041125150206.GF16633@logos.cnet>
On Thu, Nov 25, 2004 at 01:02:06PM -0200, Marcelo Tosatti wrote:
> On Sun, Oct 17, 2004 at 03:39:26AM +0100, Alan Cox wrote:
> > On Gwe, 2004-10-15 at 19:23, Marcelo Tosatti wrote:
> > > I prefer doing the "if (PageReserved(page)) put_page_testzero(page)" as
> > > you propose instead of changing get_user_pages(), as there are several
> > > users which rely on its behaviour.
> > >
> > > I have applied your fix to the 2.4 BK tree.
> >
> > That isnt sufficient. Consider anything else taking a reference to the
> > page and the refcount going negative. And yes 2.6.x has this problem and
> > far worse in some ways, but it also has the mechanism to fix it.
> >
> > 2.6.x uses VM_IO as a VMA flag which tells the kernel two things
> > a) get_user_pages fails on it
> > b) core dumping of it is forbidden
> >
> > 2.6.x is missing a whole pile of these (fixed in the 2.6.9-ac tree I'm
> > putting together). I *think* remap_page_range() in 2.6.x can just set
> > VM_IO, but older kernels didn't pass the vma so all the users would need
> > fixing (OSS audio, media/video, usb audio, usb video, frame buffer
> > etc).
>
> I dont see any practical problem with 2.4.x right now.
>
> get_user_pages() wont be called on driver created VMA's with PageReserved
> pages because of the VM_IO bit which is set at remap_page_range().
>
> Its not possible to have any vma mapped by a driver without VM_IO set.
>
> But the network packet mmap was an isolated case, so I'll apply Andrea's
> fix just for safety, although I can't find any offender in the tree.
>
>
> --- memory.c 2004-10-22 15:58:28.000000000 -0200
> +++ memory.c 2004-10-28 14:32:26.585813200 -0200
> @@ -499,7 +499,7 @@
> /* FIXME: call the correct function,
> * depending on the type of the found page
> */
> - if (!pages[i])
> + if (!pages[i] || PageReserved(pages[i]))
> goto bad_page;
> page_cache_get(pages[i]);
> }
this needs to be modified to take the ZERO_PAGE(start) into account.
It's a minor detail, but we should allow that. (my 2.4-aa tree was
already checking the zeropage for other reasons, so it didn't need any
change)
in short the above fix should be modified like this:
if (!pages[i] || PageReserved(pages[i])) {
if (pages[i] != ZERO_PAGE(start))
goto bad_page;
} else
page_cache_get(pages[i]);
the zero page is guaranteed to remain reserved. the major bug is that
get_user_pages was allowing _temporarily_ reserved pages to be pinned.
__free_pages isn't checking the VM_IO, and as such get_user_pages should
be robust against its __free_pages counterpart, this is why I believe
the above fix is the right fix.
ZEROPAGE is special because:
1) it's guaranteed to never be unpinned
2) it's the only reserved page that handle_mm_fault is allowed to
istantiate for a filesystem data mapping
The VM_IO enforcment is a nice improvement on top of the above.
Checking the PageReserved bitflag is a good thing at least for the
zeropage, so we don't overflow the zeropage count, which isn't nice.
If you really want to fix it only using VM_IO, I still recommend to
apply the above patch, and to turn 'goto bad_page' into BUG().
It'd be a bad idea not to at least add the above code as a robustness
check.
Comments welcome. thanks.
next prev parent reply other threads:[~2004-11-26 23:39 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-10-14 14:50 Memory leak in 2.4.27 kernel, using mmap raw packet sockets bgagnon
2004-10-15 18:23 ` Marcelo Tosatti
2004-10-17 2:39 ` Alan Cox
2004-10-19 14:35 ` Marcelo Tosatti
2004-10-20 18:43 ` Alan Cox
2004-10-20 23:24 ` Andrea Arcangeli
2004-10-23 14:17 ` Marcelo Tosatti
2004-11-25 15:02 ` Marcelo Tosatti
2004-11-25 20:32 ` Andrea Arcangeli [this message]
2004-11-25 17:12 ` Marcelo Tosatti
2004-11-25 23:13 ` Andrea Arcangeli
2004-11-25 19:45 ` Marcelo Tosatti
2004-11-26 1:04 ` Andrea Arcangeli
2004-11-30 4:03 ` David S. Miller
2004-11-30 4:16 ` Andrea Arcangeli
2004-11-30 6:11 ` David S. Miller
2004-11-30 6:19 ` David S. Miller
-- strict thread matches above, loose matches on Subject: below --
2004-10-21 13:39 O.Sezer
2004-10-21 14:26 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041125203248.GD5904@dualathlon.random \
--to=andrea@suse.de \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=andrea@novell.com \
--cc=bgagnon@coradiant.com \
--cc=davem@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.tosatti@cyclades.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox