public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Andrew Morton <akpm@osdl.org>
Cc: davidm@snapgear.com, gerg%snapgear.com.wli@holomorphy.com,
	linux-kernel@vger.kernel.org, uclinux-dev@uclinux.org
Subject: Re: [PATCH 2/5] NOMMU: High-order page management overhaul
Date: Mon, 13 Dec 2004 16:32:06 +0000	[thread overview]
Message-ID: <13399.1102955526@redhat.com> (raw)
In-Reply-To: <20041210130137.432edacb.akpm@osdl.org>


Andrew Morton <akpm@osdl.org> wrote:

> I think I was the original "use compound pages" culprit.

You were, but several other people have chimed in since.

> But when I realised that nommu needs access to fields in the sub-pages which
> are currently used for compound page metadata I withdrew into the "if what's
> there now works, stick with it" camp.

The nommu stuff only needs access to a flag or two (PG_compound or
PG_compound_slave) and the refcount. I don't believe that any of the stuff
that pins secondary pages for userspace's benefit cares about anything else.

And, apart from that, as far as kernel side code is concerned, high-order
pages should be dealt with as high-order pages, or they should be properly
split and used as arrays of pages.

> >  (2) Splitting high-order pages has to be done differently on MMU vs
> >      NOMMU.
> 
> Oh.  Why?

There are three cases of splitting that I can think of:

 (1) Split down to zero-order pages. I think this can be handled the same in
     both cases, since _every_ secondaty page needs reinitialisation.

     Note that I'm ignoring the case of a secondary page already being
     pinned. That is one case where the old way is superior _ASSUMING_ the
     counts on the secondary pages are incremented, not just set to 1.

     However, if a high-order page is being split after being exposed to
     userspace, the driver writer probably deserves everything they get:-)

 (2) Split down to smaller high-order pages. If a driver doing this just
     reinitialises the first page of every chunk, it'll probably be okay,
     _provided_ it doesn't touch the secondary pages. If it does do that - say
     by initialising the size to zero, the whole thing is likely to explode.

 (3) Splitting compound pages. Obviously, if a driver requests a compound
     page, it should be able to handle dissociation into lower-order compound
     pages or zero-order pages. I'd argue that the core kernel should provide
     a function to do this.

So, case (2) is potentially problematical.

> The current code (which pins each subpage individually) seems robust
> enough.

Maybe.

> I assume that nommu will thenceforth simply treat the region as an
> array of zero-order pages.

That depends what you mean by "nommu". It's actually the common bits that
thenceforth treat high-order pages as individual pages, be they compound pages
from hugetlbfs, single pages from the page cache or high-order pages from the
slab allocator or alloc_pages().

> >  (5) Abstraction of some compound page related functions, including a way to
> >      make it more efficient to access the first page (PG_compound_slave).
> 
> If there is any way at all in which we can avoid consuming another page
> flag then we should do so.  There are various concepts (many zones,
> advanced page aging algorithms) which would be unfeasible if there are not
> several more bits available in ->flags.   And they continue to dribble away.

There is. We can move the current occupant of the compound-second struct
page's mapping into page[1].lru and stick a unique magic value in there.

	[mm/page_alloc.c]
	const char compound_page_slave_magic[4];

	[include/linux/mm.h]
	extern const char compound_page_slave_magic[];
	#define COMPOUND_PAGE_SLAVE_MAGIC \
		((struct address space *) &compound_page_slave[3])

	#define PageCompoundSlave(page) \
		((page)->mapping == COMPOUND_PAGE_SLAVE_MAGIC)

	#define SetPageCompoundSlave(page) \
	do { \
		BUG_ON((page)->mapping); \
		(page)->mapping = COMPOUND_PAGE_SLAVE_MAGIC; \
	} while(0)

	#define ClearPageCompoundSlave(page) \
	do { \
		BUG_ON(!PageCompoundSlave(page)); \
		(page)->mapping = NULL; \
	} while(0)

This would have a useful property of causing a misalignment exception
(assuming it's not the i386 arch) if someone tries to access the mapping.

Andrew Morton <akpm@osdl.org> wrote:

> But there's nothing actually *essential* here, is there?  No bugs are
> fixed?

Well, I feel it's more robust. I can't say that it _definitely_ fixes any
bugs, but I can see how they could happen.

> > I think the drivers need a good auditing too. A lot of them allocate
> > high-order pages for various uses, some for use as single units, and some
> > for use as arrays of pages.
> 
> I think an ARM driver is freeing zero-order pages within a higher-order
> page.  But as long as the driver didn't set __GFP_COMP then the higher
> order page is not compound, and that splitting treatment is appropriate.

I'd changed my patch to honour __GFP_COMP. However, such driver should
probably be changed to call a splitting function in mm/page_alloc.c. This sort
of thing is definitely the territory of the master mm routines.

It might be worth adding a new allocator routine that takes arguments along
the lines of calloc() - so that you ask for 2^N pages of 2^M size. This would
allow the allocator to initialise everything correctly up front.

David

  reply	other threads:[~2004-12-13 16:33 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <7ad0b24c-4955-11d9-8e19-0002b3163499@redhat.com>
     [not found] ` <200412082012.iB8KCTBK010123@warthog.cambridge.redhat.com>
     [not found]   ` <20041209141718.6acec9ee.akpm@osdl.org>
2004-12-10 15:45     ` [PATCH 2/5] NOMMU: High-order page management overhaul David Howells
2004-12-10 21:01       ` Andrew Morton
2004-12-13 16:32         ` David Howells [this message]
2004-12-09 15:08 [PATCH 1/5] NOMMU: MM cleanups dhowells
2004-12-09 15:08 ` [PATCH 2/5] NOMMU: High-order page management overhaul dhowells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13399.1102955526@redhat.com \
    --to=dhowells@redhat.com \
    --cc=akpm@osdl.org \
    --cc=davidm@snapgear.com \
    --cc=gerg%snapgear.com.wli@holomorphy.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=uclinux-dev@uclinux.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox