Re: [PATCH] Compound page overhaul

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: David Howells <dhowells@redhat.com>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: torvalds@osdl.org, akpm@osdl.org, hch@infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Compound page overhaul
Date: Mon, 22 Nov 2004 16:07:06 +0000	[thread overview]
Message-ID: <14391.1101139626@redhat.com> (raw)
In-Reply-To: <20041122144127.GE2714@holomorphy.com>

William Lee Irwin III <wli@holomorphy.com> wrote:

> >  (1) A new bit flag PG_compound_slave has been added. This is used to mark
> > ...
> There are a lot of ways to do these things. Most of it is bitpacking
> and dodging assumptions in other code about various fields always being
> something or other they expect (e.g. bh's vs. page->private).

I want to avoid putting magic numbers in page->private. What goes in there
could be anything, as it's up to the filesystem.

Do you have any preferences? I'd prefer to use page->mapping, I think, except
that's used for the destructor.

I should probably make a set-destructor function for hugetlbfs to call.

> A generally innocuous rearrangement. Some explanation of the advantage
> of these new bitpacking and field arrangements over the current
> arrangement may be good to have.

The only differences are:

 (1) PG_compound_slave now exists.

 (2) I'm permitting the owner of the page to do do what it will with
     page->private on the first page.

> >  (3) __page_first() is now provided to find the first page of any page set
> >      (even single page sets).
> 
> I have to question the underscores.

The underscores can be dropped if they're not wanted.

> Also, there's a commonly-used term in the superpage literature, ``head of
> the superpage'', that may be more easily recognizable for readers familiar
> with that but not Linux specifics, but that's just nomenclature and not
> particularly pressing or any kind of requirement, just a non-Linux
> precedent.

I couldn't think of a good name for it, so I settled on it being the first
page.

How about page_head()?

> __GFP_COMP was introduced because several unusual drivers allocate
> higher-order pages and then move on to free fragments of them. There's
> a small danger some others may allocate higher-order pages and then
> treat each piece as a separate entity (particularly in the freeing pass).

I wasn't aware of that. Looking at the mm code, doing a fragmentary release
would cause bad_page() to be invoked. Presumably these drivers modify the
various struct pages involved directly to keep the allocator happy.

It would be better, I think, to provide a page splitter function. Thus
allowing pages to be cut in half, and then have the two halves made into the
equivalent allocated pages.

> Sweeping affected drivers to use a fragmenting primitive may help here.

Do you know which drivers?

> >  (6) compound_page_order() is now available. This will indicate the order
> ...
> Possible, but it's likely a micro-optimization to cache the order in
> registers across function calls. The allocator is something of a ``hot
> path'' and small alterations can have noticeable effects.

Yes... but the order gets examined anyway in the free page checker, and the
second plus page structs get modified too, so I don't think it'll make much of
a difference. Plus the filesystem or driver that owned the page won't need to
keep track of the size, nor will it need to calculate it.

> >  (7) Trailing spaces have been cleaned up on lines in page_alloc.c.
> 
> I like this quite a bit. =)

	(defun trim ()
	  "Delete trailing whitespace in buffer"
	  (interactive)
	  (save-excursion
	    (goto-char (point-min))
	    (replace-regexp "[ \t\r]+$" "")
	    (goto-char (point-max))
	    (skip-chars-backward "\n")
	    (if (not (eobp))
		(delete-region (1+ (point)) (point-max)))))

> >  (8) bootmem.c now has a separate path to release pages to the main
> >      allocator
> ...
> Clearly it could merely scan the bitmap for the largest properly-sized,
> properly-aligned leading run of free bits beyond even that, though I
> wouldn't expect you to pursue that as it's far beyond the scope of the
> patch. I was hit up to deal with bootmem.c issues, and will be looking
> into that and more after the current set of bootmem changes has settled
> down and ia64 bootstrap has been stable for a while.

I may look at doing this after this patch (or similar) goes in. If so, I'll
send you the patch.

> (2) The physaddr alignment comment in bootmem.c is mangled. It's not
> 	O(LOG2(BITS_PER_LONG)) -aligned, it's exactly LOG2(BITS_PER_LONG)
> 	aligned. But we don't have a LOG2(...) macro, we have fls()/ffs().

I suspect that's meant to be mathematical notation, not strictly compilable
code, though I think there may be a missing "if" in it.

> (3) page_count() probably deserves the %0*lx treatment in __bad_page().
> 	Conserving screenspace when possible helps some, though that's
> 	offset a bit against predictable field alignment. Maybe putting
> 	variable-length fields at the end of the line would help.

I don't think that matters too much. This message should never be seen, after
all...

That said, I think I should probably provide a 64-bit version too... some of
the fields will have 16-char widths there.

>       Also, the pfn would be great to have there, too, while you're at it.

Okay.

> (4) I wonder if anyone's run with CONFIG_DEBUG_PAGEALLOC recently.
> 	bootmem.c seems a bit early for kernel_map_pages() et al.
> 	It could be okay depending.

I can try it. BTW, should the free page checking be contingent on this option?
Or maybe it should have its own option.

> (5) This patch does a fair number of different things and it takes a
> 	bit of effort to wade through some of the longer rearrangements
> 	as they overflow 80x24. It would be helpful for reviewers if you
> 	could break this down into a somewhat more easily-digestible
> 	series of smaller patches.

It's a little tricky to break it up logically since it's mostly incredibly
interrelated.

I could separate out some of the cleanups: rearrangement between files,
trailing space splatting.

David

next prev parent reply	other threads:[~2004-11-22 16:40 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-11-22 13:27 [PATCH] Compound page overhaul David Howells
2004-11-22 14:41 ` William Lee Irwin III
2004-11-22 16:07   ` David Howells [this message]
2004-11-22 16:34     ` William Lee Irwin III
2004-11-22 23:54 ` Andrew Morton
2004-11-23  9:18   ` David Howells
2004-11-23 16:11     ` Andrew Morton
2004-11-23 16:48       ` David Howells
2004-11-23 16:56         ` Andrew Morton
2004-11-23 17:48           ` David Howells
2004-11-23 17:10       ` William Lee Irwin III
2004-11-23 17:24         ` David Howells
2004-11-23 17:46           ` William Lee Irwin III
2004-11-23 17:51             ` David Howells
2004-11-24 14:22         ` Greg Ungerer
2004-11-24 18:03           ` David Howells
2004-11-25  3:37             ` Greg Ungerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14391.1101139626@redhat.com \
    --to=dhowells@redhat.com \
    --cc=akpm@osdl.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox