All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Hugh Dickins <hugh@veritas.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] 18pre7aa1 mm init
Date: Thu, 31 Jan 2002 18:51:44 +0100	[thread overview]
Message-ID: <20020131185144.H1309@athlon.random> (raw)
In-Reply-To: <20020131023912.U1309@athlon.random> <Pine.LNX.4.21.0201311209460.1021-100000@localhost.localdomain>
In-Reply-To: <Pine.LNX.4.21.0201311209460.1021-100000@localhost.localdomain>; from hugh@veritas.com on Thu, Jan 31, 2002 at 02:15:41PM +0000

On Thu, Jan 31, 2002 at 02:15:41PM +0000, Hugh Dickins wrote:
> On Thu, 31 Jan 2002, Andrea Arcangeli wrote:
> > On Wed, Jan 30, 2002 at 09:16:05PM +0000, Hugh Dickins wrote:
> > 
> > > 2. You calculate nr_pte one too few since start is not PMD_SIZE
> > >    aligned, so final page table still discontiguous in some cases
> > >    e.g. when it finds it does need to allocate another pmd.
> > 
> > I don't see this one. nr_pte = (end - start + PMD_SIZE - 1) >>
> > PMD_SHIFT. This is the max number of pagetables necessary for the whole
> > array pointed by pkmap_page_table. I don't see any problem in the nr_pte
> > calculation.
> 
> Let's do it with some actual numbers: 1GB uvirtual 2GB physical PAE SMP,
> VMALLOC_START 0xc0800000, PKMAP_BASE 0xff335000, FIXADDR_START 0xfff36000,
> PMD_SIZE 0x00200000, LAST_PKMAP 1024, KM_NR_SERIES 3.  The contiguous call
> to fixrange_init then has start 0xff335000 and end 0xfff35000, calculates
> nr_pte as 6, is given contiguous physical pages 5,6,7,8,9,10; page 11 is
> allocated to topmost pmd in pgd[3]; and page 12 is allocated to topmost
> page table in your next fixrange_init on FIXADDR_START.  pkmap_page_table
> array begins at 0x400059a8 and has size 0x6000 so last byte at 0x4000b9a7:
> so kmap_pagetable2() will in due course corrupt the topmost pmd (page 11).
> 
> The (end - start) to fixrange_init was a multiple of PMD_SIZE, but start
> was _not_ a multiple of PMD_SIZE, so nr_pte should have been 7 not 6.
> 
> That's easy enough to correct, but it seems so misguided to keep adding

this should correct it:

--- 2.4.18pre7aa2/arch/i386/mm/init.c.~1~	Thu Jan 31 04:16:52 2002
+++ 2.4.18pre7aa2/arch/i386/mm/init.c	Thu Jan 31 18:29:23 2002
@@ -179,6 +179,8 @@
 	if (start & ~PAGE_MASK)
 		BUG();
 
+	start &= PMD_MASK;
+
 	i = __pgd_offset(start);
 	j = __pmd_offset(start);
 	pgd = pgd_base + i;

> further refinements (uglinesses!) to fixrange_init, when it can all be
> done so much more simply and safely - fewer lines of code, less room
> for bugs (this is all __init code, so I can't argue by saving memory).
> Allocate init_mm's pmds upfront at the beginning of pagetable_init,
> allocate directmap pagetables in the middle of pagetable_init (just
> as before), allocate the high kmap and fixmap pagetables at the end.

and if you change some #define it will waste ram etc.. it's less
generic, less optimized and it makes more assumptions without checking
those assumption can really be trusted.

> 
> > If feel much safer with lots of bugchecks there, they're zero cost and
> > in case somebody changes some #define, he will get a not booting kernel
> > and he will be able to debug it with an early printk patch (in any case
> > he won't ship patches that later breaks at runtime). I fall into the
> > "somebody" category as well of course :).
> 
> I'm not generally against putting in BUG()s (the ones in free_pages_ok,
> for example, are very very useful); but the ones you have here failed
> to catch this error, even if they caught it you'd have to go back with
> early printks and other stuff to work it out, and most of the BUG()s
> in this module smell of "I'm not sure what I'm doing here, but if I
> throw in enough BUG()s and scrape through, then it must be alright":
> leftovers from initial development, now guards against falling into
> a parallel universe, rather than checks against plausible errors.

Ok.

> 
> > > 3. If 1GB uvirtual 1GB physical HIGHMEM64G then pgd[2]
> > >    will remain empty_zero_page, and vmallocs will fail.
> > 
> > I don't see this one as well. At least with the current definition of
> > VMALLOC_SIZE. the vmalloc cames befor the PKMAP_START and vmalloc start
> > will remain above 3G, no matter where the PAGE_OFFSET is. So it will
> > always be in the pgd[3] that is always allocated by the fixrange_init
> > (as soon as I fix the pgd_none check for PAE at least :).
> > 
> > But yes, I'd also prefer if we could set VMALLOC_SIZE to 2G when 3G are
> > available, so that's a possible improvement.
> 
> I can't find definition of VMALLOC_SIZE, but never mind.  1GB uvirtual
> 1GB physical PAE SMP, VMALLOC_START 0x80800000, PKMAP_BASE 0xff335000,
> FIXADDR_START 0xfff36000.  pagetable_init initializes pgd_base[0,1,2,3]
> with the empty_zero_page.  Setting directmap allocs pmd for pgd_base[1]
> only; first fixrange_init allocs pmd for pgd_base[3]; pgd_base[0] rightly
> remains empty_zero_page; pgd_base[2] wrongly remains empty_zero_pageC.

correct. I remebered wrong that vmalloc_size is fixed to some houndred
mbytes, while it extends after the direct mapping, all right.

> 
> vmalloc uses a virtual address space across pgd_base[2] and pgd_base[3].
> But I was wrong to say it would fail: now you've forced me to try it out
> in practice, I find that vmalloc is successfully using empty_zero_page
> as its pmd.  So, for example, even /dev/zero can no longer be relied
> upon to give you a clean sheet, there's a blot near the beginning of
> the page.  Hmm, time for the reset button!
> 
> > > Imagine my distress if next time around I discover you've
> > > fixed those three with even more code in fixrange_init!
> > 
> > don't worry, you will certainly find parts of it merged :). See below
> > for some other comment.
> 
> Aaargh!  Forget the parts, grab the whole, really it's painless!
> We have been using that code (well, the version before I minimized
> changes in deference to you) on many machines for almost a year.

I'm not very interested in making those changes, and I don't feel doing
the right thing by replacing the generic and pedantic code, with the
hardcoded assumptions version even if simpler.  Anyways we're really
only arguing about a few lines of code, the rest of the fixes for 1G/2G
is just merged of course.

Andrea

      reply	other threads:[~2002-01-31 17:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-01-30 21:16 [PATCH] 18pre7aa1 mm init Hugh Dickins
2002-01-31  1:39 ` Andrea Arcangeli
2002-01-31 14:15   ` Hugh Dickins
2002-01-31 17:51     ` Andrea Arcangeli [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020131185144.H1309@athlon.random \
    --to=andrea@suse.de \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.