public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* The IBM order relaxation patch
@ 2002-02-06 19:13 Pete Zaitcev
  0 siblings, 0 replies; 16+ messages in thread
From: Pete Zaitcev @ 2002-02-06 19:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: zaitcev

Hi,

I had a look at an IBM patch, which is described thus:

  - Order 2 allocation relief

  Symptom:  Under stress and after long uptimes of a 64 bit system
            the error message "__alloc_pages: 2-order allocation failed."
            appears and either the fork of a new process fails or an
            active process dies.

  Problem:  The order 2 allocation problem is based in the size of the
            region and segement tables as defined by the zSeries
            architecture. A full region or segment table in 64 bit mode
            takes 16 KB of contigous real memory. The page allocation
            routines do not guarantee that a higher order allocation
            will succeed due to memory fragmentation.

  Solution: The order 2 allocation fix is supposed to reduce the number
            of order 2 allocations for the region and segment tables to
            a minimum. To do so it uses a feature of the architecture
            that allows to create incomplete region and segment tables.
            In almost all cases a process does not need full region or
            segment tables. If a full region or segment table is needed
            it is reallocated to the full size.

  This patch is very s/390 specific and breaks all other architectures.
  <<they meant "zSeries specific", surely --zaitcev>>

It's a stupid question, but: why can we not simply
wait until a desired unfragmented memory area is available,
with a GPF flag? What they describe does not happen in an
interrupt context, so we can sleep.

And another one: why not to increase a kernel-visible or "soft"
page size to 16KB for zSeries? It's a 64 bits platform. There
will be some increase in fragmentation, but nobody measured it.
Perhaps it's not going to be severe. It may even improve paging
efficiency.

-- Pete

P.S. The patch itself is at:
 http://www10.software.ibm.com/developerworks/opensource/linux390/alpha_src/linux-2.4.7-order2-3.tar.gz

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: The IBM order relaxation patch
@ 2002-02-06 21:50 Ulrich Weigand
  2002-02-07  0:18 ` Alan Cox
  2002-02-07 14:12 ` Daniel Phillips
  0 siblings, 2 replies; 16+ messages in thread
From: Ulrich Weigand @ 2002-02-06 21:50 UTC (permalink / raw)
  To: zaitcev; +Cc: linux-kernel

Pete Zaitcev wrote:

> This patch is very s/390 specific and breaks all other architectures.
>  <<they meant "zSeries specific", surely --zaitcev>>

B.t.w. Martin found a way to make the patch less intrusive so
that it won't break other archs any more ...

>It's a stupid question, but: why can we not simply
>wait until a desired unfragmented memory area is available,
>with a GPF flag? What they describe does not happen in an
>interrupt context, so we can sleep.

Because nobody even *tries* to free adjacent pages to build up
a free order-2 area.  You could wait really long ...

This looks hard to fix with the current mm layer.  Maybe Rik's
rmap method could help here, because with reverse mappings we
can at least try to free adjacent areas (because we then at least
*know* who's using the pages).

>And another one: why not to increase a kernel-visible or "soft"
>page size to 16KB for zSeries? It's a 64 bits platform. There
>will be some increase in fragmentation, but nobody measured it.
>Perhaps it's not going to be severe. It may even improve paging
>efficiency.

Because then we can mmap() to user space only on 16KB boundaries.
This is a problem in particular for the 31-bit emulation layer,
as 31-bit binaries are laid out on 4KB boundaries by the linker,
so you really need to be able to mmap() on 4KB boundaries.

One way to fix this could be to allow user space mappings on a
different granularity than the 'page size' for the allocator.
(Is this what PAGE_SIZE vs. PAGE_CACHE_SIZE had been intended
for, maybe?  It doesn't work at the moment in any case.)


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand
  Linux for S/390 Design & Development
  IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen
  Phone: +49-7031/16-3727   ---   Email: Ulrich.Weigand@de.ibm.com


^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: The IBM order relaxation patch
@ 2002-02-07 15:05 Ulrich Weigand
  2002-02-07 15:13 ` Rik van Riel
  2002-02-07 17:57 ` Daniel Phillips
  0 siblings, 2 replies; 16+ messages in thread
From: Ulrich Weigand @ 2002-02-07 15:05 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Daniel Phillips, zaitcev, linux-kernel


Rik van Riel wrote:

>On Thu, 7 Feb 2002, Daniel Phillips wrote:
>
>> Yes, that's one of leading reasons for wanting rmap.  (Number one and
>> two reasons are: allow forcible unmapping of multiply referenced pages
>> for swapout; get more reliable hardware ref bit readings.)
>
>It's still on my TODO list.  Patches are very much welcome
>though ;)

On s390 we have per physical page hardware referenced / changed bits.
In the rmap framework, it should also be possible to make more efficient
use of these ...


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand
  Linux for S/390 Design & Development
  IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen
  Phone: +49-7031/16-3727   ---   Email: Ulrich.Weigand@de.ibm.com


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2002-02-09 20:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-06 19:13 The IBM order relaxation patch Pete Zaitcev
  -- strict thread matches above, loose matches on Subject: below --
2002-02-06 21:50 Ulrich Weigand
2002-02-07  0:18 ` Alan Cox
2002-02-07  4:01   ` David S. Miller
2002-02-07 12:16     ` Rik van Riel
2002-02-07 12:29       ` David S. Miller
2002-02-07 12:42         ` Rik van Riel
2002-02-07 12:58         ` Hugh Dickins
2002-02-07 14:12 ` Daniel Phillips
2002-02-07 14:55   ` Rik van Riel
2002-02-07 15:07     ` Daniel Phillips
2002-02-07 15:10       ` David S. Miller
2002-02-09 20:21   ` Alex Bligh - linux-kernel
2002-02-07 15:05 Ulrich Weigand
2002-02-07 15:13 ` Rik van Riel
2002-02-07 17:57 ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox