Re: [PATCH] Physical Memory Management [0/1]

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] Physical Memory Management [0/1]
       [not found] <op.utu26hq77p4s8u@amdc030>
@ 2009-05-13 22:11 ` Andrew Morton
  2009-05-14  9:00   ` Michał Nazarewicz
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2009-05-13 22:11 UTC (permalink / raw)
  To: Micha__ Nazarewicz; +Cc: linux-kernel, m.szyprowski, kyungmin.park, linux-mm


(cc linux-mm)

(please keep the emails to under 80 columns)

On Wed, 13 May 2009 11:26:31 +0200
Micha__ Nazarewicz <m.nazarewicz@samsung.com> wrote:

> In the next message a patch which allows allocation of large continuous blocks of physical memory will be sent.  This functionality makes it similar to bigphysarea, however PMM has many more features:
>  
> 1. Each allocated block of memory has a reference counter so different kernel modules may share the same buffer with a well known get/put semantics.
>  
> 2. It aggregates physical memory allocating and management API in one place. This is good because there is a single place to debug and test for all devices. Moreover, because each device does not need to reserve it's own area of physical memory a total size of reserved memory is smaller. Say, we have 3 accelerators. Each of them can operate on 1MiB blocks, so each of them would have to reserve 1MiB for itself (this means total of 3MiB of reserved memory). However, if at most two of those devices can be used at once, we could reserve 2MiB saving 1MiB.
>  
> 3. PMM has it's own allocator which runs in O(log n) bound time where n is total number of areas and free spaces between them -- the upper time limit may be important when working on data sent in real time (for instance an output of a camera).  Currently a best-fit algorithm is used but you can easily replace it if it does not meet your needs. 
>  
> 4. Via a misc char device, the module allows allocation of continuous blocks from user space. Such solution has several advantages. In particular, other option would be to add a allocation calls for each individual devices (think hardware accelerators) -- this would double the same code in several drivers plus it would lead to inconsistent API for doing the very same think. Moreover, when creating pipelines (ie. encoded image --[decoder]--> decoded image --[scaler]--> scaled image) devices would have to develop a method of sharing buffers. With PMM user space program allocates a block and passes it as an output buffer for the first device and input buffer for the other.
>  
> 5. PMM is integrated with System V IPC, so that user space programs may "convert" allocated block into a segment of System V shared memory. This makes it possible to pass PMM buffers to PMM-unaware but SysV-aware applications. Notable example are X11. This makes it possible to deploy a zero-copy scheme when communicating with X11. For instance, image scaled in previous example could be passed directly to X server without the need to copy it to a newly created System V shared memory.
>  
> 6. PMM has a notion of memory types. In attached patch only a general memory type is defined but you can easily add more types for a given platform. To understand what in PMM terms is memory type we can use an example: a general memory may be a main RAM memory which we have a lot but it is quite slow and another type may be a portion of L2 cache configured to act as fast memory. Because PMM may be aware of those, again, allocation of different kinds of memory has a common, consistent API.

OK, let's pretend we didn't see an implementation.

What are you trying to do here?  What problem(s) are being solved? 
What are the requirements and the use cases?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-13 22:11 ` [PATCH] Physical Memory Management [0/1] Andrew Morton
@ 2009-05-14  9:00   ` Michał Nazarewicz
  2009-05-14 11:20     ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Michał Nazarewicz @ 2009-05-14  9:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, m.szyprowski, kyungmin.park, linux-mm

On Thu, 14 May 2009 00:11:42 +0200, Andrew Morton wrote:
> (please keep the emails to under 80 columns)

Yes, sorry about that. Apparently "Automatically wrap outgoing messages"
doesn't mean what I thought it does.

> Michal Nazarewicz <m.nazarewicz@samsung.com> wrote:
>> 1. Each allocated block of memory has a reference counter so different  
>> kernel modules may share the same buffer with a well known get/put  
>> semantics.
>>
>> 2. It aggregates physical memory allocating and management API in one  
>> place. This is good because there is a single place to debug and test  
>> for all devices. Moreover, because each device does not need to reserve  
>> it's own area of physical memory a total size of reserved memory is  
>> smaller. Say, we have 3 accelerators. Each of them can operate on 1MiB  
>> blocks, so each of them would have to reserve 1MiB for itself (this  
>> means total of 3MiB of reserved memory). However, if at most two of  
>> those devices can be used at once, we could reserve 2MiB saving 1MiB.
>>
>> 3. PMM has it's own allocator which runs in O(log n) bound time where n  
>> is total number of areas and free spaces between them -- the upper time  
>> limit may be important when working on data sent in real time (for  
>> instance an output of a camera).  Currently a best-fit algorithm is  
>> used but you can easily replace it if it does not meet your needs.
>>
>> 4. Via a misc char device, the module allows allocation of continuous  
>> blocks from user space. Such solution has several advantages. In  
>> particular, other option would be to add a allocation calls for each  
>> individual devices (think hardware accelerators) -- this would double  
>> the same code in several drivers plus it would lead to inconsistent API  
>> for doing the very same think. Moreover, when creating pipelines (ie.  
>> encoded image --[decoder]--> decoded image --[scaler]--> scaled image)  
>> devices would have to develop a method of sharing buffers. With PMM  
>> user space program allocates a block and passes it as an output buffer  
>> for the first device and input buffer for the other.
>>
>> 5. PMM is integrated with System V IPC, so that user space programs may  
>> "convert" allocated block into a segment of System V shared memory.  
>> This makes it possible to pass PMM buffers to PMM-unaware but  
>> SysV-aware applications. Notable example are X11. This makes it  
>> possible to deploy a zero-copy scheme when communicating with X11. For  
>> instance, image scaled in previous example could be passed directly to  
>> X server without the need to copy it to a newly created System V shared  
>> memory.
>>
>> 6. PMM has a notion of memory types. In attached patch only a general  
>> memory type is defined but you can easily add more types for a given  
>> platform. To understand what in PMM terms is memory type we can use an  
>> example: a general memory may be a main RAM memory which we have a lot  
>> but it is quite slow and another type may be a portion of L2 cache  
>> configured to act as fast memory. Because PMM may be aware of those,  
>> again, allocation of different kinds of memory has a common, consistent  
>> API.

> OK, let's pretend we didn't see an implementation.

:]

I've never said it's perfect.  I'll welcome any constructive comments.

> What are you trying to do here?  What problem(s) are being solved?
> What are the requirements and the use cases?

Overall situation: UMA embedded system and many hardware
accelerators (DMA capable, no scatter-gather).  Three use cases:

1. We have a hardware JPEG decoder, we want to decode an image.

2. As above plus we have an image scaler, we want to scale decoded
   image.

3. As above plus we want to pass scaled image to X server.

Neither decoder nor scaler may operate on malloc(3)ed areas as
they aren't continuous in physical memory.  A copying of a
scattered buffer would have to be used.  This is a performance
cost.  It also doubles memory usage.

  PMM solves this as it lets user space allocate a continuous
  buffers which the devices may use directly.

It could be solved by letting each driver allocate its own buffers
during boot time and then let user space mmap(2) them.  However, with
10 hardware accelerators each needing 1MiB buffer we need to reserve
10MiB of memory.  If we know that at most 5 devices will be used at
the same time we could've reserve 5MiB instead of 10MiB.

  PMM solves this problem since the buffers are allocated when they
  are needed.

This could be solved by letting each driver allocate buffers when
requested (using bigphysarea for instance).  It has some minor issues
like implementing mmap file operation in all drivers and inconsistent
user space API but the most significant is it's not clear how to
implement 2nd use case.  If drivers expect to work on their own
buffers, decoder's output must be copied into scaler's input buffer.

  With PMM, drivers simply expect a continuous buffers and do not
  care where they came from or if other drivers use them as well.

Now, as of 3rd use case.  X may work with System V shared memory,
however, since shared memory segments (created via shmget(2)) are
not continuous, we cannot pass it to a scaler as an output buffer.

  PMM solves it, since it allows converting an area allocated via
  PMM into a System V shared memory segment.

-- 
Best regards,                                            _     _
 .o. | Liege of Serenly Enlightened Majesty of         o' \,=./ `o
 ..o | Computer Science,  MichaA? "mina86" Nazarewicz      (o o)
 ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-14  9:00   ` Michał Nazarewicz
@ 2009-05-14 11:20     ` Peter Zijlstra
  2009-05-14 11:48       ` Michał Nazarewicz
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2009-05-14 11:20 UTC (permalink / raw)
  To: Michał Nazarewicz
  Cc: Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park,
	linux-mm

On Thu, 2009-05-14 at 11:00 +0200, MichaA? Nazarewicz wrote:
>   PMM solves this problem since the buffers are allocated when they
>   are needed.

Ha - only when you actually manage to allocate things. Physically
contiguous allocations are exceedingly hard once the machine has been
running for a while.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-14 11:20     ` Peter Zijlstra
@ 2009-05-14 11:48       ` Michał Nazarewicz
  2009-05-14 12:05         ` Peter Zijlstra
  2009-05-14 19:33         ` Andi Kleen
  0 siblings, 2 replies; 16+ messages in thread
From: Michał Nazarewicz @ 2009-05-14 11:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park,
	linux-mm

> On Thu, 2009-05-14 at 11:00 +0200, MichaA? Nazarewicz wrote:
>>   PMM solves this problem since the buffers are allocated when they
>>   are needed.

On Thu, 14 May 2009 13:20:02 +0200, Peter Zijlstra wrote:
> Ha - only when you actually manage to allocate things. Physically
> contiguous allocations are exceedingly hard once the machine has been
> running for a while.

PMM reserves memory during boot time using alloc_bootmem_low_pages().
After this is done, it can allocate buffers from reserved pool.

The idea here is that there are n hardware accelerators, each
can operate on 1MiB blocks (to simplify assume that's the case).
However, we know that at most m < n devices will be used at the same
time so instead of reserving n MiBs of memory we reserve only m MiBs.

-- 
Best regards,                                            _     _
 .o. | Liege of Serenly Enlightened Majesty of         o' \,=./ `o
 ..o | Computer Science,  MichaA? "mina86" Nazarewicz      (o o)
 ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-14 11:48       ` Michał Nazarewicz
@ 2009-05-14 12:05         ` Peter Zijlstra
  2009-05-14 13:04           ` Michał Nazarewicz
  2009-05-14 19:33         ` Andi Kleen
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2009-05-14 12:05 UTC (permalink / raw)
  To: Michał Nazarewicz
  Cc: Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park,
	linux-mm

On Thu, 2009-05-14 at 13:48 +0200, MichaA? Nazarewicz wrote:
> > On Thu, 2009-05-14 at 11:00 +0200, MichaA? Nazarewicz wrote:
> >>   PMM solves this problem since the buffers are allocated when they
> >>   are needed.
> 
> On Thu, 14 May 2009 13:20:02 +0200, Peter Zijlstra wrote:
> > Ha - only when you actually manage to allocate things. Physically
> > contiguous allocations are exceedingly hard once the machine has been
> > running for a while.
> 
> PMM reserves memory during boot time using alloc_bootmem_low_pages().
> After this is done, it can allocate buffers from reserved pool.
> 
> The idea here is that there are n hardware accelerators, each
> can operate on 1MiB blocks (to simplify assume that's the case).
> However, we know that at most m < n devices will be used at the same
> time so instead of reserving n MiBs of memory we reserve only m MiBs.

And who says your pre-allocated pool won't fragment with repeated PMM
use?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-14 12:05         ` Peter Zijlstra
@ 2009-05-14 13:04           ` Michał Nazarewicz
  2009-05-14 17:07             ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Michał Nazarewicz @ 2009-05-14 13:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park,
	linux-mm

> On Thu, 2009-05-14 at 13:48 +0200, MichaA? Nazarewicz wrote:
>>> On Thu, 2009-05-14 at 11:00 +0200, MichaA? Nazarewicz wrote:
>>>>   PMM solves this problem since the buffers are allocated when they
>>>>   are needed.

>> On Thu, 14 May 2009 13:20:02 +0200, Peter Zijlstra wrote:
>>> Ha - only when you actually manage to allocate things. Physically
>>> contiguous allocations are exceedingly hard once the machine has been
>>> running for a while.

>> PMM reserves memory during boot time using alloc_bootmem_low_pages().
>> After this is done, it can allocate buffers from reserved pool.
>>
>> The idea here is that there are n hardware accelerators, each
>> can operate on 1MiB blocks (to simplify assume that's the case).
>> However, we know that at most m < n devices will be used at the same
>> time so instead of reserving n MiBs of memory we reserve only m MiBs.

On Thu, 14 May 2009 14:05:02 +0200, Peter Zijlstra wrote:
> And who says your pre-allocated pool won't fragment with repeated PMM
> use?

Yes, this is a good question.  What's more, there's no good answer. ;)

There is no guarantee and it depends on use cases.  The biggest problem
is a lot of small buffers allocated by different applications which get
freed at different times.  However, if in most cases one or two
applications use PMM, we can assume that buffers are allocated and
freed in groups.  If that's the case, fragmentation is less likely to
occur.

I'm not claiming that PMM is panacea for all the problems present on
systems with no scatter-gather capability -- it is an attempt to gather
different functionality and existing solutions in one place which is
easier to manage and improve if needed.

Problem with allocation of continuous blocks hos no universal solution
-- you can increased reserved area but then overall performance of the
system will decrease.  PMM is trying to find a compromise between the
two.

-- 
Best regards,                                            _     _
 .o. | Liege of Serenly Enlightened Majesty of         o' \,=./ `o
 ..o | Computer Science,  MichaA? "mina86" Nazarewicz      (o o)
 ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-14 13:04           ` Michał Nazarewicz
@ 2009-05-14 17:07             ` Andrew Morton
  2009-05-14 17:10               ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2009-05-14 17:07 UTC (permalink / raw)
  To: Micha__ Nazarewicz
  Cc: peterz, linux-kernel, m.szyprowski, kyungmin.park, linux-mm

On Thu, 14 May 2009 15:04:55 +0200
Micha__ Nazarewicz <m.nazarewicz@samsung.com> wrote:

> On Thu, 14 May 2009 14:05:02 +0200, Peter Zijlstra wrote:
> > And who says your pre-allocated pool won't fragment with repeated PMM
> > use?
> 
> Yes, this is a good question.  What's more, there's no good answer. ;)
> 

We do have capability in page reclaim to deliberately free up
physically contiguous pages (known as "lumpy reclaim").

It would be interesting were someone to have a go at making that
available to userspace: ask the kernel to give you 1MB of physically
contiguous memory.  There are reasons why this can fail, but migrating
pages can be used to improve the success rate, and userspace can be
careful to not go nuts using mlock(), etc.

The returned memory would of course need to be protected from other
reclaim/migration/etc activity.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-14 17:07             ` Andrew Morton
@ 2009-05-14 17:10               ` Peter Zijlstra
  2009-05-15 10:06                 ` Michał Nazarewicz
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2009-05-14 17:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Micha__ Nazarewicz, linux-kernel, m.szyprowski, kyungmin.park,
	linux-mm

On Thu, 2009-05-14 at 10:07 -0700, Andrew Morton wrote:
> On Thu, 14 May 2009 15:04:55 +0200
> Micha__ Nazarewicz <m.nazarewicz@samsung.com> wrote:
> 
> > On Thu, 14 May 2009 14:05:02 +0200, Peter Zijlstra wrote:
> > > And who says your pre-allocated pool won't fragment with repeated PMM
> > > use?
> > 
> > Yes, this is a good question.  What's more, there's no good answer. ;)
> > 
> 
> We do have capability in page reclaim to deliberately free up
> physically contiguous pages (known as "lumpy reclaim").
> 
> It would be interesting were someone to have a go at making that
> available to userspace: ask the kernel to give you 1MB of physically
> contiguous memory.  There are reasons why this can fail, but migrating
> pages can be used to improve the success rate, and userspace can be
> careful to not go nuts using mlock(), etc.
> 
> The returned memory would of course need to be protected from other
> reclaim/migration/etc activity.

I thought we already exposed this, its called hugetlbfs ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-14 17:10               ` Peter Zijlstra
@ 2009-05-15 10:06                 ` Michał Nazarewicz
  2009-05-15 10:18                   ` Andi Kleen
  0 siblings, 1 reply; 16+ messages in thread
From: Michał Nazarewicz @ 2009-05-15 10:06 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton, Andi Kleen
  Cc: linux-kernel, m.szyprowski, kyungmin.park, linux-mm

> On Thu, 2009-05-14 at 10:07 -0700, Andrew Morton wrote:
>> We do have capability in page reclaim to deliberately free up
>> physically contiguous pages (known as "lumpy reclaim").

Doesn't this require swap?

>> It would be interesting were someone to have a go at making that
>> available to userspace: ask the kernel to give you 1MB of physically
>> contiguous memory.  There are reasons why this can fail, but migrating
>> pages can be used to improve the success rate, and userspace can be
>> careful to not go nuts using mlock(), etc.

On Thu, 14 May 2009 19:10:00 +0200, Peter Zijlstra wrote:
> I thought we already exposed this, its called hugetlbfs ;-)

On Thu, 14 May 2009 21:33:11 +0200, Andi Kleen wrote:
> You could just define a hugepage size for that and use hugetlbfs
> with a few changes to map in pages with multiple PTEs.
> It supports boot time reservation and is a well established
> interface.
>
> On x86 that would give 2MB units, on other architectures whatever
> you prefer.

Correct me if I'm wrong, but if I understand correctly, currently only
one size of huge page may be defined, even if underlaying architecture
supports many different sizes.

So now, there are two cases: (i) either define huge page size to the
largest blocks that may ever be requested and then waste a lot of
memory when small pages are requested or (ii) define smaller huge
page size but then special handling of large regions need to be
implemented.

The first solution is not acceptable, as a lot of memory may be wasted.
If for example, you have a 4 mega pixel camera you'd have to configure
4 MiB-large huge pages but in most cases, you won't be needing that
much.  Often you will work with say 320x200x2 images (125KiB) and
more then 3MiBs will be wasted!

In the later, with (say) 128 KiB huge pages no (or little) space will be
wasted when working with 320x200x2 images but then when someone would
really need 4 MiB to take a photo the very same problem we started with
will occur -- we will have to find 32 contiguous pages.

So to sum up, if I understand everything correctly, hugetlb would be a
great solution when working with buffers of similar sizes.  However, it's
not so good when size of requested buffer may vary greatly.

-- 
Best regards,                                            _     _
 .o. | Liege of Serenly Enlightened Majesty of         o' \,=./ `o
 ..o | Computer Science,  MichaA? "mina86" Nazarewicz      (o o)
 ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-15 10:06                 ` Michał Nazarewicz
@ 2009-05-15 10:18                   ` Andi Kleen
  2009-05-15 10:47                     ` Michał Nazarewicz
  0 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2009-05-15 10:18 UTC (permalink / raw)
  To: Michał Nazarewicz
  Cc: Peter Zijlstra, Andrew Morton, Andi Kleen, linux-kernel,
	m.szyprowski, kyungmin.park, linux-mm

> Correct me if I'm wrong, but if I understand correctly, currently only
> one size of huge page may be defined, even if underlaying architecture

That's not correct, support for multiple huge page sizes was recently
added. The interface is a bit clumpsy admittedly, but it's there.

However for non fragmentation purposes you probably don't
want too many different sizes anyways, the more sizes, the worse
the fragmentation. Ideal is only a single size.

> largest blocks that may ever be requested and then waste a lot of
> memory when small pages are requested or (ii) define smaller huge
> page size but then special handling of large regions need to be
> implemented.

If you don't do that then long term fragmentation will
kill you anyways. it's easy to show that pre allocation with lots
of different sizes is about equivalent what the main page allocator
does anyways.

> So to sum up, if I understand everything correctly, hugetlb would be a
> great solution when working with buffers of similar sizes.  However, it's
> not so good when size of requested buffer may vary greatly.

As Peter et.al. explained earlier varying buffer sizes don't work
anyways.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-15 10:18                   ` Andi Kleen
@ 2009-05-15 10:47                     ` Michał Nazarewicz
  2009-05-15 11:03                       ` Peter Zijlstra
  2009-05-15 11:26                       ` Andi Kleen
  0 siblings, 2 replies; 16+ messages in thread
From: Michał Nazarewicz @ 2009-05-15 10:47 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Andrew Morton, linux-kernel, m.szyprowski,
	kyungmin.park, linux-mm

>> Correct me if I'm wrong, but if I understand correctly, currently only
>> one size of huge page may be defined, even if underlaying architecture

On Fri, 15 May 2009 12:18:11 +0200, Andi Kleen wrote:
> That's not correct, support for multiple huge page sizes was recently
> added. The interface is a bit clumpsy admittedly, but it's there.

I'll have to look into that further then.  Having said that, I cannot
create a huge page SysV shared memory segment with pages of specified
size, can I?

> However for non fragmentation purposes you probably don't
> want too many different sizes anyways, the more sizes, the worse
> the fragmentation. Ideal is only a single size.

Unfortunately, sizes may very from several KiBs to a few MiBs.
On the other hand, only a handful of apps will use PMM in our system
and at most two or three will be run at the same time so hopefully
fragmentation won't be so bad.  But yes, I admit it is a concern.

>> largest blocks that may ever be requested and then waste a lot of
>> memory when small pages are requested or (ii) define smaller huge
>> page size but then special handling of large regions need to be
>> implemented.
>
> If you don't do that then long term fragmentation will
> kill you anyways. it's easy to show that pre allocation with lots
> of different sizes is about equivalent what the main page allocator
> does anyways.

However, having an allocator in PMM used by a handful of apps, an
architect may provide a use cases that need to be supported and then
PMM may be reimplemented to guarantee that those cases are handled.

> As Peter et.al. explained earlier varying buffer sizes don't work
> anyways.

Either I missed something or Peter and Adrew only pointed the problem
we all seem to agree exists: a problem of fragmentation.

-- 
Best regards,                                            _     _
 .o. | Liege of Serenly Enlightened Majesty of         o' \,=./ `o
 ..o | Computer Science,  MichaA? "mina86" Nazarewicz      (o o)
 ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-15 10:47                     ` Michał Nazarewicz
@ 2009-05-15 11:03                       ` Peter Zijlstra
  2009-05-15 11:11                         ` Michał Nazarewicz
  2009-05-15 11:26                       ` Andi Kleen
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2009-05-15 11:03 UTC (permalink / raw)
  To: Michał Nazarewicz
  Cc: Andi Kleen, Andrew Morton, linux-kernel, m.szyprowski,
	kyungmin.park, linux-mm

On Fri, 2009-05-15 at 12:47 +0200, Michał Nazarewicz wrote:
> >> Correct me if I'm wrong, but if I understand correctly, currently only
> >> one size of huge page may be defined, even if underlaying architecture
> 
> On Fri, 15 May 2009 12:18:11 +0200, Andi Kleen wrote:
> > That's not correct, support for multiple huge page sizes was recently
> > added. The interface is a bit clumpsy admittedly, but it's there.
> 
> I'll have to look into that further then.  Having said that, I cannot
> create a huge page SysV shared memory segment with pages of specified
> size, can I?

Well, hugetlbfs is a fs, so you can simply create a file on there and
map that shared -- much saner interface than sysvshm if you ask me.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-15 11:03                       ` Peter Zijlstra
@ 2009-05-15 11:11                         ` Michał Nazarewicz
  0 siblings, 0 replies; 16+ messages in thread
From: Michał Nazarewicz @ 2009-05-15 11:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Andrew Morton, linux-kernel, m.szyprowski,
	kyungmin.park, linux-mm

> On Fri, 2009-05-15 at 12:47 +0200, MichaA? Nazarewicz wrote:
>> I cannot create a huge page SysV shared memory segment
>> with pages of specified size, can I?

On Fri, 15 May 2009 13:03:34 +0200, Peter Zijlstra wrote:
> Well, hugetlbfs is a fs, so you can simply create a file on there and
> map that shared -- much saner interface than sysvshm if you ask me.

It's not a question of being sane or not, it's a question of whether
X server supports it and it doesn't.  X can read data from Sys V shm
to avoid needles copying (or sending via unix socket or whatever)
pixmaps (or whatever) and so PMM lets it read from continuous blocks
without knowing or carying about it.

-- 
Best regards,                                            _     _
 .o. | Liege of Serenly Enlightened Majesty of         o' \,=./ `o
 ..o | Computer Science,  MichaA? "mina86" Nazarewicz      (o o)
 ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-15 10:47                     ` Michał Nazarewicz
  2009-05-15 11:03                       ` Peter Zijlstra
@ 2009-05-15 11:26                       ` Andi Kleen
  2009-05-15 12:05                         ` Michał Nazarewicz
  1 sibling, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2009-05-15 11:26 UTC (permalink / raw)
  To: Michał Nazarewicz
  Cc: Andi Kleen, Peter Zijlstra, Andrew Morton, linux-kernel,
	m.szyprowski, kyungmin.park, linux-mm

On Fri, May 15, 2009 at 12:47:23PM +0200, MichaA? Nazarewicz wrote:
> On Fri, 15 May 2009 12:18:11 +0200, Andi Kleen wrote:
> > That's not correct, support for multiple huge page sizes was recently
> > added. The interface is a bit clumpsy admittedly, but it's there.
> 
> I'll have to look into that further then.  Having said that, I cannot
> create a huge page SysV shared memory segment with pages of specified
> size, can I?

sysv shared memory supports huge pages, but there is currently
no interface to specify the intended page size, you always
get the default.

> 
> > However for non fragmentation purposes you probably don't
> > want too many different sizes anyways, the more sizes, the worse
> > the fragmentation. Ideal is only a single size.
> 
> Unfortunately, sizes may very from several KiBs to a few MiBs.

Then your approach will likely not be reliable.

> On the other hand, only a handful of apps will use PMM in our system
> and at most two or three will be run at the same time so hopefully
> fragmentation won't be so bad.  But yes, I admit it is a concern.

Such tight restrictions might work for you, but for mainline Linux the quality 
standards are higher.
 
> > As Peter et.al. explained earlier varying buffer sizes don't work
> > anyways.
> 
> Either I missed something or Peter and Adrew only pointed the problem
> we all seem to agree exists: a problem of fragmentation.

Multiple buffer sizes lead to fragmentation.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-15 11:26                       ` Andi Kleen
@ 2009-05-15 12:05                         ` Michał Nazarewicz
  0 siblings, 0 replies; 16+ messages in thread
From: Michał Nazarewicz @ 2009-05-15 12:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Andrew Morton, linux-kernel, m.szyprowski,
	kyungmin.park, linux-mm

>> On Fri, 15 May 2009 12:18:11 +0200, Andi Kleen wrote:
>>> However for non fragmentation purposes you probably don't
>>> want too many different sizes anyways, the more sizes, the worse
>>> the fragmentation. Ideal is only a single size.

> On Fri, May 15, 2009 at 12:47:23PM +0200, MichaA? Nazarewicz wrote:
>> Unfortunately, sizes may very from several KiBs to a few MiBs.

On Fri, 15 May 2009 13:26:56 +0200, Andi Kleen <andi@firstfloor.org> wrote:
> Then your approach will likely not be reliable.

>> On the other hand, only a handful of apps will use PMM in our system
>> and at most two or three will be run at the same time so hopefully
>> fragmentation won't be so bad.  But yes, I admit it is a concern.
>
> Such tight restrictions might work for you, but for mainline Linux the  
> quality standards are higher.

I understand PMM in current form may be unacceptable, however, hear me
out and please do correct me if I'm wrong at any point as I would love
to use an existing solution if any fulfilling my needs is present:

When different sizes of buffers are needed fragmentation is even bigger
problem in hugetlb (as pages must be aligned) then with PMM.

If a buffer that does not match page size is needed then with hugetlb
either bigger page needs to be allocated (and memory wasted) or few
smaller need to be merged (and the same problem as in PMM exists --
finding contiguous pages).

Reclaiming is not really an option since situation where there is no
sane bound time for allocation is not acceptable -- you don't want to
wait 10 seconds for an application to start on your cell phone. ;)

Also, I need an ability to convert any buffer to a Sys V shm, as to
be able to pass it to X server.  Currently no such API exist, does it?

With PMM and it's notion of memory types, different allocators and/or
memory pools, etc.  Allocators could be even dynamically loaded as
modules if one desires that.  My point is, that PMM is to be considered
a framework for situations similar to the one I described thorough all
of my mails, rather then a universal solution.

-- 
Best regards,                                            _     _
 .o. | Liege of Serenly Enlightened Majesty of         o' \,=./ `o
 ..o | Computer Science,  MichaA? "mina86" Nazarewicz      (o o)
 ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Physical Memory Management [0/1]
  2009-05-14 11:48       ` Michał Nazarewicz
  2009-05-14 12:05         ` Peter Zijlstra
@ 2009-05-14 19:33         ` Andi Kleen
  1 sibling, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2009-05-14 19:33 UTC (permalink / raw)
  To: Michaâ Nazarewicz
  Cc: Peter Zijlstra, Andrew Morton, linux-kernel, m.szyprowski,
	kyungmin.park, linux-mm

Michaa Nazarewicz <m.nazarewicz@samsung.com> writes:
>
> The idea here is that there are n hardware accelerators, each
> can operate on 1MiB blocks (to simplify assume that's the case).

You could just define a hugepage size for that and use hugetlbfs
with a few changes to map in pages with multiple PTEs.
It supports boot time reservation and is a well established
interface.

On x86 that would give 2MB units, on other architectures whatever
you prefer.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-05-15 12:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <op.utu26hq77p4s8u@amdc030>
2009-05-13 22:11 ` [PATCH] Physical Memory Management [0/1] Andrew Morton
2009-05-14  9:00   ` Michał Nazarewicz
2009-05-14 11:20     ` Peter Zijlstra
2009-05-14 11:48       ` Michał Nazarewicz
2009-05-14 12:05         ` Peter Zijlstra
2009-05-14 13:04           ` Michał Nazarewicz
2009-05-14 17:07             ` Andrew Morton
2009-05-14 17:10               ` Peter Zijlstra
2009-05-15 10:06                 ` Michał Nazarewicz
2009-05-15 10:18                   ` Andi Kleen
2009-05-15 10:47                     ` Michał Nazarewicz
2009-05-15 11:03                       ` Peter Zijlstra
2009-05-15 11:11                         ` Michał Nazarewicz
2009-05-15 11:26                       ` Andi Kleen
2009-05-15 12:05                         ` Michał Nazarewicz
2009-05-14 19:33         ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).