* Re: [PATCH] Physical Memory Management [0/1] [not found] <op.utu26hq77p4s8u@amdc030> @ 2009-05-13 22:11 ` Andrew Morton 2009-05-14 9:00 ` Michał Nazarewicz 0 siblings, 1 reply; 16+ messages in thread From: Andrew Morton @ 2009-05-13 22:11 UTC (permalink / raw) To: Micha__ Nazarewicz; +Cc: linux-kernel, m.szyprowski, kyungmin.park, linux-mm (cc linux-mm) (please keep the emails to under 80 columns) On Wed, 13 May 2009 11:26:31 +0200 Micha__ Nazarewicz <m.nazarewicz@samsung.com> wrote: > In the next message a patch which allows allocation of large continuous blocks of physical memory will be sent. This functionality makes it similar to bigphysarea, however PMM has many more features: > > 1. Each allocated block of memory has a reference counter so different kernel modules may share the same buffer with a well known get/put semantics. > > 2. It aggregates physical memory allocating and management API in one place. This is good because there is a single place to debug and test for all devices. Moreover, because each device does not need to reserve it's own area of physical memory a total size of reserved memory is smaller. Say, we have 3 accelerators. Each of them can operate on 1MiB blocks, so each of them would have to reserve 1MiB for itself (this means total of 3MiB of reserved memory). However, if at most two of those devices can be used at once, we could reserve 2MiB saving 1MiB. > > 3. PMM has it's own allocator which runs in O(log n) bound time where n is total number of areas and free spaces between them -- the upper time limit may be important when working on data sent in real time (for instance an output of a camera). Currently a best-fit algorithm is used but you can easily replace it if it does not meet your needs. > > 4. Via a misc char device, the module allows allocation of continuous blocks from user space. Such solution has several advantages. In particular, other option would be to add a allocation calls for each individual devices (think hardware accelerators) -- this would double the same code in several drivers plus it would lead to inconsistent API for doing the very same think. Moreover, when creating pipelines (ie. encoded image --[decoder]--> decoded image --[scaler]--> scaled image) devices would have to develop a method of sharing buffers. With PMM user space program allocates a block and passes it as an output buffer for the first device and input buffer for the other. > > 5. PMM is integrated with System V IPC, so that user space programs may "convert" allocated block into a segment of System V shared memory. This makes it possible to pass PMM buffers to PMM-unaware but SysV-aware applications. Notable example are X11. This makes it possible to deploy a zero-copy scheme when communicating with X11. For instance, image scaled in previous example could be passed directly to X server without the need to copy it to a newly created System V shared memory. > > 6. PMM has a notion of memory types. In attached patch only a general memory type is defined but you can easily add more types for a given platform. To understand what in PMM terms is memory type we can use an example: a general memory may be a main RAM memory which we have a lot but it is quite slow and another type may be a portion of L2 cache configured to act as fast memory. Because PMM may be aware of those, again, allocation of different kinds of memory has a common, consistent API. OK, let's pretend we didn't see an implementation. What are you trying to do here? What problem(s) are being solved? What are the requirements and the use cases? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-13 22:11 ` [PATCH] Physical Memory Management [0/1] Andrew Morton @ 2009-05-14 9:00 ` Michał Nazarewicz 2009-05-14 11:20 ` Peter Zijlstra 0 siblings, 1 reply; 16+ messages in thread From: Michał Nazarewicz @ 2009-05-14 9:00 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, m.szyprowski, kyungmin.park, linux-mm On Thu, 14 May 2009 00:11:42 +0200, Andrew Morton wrote: > (please keep the emails to under 80 columns) Yes, sorry about that. Apparently "Automatically wrap outgoing messages" doesn't mean what I thought it does. > Michal Nazarewicz <m.nazarewicz@samsung.com> wrote: >> 1. Each allocated block of memory has a reference counter so different >> kernel modules may share the same buffer with a well known get/put >> semantics. >> >> 2. It aggregates physical memory allocating and management API in one >> place. This is good because there is a single place to debug and test >> for all devices. Moreover, because each device does not need to reserve >> it's own area of physical memory a total size of reserved memory is >> smaller. Say, we have 3 accelerators. Each of them can operate on 1MiB >> blocks, so each of them would have to reserve 1MiB for itself (this >> means total of 3MiB of reserved memory). However, if at most two of >> those devices can be used at once, we could reserve 2MiB saving 1MiB. >> >> 3. PMM has it's own allocator which runs in O(log n) bound time where n >> is total number of areas and free spaces between them -- the upper time >> limit may be important when working on data sent in real time (for >> instance an output of a camera). Currently a best-fit algorithm is >> used but you can easily replace it if it does not meet your needs. >> >> 4. Via a misc char device, the module allows allocation of continuous >> blocks from user space. Such solution has several advantages. In >> particular, other option would be to add a allocation calls for each >> individual devices (think hardware accelerators) -- this would double >> the same code in several drivers plus it would lead to inconsistent API >> for doing the very same think. Moreover, when creating pipelines (ie. >> encoded image --[decoder]--> decoded image --[scaler]--> scaled image) >> devices would have to develop a method of sharing buffers. With PMM >> user space program allocates a block and passes it as an output buffer >> for the first device and input buffer for the other. >> >> 5. PMM is integrated with System V IPC, so that user space programs may >> "convert" allocated block into a segment of System V shared memory. >> This makes it possible to pass PMM buffers to PMM-unaware but >> SysV-aware applications. Notable example are X11. This makes it >> possible to deploy a zero-copy scheme when communicating with X11. For >> instance, image scaled in previous example could be passed directly to >> X server without the need to copy it to a newly created System V shared >> memory. >> >> 6. PMM has a notion of memory types. In attached patch only a general >> memory type is defined but you can easily add more types for a given >> platform. To understand what in PMM terms is memory type we can use an >> example: a general memory may be a main RAM memory which we have a lot >> but it is quite slow and another type may be a portion of L2 cache >> configured to act as fast memory. Because PMM may be aware of those, >> again, allocation of different kinds of memory has a common, consistent >> API. > OK, let's pretend we didn't see an implementation. :] I've never said it's perfect. I'll welcome any constructive comments. > What are you trying to do here? What problem(s) are being solved? > What are the requirements and the use cases? Overall situation: UMA embedded system and many hardware accelerators (DMA capable, no scatter-gather). Three use cases: 1. We have a hardware JPEG decoder, we want to decode an image. 2. As above plus we have an image scaler, we want to scale decoded image. 3. As above plus we want to pass scaled image to X server. Neither decoder nor scaler may operate on malloc(3)ed areas as they aren't continuous in physical memory. A copying of a scattered buffer would have to be used. This is a performance cost. It also doubles memory usage. PMM solves this as it lets user space allocate a continuous buffers which the devices may use directly. It could be solved by letting each driver allocate its own buffers during boot time and then let user space mmap(2) them. However, with 10 hardware accelerators each needing 1MiB buffer we need to reserve 10MiB of memory. If we know that at most 5 devices will be used at the same time we could've reserve 5MiB instead of 10MiB. PMM solves this problem since the buffers are allocated when they are needed. This could be solved by letting each driver allocate buffers when requested (using bigphysarea for instance). It has some minor issues like implementing mmap file operation in all drivers and inconsistent user space API but the most significant is it's not clear how to implement 2nd use case. If drivers expect to work on their own buffers, decoder's output must be copied into scaler's input buffer. With PMM, drivers simply expect a continuous buffers and do not care where they came from or if other drivers use them as well. Now, as of 3rd use case. X may work with System V shared memory, however, since shared memory segments (created via shmget(2)) are not continuous, we cannot pass it to a scaler as an output buffer. PMM solves it, since it allows converting an area allocated via PMM into a System V shared memory segment. -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, MichaA? "mina86" Nazarewicz (o o) ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-14 9:00 ` Michał Nazarewicz @ 2009-05-14 11:20 ` Peter Zijlstra 2009-05-14 11:48 ` Michał Nazarewicz 0 siblings, 1 reply; 16+ messages in thread From: Peter Zijlstra @ 2009-05-14 11:20 UTC (permalink / raw) To: Michał Nazarewicz Cc: Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park, linux-mm On Thu, 2009-05-14 at 11:00 +0200, MichaA? Nazarewicz wrote: > PMM solves this problem since the buffers are allocated when they > are needed. Ha - only when you actually manage to allocate things. Physically contiguous allocations are exceedingly hard once the machine has been running for a while. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-14 11:20 ` Peter Zijlstra @ 2009-05-14 11:48 ` Michał Nazarewicz 2009-05-14 12:05 ` Peter Zijlstra 2009-05-14 19:33 ` Andi Kleen 0 siblings, 2 replies; 16+ messages in thread From: Michał Nazarewicz @ 2009-05-14 11:48 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park, linux-mm > On Thu, 2009-05-14 at 11:00 +0200, MichaA? Nazarewicz wrote: >> PMM solves this problem since the buffers are allocated when they >> are needed. On Thu, 14 May 2009 13:20:02 +0200, Peter Zijlstra wrote: > Ha - only when you actually manage to allocate things. Physically > contiguous allocations are exceedingly hard once the machine has been > running for a while. PMM reserves memory during boot time using alloc_bootmem_low_pages(). After this is done, it can allocate buffers from reserved pool. The idea here is that there are n hardware accelerators, each can operate on 1MiB blocks (to simplify assume that's the case). However, we know that at most m < n devices will be used at the same time so instead of reserving n MiBs of memory we reserve only m MiBs. -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, MichaA? "mina86" Nazarewicz (o o) ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-14 11:48 ` Michał Nazarewicz @ 2009-05-14 12:05 ` Peter Zijlstra 2009-05-14 13:04 ` Michał Nazarewicz 2009-05-14 19:33 ` Andi Kleen 1 sibling, 1 reply; 16+ messages in thread From: Peter Zijlstra @ 2009-05-14 12:05 UTC (permalink / raw) To: Michał Nazarewicz Cc: Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park, linux-mm On Thu, 2009-05-14 at 13:48 +0200, MichaA? Nazarewicz wrote: > > On Thu, 2009-05-14 at 11:00 +0200, MichaA? Nazarewicz wrote: > >> PMM solves this problem since the buffers are allocated when they > >> are needed. > > On Thu, 14 May 2009 13:20:02 +0200, Peter Zijlstra wrote: > > Ha - only when you actually manage to allocate things. Physically > > contiguous allocations are exceedingly hard once the machine has been > > running for a while. > > PMM reserves memory during boot time using alloc_bootmem_low_pages(). > After this is done, it can allocate buffers from reserved pool. > > The idea here is that there are n hardware accelerators, each > can operate on 1MiB blocks (to simplify assume that's the case). > However, we know that at most m < n devices will be used at the same > time so instead of reserving n MiBs of memory we reserve only m MiBs. And who says your pre-allocated pool won't fragment with repeated PMM use? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-14 12:05 ` Peter Zijlstra @ 2009-05-14 13:04 ` Michał Nazarewicz 2009-05-14 17:07 ` Andrew Morton 0 siblings, 1 reply; 16+ messages in thread From: Michał Nazarewicz @ 2009-05-14 13:04 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park, linux-mm > On Thu, 2009-05-14 at 13:48 +0200, MichaA? Nazarewicz wrote: >>> On Thu, 2009-05-14 at 11:00 +0200, MichaA? Nazarewicz wrote: >>>> PMM solves this problem since the buffers are allocated when they >>>> are needed. >> On Thu, 14 May 2009 13:20:02 +0200, Peter Zijlstra wrote: >>> Ha - only when you actually manage to allocate things. Physically >>> contiguous allocations are exceedingly hard once the machine has been >>> running for a while. >> PMM reserves memory during boot time using alloc_bootmem_low_pages(). >> After this is done, it can allocate buffers from reserved pool. >> >> The idea here is that there are n hardware accelerators, each >> can operate on 1MiB blocks (to simplify assume that's the case). >> However, we know that at most m < n devices will be used at the same >> time so instead of reserving n MiBs of memory we reserve only m MiBs. On Thu, 14 May 2009 14:05:02 +0200, Peter Zijlstra wrote: > And who says your pre-allocated pool won't fragment with repeated PMM > use? Yes, this is a good question. What's more, there's no good answer. ;) There is no guarantee and it depends on use cases. The biggest problem is a lot of small buffers allocated by different applications which get freed at different times. However, if in most cases one or two applications use PMM, we can assume that buffers are allocated and freed in groups. If that's the case, fragmentation is less likely to occur. I'm not claiming that PMM is panacea for all the problems present on systems with no scatter-gather capability -- it is an attempt to gather different functionality and existing solutions in one place which is easier to manage and improve if needed. Problem with allocation of continuous blocks hos no universal solution -- you can increased reserved area but then overall performance of the system will decrease. PMM is trying to find a compromise between the two. -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, MichaA? "mina86" Nazarewicz (o o) ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-14 13:04 ` Michał Nazarewicz @ 2009-05-14 17:07 ` Andrew Morton 2009-05-14 17:10 ` Peter Zijlstra 0 siblings, 1 reply; 16+ messages in thread From: Andrew Morton @ 2009-05-14 17:07 UTC (permalink / raw) To: Micha__ Nazarewicz Cc: peterz, linux-kernel, m.szyprowski, kyungmin.park, linux-mm On Thu, 14 May 2009 15:04:55 +0200 Micha__ Nazarewicz <m.nazarewicz@samsung.com> wrote: > On Thu, 14 May 2009 14:05:02 +0200, Peter Zijlstra wrote: > > And who says your pre-allocated pool won't fragment with repeated PMM > > use? > > Yes, this is a good question. What's more, there's no good answer. ;) > We do have capability in page reclaim to deliberately free up physically contiguous pages (known as "lumpy reclaim"). It would be interesting were someone to have a go at making that available to userspace: ask the kernel to give you 1MB of physically contiguous memory. There are reasons why this can fail, but migrating pages can be used to improve the success rate, and userspace can be careful to not go nuts using mlock(), etc. The returned memory would of course need to be protected from other reclaim/migration/etc activity. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-14 17:07 ` Andrew Morton @ 2009-05-14 17:10 ` Peter Zijlstra 2009-05-15 10:06 ` Michał Nazarewicz 0 siblings, 1 reply; 16+ messages in thread From: Peter Zijlstra @ 2009-05-14 17:10 UTC (permalink / raw) To: Andrew Morton Cc: Micha__ Nazarewicz, linux-kernel, m.szyprowski, kyungmin.park, linux-mm On Thu, 2009-05-14 at 10:07 -0700, Andrew Morton wrote: > On Thu, 14 May 2009 15:04:55 +0200 > Micha__ Nazarewicz <m.nazarewicz@samsung.com> wrote: > > > On Thu, 14 May 2009 14:05:02 +0200, Peter Zijlstra wrote: > > > And who says your pre-allocated pool won't fragment with repeated PMM > > > use? > > > > Yes, this is a good question. What's more, there's no good answer. ;) > > > > We do have capability in page reclaim to deliberately free up > physically contiguous pages (known as "lumpy reclaim"). > > It would be interesting were someone to have a go at making that > available to userspace: ask the kernel to give you 1MB of physically > contiguous memory. There are reasons why this can fail, but migrating > pages can be used to improve the success rate, and userspace can be > careful to not go nuts using mlock(), etc. > > The returned memory would of course need to be protected from other > reclaim/migration/etc activity. I thought we already exposed this, its called hugetlbfs ;-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-14 17:10 ` Peter Zijlstra @ 2009-05-15 10:06 ` Michał Nazarewicz 2009-05-15 10:18 ` Andi Kleen 0 siblings, 1 reply; 16+ messages in thread From: Michał Nazarewicz @ 2009-05-15 10:06 UTC (permalink / raw) To: Peter Zijlstra, Andrew Morton, Andi Kleen Cc: linux-kernel, m.szyprowski, kyungmin.park, linux-mm > On Thu, 2009-05-14 at 10:07 -0700, Andrew Morton wrote: >> We do have capability in page reclaim to deliberately free up >> physically contiguous pages (known as "lumpy reclaim"). Doesn't this require swap? >> It would be interesting were someone to have a go at making that >> available to userspace: ask the kernel to give you 1MB of physically >> contiguous memory. There are reasons why this can fail, but migrating >> pages can be used to improve the success rate, and userspace can be >> careful to not go nuts using mlock(), etc. On Thu, 14 May 2009 19:10:00 +0200, Peter Zijlstra wrote: > I thought we already exposed this, its called hugetlbfs ;-) On Thu, 14 May 2009 21:33:11 +0200, Andi Kleen wrote: > You could just define a hugepage size for that and use hugetlbfs > with a few changes to map in pages with multiple PTEs. > It supports boot time reservation and is a well established > interface. > > On x86 that would give 2MB units, on other architectures whatever > you prefer. Correct me if I'm wrong, but if I understand correctly, currently only one size of huge page may be defined, even if underlaying architecture supports many different sizes. So now, there are two cases: (i) either define huge page size to the largest blocks that may ever be requested and then waste a lot of memory when small pages are requested or (ii) define smaller huge page size but then special handling of large regions need to be implemented. The first solution is not acceptable, as a lot of memory may be wasted. If for example, you have a 4 mega pixel camera you'd have to configure 4 MiB-large huge pages but in most cases, you won't be needing that much. Often you will work with say 320x200x2 images (125KiB) and more then 3MiBs will be wasted! In the later, with (say) 128 KiB huge pages no (or little) space will be wasted when working with 320x200x2 images but then when someone would really need 4 MiB to take a photo the very same problem we started with will occur -- we will have to find 32 contiguous pages. So to sum up, if I understand everything correctly, hugetlb would be a great solution when working with buffers of similar sizes. However, it's not so good when size of requested buffer may vary greatly. -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, MichaA? "mina86" Nazarewicz (o o) ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-15 10:06 ` Michał Nazarewicz @ 2009-05-15 10:18 ` Andi Kleen 2009-05-15 10:47 ` Michał Nazarewicz 0 siblings, 1 reply; 16+ messages in thread From: Andi Kleen @ 2009-05-15 10:18 UTC (permalink / raw) To: Michał Nazarewicz Cc: Peter Zijlstra, Andrew Morton, Andi Kleen, linux-kernel, m.szyprowski, kyungmin.park, linux-mm > Correct me if I'm wrong, but if I understand correctly, currently only > one size of huge page may be defined, even if underlaying architecture That's not correct, support for multiple huge page sizes was recently added. The interface is a bit clumpsy admittedly, but it's there. However for non fragmentation purposes you probably don't want too many different sizes anyways, the more sizes, the worse the fragmentation. Ideal is only a single size. > largest blocks that may ever be requested and then waste a lot of > memory when small pages are requested or (ii) define smaller huge > page size but then special handling of large regions need to be > implemented. If you don't do that then long term fragmentation will kill you anyways. it's easy to show that pre allocation with lots of different sizes is about equivalent what the main page allocator does anyways. > So to sum up, if I understand everything correctly, hugetlb would be a > great solution when working with buffers of similar sizes. However, it's > not so good when size of requested buffer may vary greatly. As Peter et.al. explained earlier varying buffer sizes don't work anyways. -Andi -- ak@linux.intel.com -- Speaking for myself only. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-15 10:18 ` Andi Kleen @ 2009-05-15 10:47 ` Michał Nazarewicz 2009-05-15 11:03 ` Peter Zijlstra 2009-05-15 11:26 ` Andi Kleen 0 siblings, 2 replies; 16+ messages in thread From: Michał Nazarewicz @ 2009-05-15 10:47 UTC (permalink / raw) To: Andi Kleen Cc: Peter Zijlstra, Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park, linux-mm >> Correct me if I'm wrong, but if I understand correctly, currently only >> one size of huge page may be defined, even if underlaying architecture On Fri, 15 May 2009 12:18:11 +0200, Andi Kleen wrote: > That's not correct, support for multiple huge page sizes was recently > added. The interface is a bit clumpsy admittedly, but it's there. I'll have to look into that further then. Having said that, I cannot create a huge page SysV shared memory segment with pages of specified size, can I? > However for non fragmentation purposes you probably don't > want too many different sizes anyways, the more sizes, the worse > the fragmentation. Ideal is only a single size. Unfortunately, sizes may very from several KiBs to a few MiBs. On the other hand, only a handful of apps will use PMM in our system and at most two or three will be run at the same time so hopefully fragmentation won't be so bad. But yes, I admit it is a concern. >> largest blocks that may ever be requested and then waste a lot of >> memory when small pages are requested or (ii) define smaller huge >> page size but then special handling of large regions need to be >> implemented. > > If you don't do that then long term fragmentation will > kill you anyways. it's easy to show that pre allocation with lots > of different sizes is about equivalent what the main page allocator > does anyways. However, having an allocator in PMM used by a handful of apps, an architect may provide a use cases that need to be supported and then PMM may be reimplemented to guarantee that those cases are handled. > As Peter et.al. explained earlier varying buffer sizes don't work > anyways. Either I missed something or Peter and Adrew only pointed the problem we all seem to agree exists: a problem of fragmentation. -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, MichaA? "mina86" Nazarewicz (o o) ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-15 10:47 ` Michał Nazarewicz @ 2009-05-15 11:03 ` Peter Zijlstra 2009-05-15 11:11 ` Michał Nazarewicz 2009-05-15 11:26 ` Andi Kleen 1 sibling, 1 reply; 16+ messages in thread From: Peter Zijlstra @ 2009-05-15 11:03 UTC (permalink / raw) To: Michał Nazarewicz Cc: Andi Kleen, Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park, linux-mm On Fri, 2009-05-15 at 12:47 +0200, Michał Nazarewicz wrote: > >> Correct me if I'm wrong, but if I understand correctly, currently only > >> one size of huge page may be defined, even if underlaying architecture > > On Fri, 15 May 2009 12:18:11 +0200, Andi Kleen wrote: > > That's not correct, support for multiple huge page sizes was recently > > added. The interface is a bit clumpsy admittedly, but it's there. > > I'll have to look into that further then. Having said that, I cannot > create a huge page SysV shared memory segment with pages of specified > size, can I? Well, hugetlbfs is a fs, so you can simply create a file on there and map that shared -- much saner interface than sysvshm if you ask me. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-15 11:03 ` Peter Zijlstra @ 2009-05-15 11:11 ` Michał Nazarewicz 0 siblings, 0 replies; 16+ messages in thread From: Michał Nazarewicz @ 2009-05-15 11:11 UTC (permalink / raw) To: Peter Zijlstra Cc: Andi Kleen, Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park, linux-mm > On Fri, 2009-05-15 at 12:47 +0200, MichaA? Nazarewicz wrote: >> I cannot create a huge page SysV shared memory segment >> with pages of specified size, can I? On Fri, 15 May 2009 13:03:34 +0200, Peter Zijlstra wrote: > Well, hugetlbfs is a fs, so you can simply create a file on there and > map that shared -- much saner interface than sysvshm if you ask me. It's not a question of being sane or not, it's a question of whether X server supports it and it doesn't. X can read data from Sys V shm to avoid needles copying (or sending via unix socket or whatever) pixmaps (or whatever) and so PMM lets it read from continuous blocks without knowing or carying about it. -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, MichaA? "mina86" Nazarewicz (o o) ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-15 10:47 ` Michał Nazarewicz 2009-05-15 11:03 ` Peter Zijlstra @ 2009-05-15 11:26 ` Andi Kleen 2009-05-15 12:05 ` Michał Nazarewicz 1 sibling, 1 reply; 16+ messages in thread From: Andi Kleen @ 2009-05-15 11:26 UTC (permalink / raw) To: Michał Nazarewicz Cc: Andi Kleen, Peter Zijlstra, Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park, linux-mm On Fri, May 15, 2009 at 12:47:23PM +0200, MichaA? Nazarewicz wrote: > On Fri, 15 May 2009 12:18:11 +0200, Andi Kleen wrote: > > That's not correct, support for multiple huge page sizes was recently > > added. The interface is a bit clumpsy admittedly, but it's there. > > I'll have to look into that further then. Having said that, I cannot > create a huge page SysV shared memory segment with pages of specified > size, can I? sysv shared memory supports huge pages, but there is currently no interface to specify the intended page size, you always get the default. > > > However for non fragmentation purposes you probably don't > > want too many different sizes anyways, the more sizes, the worse > > the fragmentation. Ideal is only a single size. > > Unfortunately, sizes may very from several KiBs to a few MiBs. Then your approach will likely not be reliable. > On the other hand, only a handful of apps will use PMM in our system > and at most two or three will be run at the same time so hopefully > fragmentation won't be so bad. But yes, I admit it is a concern. Such tight restrictions might work for you, but for mainline Linux the quality standards are higher. > > As Peter et.al. explained earlier varying buffer sizes don't work > > anyways. > > Either I missed something or Peter and Adrew only pointed the problem > we all seem to agree exists: a problem of fragmentation. Multiple buffer sizes lead to fragmentation. -Andi -- ak@linux.intel.com -- Speaking for myself only. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-15 11:26 ` Andi Kleen @ 2009-05-15 12:05 ` Michał Nazarewicz 0 siblings, 0 replies; 16+ messages in thread From: Michał Nazarewicz @ 2009-05-15 12:05 UTC (permalink / raw) To: Andi Kleen Cc: Peter Zijlstra, Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park, linux-mm >> On Fri, 15 May 2009 12:18:11 +0200, Andi Kleen wrote: >>> However for non fragmentation purposes you probably don't >>> want too many different sizes anyways, the more sizes, the worse >>> the fragmentation. Ideal is only a single size. > On Fri, May 15, 2009 at 12:47:23PM +0200, MichaA? Nazarewicz wrote: >> Unfortunately, sizes may very from several KiBs to a few MiBs. On Fri, 15 May 2009 13:26:56 +0200, Andi Kleen <andi@firstfloor.org> wrote: > Then your approach will likely not be reliable. >> On the other hand, only a handful of apps will use PMM in our system >> and at most two or three will be run at the same time so hopefully >> fragmentation won't be so bad. But yes, I admit it is a concern. > > Such tight restrictions might work for you, but for mainline Linux the > quality standards are higher. I understand PMM in current form may be unacceptable, however, hear me out and please do correct me if I'm wrong at any point as I would love to use an existing solution if any fulfilling my needs is present: When different sizes of buffers are needed fragmentation is even bigger problem in hugetlb (as pages must be aligned) then with PMM. If a buffer that does not match page size is needed then with hugetlb either bigger page needs to be allocated (and memory wasted) or few smaller need to be merged (and the same problem as in PMM exists -- finding contiguous pages). Reclaiming is not really an option since situation where there is no sane bound time for allocation is not acceptable -- you don't want to wait 10 seconds for an application to start on your cell phone. ;) Also, I need an ability to convert any buffer to a Sys V shm, as to be able to pass it to X server. Currently no such API exist, does it? With PMM and it's notion of memory types, different allocators and/or memory pools, etc. Allocators could be even dynamically loaded as modules if one desires that. My point is, that PMM is to be considered a framework for situations similar to the one I described thorough all of my mails, rather then a universal solution. -- Best regards, _ _ .o. | Liege of Serenly Enlightened Majesty of o' \,=./ `o ..o | Computer Science, MichaA? "mina86" Nazarewicz (o o) ooo +-<m.nazarewicz@samsung.com>-<mina86@jabber.org>-ooO--(_)--Ooo-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Physical Memory Management [0/1] 2009-05-14 11:48 ` Michał Nazarewicz 2009-05-14 12:05 ` Peter Zijlstra @ 2009-05-14 19:33 ` Andi Kleen 1 sibling, 0 replies; 16+ messages in thread From: Andi Kleen @ 2009-05-14 19:33 UTC (permalink / raw) To: Michaâ Nazarewicz Cc: Peter Zijlstra, Andrew Morton, linux-kernel, m.szyprowski, kyungmin.park, linux-mm Michaa Nazarewicz <m.nazarewicz@samsung.com> writes: > > The idea here is that there are n hardware accelerators, each > can operate on 1MiB blocks (to simplify assume that's the case). You could just define a hugepage size for that and use hugetlbfs with a few changes to map in pages with multiple PTEs. It supports boot time reservation and is a well established interface. On x86 that would give 2MB units, on other architectures whatever you prefer. -Andi -- ak@linux.intel.com -- Speaking for myself only. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2009-05-15 12:05 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <op.utu26hq77p4s8u@amdc030>
2009-05-13 22:11 ` [PATCH] Physical Memory Management [0/1] Andrew Morton
2009-05-14 9:00 ` Michał Nazarewicz
2009-05-14 11:20 ` Peter Zijlstra
2009-05-14 11:48 ` Michał Nazarewicz
2009-05-14 12:05 ` Peter Zijlstra
2009-05-14 13:04 ` Michał Nazarewicz
2009-05-14 17:07 ` Andrew Morton
2009-05-14 17:10 ` Peter Zijlstra
2009-05-15 10:06 ` Michał Nazarewicz
2009-05-15 10:18 ` Andi Kleen
2009-05-15 10:47 ` Michał Nazarewicz
2009-05-15 11:03 ` Peter Zijlstra
2009-05-15 11:11 ` Michał Nazarewicz
2009-05-15 11:26 ` Andi Kleen
2009-05-15 12:05 ` Michał Nazarewicz
2009-05-14 19:33 ` Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).