[PATCH 0/8] zcache: page cache compression support

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/8] zcache: page cache compression support
@ 2010-07-16 12:37 Nitin Gupta
  2010-07-17 21:13 ` Ed Tomlinson
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Nitin Gupta @ 2010-07-16 12:37 UTC (permalink / raw)
  To: Pekka Enberg, Hugh Dickins, Andrew Morton, Greg KH,
	Dan Magenheimer, Rik van Riel, Avi Kivity, Christoph Hellwig,
	Minchan Kim, Konrad Rzeszutek Wilk
  Cc: linux-mm, linux-kernel

Frequently accessed filesystem data is stored in memory to reduce access to
(much) slower backing disks. Under memory pressure, these pages are freed and
when needed again, they have to be read from disks again. When combined working
set of all running application exceeds amount of physical RAM, we get extereme
slowdown as reading a page from disk can take time in order of milliseconds.

Memory compression increases effective memory size and allows more pages to
stay in RAM. Since de/compressing memory pages is several orders of magnitude
faster than disk I/O, this can provide signifant performance gains for many
workloads. Also, with multi-cores becoming common, benefits of reduced disk I/O
should easily outweigh the problem of increased CPU usage.

It is implemented as a "backend" for cleancache_ops [1] which provides
callbacks for events such as when a page is to be removed from the page cache
and when it is required again. We use them to implement a 'second chance' cache
for these evicted page cache pages by compressing and storing them in memory
itself.

We only keep pages that compress to PAGE_SIZE/2 or less. Compressed chunks are
stored using xvmalloc memory allocator which is already being used by zram
driver for the same purpose. Zero-filled pages are checked and no memory is
allocated for them.

A separate "pool" is created for each mount instance for a cleancache-aware
filesystem. Each incoming page is identified with <pool_id, inode_no, index>
where inode_no identifies file within the filesystem corresponding to pool_id
and index is offset of the page within this inode. Within a pool, inodes are
maintained in an rb-tree and each of its nodes points to a separate radix-tree
which maintains list of pages within that inode.

While compression reduces disk I/O, it also reduces the space available for
normal (uncompressed) page cache. This can result in more frequent page cache
reclaim and thus higher CPU overhead. Thus, it's important to maintain good hit
rate for compressed cache or increased CPU overhead can nullify any other
benefits. This requires adaptive (compressed) cache resizing and page
replacement policies that can maintain optimal cache size and quickly reclaim
unused compressed chunks. This work is yet to be done. However, in the current
state, it allows manually resizing cache size using (per-pool) sysfs node
'memlimit' which in turn frees any excess pages *sigh* randomly.

Finally, it uses percpu stats and compression buffers to allow better
performance on multi-cores. Still, there are known bottlenecks like a single
xvmalloc mempool per zcache pool and few others. I will work on this when I
start with profiling.

 * Performance numbers:
   - Tested using iozone filesystem benchmark
   - 4 CPUs, 1G RAM
   - Read performance gain: ~2.5X
   - Random read performance gain: ~3X
   - In general, performance gains for every kind of I/O

Test details with graphs can be found here:
http://code.google.com/p/compcache/wiki/zcacheIOzone

If I can get some help with testing, it would be intersting to find its
effect in more real-life workloads. In particular, I'm intersted in finding
out its effect in KVM virtualization case where it can potentially allow
running more number of VMs per-host for a given amount of RAM. With zcache
enabled, VMs can be assigned much smaller amount of memory since host can now
hold bulk of page-cache pages, allowing VMs to maintain similar level of
performance while a greater number of them can be hosted.

 * How to test:
All patches are against 2.6.35-rc5:

 - First, apply all prerequisite patches here:
http://compcache.googlecode.com/hg/sub-projects/zcache_base_patches

 - Then apply this patch series; also uploaded here:
http://compcache.googlecode.com/hg/sub-projects/zcache_patches

Nitin Gupta (8):
  Allow sharing xvmalloc for zram and zcache
  Basic zcache functionality
  Create sysfs nodes and export basic statistics
  Shrink zcache based on memlimit
  Eliminate zero-filled pages
  Compress pages using LZO
  Use xvmalloc to store compressed chunks
  Document sysfs entries

 Documentation/ABI/testing/sysfs-kernel-mm-zcache |   53 +
 drivers/staging/Makefile                         |    2 +
 drivers/staging/zram/Kconfig                     |   22 +
 drivers/staging/zram/Makefile                    |    5 +-
 drivers/staging/zram/xvmalloc.c                  |    8 +
 drivers/staging/zram/zcache_drv.c                | 1312 ++++++++++++++++++++++
 drivers/staging/zram/zcache_drv.h                |   90 ++
 7 files changed, 1491 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-zcache
 create mode 100644 drivers/staging/zram/zcache_drv.c
 create mode 100644 drivers/staging/zram/zcache_drv.h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2010-07-16 12:37 Nitin Gupta
@ 2010-07-17 21:13 ` Ed Tomlinson
  2010-07-18  2:23   ` Nitin Gupta
  2010-07-18  7:50 ` Pekka Enberg
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Ed Tomlinson @ 2010-07-17 21:13 UTC (permalink / raw)
  To: Nitin Gupta; +Cc: linux-mm, linux-kernel

Nitin,

Would you have all this in a git tree somewhere?

Considering getting this working requires 24 patches it would really help with testing.

TIA
Ed Tomlinson

On Friday 16 July 2010 08:37:42 you wrote:
> Frequently accessed filesystem data is stored in memory to reduce access to
> (much) slower backing disks. Under memory pressure, these pages are freed and
> when needed again, they have to be read from disks again. When combined working
> set of all running application exceeds amount of physical RAM, we get extereme
> slowdown as reading a page from disk can take time in order of milliseconds.
> 
> Memory compression increases effective memory size and allows more pages to
> stay in RAM. Since de/compressing memory pages is several orders of magnitude
> faster than disk I/O, this can provide signifant performance gains for many
> workloads. Also, with multi-cores becoming common, benefits of reduced disk I/O
> should easily outweigh the problem of increased CPU usage.
> 
> It is implemented as a "backend" for cleancache_ops [1] which provides
> callbacks for events such as when a page is to be removed from the page cache
> and when it is required again. We use them to implement a 'second chance' cache
> for these evicted page cache pages by compressing and storing them in memory
> itself.
> 
> We only keep pages that compress to PAGE_SIZE/2 or less. Compressed chunks are
> stored using xvmalloc memory allocator which is already being used by zram
> driver for the same purpose. Zero-filled pages are checked and no memory is
> allocated for them.
> 
> A separate "pool" is created for each mount instance for a cleancache-aware
> filesystem. Each incoming page is identified with <pool_id, inode_no, index>
> where inode_no identifies file within the filesystem corresponding to pool_id
> and index is offset of the page within this inode. Within a pool, inodes are
> maintained in an rb-tree and each of its nodes points to a separate radix-tree
> which maintains list of pages within that inode.
> 
> While compression reduces disk I/O, it also reduces the space available for
> normal (uncompressed) page cache. This can result in more frequent page cache
> reclaim and thus higher CPU overhead. Thus, it's important to maintain good hit
> rate for compressed cache or increased CPU overhead can nullify any other
> benefits. This requires adaptive (compressed) cache resizing and page
> replacement policies that can maintain optimal cache size and quickly reclaim
> unused compressed chunks. This work is yet to be done. However, in the current
> state, it allows manually resizing cache size using (per-pool) sysfs node
> 'memlimit' which in turn frees any excess pages *sigh* randomly.
> 
> Finally, it uses percpu stats and compression buffers to allow better
> performance on multi-cores. Still, there are known bottlenecks like a single
> xvmalloc mempool per zcache pool and few others. I will work on this when I
> start with profiling.
> 
>  * Performance numbers:
>    - Tested using iozone filesystem benchmark
>    - 4 CPUs, 1G RAM
>    - Read performance gain: ~2.5X
>    - Random read performance gain: ~3X
>    - In general, performance gains for every kind of I/O
> 
> Test details with graphs can be found here:
> http://code.google.com/p/compcache/wiki/zcacheIOzone
> 
> If I can get some help with testing, it would be intersting to find its
> effect in more real-life workloads. In particular, I'm intersted in finding
> out its effect in KVM virtualization case where it can potentially allow
> running more number of VMs per-host for a given amount of RAM. With zcache
> enabled, VMs can be assigned much smaller amount of memory since host can now
> hold bulk of page-cache pages, allowing VMs to maintain similar level of
> performance while a greater number of them can be hosted.
> 
>  * How to test:
> All patches are against 2.6.35-rc5:
> 
>  - First, apply all prerequisite patches here:
> http://compcache.googlecode.com/hg/sub-projects/zcache_base_patches
> 
>  - Then apply this patch series; also uploaded here:
> http://compcache.googlecode.com/hg/sub-projects/zcache_patches
> 
> 
> Nitin Gupta (8):
>   Allow sharing xvmalloc for zram and zcache
>   Basic zcache functionality
>   Create sysfs nodes and export basic statistics
>   Shrink zcache based on memlimit
>   Eliminate zero-filled pages
>   Compress pages using LZO
>   Use xvmalloc to store compressed chunks
>   Document sysfs entries
> 
>  Documentation/ABI/testing/sysfs-kernel-mm-zcache |   53 +
>  drivers/staging/Makefile                         |    2 +
>  drivers/staging/zram/Kconfig                     |   22 +
>  drivers/staging/zram/Makefile                    |    5 +-
>  drivers/staging/zram/xvmalloc.c                  |    8 +
>  drivers/staging/zram/zcache_drv.c                | 1312 ++++++++++++++++++++++
>  drivers/staging/zram/zcache_drv.h                |   90 ++
>  7 files changed, 1491 insertions(+), 1 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-zcache
>  create mode 100644 drivers/staging/zram/zcache_drv.c
>  create mode 100644 drivers/staging/zram/zcache_drv.h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2010-07-17 21:13 ` Ed Tomlinson
@ 2010-07-18  2:23   ` Nitin Gupta
  0 siblings, 0 replies; 23+ messages in thread
From: Nitin Gupta @ 2010-07-18  2:23 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: linux-mm, linux-kernel

Hi Ed,

On 07/18/2010 02:43 AM, Ed Tomlinson wrote:
> 
> Would you have all this in a git tree somewhere?
> 
> Considering getting this working requires 24 patches it would really help with testing.
> 

Unfortunately, git tree for this is not hosted anywhere.

Anyways, I just uploaded monolithic zcache patch containing all its dependencies:
http://compcache.googlecode.com/hg/sub-projects/mainline/zcache_v1_2.6.35-rc5.patch

It applies on top of 2.6.35-rc5

Thanks for trying it out.
Nitin


> On Friday 16 July 2010 08:37:42 you wrote:
>> Frequently accessed filesystem data is stored in memory to reduce access to
>> (much) slower backing disks. Under memory pressure, these pages are freed and
>> when needed again, they have to be read from disks again. When combined working
>> set of all running application exceeds amount of physical RAM, we get extereme
>> slowdown as reading a page from disk can take time in order of milliseconds.
>>
>> Memory compression increases effective memory size and allows more pages to
>> stay in RAM. Since de/compressing memory pages is several orders of magnitude
>> faster than disk I/O, this can provide signifant performance gains for many
>> workloads. Also, with multi-cores becoming common, benefits of reduced disk I/O
>> should easily outweigh the problem of increased CPU usage.
>>
>> It is implemented as a "backend" for cleancache_ops [1] which provides
>> callbacks for events such as when a page is to be removed from the page cache
>> and when it is required again. We use them to implement a 'second chance' cache
>> for these evicted page cache pages by compressing and storing them in memory
>> itself.
>>
>> We only keep pages that compress to PAGE_SIZE/2 or less. Compressed chunks are
>> stored using xvmalloc memory allocator which is already being used by zram
>> driver for the same purpose. Zero-filled pages are checked and no memory is
>> allocated for them.
>>
>> A separate "pool" is created for each mount instance for a cleancache-aware
>> filesystem. Each incoming page is identified with <pool_id, inode_no, index>
>> where inode_no identifies file within the filesystem corresponding to pool_id
>> and index is offset of the page within this inode. Within a pool, inodes are
>> maintained in an rb-tree and each of its nodes points to a separate radix-tree
>> which maintains list of pages within that inode.
>>
>> While compression reduces disk I/O, it also reduces the space available for
>> normal (uncompressed) page cache. This can result in more frequent page cache
>> reclaim and thus higher CPU overhead. Thus, it's important to maintain good hit
>> rate for compressed cache or increased CPU overhead can nullify any other
>> benefits. This requires adaptive (compressed) cache resizing and page
>> replacement policies that can maintain optimal cache size and quickly reclaim
>> unused compressed chunks. This work is yet to be done. However, in the current
>> state, it allows manually resizing cache size using (per-pool) sysfs node
>> 'memlimit' which in turn frees any excess pages *sigh* randomly.
>>
>> Finally, it uses percpu stats and compression buffers to allow better
>> performance on multi-cores. Still, there are known bottlenecks like a single
>> xvmalloc mempool per zcache pool and few others. I will work on this when I
>> start with profiling.
>>
>>  * Performance numbers:
>>    - Tested using iozone filesystem benchmark
>>    - 4 CPUs, 1G RAM
>>    - Read performance gain: ~2.5X
>>    - Random read performance gain: ~3X
>>    - In general, performance gains for every kind of I/O
>>
>> Test details with graphs can be found here:
>> http://code.google.com/p/compcache/wiki/zcacheIOzone
>>
>> If I can get some help with testing, it would be intersting to find its
>> effect in more real-life workloads. In particular, I'm intersted in finding
>> out its effect in KVM virtualization case where it can potentially allow
>> running more number of VMs per-host for a given amount of RAM. With zcache
>> enabled, VMs can be assigned much smaller amount of memory since host can now
>> hold bulk of page-cache pages, allowing VMs to maintain similar level of
>> performance while a greater number of them can be hosted.
>>
>>  * How to test:
>> All patches are against 2.6.35-rc5:
>>
>>  - First, apply all prerequisite patches here:
>> http://compcache.googlecode.com/hg/sub-projects/zcache_base_patches
>>
>>  - Then apply this patch series; also uploaded here:
>> http://compcache.googlecode.com/hg/sub-projects/zcache_patches
>>
>>
>> Nitin Gupta (8):
>>   Allow sharing xvmalloc for zram and zcache
>>   Basic zcache functionality
>>   Create sysfs nodes and export basic statistics
>>   Shrink zcache based on memlimit
>>   Eliminate zero-filled pages
>>   Compress pages using LZO
>>   Use xvmalloc to store compressed chunks
>>   Document sysfs entries
>>
>>  Documentation/ABI/testing/sysfs-kernel-mm-zcache |   53 +
>>  drivers/staging/Makefile                         |    2 +
>>  drivers/staging/zram/Kconfig                     |   22 +
>>  drivers/staging/zram/Makefile                    |    5 +-
>>  drivers/staging/zram/xvmalloc.c                  |    8 +
>>  drivers/staging/zram/zcache_drv.c                | 1312 ++++++++++++++++++++++
>>  drivers/staging/zram/zcache_drv.h                |   90 ++
>>  7 files changed, 1491 insertions(+), 1 deletions(-)
>>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-zcache
>>  create mode 100644 drivers/staging/zram/zcache_drv.c
>>  create mode 100644 drivers/staging/zram/zcache_drv.h
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2010-07-16 12:37 Nitin Gupta
  2010-07-17 21:13 ` Ed Tomlinson
@ 2010-07-18  7:50 ` Pekka Enberg
  2010-07-18  8:12   ` Nitin Gupta
  2010-07-19 19:57 ` Dan Magenheimer
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Pekka Enberg @ 2010-07-18  7:50 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Hugh Dickins, Andrew Morton, Greg KH, Dan Magenheimer,
	Rik van Riel, Avi Kivity, Christoph Hellwig, Minchan Kim,
	Konrad Rzeszutek Wilk, linux-mm, linux-kernel

Nitin Gupta wrote:
> Frequently accessed filesystem data is stored in memory to reduce access to
> (much) slower backing disks. Under memory pressure, these pages are freed and
> when needed again, they have to be read from disks again. When combined working
> set of all running application exceeds amount of physical RAM, we get extereme
> slowdown as reading a page from disk can take time in order of milliseconds.
> 
> Memory compression increases effective memory size and allows more pages to
> stay in RAM. Since de/compressing memory pages is several orders of magnitude
> faster than disk I/O, this can provide signifant performance gains for many
> workloads. Also, with multi-cores becoming common, benefits of reduced disk I/O
> should easily outweigh the problem of increased CPU usage.
> 
> It is implemented as a "backend" for cleancache_ops [1] which provides
> callbacks for events such as when a page is to be removed from the page cache
> and when it is required again. We use them to implement a 'second chance' cache
> for these evicted page cache pages by compressing and storing them in memory
> itself.
> 
> We only keep pages that compress to PAGE_SIZE/2 or less. Compressed chunks are
> stored using xvmalloc memory allocator which is already being used by zram
> driver for the same purpose. Zero-filled pages are checked and no memory is
> allocated for them.
> 
> A separate "pool" is created for each mount instance for a cleancache-aware
> filesystem. Each incoming page is identified with <pool_id, inode_no, index>
> where inode_no identifies file within the filesystem corresponding to pool_id
> and index is offset of the page within this inode. Within a pool, inodes are
> maintained in an rb-tree and each of its nodes points to a separate radix-tree
> which maintains list of pages within that inode.
> 
> While compression reduces disk I/O, it also reduces the space available for
> normal (uncompressed) page cache. This can result in more frequent page cache
> reclaim and thus higher CPU overhead. Thus, it's important to maintain good hit
> rate for compressed cache or increased CPU overhead can nullify any other
> benefits. This requires adaptive (compressed) cache resizing and page
> replacement policies that can maintain optimal cache size and quickly reclaim
> unused compressed chunks. This work is yet to be done. However, in the current
> state, it allows manually resizing cache size using (per-pool) sysfs node
> 'memlimit' which in turn frees any excess pages *sigh* randomly.
> 
> Finally, it uses percpu stats and compression buffers to allow better
> performance on multi-cores. Still, there are known bottlenecks like a single
> xvmalloc mempool per zcache pool and few others. I will work on this when I
> start with profiling.
> 
>  * Performance numbers:
>    - Tested using iozone filesystem benchmark
>    - 4 CPUs, 1G RAM
>    - Read performance gain: ~2.5X
>    - Random read performance gain: ~3X
>    - In general, performance gains for every kind of I/O
> 
> Test details with graphs can be found here:
> http://code.google.com/p/compcache/wiki/zcacheIOzone
> 
> If I can get some help with testing, it would be intersting to find its
> effect in more real-life workloads. In particular, I'm intersted in finding
> out its effect in KVM virtualization case where it can potentially allow
> running more number of VMs per-host for a given amount of RAM. With zcache
> enabled, VMs can be assigned much smaller amount of memory since host can now
> hold bulk of page-cache pages, allowing VMs to maintain similar level of
> performance while a greater number of them can be hosted.

So why would someone want to use zram if they have transparent page 
cache compression with zcache? That is, why is this not a replacement 
for zram?

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2010-07-18  7:50 ` Pekka Enberg
@ 2010-07-18  8:12   ` Nitin Gupta
  0 siblings, 0 replies; 23+ messages in thread
From: Nitin Gupta @ 2010-07-18  8:12 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Hugh Dickins, Andrew Morton, Greg KH, Dan Magenheimer,
	Rik van Riel, Avi Kivity, Christoph Hellwig, Minchan Kim,
	Konrad Rzeszutek Wilk, linux-mm, linux-kernel

On 07/18/2010 01:20 PM, Pekka Enberg wrote:
> Nitin Gupta wrote:
>> Frequently accessed filesystem data is stored in memory to reduce access to
>> (much) slower backing disks. Under memory pressure, these pages are freed and
>> when needed again, they have to be read from disks again. When combined working
>> set of all running application exceeds amount of physical RAM, we get extereme
>> slowdown as reading a page from disk can take time in order of milliseconds.
>>
>> Memory compression increases effective memory size and allows more pages to
>> stay in RAM. Since de/compressing memory pages is several orders of magnitude
>> faster than disk I/O, this can provide signifant performance gains for many
>> workloads. Also, with multi-cores becoming common, benefits of reduced disk I/O
>> should easily outweigh the problem of increased CPU usage.
>>
>> It is implemented as a "backend" for cleancache_ops [1] which provides
>> callbacks for events such as when a page is to be removed from the page cache
>> and when it is required again. We use them to implement a 'second chance' cache
>> for these evicted page cache pages by compressing and storing them in memory
>> itself.
>>
<snip>

> 
> So why would someone want to use zram if they have transparent page cache compression with zcache? That is, why is this not a replacement for zram?
> 

zcache complements zram; it's not a replacement:

 - zram compresses anonymous pages while zcache is for page cache compression.
So, workload which depends heavily on "heap memory" usage will tend to prefer
zram and those which are I/O intensive will prefer zcache. Though I have not
yet experimented much, most workloads may want to have a mix of them.

 - zram is not just for swap. /dev/zram<id> are generic in-memory compressed
block devices which can be used for, say, /tmp, /var/... etc. temporary storage.

 - /dev/zram<id> being a generic block devices, can be used as raw disk in other
OSes also (using virtualization): For example:
http://www.vflare.org/2010/05/compressed-ram-disk-for-windows-virtual.html

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH 0/8] zcache: page cache compression support
  2010-07-16 12:37 Nitin Gupta
  2010-07-17 21:13 ` Ed Tomlinson
  2010-07-18  7:50 ` Pekka Enberg
@ 2010-07-19 19:57 ` Dan Magenheimer
  2010-07-20 13:50   ` Nitin Gupta
  2010-07-22 19:14 ` Greg KH
  2011-01-10 13:16 ` Kirill A. Shutemov
  4 siblings, 1 reply; 23+ messages in thread
From: Dan Magenheimer @ 2010-07-19 19:57 UTC (permalink / raw)
  To: Nitin Gupta, Pekka Enberg, Hugh Dickins, Andrew Morton, Greg KH,
	Rik van Riel, Avi Kivity, Christoph Hellwig, Minchan Kim,
	Konrad Wilk
  Cc: linux-mm, linux-kernel

> We only keep pages that compress to PAGE_SIZE/2 or less. Compressed
> chunks are
> stored using xvmalloc memory allocator which is already being used by
> zram
> driver for the same purpose. Zero-filled pages are checked and no
> memory is
> allocated for them.

I'm curious about this policy choice.  I can see why one
would want to ensure that the average page is compressed
to less than PAGE_SIZE/2, and preferably PAGE_SIZE/2
minus the overhead of the data structures necessary to
track the page.  And I see that this makes no difference
when the reclamation algorithm is random (as it is for
now).  But once there is some better reclamation logic,
I'd hope that this compression factor restriction would
be lifted and replaced with something much higher.  IIRC,
compression is much more expensive than decompression
so there's no CPU-overhead argument here either,
correct?

Thanks,
Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2010-07-19 19:57 ` Dan Magenheimer
@ 2010-07-20 13:50   ` Nitin Gupta
  2010-07-20 14:28     ` Dan Magenheimer
  0 siblings, 1 reply; 23+ messages in thread
From: Nitin Gupta @ 2010-07-20 13:50 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Pekka Enberg, Hugh Dickins, Andrew Morton, Greg KH, Rik van Riel,
	Avi Kivity, Christoph Hellwig, Minchan Kim, Konrad Wilk, linux-mm,
	linux-kernel

On 07/20/2010 01:27 AM, Dan Magenheimer wrote:
>> We only keep pages that compress to PAGE_SIZE/2 or less. Compressed
>> chunks are
>> stored using xvmalloc memory allocator which is already being used by
>> zram
>> driver for the same purpose. Zero-filled pages are checked and no
>> memory is
>> allocated for them.
> 
> I'm curious about this policy choice.  I can see why one
> would want to ensure that the average page is compressed
> to less than PAGE_SIZE/2, and preferably PAGE_SIZE/2
> minus the overhead of the data structures necessary to
> track the page.  And I see that this makes no difference
> when the reclamation algorithm is random (as it is for
> now).  But once there is some better reclamation logic,
> I'd hope that this compression factor restriction would
> be lifted and replaced with something much higher.  IIRC,
> compression is much more expensive than decompression
> so there's no CPU-overhead argument here either,
> correct?
> 
>

Its true that we waste CPU cycles for every incompressible page
encountered but still we can't keep such pages in RAM since this
is what host wanted to reclaim and we can't help since compression
failed. Compressed caching makes sense only when we keep highly
compressible pages in RAM, regardless of reclaim scheme.

Keeping (nearly) incompressible pages in RAM probably makes sense
for Xen's case where cleancache provider runs *inside* a VM, sending
pages to host. So, if VM is limited to say 512M and host has 64G RAM,
caching guest pages, with or without compression, will help.

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH 0/8] zcache: page cache compression support
  2010-07-20 13:50   ` Nitin Gupta
@ 2010-07-20 14:28     ` Dan Magenheimer
  2010-07-21  4:27       ` Nitin Gupta
  0 siblings, 1 reply; 23+ messages in thread
From: Dan Magenheimer @ 2010-07-20 14:28 UTC (permalink / raw)
  To: ngupta
  Cc: Pekka Enberg, Hugh Dickins, Andrew Morton, Greg KH, Rik van Riel,
	Avi Kivity, Christoph Hellwig, Minchan Kim, Konrad Wilk, linux-mm,
	linux-kernel

> On 07/20/2010 01:27 AM, Dan Magenheimer wrote:
> >> We only keep pages that compress to PAGE_SIZE/2 or less. Compressed
> >> chunks are
> >> stored using xvmalloc memory allocator which is already being used
> by
> >> zram
> >> driver for the same purpose. Zero-filled pages are checked and no
> >> memory is
> >> allocated for them.
> >
> > I'm curious about this policy choice.  I can see why one
> > would want to ensure that the average page is compressed
> > to less than PAGE_SIZE/2, and preferably PAGE_SIZE/2
> > minus the overhead of the data structures necessary to
> > track the page.  And I see that this makes no difference
> > when the reclamation algorithm is random (as it is for
> > now).  But once there is some better reclamation logic,
> > I'd hope that this compression factor restriction would
> > be lifted and replaced with something much higher.  IIRC,
> > compression is much more expensive than decompression
> > so there's no CPU-overhead argument here either,
> > correct?
> 
> Its true that we waste CPU cycles for every incompressible page
> encountered but still we can't keep such pages in RAM since this
> is what host wanted to reclaim and we can't help since compression
> failed. Compressed caching makes sense only when we keep highly
> compressible pages in RAM, regardless of reclaim scheme.
> 
> Keeping (nearly) incompressible pages in RAM probably makes sense
> for Xen's case where cleancache provider runs *inside* a VM, sending
> pages to host. So, if VM is limited to say 512M and host has 64G RAM,
> caching guest pages, with or without compression, will help.

I agree that the use model is a bit different, but PAGE_SIZE/2
still seems like an unnecessarily strict threshold.  For
example, saving 3000 clean pages in 2000*PAGE_SIZE of RAM
still seems like a considerable space savings.  And as
long as the _average_ is less than some threshold, saving
a few slightly-less-than-ideally-compressible pages doesn't
seem like it would be a problem.  For example, IMHO, saving two
pages when one compresses to 2047 bytes and the other compresses
to 2049 bytes seems just as reasonable as saving two pages that
both compress to 2048 bytes.

Maybe the best solution is to make the threshold a sysfs
settable?  Or maybe BOTH the single-page threshold and
the average threshold as two different sysfs settables?
E.g. throw away a put page if either it compresses poorly
or adding it to the pool would push the average over.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2010-07-20 14:28     ` Dan Magenheimer
@ 2010-07-21  4:27       ` Nitin Gupta
  2010-07-21 17:37         ` Dan Magenheimer
  0 siblings, 1 reply; 23+ messages in thread
From: Nitin Gupta @ 2010-07-21  4:27 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Pekka Enberg, Hugh Dickins, Andrew Morton, Greg KH, Rik van Riel,
	Avi Kivity, Christoph Hellwig, Minchan Kim, Konrad Wilk, linux-mm,
	linux-kernel

On 07/20/2010 07:58 PM, Dan Magenheimer wrote:
>> On 07/20/2010 01:27 AM, Dan Magenheimer wrote:
>>>> We only keep pages that compress to PAGE_SIZE/2 or less. Compressed
>>>> chunks are
>>>> stored using xvmalloc memory allocator which is already being used
>> by
>>>> zram
>>>> driver for the same purpose. Zero-filled pages are checked and no
>>>> memory is
>>>> allocated for them.
>>>
>>> I'm curious about this policy choice.  I can see why one
>>> would want to ensure that the average page is compressed
>>> to less than PAGE_SIZE/2, and preferably PAGE_SIZE/2
>>> minus the overhead of the data structures necessary to
>>> track the page.  And I see that this makes no difference
>>> when the reclamation algorithm is random (as it is for
>>> now).  But once there is some better reclamation logic,
>>> I'd hope that this compression factor restriction would
>>> be lifted and replaced with something much higher.  IIRC,
>>> compression is much more expensive than decompression
>>> so there's no CPU-overhead argument here either,
>>> correct?
>>
>> Its true that we waste CPU cycles for every incompressible page
>> encountered but still we can't keep such pages in RAM since this
>> is what host wanted to reclaim and we can't help since compression
>> failed. Compressed caching makes sense only when we keep highly
>> compressible pages in RAM, regardless of reclaim scheme.
>>
>> Keeping (nearly) incompressible pages in RAM probably makes sense
>> for Xen's case where cleancache provider runs *inside* a VM, sending
>> pages to host. So, if VM is limited to say 512M and host has 64G RAM,
>> caching guest pages, with or without compression, will help.
> 
> I agree that the use model is a bit different, but PAGE_SIZE/2
> still seems like an unnecessarily strict threshold.  For
> example, saving 3000 clean pages in 2000*PAGE_SIZE of RAM
> still seems like a considerable space savings.  And as
> long as the _average_ is less than some threshold, saving
> a few slightly-less-than-ideally-compressible pages doesn't
> seem like it would be a problem.  For example, IMHO, saving two
> pages when one compresses to 2047 bytes and the other compresses
> to 2049 bytes seems just as reasonable as saving two pages that
> both compress to 2048 bytes.
> 
> Maybe the best solution is to make the threshold a sysfs
> settable?  Or maybe BOTH the single-page threshold and
> the average threshold as two different sysfs settables?
> E.g. throw away a put page if either it compresses poorly
> or adding it to the pool would push the average over.
> 

Considering overall compression average instead of bothering about
individual page compressibility seems like a good point. Still, I think
storing completely incompressible pages isn't desirable.

So, I agree with the idea of separate sysfs tunables for average and single-page
compression thresholds with defaults conservatively set to 50% and PAGE_SIZE/2
respectively. I will include these in "v2" patches.

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH 0/8] zcache: page cache compression support
  2010-07-21  4:27       ` Nitin Gupta
@ 2010-07-21 17:37         ` Dan Magenheimer
  0 siblings, 0 replies; 23+ messages in thread
From: Dan Magenheimer @ 2010-07-21 17:37 UTC (permalink / raw)
  To: ngupta
  Cc: Pekka Enberg, Hugh Dickins, Andrew Morton, Greg KH, Rik van Riel,
	Avi Kivity, Christoph Hellwig, Minchan Kim, Konrad Wilk, linux-mm,
	linux-kernel

> > Maybe the best solution is to make the threshold a sysfs
> > settable?  Or maybe BOTH the single-page threshold and
> > the average threshold as two different sysfs settables?
> > E.g. throw away a put page if either it compresses poorly
> > or adding it to the pool would push the average over.
> 
> Considering overall compression average instead of bothering about
> individual page compressibility seems like a good point. Still, I think
> storing completely incompressible pages isn't desirable.
> 
> So, I agree with the idea of separate sysfs tunables for average and
> single-page
> compression thresholds with defaults conservatively set to 50% and
> PAGE_SIZE/2
> respectively. I will include these in "v2" patches.

Unless the single-page compression threshold is higher than the
average, the average is useless.  IMHO I'd suggest at least
5*PAGE_SIZE/8 as the single-page threshold, possibly higher.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2010-07-16 12:37 Nitin Gupta
                   ` (2 preceding siblings ...)
  2010-07-19 19:57 ` Dan Magenheimer
@ 2010-07-22 19:14 ` Greg KH
  2010-07-22 19:54   ` Dan Magenheimer
  2011-01-10 13:16 ` Kirill A. Shutemov
  4 siblings, 1 reply; 23+ messages in thread
From: Greg KH @ 2010-07-22 19:14 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Hugh Dickins, Andrew Morton, Dan Magenheimer,
	Rik van Riel, Avi Kivity, Christoph Hellwig, Minchan Kim,
	Konrad Rzeszutek Wilk, linux-mm, linux-kernel

On Fri, Jul 16, 2010 at 06:07:42PM +0530, Nitin Gupta wrote:
> Frequently accessed filesystem data is stored in memory to reduce access to
> (much) slower backing disks. Under memory pressure, these pages are freed and
> when needed again, they have to be read from disks again. When combined working
> set of all running application exceeds amount of physical RAM, we get extereme
> slowdown as reading a page from disk can take time in order of milliseconds.

<snip>

Given that there were a lot of comments and changes for this series, can
you resend them with your updates so I can then apply them if they are
acceptable to everyone?

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH 0/8] zcache: page cache compression support
  2010-07-22 19:14 ` Greg KH
@ 2010-07-22 19:54   ` Dan Magenheimer
  2010-07-22 21:00     ` Greg KH
  0 siblings, 1 reply; 23+ messages in thread
From: Dan Magenheimer @ 2010-07-22 19:54 UTC (permalink / raw)
  To: Greg KH, Nitin Gupta
  Cc: Pekka Enberg, Hugh Dickins, Andrew Morton, Rik van Riel,
	Avi Kivity, Christoph Hellwig, Minchan Kim, Konrad Wilk, linux-mm,
	linux-kernel

> From: Greg KH [mailto:greg@kroah.com]
> Sent: Thursday, July 22, 2010 1:15 PM
> To: Nitin Gupta
> Cc: Pekka Enberg; Hugh Dickins; Andrew Morton; Dan Magenheimer; Rik van
> Riel; Avi Kivity; Christoph Hellwig; Minchan Kim; Konrad Rzeszutek
> Wilk; linux-mm; linux-kernel
> Subject: Re: [PATCH 0/8] zcache: page cache compression support
> 
> On Fri, Jul 16, 2010 at 06:07:42PM +0530, Nitin Gupta wrote:
> > Frequently accessed filesystem data is stored in memory to reduce
> access to
> > (much) slower backing disks. Under memory pressure, these pages are
> freed and
> > when needed again, they have to be read from disks again. When
> combined working
> > set of all running application exceeds amount of physical RAM, we get
> extereme
> > slowdown as reading a page from disk can take time in order of
> milliseconds.
> 
> <snip>
> 
> Given that there were a lot of comments and changes for this series,
> can
> you resend them with your updates so I can then apply them if they are
> acceptable to everyone?
> 
> thanks,
> greg k-h

Hi Greg --

Nitin's zcache code is dependent on the cleancache series:
http://lkml.org/lkml/2010/6/21/411 

The cleancache series has not changed since V3 (other than
fixing a couple of documentation typos) and didn't receive any
comments other than Christoph's concern that there weren't
any users... which I think has been since addressed with the
posting of the Xen tmem driver code and Nitin's zcache.

If you are ready to apply the cleancache series, great!
If not, please let me know next steps so cleancache isn't
an impediment for applying the zcache series.

Thanks,
Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2010-07-22 19:54   ` Dan Magenheimer
@ 2010-07-22 21:00     ` Greg KH
  0 siblings, 0 replies; 23+ messages in thread
From: Greg KH @ 2010-07-22 21:00 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Nitin Gupta, Pekka Enberg, Hugh Dickins, Andrew Morton,
	Rik van Riel, Avi Kivity, Christoph Hellwig, Minchan Kim,
	Konrad Wilk, linux-mm, linux-kernel

On Thu, Jul 22, 2010 at 12:54:57PM -0700, Dan Magenheimer wrote:
> > From: Greg KH [mailto:greg@kroah.com]
> > Sent: Thursday, July 22, 2010 1:15 PM
> > To: Nitin Gupta
> > Cc: Pekka Enberg; Hugh Dickins; Andrew Morton; Dan Magenheimer; Rik van
> > Riel; Avi Kivity; Christoph Hellwig; Minchan Kim; Konrad Rzeszutek
> > Wilk; linux-mm; linux-kernel
> > Subject: Re: [PATCH 0/8] zcache: page cache compression support
> > 
> > On Fri, Jul 16, 2010 at 06:07:42PM +0530, Nitin Gupta wrote:
> > > Frequently accessed filesystem data is stored in memory to reduce
> > access to
> > > (much) slower backing disks. Under memory pressure, these pages are
> > freed and
> > > when needed again, they have to be read from disks again. When
> > combined working
> > > set of all running application exceeds amount of physical RAM, we get
> > extereme
> > > slowdown as reading a page from disk can take time in order of
> > milliseconds.
> > 
> > <snip>
> > 
> > Given that there were a lot of comments and changes for this series,
> > can
> > you resend them with your updates so I can then apply them if they are
> > acceptable to everyone?
> > 
> > thanks,
> > greg k-h
> 
> Hi Greg --
> 
> Nitin's zcache code is dependent on the cleancache series:
> http://lkml.org/lkml/2010/6/21/411 

Ah, I didn't realize that.  Hm, that makes it something that I can't
take until that code is upstream, sorry.

> The cleancache series has not changed since V3 (other than
> fixing a couple of documentation typos) and didn't receive any
> comments other than Christoph's concern that there weren't
> any users... which I think has been since addressed with the
> posting of the Xen tmem driver code and Nitin's zcache.
> 
> If you are ready to apply the cleancache series, great!
> If not, please let me know next steps so cleancache isn't
> an impediment for applying the zcache series.

I don't know, work with the kernel developers to resolve the issues they
pointed out in the cleancache code and when it goes in, then I can take
these patches.

good luck,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
       [not found] <575348163.1113381279906498028.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
@ 2010-07-23 17:36 ` caiqian
  2010-07-23 17:41   ` CAI Qian
  0 siblings, 1 reply; 23+ messages in thread
From: caiqian @ 2010-07-23 17:36 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: linux-mm, linux-kernel, Pekka Enberg, Hugh Dickins, Andrew Morton,
	Greg KH, Dan Magenheimer, Rik van Riel, Avi Kivity


----- "Nitin Gupta" <ngupta@vflare.org> wrote:

> Frequently accessed filesystem data is stored in memory to reduce
> access to
> (much) slower backing disks. Under memory pressure, these pages are
> freed and
> when needed again, they have to be read from disks again. When
> combined working
> set of all running application exceeds amount of physical RAM, we get
> extereme
> slowdown as reading a page from disk can take time in order of
> milliseconds.
> 
> Memory compression increases effective memory size and allows more
> pages to
> stay in RAM. Since de/compressing memory pages is several orders of
> magnitude
> faster than disk I/O, this can provide signifant performance gains for
> many
> workloads. Also, with multi-cores becoming common, benefits of reduced
> disk I/O
> should easily outweigh the problem of increased CPU usage.
> 
> It is implemented as a "backend" for cleancache_ops [1] which
> provides
> callbacks for events such as when a page is to be removed from the
> page cache
> and when it is required again. We use them to implement a 'second
> chance' cache
> for these evicted page cache pages by compressing and storing them in
> memory
> itself.
> 
> We only keep pages that compress to PAGE_SIZE/2 or less. Compressed
> chunks are
> stored using xvmalloc memory allocator which is already being used by
> zram
> driver for the same purpose. Zero-filled pages are checked and no
> memory is
> allocated for them.
> 
> A separate "pool" is created for each mount instance for a
> cleancache-aware
> filesystem. Each incoming page is identified with <pool_id, inode_no,
> index>
> where inode_no identifies file within the filesystem corresponding to
> pool_id
> and index is offset of the page within this inode. Within a pool,
> inodes are
> maintained in an rb-tree and each of its nodes points to a separate
> radix-tree
> which maintains list of pages within that inode.
> 
> While compression reduces disk I/O, it also reduces the space
> available for
> normal (uncompressed) page cache. This can result in more frequent
> page cache
> reclaim and thus higher CPU overhead. Thus, it's important to maintain
> good hit
> rate for compressed cache or increased CPU overhead can nullify any
> other
> benefits. This requires adaptive (compressed) cache resizing and page
> replacement policies that can maintain optimal cache size and quickly
> reclaim
> unused compressed chunks. This work is yet to be done. However, in the
> current
> state, it allows manually resizing cache size using (per-pool) sysfs
> node
> 'memlimit' which in turn frees any excess pages *sigh* randomly.
> 
> Finally, it uses percpu stats and compression buffers to allow better
> performance on multi-cores. Still, there are known bottlenecks like a
> single
> xvmalloc mempool per zcache pool and few others. I will work on this
> when I
> start with profiling.
> 
>  * Performance numbers:
>    - Tested using iozone filesystem benchmark
>    - 4 CPUs, 1G RAM
>    - Read performance gain: ~2.5X
>    - Random read performance gain: ~3X
>    - In general, performance gains for every kind of I/O
> 
> Test details with graphs can be found here:
> http://code.google.com/p/compcache/wiki/zcacheIOzone
> 
> If I can get some help with testing, it would be intersting to find
> its
> effect in more real-life workloads. In particular, I'm intersted in
> finding
> out its effect in KVM virtualization case where it can potentially
> allow
> running more number of VMs per-host for a given amount of RAM. With
> zcache
> enabled, VMs can be assigned much smaller amount of memory since host
> can now
> hold bulk of page-cache pages, allowing VMs to maintain similar level
> of
> performance while a greater number of them can be hosted.
> 
>  * How to test:
> All patches are against 2.6.35-rc5:
> 
>  - First, apply all prerequisite patches here:
> http://compcache.googlecode.com/hg/sub-projects/zcache_base_patches
> 
>  - Then apply this patch series; also uploaded here:
> http://compcache.googlecode.com/hg/sub-projects/zcache_patches
> 
> 
> Nitin Gupta (8):
>   Allow sharing xvmalloc for zram and zcache
>   Basic zcache functionality
>   Create sysfs nodes and export basic statistics
>   Shrink zcache based on memlimit
>   Eliminate zero-filled pages
>   Compress pages using LZO
>   Use xvmalloc to store compressed chunks
>   Document sysfs entries
> 
>  Documentation/ABI/testing/sysfs-kernel-mm-zcache |   53 +
>  drivers/staging/Makefile                         |    2 +
>  drivers/staging/zram/Kconfig                     |   22 +
>  drivers/staging/zram/Makefile                    |    5 +-
>  drivers/staging/zram/xvmalloc.c                  |    8 +
>  drivers/staging/zram/zcache_drv.c                | 1312
> ++++++++++++++++++++++
>  drivers/staging/zram/zcache_drv.h                |   90 ++
>  7 files changed, 1491 insertions(+), 1 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-zcache
>  create mode 100644 drivers/staging/zram/zcache_drv.c
>  create mode 100644 drivers/staging/zram/zcache_drv.h
By tested those patches on the top of the linus tree at this commit d0c6f6258478e1dba532bf7c28e2cd6e1047d3a4, the OOM was trigger even though there looked like still lots of swap.

# free -m
             total       used       free     shared    buffers     cached
Mem:           852        379        473          0          3         15
-/+ buffers/cache:        359        492
Swap:         2015         14       2001

# ./usemem 1024
0: Mallocing 32 megabytes
1: Mallocing 32 megabytes
2: Mallocing 32 megabytes
3: Mallocing 32 megabytes
4: Mallocing 32 megabytes
5: Mallocing 32 megabytes
6: Mallocing 32 megabytes
7: Mallocing 32 megabytes
8: Mallocing 32 megabytes
9: Mallocing 32 megabytes
10: Mallocing 32 megabytes
11: Mallocing 32 megabytes
12: Mallocing 32 megabytes
13: Mallocing 32 megabytes
14: Mallocing 32 megabytes
15: Mallocing 32 megabytes
Connection to 192.168.122.193 closed.

usemem invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0
usemem cpuset=/ mems_allowed=0
Pid: 1829, comm: usemem Not tainted 2.6.35-rc5+ #5
Call Trace:
 [<ffffffff814e10cb>] ? _raw_spin_unlock+0x2b/0x40
 [<ffffffff81108520>] dump_header+0x70/0x190
 [<ffffffff811086c1>] oom_kill_process+0x81/0x180
 [<ffffffff81108c08>] __out_of_memory+0x58/0xd0
 [<ffffffff81108ddc>] ? out_of_memory+0x15c/0x1f0
 [<ffffffff81108d8f>] out_of_memory+0x10f/0x1f0
 [<ffffffff8110cc7f>] __alloc_pages_nodemask+0x7af/0x7c0
 [<ffffffff81140a69>] alloc_page_vma+0x89/0x140
 [<ffffffff81125f76>] handle_mm_fault+0x6d6/0x990
 [<ffffffff814e10cb>] ? _raw_spin_unlock+0x2b/0x40
 [<ffffffff81121afd>] ? follow_page+0x19d/0x350
 [<ffffffff8112639c>] __get_user_pages+0x16c/0x480
 [<ffffffff810127c9>] ? sched_clock+0x9/0x10
 [<ffffffff811276ef>] __mlock_vma_pages_range+0xef/0x1f0
 [<ffffffff81127f01>] mlock_vma_pages_range+0x91/0xa0
 [<ffffffff8112ad57>] mmap_region+0x307/0x5b0
 [<ffffffff8112b354>] do_mmap_pgoff+0x354/0x3a0
 [<ffffffff8112b3fc>] ? sys_mmap_pgoff+0x5c/0x200
 [<ffffffff8112b41a>] sys_mmap_pgoff+0x7a/0x200
 [<ffffffff814e02f2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100fa09>] sys_mmap+0x29/0x30
 [<ffffffff8100b032>] system_call_fastpath+0x16/0x1b
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 140
CPU    1: hi:  186, btch:  31 usd:  47
active_anon:128 inactive_anon:140 isolated_anon:0
 active_file:0 inactive_file:9 isolated_file:0
 unevictable:126855 dirty:0 writeback:125 unstable:0
 free:1996 slab_reclaimable:4445 slab_unreclaimable:23646
 mapped:923 shmem:7 pagetables:778 bounce:0
Node 0 DMA free:4032kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:11896kB isolated(anon):0kB isolated(file):0kB present:15756kB mlocked:11896kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 994 994 994
Node 0 DMA32 free:3952kB min:4000kB low:5000kB high:6000kB active_anon:512kB inactive_anon:560kB active_file:0kB inactive_file:36kB unevictable:495524kB isolated(anon):0kB isolated(file):0kB present:1018060kB mlocked:495524kB dirty:0kB writeback:500kB mapped:3692kB shmem:28kB slab_reclaimable:17780kB slab_unreclaimable:94584kB kernel_stack:1296kB pagetables:3088kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1726 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 0*4kB 2*8kB 1*16kB 1*32kB 2*64kB 2*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 4032kB
Node 0 DMA32: 476*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3952kB
1146 total pagecache pages
215 pages in swap cache
Swap cache stats: add 19633, delete 19418, find 941/1333
Free swap  = 2051080kB
Total swap = 2064380kB
262138 pages RAM
43914 pages reserved
4832 pages shared
155665 pages non-shared
Out of memory: kill process 1727 (console-kit-dae) score 1027939 or a child
Killed process 1727 (console-kit-dae) vsz:4111756kB, anon-rss:0kB, file-rss:600kB
console-kit-dae invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
console-kit-dae cpuset=/ mems_allowed=0
Pid: 1752, comm: console-kit-dae Not tainted 2.6.35-rc5+ #5
Call Trace:
 [<ffffffff814e10cb>] ? _raw_spin_unlock+0x2b/0x40
 [<ffffffff81108520>] dump_header+0x70/0x190
 [<ffffffff811086c1>] oom_kill_process+0x81/0x180
 [<ffffffff81108c08>] __out_of_memory+0x58/0xd0
 [<ffffffff81108ddc>] ? out_of_memory+0x15c/0x1f0
 [<ffffffff81108d8f>] out_of_memory+0x10f/0x1f0
 [<ffffffff8110cc7f>] __alloc_pages_nodemask+0x7af/0x7c0
 [<ffffffff8114522e>] kmem_getpages+0x6e/0x180
 [<ffffffff81147d79>] fallback_alloc+0x1c9/0x2b0
 [<ffffffff81147602>] ? cache_grow+0x4b2/0x520
 [<ffffffff81147a5b>] ____cache_alloc_node+0xab/0x200
 [<ffffffff810d55d5>] ? taskstats_exit+0x305/0x3b0
 [<ffffffff8114862b>] kmem_cache_alloc+0x1fb/0x290
 [<ffffffff810d55d5>] taskstats_exit+0x305/0x3b0
 [<ffffffff81063a4b>] do_exit+0x12b/0x890
 [<ffffffff810924fd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8108641f>] ? cpu_clock+0x6f/0x80
 [<ffffffff81095cbd>] ? lock_release_holdtime+0x3d/0x190
 [<ffffffff814e1010>] ? _raw_spin_unlock_irq+0x30/0x40
 [<ffffffff8106420e>] do_group_exit+0x5e/0xd0
 [<ffffffff81075b54>] get_signal_to_deliver+0x2d4/0x490
 [<ffffffff811ea6ad>] ? inode_has_perm+0x7d/0xf0
 [<ffffffff8100a2e5>] do_signal+0x75/0x7b0
 [<ffffffff81169d2d>] ? vfs_ioctl+0x3d/0xf0
 [<ffffffff8116a394>] ? do_vfs_ioctl+0x84/0x570
 [<ffffffff8100aa85>] do_notify_resume+0x65/0x80
 [<ffffffff814e02f2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100b381>] int_signal+0x12/0x17
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 151
CPU    1: hi:  186, btch:  31 usd:  61
active_anon:128 inactive_anon:165 isolated_anon:0
 active_file:0 inactive_file:9 isolated_file:0
 unevictable:126855 dirty:0 writeback:25 unstable:0
 free:1965 slab_reclaimable:4445 slab_unreclaimable:23646
 mapped:923 shmem:7 pagetables:778 bounce:0
Node 0 DMA free:4032kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:11896kB isolated(anon):0kB isolated(file):0kB present:15756kB mlocked:11896kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 994 994 994
Node 0 DMA32 free:3828kB min:4000kB low:5000kB high:6000kB active_anon:512kB inactive_anon:660kB active_file:0kB inactive_file:36kB unevictable:495524kB isolated(anon):0kB isolated(file):0kB present:1018060kB mlocked:495524kB dirty:0kB writeback:100kB mapped:3692kB shmem:28kB slab_reclaimable:17780kB slab_unreclaimable:94584kB kernel_stack:1296kB pagetables:3088kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1726 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 0*4kB 2*8kB 1*16kB 1*32kB 2*64kB 2*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 4032kB
Node 0 DMA32: 445*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3828kB
1146 total pagecache pages
230 pages in swap cache
Swap cache stats: add 19649, delete 19419, find 942/1336
Free swap  = 2051084kB
Total swap = 2064380kB
262138 pages RAM
43914 pages reserved
4818 pages shared
155685 pages non-shared
Out of memory: kill process 1806 (sshd) score 9474 or a child
Killed process 1810 (bash) vsz:108384kB, anon-rss:0kB, file-rss:656kB
console-kit-dae invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
console-kit-dae cpuset=/ mems_allowed=0
Pid: 1752, comm: console-kit-dae Not tainted 2.6.35-rc5+ #5
Call Trace:
 [<ffffffff814e10cb>] ? _raw_spin_unlock+0x2b/0x40
 [<ffffffff81108520>] dump_header+0x70/0x190
 [<ffffffff811086c1>] oom_kill_process+0x81/0x180
 [<ffffffff81108c08>] __out_of_memory+0x58/0xd0
 [<ffffffff81108ddc>] ? out_of_memory+0x15c/0x1f0
 [<ffffffff81108d8f>] out_of_memory+0x10f/0x1f0
 [<ffffffff8110cc7f>] __alloc_pages_nodemask+0x7af/0x7c0
 [<ffffffff8114522e>] kmem_getpages+0x6e/0x180
 [<ffffffff81147d79>] fallback_alloc+0x1c9/0x2b0
 [<ffffffff81147602>] ? cache_grow+0x4b2/0x520
 [<ffffffff81147a5b>] ____cache_alloc_node+0xab/0x200
 [<ffffffff810d55d5>] ? taskstats_exit+0x305/0x3b0
 [<ffffffff8114862b>] kmem_cache_alloc+0x1fb/0x290
 [<ffffffff810d55d5>] taskstats_exit+0x305/0x3b0
 [<ffffffff81063a4b>] do_exit+0x12b/0x890
 [<ffffffff810924fd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8108641f>] ? cpu_clock+0x6f/0x80
 [<ffffffff81095cbd>] ? lock_release_holdtime+0x3d/0x190
 [<ffffffff814e1010>] ? _raw_spin_unlock_irq+0x30/0x40
 [<ffffffff8106420e>] do_group_exit+0x5e/0xd0
 [<ffffffff81075b54>] get_signal_to_deliver+0x2d4/0x490
 [<ffffffff811ea6ad>] ? inode_has_perm+0x7d/0xf0
 [<ffffffff8100a2e5>] do_signal+0x75/0x7b0
 [<ffffffff81169d2d>] ? vfs_ioctl+0x3d/0xf0
 [<ffffffff8116a394>] ? do_vfs_ioctl+0x84/0x570
 [<ffffffff8100aa85>] do_notify_resume+0x65/0x80
 [<ffffffff814e02f2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100b381>] int_signal+0x12/0x17
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 119
CPU    1: hi:  186, btch:  31 usd:  73
active_anon:50 inactive_anon:175 isolated_anon:0
 active_file:0 inactive_file:9 isolated_file:0
 unevictable:126855 dirty:0 writeback:25 unstable:0
 free:1996 slab_reclaimable:4445 slab_unreclaimable:23663
 mapped:923 shmem:7 pagetables:778 bounce:0
Node 0 DMA free:4032kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:11896kB isolated(anon):0kB isolated(file):0kB present:15756kB mlocked:11896kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 994 994 994
Node 0 DMA32 free:3952kB min:4000kB low:5000kB high:6000kB active_anon:200kB inactive_anon:700kB active_file:0kB inactive_file:36kB unevictable:495524kB isolated(anon):0kB isolated(file):0kB present:1018060kB mlocked:495524kB dirty:0kB writeback:100kB mapped:3692kB shmem:28kB slab_reclaimable:17780kB slab_unreclaimable:94652kB kernel_stack:1296kB pagetables:3088kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1536 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 0*4kB 2*8kB 1*16kB 1*32kB 2*64kB 2*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 4032kB
Node 0 DMA32: 470*4kB 3*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3952kB
1146 total pagecache pages
221 pages in swap cache
Swap cache stats: add 19848, delete 19627, find 970/1386
Free swap  = 2051428kB
Total swap = 2064380kB
262138 pages RAM
43914 pages reserved
4669 pages shared
155659 pages non-shared
Out of memory: kill process 1829 (usemem) score 8253 or a child
Killed process 1829 (usemem) vsz:528224kB, anon-rss:502468kB, file-rss:376kB

# cat usemem.c
# cat usemem.c 
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#define CHUNKS 32

int 
main(int argc, char *argv[])
{
	mlockall(MCL_FUTURE);

	unsigned long mb;
	char *buf[CHUNKS];
	int i;

	if (argc < 2) {
		fprintf(stderr, "usage: usemem megabytes\n");
		exit(1);
	}
	mb = strtoul(argv[1], NULL, 0);

	for (i = 0; i < CHUNKS; i++) {
		fprintf(stderr, "%d: Mallocing %lu megabytes\n", i, mb/CHUNKS);
		buf[i] = (char *)malloc(mb/CHUNKS * 1024L * 1024L);
		if (!buf[i]) {
			fprintf(stderr, "malloc failure\n");
			exit(1);
		}
	}

	for (i = 0; i < CHUNKS; i++) {
		fprintf(stderr, "%d: Zeroing %lu megabytes at %p\n", 
				i, mb/CHUNKS, buf[i]);
		memset(buf[i], 0, mb/CHUNKS * 1024L * 1024L);
	}


	exit(0);
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2010-07-23 17:36 ` [PATCH 0/8] zcache: page cache compression support caiqian
@ 2010-07-23 17:41   ` CAI Qian
  2010-07-23 18:02     ` CAI Qian
  0 siblings, 1 reply; 23+ messages in thread
From: CAI Qian @ 2010-07-23 17:41 UTC (permalink / raw)
  To: caiqian
  Cc: linux-mm, linux-kernel, Pekka Enberg, Hugh Dickins, Andrew Morton,
	Greg KH, Dan Magenheimer, Rik van Riel, Avi Kivity, Nitin Gupta


----- caiqian@redhat.com wrote:

> ----- "Nitin Gupta" <ngupta@vflare.org> wrote:
> 
> > Frequently accessed filesystem data is stored in memory to reduce
> > access to
> > (much) slower backing disks. Under memory pressure, these pages are
> > freed and
> > when needed again, they have to be read from disks again. When
> > combined working
> > set of all running application exceeds amount of physical RAM, we
> get
> > extereme
> > slowdown as reading a page from disk can take time in order of
> > milliseconds.
> > 
> > Memory compression increases effective memory size and allows more
> > pages to
> > stay in RAM. Since de/compressing memory pages is several orders of
> > magnitude
> > faster than disk I/O, this can provide signifant performance gains
> for
> > many
> > workloads. Also, with multi-cores becoming common, benefits of
> reduced
> > disk I/O
> > should easily outweigh the problem of increased CPU usage.
> > 
> > It is implemented as a "backend" for cleancache_ops [1] which
> > provides
> > callbacks for events such as when a page is to be removed from the
> > page cache
> > and when it is required again. We use them to implement a 'second
> > chance' cache
> > for these evicted page cache pages by compressing and storing them
> in
> > memory
> > itself.
> > 
> > We only keep pages that compress to PAGE_SIZE/2 or less. Compressed
> > chunks are
> > stored using xvmalloc memory allocator which is already being used
> by
> > zram
> > driver for the same purpose. Zero-filled pages are checked and no
> > memory is
> > allocated for them.
> > 
> > A separate "pool" is created for each mount instance for a
> > cleancache-aware
> > filesystem. Each incoming page is identified with <pool_id,
> inode_no,
> > index>
> > where inode_no identifies file within the filesystem corresponding
> to
> > pool_id
> > and index is offset of the page within this inode. Within a pool,
> > inodes are
> > maintained in an rb-tree and each of its nodes points to a separate
> > radix-tree
> > which maintains list of pages within that inode.
> > 
> > While compression reduces disk I/O, it also reduces the space
> > available for
> > normal (uncompressed) page cache. This can result in more frequent
> > page cache
> > reclaim and thus higher CPU overhead. Thus, it's important to
> maintain
> > good hit
> > rate for compressed cache or increased CPU overhead can nullify any
> > other
> > benefits. This requires adaptive (compressed) cache resizing and
> page
> > replacement policies that can maintain optimal cache size and
> quickly
> > reclaim
> > unused compressed chunks. This work is yet to be done. However, in
> the
> > current
> > state, it allows manually resizing cache size using (per-pool)
> sysfs
> > node
> > 'memlimit' which in turn frees any excess pages *sigh* randomly.
> > 
> > Finally, it uses percpu stats and compression buffers to allow
> better
> > performance on multi-cores. Still, there are known bottlenecks like
> a
> > single
> > xvmalloc mempool per zcache pool and few others. I will work on
> this
> > when I
> > start with profiling.
> > 
> >  * Performance numbers:
> >    - Tested using iozone filesystem benchmark
> >    - 4 CPUs, 1G RAM
> >    - Read performance gain: ~2.5X
> >    - Random read performance gain: ~3X
> >    - In general, performance gains for every kind of I/O
> > 
> > Test details with graphs can be found here:
> > http://code.google.com/p/compcache/wiki/zcacheIOzone
> > 
> > If I can get some help with testing, it would be intersting to find
> > its
> > effect in more real-life workloads. In particular, I'm intersted in
> > finding
> > out its effect in KVM virtualization case where it can potentially
> > allow
> > running more number of VMs per-host for a given amount of RAM. With
> > zcache
> > enabled, VMs can be assigned much smaller amount of memory since
> host
> > can now
> > hold bulk of page-cache pages, allowing VMs to maintain similar
> level
> > of
> > performance while a greater number of them can be hosted.
> > 
> >  * How to test:
> > All patches are against 2.6.35-rc5:
> > 
> >  - First, apply all prerequisite patches here:
> > http://compcache.googlecode.com/hg/sub-projects/zcache_base_patches
> > 
> >  - Then apply this patch series; also uploaded here:
> > http://compcache.googlecode.com/hg/sub-projects/zcache_patches
> > 
> > 
> > Nitin Gupta (8):
> >   Allow sharing xvmalloc for zram and zcache
> >   Basic zcache functionality
> >   Create sysfs nodes and export basic statistics
> >   Shrink zcache based on memlimit
> >   Eliminate zero-filled pages
> >   Compress pages using LZO
> >   Use xvmalloc to store compressed chunks
> >   Document sysfs entries
> > 
> >  Documentation/ABI/testing/sysfs-kernel-mm-zcache |   53 +
> >  drivers/staging/Makefile                         |    2 +
> >  drivers/staging/zram/Kconfig                     |   22 +
> >  drivers/staging/zram/Makefile                    |    5 +-
> >  drivers/staging/zram/xvmalloc.c                  |    8 +
> >  drivers/staging/zram/zcache_drv.c                | 1312
> > ++++++++++++++++++++++
> >  drivers/staging/zram/zcache_drv.h                |   90 ++
> >  7 files changed, 1491 insertions(+), 1 deletions(-)
> >  create mode 100644
> Documentation/ABI/testing/sysfs-kernel-mm-zcache
> >  create mode 100644 drivers/staging/zram/zcache_drv.c
> >  create mode 100644 drivers/staging/zram/zcache_drv.h
> By tested those patches on the top of the linus tree at this commit
> d0c6f6258478e1dba532bf7c28e2cd6e1047d3a4, the OOM was trigger even
> though there looked like still lots of swap.
> 
> # free -m
>              total       used       free     shared    buffers    
> cached
> Mem:           852        379        473          0          3        
> 15
> -/+ buffers/cache:        359        492
> Swap:         2015         14       2001
> 
> # ./usemem 1024
> 0: Mallocing 32 megabytes
> 1: Mallocing 32 megabytes
> 2: Mallocing 32 megabytes
> 3: Mallocing 32 megabytes
> 4: Mallocing 32 megabytes
> 5: Mallocing 32 megabytes
> 6: Mallocing 32 megabytes
> 7: Mallocing 32 megabytes
> 8: Mallocing 32 megabytes
> 9: Mallocing 32 megabytes
> 10: Mallocing 32 megabytes
> 11: Mallocing 32 megabytes
> 12: Mallocing 32 megabytes
> 13: Mallocing 32 megabytes
> 14: Mallocing 32 megabytes
> 15: Mallocing 32 megabytes
> Connection to 192.168.122.193 closed.
> 
> usemem invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0
> usemem cpuset=/ mems_allowed=0
> Pid: 1829, comm: usemem Not tainted 2.6.35-rc5+ #5
> Call Trace:
>  [<ffffffff814e10cb>] ? _raw_spin_unlock+0x2b/0x40
>  [<ffffffff81108520>] dump_header+0x70/0x190
>  [<ffffffff811086c1>] oom_kill_process+0x81/0x180
>  [<ffffffff81108c08>] __out_of_memory+0x58/0xd0
>  [<ffffffff81108ddc>] ? out_of_memory+0x15c/0x1f0
>  [<ffffffff81108d8f>] out_of_memory+0x10f/0x1f0
>  [<ffffffff8110cc7f>] __alloc_pages_nodemask+0x7af/0x7c0
>  [<ffffffff81140a69>] alloc_page_vma+0x89/0x140
>  [<ffffffff81125f76>] handle_mm_fault+0x6d6/0x990
>  [<ffffffff814e10cb>] ? _raw_spin_unlock+0x2b/0x40
>  [<ffffffff81121afd>] ? follow_page+0x19d/0x350
>  [<ffffffff8112639c>] __get_user_pages+0x16c/0x480
>  [<ffffffff810127c9>] ? sched_clock+0x9/0x10
>  [<ffffffff811276ef>] __mlock_vma_pages_range+0xef/0x1f0
>  [<ffffffff81127f01>] mlock_vma_pages_range+0x91/0xa0
>  [<ffffffff8112ad57>] mmap_region+0x307/0x5b0
>  [<ffffffff8112b354>] do_mmap_pgoff+0x354/0x3a0
>  [<ffffffff8112b3fc>] ? sys_mmap_pgoff+0x5c/0x200
>  [<ffffffff8112b41a>] sys_mmap_pgoff+0x7a/0x200
>  [<ffffffff814e02f2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8100fa09>] sys_mmap+0x29/0x30
>  [<ffffffff8100b032>] system_call_fastpath+0x16/0x1b
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU    0: hi:    0, btch:   1 usd:   0
> CPU    1: hi:    0, btch:   1 usd:   0
> Node 0 DMA32 per-cpu:
> CPU    0: hi:  186, btch:  31 usd: 140
> CPU    1: hi:  186, btch:  31 usd:  47
> active_anon:128 inactive_anon:140 isolated_anon:0
>  active_file:0 inactive_file:9 isolated_file:0
>  unevictable:126855 dirty:0 writeback:125 unstable:0
>  free:1996 slab_reclaimable:4445 slab_unreclaimable:23646
>  mapped:923 shmem:7 pagetables:778 bounce:0
> Node 0 DMA free:4032kB min:60kB low:72kB high:88kB active_anon:0kB
> inactive_anon:0kB active_file:0kB inactive_file:0kB
> unevictable:11896kB isolated(anon):0kB isolated(file):0kB
> present:15756kB mlocked:11896kB dirty:0kB writeback:0kB mapped:0kB
> shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB
> pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 994 994 994
> Node 0 DMA32 free:3952kB min:4000kB low:5000kB high:6000kB
> active_anon:512kB inactive_anon:560kB active_file:0kB
> inactive_file:36kB unevictable:495524kB isolated(anon):0kB
> isolated(file):0kB present:1018060kB mlocked:495524kB dirty:0kB
> writeback:500kB mapped:3692kB shmem:28kB slab_reclaimable:17780kB
> slab_unreclaimable:94584kB kernel_stack:1296kB pagetables:3088kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1726
> all_unreclaimable? yes
> lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 0*4kB 2*8kB 1*16kB 1*32kB 2*64kB 2*128kB 2*256kB 0*512kB
> 1*1024kB 1*2048kB 0*4096kB = 4032kB
> Node 0 DMA32: 476*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3952kB
> 1146 total pagecache pages
> 215 pages in swap cache
> Swap cache stats: add 19633, delete 19418, find 941/1333
> Free swap  = 2051080kB
> Total swap = 2064380kB
> 262138 pages RAM
> 43914 pages reserved
> 4832 pages shared
> 155665 pages non-shared
> Out of memory: kill process 1727 (console-kit-dae) score 1027939 or a
> child
> Killed process 1727 (console-kit-dae) vsz:4111756kB, anon-rss:0kB,
> file-rss:600kB
> console-kit-dae invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
> console-kit-dae cpuset=/ mems_allowed=0
> Pid: 1752, comm: console-kit-dae Not tainted 2.6.35-rc5+ #5
> Call Trace:
>  [<ffffffff814e10cb>] ? _raw_spin_unlock+0x2b/0x40
>  [<ffffffff81108520>] dump_header+0x70/0x190
>  [<ffffffff811086c1>] oom_kill_process+0x81/0x180
>  [<ffffffff81108c08>] __out_of_memory+0x58/0xd0
>  [<ffffffff81108ddc>] ? out_of_memory+0x15c/0x1f0
>  [<ffffffff81108d8f>] out_of_memory+0x10f/0x1f0
>  [<ffffffff8110cc7f>] __alloc_pages_nodemask+0x7af/0x7c0
>  [<ffffffff8114522e>] kmem_getpages+0x6e/0x180
>  [<ffffffff81147d79>] fallback_alloc+0x1c9/0x2b0
>  [<ffffffff81147602>] ? cache_grow+0x4b2/0x520
>  [<ffffffff81147a5b>] ____cache_alloc_node+0xab/0x200
>  [<ffffffff810d55d5>] ? taskstats_exit+0x305/0x3b0
>  [<ffffffff8114862b>] kmem_cache_alloc+0x1fb/0x290
>  [<ffffffff810d55d5>] taskstats_exit+0x305/0x3b0
>  [<ffffffff81063a4b>] do_exit+0x12b/0x890
>  [<ffffffff810924fd>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffff8108641f>] ? cpu_clock+0x6f/0x80
>  [<ffffffff81095cbd>] ? lock_release_holdtime+0x3d/0x190
>  [<ffffffff814e1010>] ? _raw_spin_unlock_irq+0x30/0x40
>  [<ffffffff8106420e>] do_group_exit+0x5e/0xd0
>  [<ffffffff81075b54>] get_signal_to_deliver+0x2d4/0x490
>  [<ffffffff811ea6ad>] ? inode_has_perm+0x7d/0xf0
>  [<ffffffff8100a2e5>] do_signal+0x75/0x7b0
>  [<ffffffff81169d2d>] ? vfs_ioctl+0x3d/0xf0
>  [<ffffffff8116a394>] ? do_vfs_ioctl+0x84/0x570
>  [<ffffffff8100aa85>] do_notify_resume+0x65/0x80
>  [<ffffffff814e02f2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8100b381>] int_signal+0x12/0x17
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU    0: hi:    0, btch:   1 usd:   0
> CPU    1: hi:    0, btch:   1 usd:   0
> Node 0 DMA32 per-cpu:
> CPU    0: hi:  186, btch:  31 usd: 151
> CPU    1: hi:  186, btch:  31 usd:  61
> active_anon:128 inactive_anon:165 isolated_anon:0
>  active_file:0 inactive_file:9 isolated_file:0
>  unevictable:126855 dirty:0 writeback:25 unstable:0
>  free:1965 slab_reclaimable:4445 slab_unreclaimable:23646
>  mapped:923 shmem:7 pagetables:778 bounce:0
> Node 0 DMA free:4032kB min:60kB low:72kB high:88kB active_anon:0kB
> inactive_anon:0kB active_file:0kB inactive_file:0kB
> unevictable:11896kB isolated(anon):0kB isolated(file):0kB
> present:15756kB mlocked:11896kB dirty:0kB writeback:0kB mapped:0kB
> shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB
> pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 994 994 994
> Node 0 DMA32 free:3828kB min:4000kB low:5000kB high:6000kB
> active_anon:512kB inactive_anon:660kB active_file:0kB
> inactive_file:36kB unevictable:495524kB isolated(anon):0kB
> isolated(file):0kB present:1018060kB mlocked:495524kB dirty:0kB
> writeback:100kB mapped:3692kB shmem:28kB slab_reclaimable:17780kB
> slab_unreclaimable:94584kB kernel_stack:1296kB pagetables:3088kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1726
> all_unreclaimable? yes
> lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 0*4kB 2*8kB 1*16kB 1*32kB 2*64kB 2*128kB 2*256kB 0*512kB
> 1*1024kB 1*2048kB 0*4096kB = 4032kB
> Node 0 DMA32: 445*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3828kB
> 1146 total pagecache pages
> 230 pages in swap cache
> Swap cache stats: add 19649, delete 19419, find 942/1336
> Free swap  = 2051084kB
> Total swap = 2064380kB
> 262138 pages RAM
> 43914 pages reserved
> 4818 pages shared
> 155685 pages non-shared
> Out of memory: kill process 1806 (sshd) score 9474 or a child
> Killed process 1810 (bash) vsz:108384kB, anon-rss:0kB, file-rss:656kB
> console-kit-dae invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
> console-kit-dae cpuset=/ mems_allowed=0
> Pid: 1752, comm: console-kit-dae Not tainted 2.6.35-rc5+ #5
> Call Trace:
>  [<ffffffff814e10cb>] ? _raw_spin_unlock+0x2b/0x40
>  [<ffffffff81108520>] dump_header+0x70/0x190
>  [<ffffffff811086c1>] oom_kill_process+0x81/0x180
>  [<ffffffff81108c08>] __out_of_memory+0x58/0xd0
>  [<ffffffff81108ddc>] ? out_of_memory+0x15c/0x1f0
>  [<ffffffff81108d8f>] out_of_memory+0x10f/0x1f0
>  [<ffffffff8110cc7f>] __alloc_pages_nodemask+0x7af/0x7c0
>  [<ffffffff8114522e>] kmem_getpages+0x6e/0x180
>  [<ffffffff81147d79>] fallback_alloc+0x1c9/0x2b0
>  [<ffffffff81147602>] ? cache_grow+0x4b2/0x520
>  [<ffffffff81147a5b>] ____cache_alloc_node+0xab/0x200
>  [<ffffffff810d55d5>] ? taskstats_exit+0x305/0x3b0
>  [<ffffffff8114862b>] kmem_cache_alloc+0x1fb/0x290
>  [<ffffffff810d55d5>] taskstats_exit+0x305/0x3b0
>  [<ffffffff81063a4b>] do_exit+0x12b/0x890
>  [<ffffffff810924fd>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffff8108641f>] ? cpu_clock+0x6f/0x80
>  [<ffffffff81095cbd>] ? lock_release_holdtime+0x3d/0x190
>  [<ffffffff814e1010>] ? _raw_spin_unlock_irq+0x30/0x40
>  [<ffffffff8106420e>] do_group_exit+0x5e/0xd0
>  [<ffffffff81075b54>] get_signal_to_deliver+0x2d4/0x490
>  [<ffffffff811ea6ad>] ? inode_has_perm+0x7d/0xf0
>  [<ffffffff8100a2e5>] do_signal+0x75/0x7b0
>  [<ffffffff81169d2d>] ? vfs_ioctl+0x3d/0xf0
>  [<ffffffff8116a394>] ? do_vfs_ioctl+0x84/0x570
>  [<ffffffff8100aa85>] do_notify_resume+0x65/0x80
>  [<ffffffff814e02f2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8100b381>] int_signal+0x12/0x17
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU    0: hi:    0, btch:   1 usd:   0
> CPU    1: hi:    0, btch:   1 usd:   0
> Node 0 DMA32 per-cpu:
> CPU    0: hi:  186, btch:  31 usd: 119
> CPU    1: hi:  186, btch:  31 usd:  73
> active_anon:50 inactive_anon:175 isolated_anon:0
>  active_file:0 inactive_file:9 isolated_file:0
>  unevictable:126855 dirty:0 writeback:25 unstable:0
>  free:1996 slab_reclaimable:4445 slab_unreclaimable:23663
>  mapped:923 shmem:7 pagetables:778 bounce:0
> Node 0 DMA free:4032kB min:60kB low:72kB high:88kB active_anon:0kB
> inactive_anon:0kB active_file:0kB inactive_file:0kB
> unevictable:11896kB isolated(anon):0kB isolated(file):0kB
> present:15756kB mlocked:11896kB dirty:0kB writeback:0kB mapped:0kB
> shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB
> pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 994 994 994
> Node 0 DMA32 free:3952kB min:4000kB low:5000kB high:6000kB
> active_anon:200kB inactive_anon:700kB active_file:0kB
> inactive_file:36kB unevictable:495524kB isolated(anon):0kB
> isolated(file):0kB present:1018060kB mlocked:495524kB dirty:0kB
> writeback:100kB mapped:3692kB shmem:28kB slab_reclaimable:17780kB
> slab_unreclaimable:94652kB kernel_stack:1296kB pagetables:3088kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1536
> all_unreclaimable? yes
> lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 0*4kB 2*8kB 1*16kB 1*32kB 2*64kB 2*128kB 2*256kB 0*512kB
> 1*1024kB 1*2048kB 0*4096kB = 4032kB
> Node 0 DMA32: 470*4kB 3*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3952kB
> 1146 total pagecache pages
> 221 pages in swap cache
> Swap cache stats: add 19848, delete 19627, find 970/1386
> Free swap  = 2051428kB
> Total swap = 2064380kB
> 262138 pages RAM
> 43914 pages reserved
> 4669 pages shared
> 155659 pages non-shared
> Out of memory: kill process 1829 (usemem) score 8253 or a child
> Killed process 1829 (usemem) vsz:528224kB, anon-rss:502468kB,
> file-rss:376kB
> 
> # cat usemem.c
> # cat usemem.c 
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/mman.h>
> #define CHUNKS 32
> 
> int 
> main(int argc, char *argv[])
> {
> 	mlockall(MCL_FUTURE);
> 
> 	unsigned long mb;
> 	char *buf[CHUNKS];
> 	int i;
> 
> 	if (argc < 2) {
> 		fprintf(stderr, "usage: usemem megabytes\n");
> 		exit(1);
> 	}
> 	mb = strtoul(argv[1], NULL, 0);
> 
> 	for (i = 0; i < CHUNKS; i++) {
> 		fprintf(stderr, "%d: Mallocing %lu megabytes\n", i, mb/CHUNKS);
> 		buf[i] = (char *)malloc(mb/CHUNKS * 1024L * 1024L);
> 		if (!buf[i]) {
> 			fprintf(stderr, "malloc failure\n");
> 			exit(1);
> 		}
> 	}
> 
> 	for (i = 0; i < CHUNKS; i++) {
> 		fprintf(stderr, "%d: Zeroing %lu megabytes at %p\n", 
> 				i, mb/CHUNKS, buf[i]);
> 		memset(buf[i], 0, mb/CHUNKS * 1024L * 1024L);
> 	}
> 
> 
> 	exit(0);
> }
> 
If this ever be relevant, this was tested inside the kvm guest. The host was a RHEL6 with THP enabled.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2010-07-23 17:41   ` CAI Qian
@ 2010-07-23 18:02     ` CAI Qian
  2010-07-24 14:41       ` Valdis.Kletnieks
  0 siblings, 1 reply; 23+ messages in thread
From: CAI Qian @ 2010-07-23 18:02 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Pekka Enberg, Hugh Dickins, Andrew Morton,
	Greg KH, Dan Magenheimer, Rik van Riel, Avi Kivity, Nitin Gupta

Ignore me. The test case should not be using mlockall()!

----- "CAI Qian" <caiqian@redhat.com> wrote:

> ----- caiqian@redhat.com wrote:
> 
> > ----- "Nitin Gupta" <ngupta@vflare.org> wrote:
> > 
> > > Frequently accessed filesystem data is stored in memory to reduce
> > > access to
> > > (much) slower backing disks. Under memory pressure, these pages
> are
> > > freed and
> > > when needed again, they have to be read from disks again. When
> > > combined working
> > > set of all running application exceeds amount of physical RAM, we
> > get
> > > extereme
> > > slowdown as reading a page from disk can take time in order of
> > > milliseconds.
> > > 
> > > Memory compression increases effective memory size and allows
> more
> > > pages to
> > > stay in RAM. Since de/compressing memory pages is several orders
> of
> > > magnitude
> > > faster than disk I/O, this can provide signifant performance
> gains
> > for
> > > many
> > > workloads. Also, with multi-cores becoming common, benefits of
> > reduced
> > > disk I/O
> > > should easily outweigh the problem of increased CPU usage.
> > > 
> > > It is implemented as a "backend" for cleancache_ops [1] which
> > > provides
> > > callbacks for events such as when a page is to be removed from
> the
> > > page cache
> > > and when it is required again. We use them to implement a 'second
> > > chance' cache
> > > for these evicted page cache pages by compressing and storing
> them
> > in
> > > memory
> > > itself.
> > > 
> > > We only keep pages that compress to PAGE_SIZE/2 or less.
> Compressed
> > > chunks are
> > > stored using xvmalloc memory allocator which is already being
> used
> > by
> > > zram
> > > driver for the same purpose. Zero-filled pages are checked and no
> > > memory is
> > > allocated for them.
> > > 
> > > A separate "pool" is created for each mount instance for a
> > > cleancache-aware
> > > filesystem. Each incoming page is identified with <pool_id,
> > inode_no,
> > > index>
> > > where inode_no identifies file within the filesystem
> corresponding
> > to
> > > pool_id
> > > and index is offset of the page within this inode. Within a pool,
> > > inodes are
> > > maintained in an rb-tree and each of its nodes points to a
> separate
> > > radix-tree
> > > which maintains list of pages within that inode.
> > > 
> > > While compression reduces disk I/O, it also reduces the space
> > > available for
> > > normal (uncompressed) page cache. This can result in more
> frequent
> > > page cache
> > > reclaim and thus higher CPU overhead. Thus, it's important to
> > maintain
> > > good hit
> > > rate for compressed cache or increased CPU overhead can nullify
> any
> > > other
> > > benefits. This requires adaptive (compressed) cache resizing and
> > page
> > > replacement policies that can maintain optimal cache size and
> > quickly
> > > reclaim
> > > unused compressed chunks. This work is yet to be done. However,
> in
> > the
> > > current
> > > state, it allows manually resizing cache size using (per-pool)
> > sysfs
> > > node
> > > 'memlimit' which in turn frees any excess pages *sigh* randomly.
> > > 
> > > Finally, it uses percpu stats and compression buffers to allow
> > better
> > > performance on multi-cores. Still, there are known bottlenecks
> like
> > a
> > > single
> > > xvmalloc mempool per zcache pool and few others. I will work on
> > this
> > > when I
> > > start with profiling.
> > > 
> > >  * Performance numbers:
> > >    - Tested using iozone filesystem benchmark
> > >    - 4 CPUs, 1G RAM
> > >    - Read performance gain: ~2.5X
> > >    - Random read performance gain: ~3X
> > >    - In general, performance gains for every kind of I/O
> > > 
> > > Test details with graphs can be found here:
> > > http://code.google.com/p/compcache/wiki/zcacheIOzone
> > > 
> > > If I can get some help with testing, it would be intersting to
> find
> > > its
> > > effect in more real-life workloads. In particular, I'm intersted
> in
> > > finding
> > > out its effect in KVM virtualization case where it can
> potentially
> > > allow
> > > running more number of VMs per-host for a given amount of RAM.
> With
> > > zcache
> > > enabled, VMs can be assigned much smaller amount of memory since
> > host
> > > can now
> > > hold bulk of page-cache pages, allowing VMs to maintain similar
> > level
> > > of
> > > performance while a greater number of them can be hosted.
> > > 
> > >  * How to test:
> > > All patches are against 2.6.35-rc5:
> > > 
> > >  - First, apply all prerequisite patches here:
> > >
> http://compcache.googlecode.com/hg/sub-projects/zcache_base_patches
> > > 
> > >  - Then apply this patch series; also uploaded here:
> > > http://compcache.googlecode.com/hg/sub-projects/zcache_patches
> > > 
> > > 
> > > Nitin Gupta (8):
> > >   Allow sharing xvmalloc for zram and zcache
> > >   Basic zcache functionality
> > >   Create sysfs nodes and export basic statistics
> > >   Shrink zcache based on memlimit
> > >   Eliminate zero-filled pages
> > >   Compress pages using LZO
> > >   Use xvmalloc to store compressed chunks
> > >   Document sysfs entries
> > > 
> > >  Documentation/ABI/testing/sysfs-kernel-mm-zcache |   53 +
> > >  drivers/staging/Makefile                         |    2 +
> > >  drivers/staging/zram/Kconfig                     |   22 +
> > >  drivers/staging/zram/Makefile                    |    5 +-
> > >  drivers/staging/zram/xvmalloc.c                  |    8 +
> > >  drivers/staging/zram/zcache_drv.c                | 1312
> > > ++++++++++++++++++++++
> > >  drivers/staging/zram/zcache_drv.h                |   90 ++
> > >  7 files changed, 1491 insertions(+), 1 deletions(-)
> > >  create mode 100644
> > Documentation/ABI/testing/sysfs-kernel-mm-zcache
> > >  create mode 100644 drivers/staging/zram/zcache_drv.c
> > >  create mode 100644 drivers/staging/zram/zcache_drv.h
> > By tested those patches on the top of the linus tree at this commit
> > d0c6f6258478e1dba532bf7c28e2cd6e1047d3a4, the OOM was trigger even
> > though there looked like still lots of swap.
> > 
> > # free -m
> >              total       used       free     shared    buffers    
> > cached
> > Mem:           852        379        473          0          3      
>  
> > 15
> > -/+ buffers/cache:        359        492
> > Swap:         2015         14       2001
> > 
> > # ./usemem 1024
> > 0: Mallocing 32 megabytes
> > 1: Mallocing 32 megabytes
> > 2: Mallocing 32 megabytes
> > 3: Mallocing 32 megabytes
> > 4: Mallocing 32 megabytes
> > 5: Mallocing 32 megabytes
> > 6: Mallocing 32 megabytes
> > 7: Mallocing 32 megabytes
> > 8: Mallocing 32 megabytes
> > 9: Mallocing 32 megabytes
> > 10: Mallocing 32 megabytes
> > 11: Mallocing 32 megabytes
> > 12: Mallocing 32 megabytes
> > 13: Mallocing 32 megabytes
> > 14: Mallocing 32 megabytes
> > 15: Mallocing 32 megabytes
> > Connection to 192.168.122.193 closed.
> > 
> > usemem invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0
> > usemem cpuset=/ mems_allowed=0
> > Pid: 1829, comm: usemem Not tainted 2.6.35-rc5+ #5
> > Call Trace:
> >  [<ffffffff814e10cb>] ? _raw_spin_unlock+0x2b/0x40
> >  [<ffffffff81108520>] dump_header+0x70/0x190
> >  [<ffffffff811086c1>] oom_kill_process+0x81/0x180
> >  [<ffffffff81108c08>] __out_of_memory+0x58/0xd0
> >  [<ffffffff81108ddc>] ? out_of_memory+0x15c/0x1f0
> >  [<ffffffff81108d8f>] out_of_memory+0x10f/0x1f0
> >  [<ffffffff8110cc7f>] __alloc_pages_nodemask+0x7af/0x7c0
> >  [<ffffffff81140a69>] alloc_page_vma+0x89/0x140
> >  [<ffffffff81125f76>] handle_mm_fault+0x6d6/0x990
> >  [<ffffffff814e10cb>] ? _raw_spin_unlock+0x2b/0x40
> >  [<ffffffff81121afd>] ? follow_page+0x19d/0x350
> >  [<ffffffff8112639c>] __get_user_pages+0x16c/0x480
> >  [<ffffffff810127c9>] ? sched_clock+0x9/0x10
> >  [<ffffffff811276ef>] __mlock_vma_pages_range+0xef/0x1f0
> >  [<ffffffff81127f01>] mlock_vma_pages_range+0x91/0xa0
> >  [<ffffffff8112ad57>] mmap_region+0x307/0x5b0
> >  [<ffffffff8112b354>] do_mmap_pgoff+0x354/0x3a0
> >  [<ffffffff8112b3fc>] ? sys_mmap_pgoff+0x5c/0x200
> >  [<ffffffff8112b41a>] sys_mmap_pgoff+0x7a/0x200
> >  [<ffffffff814e02f2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> >  [<ffffffff8100fa09>] sys_mmap+0x29/0x30
> >  [<ffffffff8100b032>] system_call_fastpath+0x16/0x1b
> > Mem-Info:
> > Node 0 DMA per-cpu:
> > CPU    0: hi:    0, btch:   1 usd:   0
> > CPU    1: hi:    0, btch:   1 usd:   0
> > Node 0 DMA32 per-cpu:
> > CPU    0: hi:  186, btch:  31 usd: 140
> > CPU    1: hi:  186, btch:  31 usd:  47
> > active_anon:128 inactive_anon:140 isolated_anon:0
> >  active_file:0 inactive_file:9 isolated_file:0
> >  unevictable:126855 dirty:0 writeback:125 unstable:0
> >  free:1996 slab_reclaimable:4445 slab_unreclaimable:23646
> >  mapped:923 shmem:7 pagetables:778 bounce:0
> > Node 0 DMA free:4032kB min:60kB low:72kB high:88kB active_anon:0kB
> > inactive_anon:0kB active_file:0kB inactive_file:0kB
> > unevictable:11896kB isolated(anon):0kB isolated(file):0kB
> > present:15756kB mlocked:11896kB dirty:0kB writeback:0kB mapped:0kB
> > shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB
> kernel_stack:0kB
> > pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB
> > pages_scanned:0 all_unreclaimable? yes
> > lowmem_reserve[]: 0 994 994 994
> > Node 0 DMA32 free:3952kB min:4000kB low:5000kB high:6000kB
> > active_anon:512kB inactive_anon:560kB active_file:0kB
> > inactive_file:36kB unevictable:495524kB isolated(anon):0kB
> > isolated(file):0kB present:1018060kB mlocked:495524kB dirty:0kB
> > writeback:500kB mapped:3692kB shmem:28kB slab_reclaimable:17780kB
> > slab_unreclaimable:94584kB kernel_stack:1296kB pagetables:3088kB
> > unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1726
> > all_unreclaimable? yes
> > lowmem_reserve[]: 0 0 0 0
> > Node 0 DMA: 0*4kB 2*8kB 1*16kB 1*32kB 2*64kB 2*128kB 2*256kB
> 0*512kB
> > 1*1024kB 1*2048kB 0*4096kB = 4032kB
> > Node 0 DMA32: 476*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> > 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3952kB
> > 1146 total pagecache pages
> > 215 pages in swap cache
> > Swap cache stats: add 19633, delete 19418, find 941/1333
> > Free swap  = 2051080kB
> > Total swap = 2064380kB
> > 262138 pages RAM
> > 43914 pages reserved
> > 4832 pages shared
> > 155665 pages non-shared
> > Out of memory: kill process 1727 (console-kit-dae) score 1027939 or
> a
> > child
> > Killed process 1727 (console-kit-dae) vsz:4111756kB, anon-rss:0kB,
> > file-rss:600kB
> > console-kit-dae invoked oom-killer: gfp_mask=0xd0, order=0,
> oom_adj=0
> > console-kit-dae cpuset=/ mems_allowed=0
> > Pid: 1752, comm: console-kit-dae Not tainted 2.6.35-rc5+ #5
> > Call Trace:
> >  [<ffffffff814e10cb>] ? _raw_spin_unlock+0x2b/0x40
> >  [<ffffffff81108520>] dump_header+0x70/0x190
> >  [<ffffffff811086c1>] oom_kill_process+0x81/0x180
> >  [<ffffffff81108c08>] __out_of_memory+0x58/0xd0
> >  [<ffffffff81108ddc>] ? out_of_memory+0x15c/0x1f0
> >  [<ffffffff81108d8f>] out_of_memory+0x10f/0x1f0
> >  [<ffffffff8110cc7f>] __alloc_pages_nodemask+0x7af/0x7c0
> >  [<ffffffff8114522e>] kmem_getpages+0x6e/0x180
> >  [<ffffffff81147d79>] fallback_alloc+0x1c9/0x2b0
> >  [<ffffffff81147602>] ? cache_grow+0x4b2/0x520
> >  [<ffffffff81147a5b>] ____cache_alloc_node+0xab/0x200
> >  [<ffffffff810d55d5>] ? taskstats_exit+0x305/0x3b0
> >  [<ffffffff8114862b>] kmem_cache_alloc+0x1fb/0x290
> >  [<ffffffff810d55d5>] taskstats_exit+0x305/0x3b0
> >  [<ffffffff81063a4b>] do_exit+0x12b/0x890
> >  [<ffffffff810924fd>] ? trace_hardirqs_off+0xd/0x10
> >  [<ffffffff8108641f>] ? cpu_clock+0x6f/0x80
> >  [<ffffffff81095cbd>] ? lock_release_holdtime+0x3d/0x190
> >  [<ffffffff814e1010>] ? _raw_spin_unlock_irq+0x30/0x40
> >  [<ffffffff8106420e>] do_group_exit+0x5e/0xd0
> >  [<ffffffff81075b54>] get_signal_to_deliver+0x2d4/0x490
> >  [<ffffffff811ea6ad>] ? inode_has_perm+0x7d/0xf0
> >  [<ffffffff8100a2e5>] do_signal+0x75/0x7b0
> >  [<ffffffff81169d2d>] ? vfs_ioctl+0x3d/0xf0
> >  [<ffffffff8116a394>] ? do_vfs_ioctl+0x84/0x570
> >  [<ffffffff8100aa85>] do_notify_resume+0x65/0x80
> >  [<ffffffff814e02f2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> >  [<ffffffff8100b381>] int_signal+0x12/0x17
> > Mem-Info:
> > Node 0 DMA per-cpu:
> > CPU    0: hi:    0, btch:   1 usd:   0
> > CPU    1: hi:    0, btch:   1 usd:   0
> > Node 0 DMA32 per-cpu:
> > CPU    0: hi:  186, btch:  31 usd: 151
> > CPU    1: hi:  186, btch:  31 usd:  61
> > active_anon:128 inactive_anon:165 isolated_anon:0
> >  active_file:0 inactive_file:9 isolated_file:0
> >  unevictable:126855 dirty:0 writeback:25 unstable:0
> >  free:1965 slab_reclaimable:4445 slab_unreclaimable:23646
> >  mapped:923 shmem:7 pagetables:778 bounce:0
> > Node 0 DMA free:4032kB min:60kB low:72kB high:88kB active_anon:0kB
> > inactive_anon:0kB active_file:0kB inactive_file:0kB
> > unevictable:11896kB isolated(anon):0kB isolated(file):0kB
> > present:15756kB mlocked:11896kB dirty:0kB writeback:0kB mapped:0kB
> > shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB
> kernel_stack:0kB
> > pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB
> > pages_scanned:0 all_unreclaimable? yes
> > lowmem_reserve[]: 0 994 994 994
> > Node 0 DMA32 free:3828kB min:4000kB low:5000kB high:6000kB
> > active_anon:512kB inactive_anon:660kB active_file:0kB
> > inactive_file:36kB unevictable:495524kB isolated(anon):0kB
> > isolated(file):0kB present:1018060kB mlocked:495524kB dirty:0kB
> > writeback:100kB mapped:3692kB shmem:28kB slab_reclaimable:17780kB
> > slab_unreclaimable:94584kB kernel_stack:1296kB pagetables:3088kB
> > unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1726
> > all_unreclaimable? yes
> > lowmem_reserve[]: 0 0 0 0
> > Node 0 DMA: 0*4kB 2*8kB 1*16kB 1*32kB 2*64kB 2*128kB 2*256kB
> 0*512kB
> > 1*1024kB 1*2048kB 0*4096kB = 4032kB
> > Node 0 DMA32: 445*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> > 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3828kB
> > 1146 total pagecache pages
> > 230 pages in swap cache
> > Swap cache stats: add 19649, delete 19419, find 942/1336
> > Free swap  = 2051084kB
> > Total swap = 2064380kB
> > 262138 pages RAM
> > 43914 pages reserved
> > 4818 pages shared
> > 155685 pages non-shared
> > Out of memory: kill process 1806 (sshd) score 9474 or a child
> > Killed process 1810 (bash) vsz:108384kB, anon-rss:0kB,
> file-rss:656kB
> > console-kit-dae invoked oom-killer: gfp_mask=0xd0, order=0,
> oom_adj=0
> > console-kit-dae cpuset=/ mems_allowed=0
> > Pid: 1752, comm: console-kit-dae Not tainted 2.6.35-rc5+ #5
> > Call Trace:
> >  [<ffffffff814e10cb>] ? _raw_spin_unlock+0x2b/0x40
> >  [<ffffffff81108520>] dump_header+0x70/0x190
> >  [<ffffffff811086c1>] oom_kill_process+0x81/0x180
> >  [<ffffffff81108c08>] __out_of_memory+0x58/0xd0
> >  [<ffffffff81108ddc>] ? out_of_memory+0x15c/0x1f0
> >  [<ffffffff81108d8f>] out_of_memory+0x10f/0x1f0
> >  [<ffffffff8110cc7f>] __alloc_pages_nodemask+0x7af/0x7c0
> >  [<ffffffff8114522e>] kmem_getpages+0x6e/0x180
> >  [<ffffffff81147d79>] fallback_alloc+0x1c9/0x2b0
> >  [<ffffffff81147602>] ? cache_grow+0x4b2/0x520
> >  [<ffffffff81147a5b>] ____cache_alloc_node+0xab/0x200
> >  [<ffffffff810d55d5>] ? taskstats_exit+0x305/0x3b0
> >  [<ffffffff8114862b>] kmem_cache_alloc+0x1fb/0x290
> >  [<ffffffff810d55d5>] taskstats_exit+0x305/0x3b0
> >  [<ffffffff81063a4b>] do_exit+0x12b/0x890
> >  [<ffffffff810924fd>] ? trace_hardirqs_off+0xd/0x10
> >  [<ffffffff8108641f>] ? cpu_clock+0x6f/0x80
> >  [<ffffffff81095cbd>] ? lock_release_holdtime+0x3d/0x190
> >  [<ffffffff814e1010>] ? _raw_spin_unlock_irq+0x30/0x40
> >  [<ffffffff8106420e>] do_group_exit+0x5e/0xd0
> >  [<ffffffff81075b54>] get_signal_to_deliver+0x2d4/0x490
> >  [<ffffffff811ea6ad>] ? inode_has_perm+0x7d/0xf0
> >  [<ffffffff8100a2e5>] do_signal+0x75/0x7b0
> >  [<ffffffff81169d2d>] ? vfs_ioctl+0x3d/0xf0
> >  [<ffffffff8116a394>] ? do_vfs_ioctl+0x84/0x570
> >  [<ffffffff8100aa85>] do_notify_resume+0x65/0x80
> >  [<ffffffff814e02f2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> >  [<ffffffff8100b381>] int_signal+0x12/0x17
> > Mem-Info:
> > Node 0 DMA per-cpu:
> > CPU    0: hi:    0, btch:   1 usd:   0
> > CPU    1: hi:    0, btch:   1 usd:   0
> > Node 0 DMA32 per-cpu:
> > CPU    0: hi:  186, btch:  31 usd: 119
> > CPU    1: hi:  186, btch:  31 usd:  73
> > active_anon:50 inactive_anon:175 isolated_anon:0
> >  active_file:0 inactive_file:9 isolated_file:0
> >  unevictable:126855 dirty:0 writeback:25 unstable:0
> >  free:1996 slab_reclaimable:4445 slab_unreclaimable:23663
> >  mapped:923 shmem:7 pagetables:778 bounce:0
> > Node 0 DMA free:4032kB min:60kB low:72kB high:88kB active_anon:0kB
> > inactive_anon:0kB active_file:0kB inactive_file:0kB
> > unevictable:11896kB isolated(anon):0kB isolated(file):0kB
> > present:15756kB mlocked:11896kB dirty:0kB writeback:0kB mapped:0kB
> > shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB
> kernel_stack:0kB
> > pagetables:24kB unstable:0kB bounce:0kB writeback_tmp:0kB
> > pages_scanned:0 all_unreclaimable? yes
> > lowmem_reserve[]: 0 994 994 994
> > Node 0 DMA32 free:3952kB min:4000kB low:5000kB high:6000kB
> > active_anon:200kB inactive_anon:700kB active_file:0kB
> > inactive_file:36kB unevictable:495524kB isolated(anon):0kB
> > isolated(file):0kB present:1018060kB mlocked:495524kB dirty:0kB
> > writeback:100kB mapped:3692kB shmem:28kB slab_reclaimable:17780kB
> > slab_unreclaimable:94652kB kernel_stack:1296kB pagetables:3088kB
> > unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1536
> > all_unreclaimable? yes
> > lowmem_reserve[]: 0 0 0 0
> > Node 0 DMA: 0*4kB 2*8kB 1*16kB 1*32kB 2*64kB 2*128kB 2*256kB
> 0*512kB
> > 1*1024kB 1*2048kB 0*4096kB = 4032kB
> > Node 0 DMA32: 470*4kB 3*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
> > 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3952kB
> > 1146 total pagecache pages
> > 221 pages in swap cache
> > Swap cache stats: add 19848, delete 19627, find 970/1386
> > Free swap  = 2051428kB
> > Total swap = 2064380kB
> > 262138 pages RAM
> > 43914 pages reserved
> > 4669 pages shared
> > 155659 pages non-shared
> > Out of memory: kill process 1829 (usemem) score 8253 or a child
> > Killed process 1829 (usemem) vsz:528224kB, anon-rss:502468kB,
> > file-rss:376kB
> > 
> > # cat usemem.c
> > # cat usemem.c 
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>
> > #include <sys/mman.h>
> > #define CHUNKS 32
> > 
> > int 
> > main(int argc, char *argv[])
> > {
> > 	mlockall(MCL_FUTURE);
> > 
> > 	unsigned long mb;
> > 	char *buf[CHUNKS];
> > 	int i;
> > 
> > 	if (argc < 2) {
> > 		fprintf(stderr, "usage: usemem megabytes\n");
> > 		exit(1);
> > 	}
> > 	mb = strtoul(argv[1], NULL, 0);
> > 
> > 	for (i = 0; i < CHUNKS; i++) {
> > 		fprintf(stderr, "%d: Mallocing %lu megabytes\n", i, mb/CHUNKS);
> > 		buf[i] = (char *)malloc(mb/CHUNKS * 1024L * 1024L);
> > 		if (!buf[i]) {
> > 			fprintf(stderr, "malloc failure\n");
> > 			exit(1);
> > 		}
> > 	}
> > 
> > 	for (i = 0; i < CHUNKS; i++) {
> > 		fprintf(stderr, "%d: Zeroing %lu megabytes at %p\n", 
> > 				i, mb/CHUNKS, buf[i]);
> > 		memset(buf[i], 0, mb/CHUNKS * 1024L * 1024L);
> > 	}
> > 
> > 
> > 	exit(0);
> > }
> > 
> If this ever be relevant, this was tested inside the kvm guest. The
> host was a RHEL6 with THP enabled.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2010-07-23 18:02     ` CAI Qian
@ 2010-07-24 14:41       ` Valdis.Kletnieks
  0 siblings, 0 replies; 23+ messages in thread
From: Valdis.Kletnieks @ 2010-07-24 14:41 UTC (permalink / raw)
  To: CAI Qian
  Cc: linux-mm, linux-kernel, Pekka Enberg, Hugh Dickins, Andrew Morton,
	Greg KH, Dan Magenheimer, Rik van Riel, Avi Kivity, Nitin Gupta

[-- Attachment #1: Type: text/plain, Size: 234 bytes --]

On Fri, 23 Jul 2010 14:02:16 EDT, CAI Qian said:
> Ignore me. The test case should not be using mlockall()!

I'm confused. I don't see any mlockall() call in the usemem.c you posted? Or
was what you posted not what you actually ran?


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2010-07-16 12:37 Nitin Gupta
                   ` (3 preceding siblings ...)
  2010-07-22 19:14 ` Greg KH
@ 2011-01-10 13:16 ` Kirill A. Shutemov
  2011-01-18 17:53   ` Dan Magenheimer
  4 siblings, 1 reply; 23+ messages in thread
From: Kirill A. Shutemov @ 2011-01-10 13:16 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Hugh Dickins, Andrew Morton, Greg KH,
	Dan Magenheimer, Rik van Riel, Avi Kivity, Christoph Hellwig,
	Minchan Kim, Konrad Rzeszutek Wilk, linux-mm, linux-kernel

Hi,

What is status of the patchset?
Do you have updated patchset with fixes?

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH 0/8] zcache: page cache compression support
  2011-01-10 13:16 ` Kirill A. Shutemov
@ 2011-01-18 17:53   ` Dan Magenheimer
  2011-01-20 12:33     ` Nitin Gupta
  0 siblings, 1 reply; 23+ messages in thread
From: Dan Magenheimer @ 2011-01-18 17:53 UTC (permalink / raw)
  To: Kirill A. Shutemov, Nitin Gupta
  Cc: Pekka Enberg, Hugh Dickins, Andrew Morton, Greg KH, Rik van Riel,
	Avi Kivity, Christoph Hellwig, Minchan Kim, Konrad Wilk, linux-mm,
	linux-kernel

> From: Kirill A. Shutemov [mailto:kirill@shutemov.name]
> Sent: Monday, January 10, 2011 6:16 AM
> To: Nitin Gupta
> Cc: Pekka Enberg; Hugh Dickins; Andrew Morton; Greg KH; Dan
> Magenheimer; Rik van Riel; Avi Kivity; Christoph Hellwig; Minchan Kim;
> Konrad Rzeszutek Wilk; linux-mm; linux-kernel
> Subject: Re: [PATCH 0/8] zcache: page cache compression support
> 
> Hi,
> 
> What is status of the patchset?
> Do you have updated patchset with fixes?
> 
> --
>  Kirill A. Shutemov

I wanted to give Nitin a week to respond, but I guess he
continues to be offline.

I believe zcache is completely superceded by kztmem.
Kztmem, like zcache, is dependent on cleancache
getting merged.

Kztmem may supercede zram also although frontswap (which
kztmem uses for a more dynamic in-memory swap compression)
and zram have some functional differences that support
both being merged.

For latest kztmem patches and description, see:

https://lkml.org/lkml/2011/1/18/170 


Thanks,
Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2011-01-18 17:53   ` Dan Magenheimer
@ 2011-01-20 12:33     ` Nitin Gupta
  2011-01-20 12:47       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Nitin Gupta @ 2011-01-20 12:33 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Kirill A. Shutemov, Pekka Enberg, Hugh Dickins, Andrew Morton,
	Greg KH, Rik van Riel, Avi Kivity, Christoph Hellwig, Minchan Kim,
	Konrad Wilk, linux-mm, linux-kernel

On 01/18/2011 12:53 PM, Dan Magenheimer wrote:
>> From: Kirill A. Shutemov [mailto:kirill@shutemov.name]
>> Sent: Monday, January 10, 2011 6:16 AM
>> To: Nitin Gupta
>> Cc: Pekka Enberg; Hugh Dickins; Andrew Morton; Greg KH; Dan
>> Magenheimer; Rik van Riel; Avi Kivity; Christoph Hellwig; Minchan Kim;
>> Konrad Rzeszutek Wilk; linux-mm; linux-kernel
>> Subject: Re: [PATCH 0/8] zcache: page cache compression support
>>
>> Hi,
>>
>> What is status of the patchset?
>> Do you have updated patchset with fixes?
>>
>> --
>>   Kirill A. Shutemov
> I wanted to give Nitin a week to respond, but I guess he
> continues to be offline.
>

Sorry, I was on post-exam-vacations, so couldn't
look into it much :)

> I believe zcache is completely superceded by kztmem.
> Kztmem, like zcache, is dependent on cleancache
> getting merged.
>
> Kztmem may supercede zram also although frontswap (which
> kztmem uses for a more dynamic in-memory swap compression)
> and zram have some functional differences that support
> both being merged.
>
> For latest kztmem patches and description, see:
>
> https://lkml.org/lkml/2011/1/18/170
>

I just started looking into kztmem (weird name!) but on
the high level it seems so much similar to zcache with some
dynamic resizing added (callback for shrinker interface).

Now, I'll try rebuilding zcache according to new cleancache
API as provided by these set of patches. This will help refresh
whatever issues I was having back then with pagecache
compression and maybe pick useful bits/directions from
new kztmem work.

(PAM etc. synonyms make kztmem code reading quite heavy, and
I still don't like frontswap approach but unfortunately do not yet
have any better alternatives ready yet).

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2011-01-20 12:33     ` Nitin Gupta
@ 2011-01-20 12:47       ` Christoph Hellwig
  2011-01-20 13:16         ` Pekka Enberg
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2011-01-20 12:47 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Dan Magenheimer, Kirill A. Shutemov, Pekka Enberg, Hugh Dickins,
	Andrew Morton, Greg KH, Rik van Riel, Avi Kivity,
	Christoph Hellwig, Minchan Kim, Konrad Wilk, linux-mm,
	linux-kernel

On Thu, Jan 20, 2011 at 07:33:29AM -0500, Nitin Gupta wrote:
> I just started looking into kztmem (weird name!) but on
> the high level it seems so much similar to zcache with some
> dynamic resizing added (callback for shrinker interface).
> 
> Now, I'll try rebuilding zcache according to new cleancache
> API as provided by these set of patches. This will help refresh
> whatever issues I was having back then with pagecache
> compression and maybe pick useful bits/directions from
> new kztmem work.

Yes, we shouldn't have two drivers doing almost the same in the
tree.  Also adding core hooks for staging drivers really is against
the idea of staging of having a separate crap tree.  So it would be
good to get zcache into a state where we can merge it into the
proper tree first.  And then we can discuss if adding an abstraction
layer between it and the core VM really makes sense, and if it does
how.   But I'm pretty sure there's now need for multiple layers of
abstraction for something that's relatively core VM functionality.

E.g. the abstraction should involve because of it's users, not the
compressed caching code should involve because it's needed to present
a user for otherwise useless code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2011-01-20 12:47       ` Christoph Hellwig
@ 2011-01-20 13:16         ` Pekka Enberg
  2011-01-20 13:58           ` Nitin Gupta
  0 siblings, 1 reply; 23+ messages in thread
From: Pekka Enberg @ 2011-01-20 13:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nitin Gupta, Dan Magenheimer, Kirill A. Shutemov, Pekka Enberg,
	Hugh Dickins, Andrew Morton, Greg KH, Rik van Riel, Avi Kivity,
	Minchan Kim, Konrad Wilk, linux-mm, linux-kernel

Hi Christoph,

On Thu, Jan 20, 2011 at 07:33:29AM -0500, Nitin Gupta wrote:
>> I just started looking into kztmem (weird name!) but on
>> the high level it seems so much similar to zcache with some
>> dynamic resizing added (callback for shrinker interface).
>>
>> Now, I'll try rebuilding zcache according to new cleancache
>> API as provided by these set of patches. This will help refresh
>> whatever issues I was having back then with pagecache
>> compression and maybe pick useful bits/directions from
>> new kztmem work.

On Thu, Jan 20, 2011 at 2:47 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Yes, we shouldn't have two drivers doing almost the same in the
> tree.  Also adding core hooks for staging drivers really is against
> the idea of staging of having a separate crap tree.  So it would be
> good to get zcache into a state where we can merge it into the
> proper tree first.  And then we can discuss if adding an abstraction
> layer between it and the core VM really makes sense, and if it does
> how.   But I'm pretty sure there's now need for multiple layers of
> abstraction for something that's relatively core VM functionality.
>
> E.g. the abstraction should involve because of it's users, not the
> compressed caching code should involve because it's needed to present
> a user for otherwise useless code.

I'm not sure which hooks you're referring to but for zcache we did this:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b3a27d0529c6e5206f1b60f60263e3ecfd0d77cb

I completely agree with getting zcache merged properly before going
for the cleancache stuff.

                        Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] zcache: page cache compression support
  2011-01-20 13:16         ` Pekka Enberg
@ 2011-01-20 13:58           ` Nitin Gupta
  0 siblings, 0 replies; 23+ messages in thread
From: Nitin Gupta @ 2011-01-20 13:58 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Hellwig, Dan Magenheimer, Kirill A. Shutemov,
	Pekka Enberg, Hugh Dickins, Andrew Morton, Greg KH, Rik van Riel,
	Avi Kivity, Minchan Kim, Konrad Wilk, linux-mm, linux-kernel

On 01/20/2011 08:16 AM, Pekka Enberg wrote:
> Hi Christoph,
>
> On Thu, Jan 20, 2011 at 07:33:29AM -0500, Nitin Gupta wrote:
>>> I just started looking into kztmem (weird name!) but on
>>> the high level it seems so much similar to zcache with some
>>> dynamic resizing added (callback for shrinker interface).
>>>
>>> Now, I'll try rebuilding zcache according to new cleancache
>>> API as provided by these set of patches. This will help refresh
>>> whatever issues I was having back then with pagecache
>>> compression and maybe pick useful bits/directions from
>>> new kztmem work.
> On Thu, Jan 20, 2011 at 2:47 PM, Christoph Hellwig<hch@infradead.org>  wrote:
>> Yes, we shouldn't have two drivers doing almost the same in the
>> tree.  Also adding core hooks for staging drivers really is against
>> the idea of staging of having a separate crap tree.  So it would be
>> good to get zcache into a state where we can merge it into the
>> proper tree first.  And then we can discuss if adding an abstraction
>> layer between it and the core VM really makes sense, and if it does
>> how.   But I'm pretty sure there's now need for multiple layers of
>> abstraction for something that's relatively core VM functionality.
>>
>> E.g. the abstraction should involve because of it's users, not the
>> compressed caching code should involve because it's needed to present
>> a user for otherwise useless code.
> I'm not sure which hooks you're referring to but for zcache we did this:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b3a27d0529c6e5206f1b60f60263e3ecfd0d77cb
>
> I completely agree with getting zcache merged properly before going
> for the cleancache stuff.
>

These hooks are for zram (generic, in-memory compressed block devices)
which can also be used as swap disks. Without that swap notify hook, we
could not free [compressed] swap pages as soon as they are marked free.

For zcache (which does pagecache compression), we need separate set
of hooks, currently known as "cleancache" [1]. These hooks are very
minimal but not sure if they are accepted yet (they are present in
linux-next tree only, see: mm/cleancache.c, include/linux/cleancache.h

[1] cleancache: http://lwn.net/Articles/393013/

Nitin


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-01-20 13:57 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <575348163.1113381279906498028.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-07-23 17:36 ` [PATCH 0/8] zcache: page cache compression support caiqian
2010-07-23 17:41   ` CAI Qian
2010-07-23 18:02     ` CAI Qian
2010-07-24 14:41       ` Valdis.Kletnieks
2010-07-16 12:37 Nitin Gupta
2010-07-17 21:13 ` Ed Tomlinson
2010-07-18  2:23   ` Nitin Gupta
2010-07-18  7:50 ` Pekka Enberg
2010-07-18  8:12   ` Nitin Gupta
2010-07-19 19:57 ` Dan Magenheimer
2010-07-20 13:50   ` Nitin Gupta
2010-07-20 14:28     ` Dan Magenheimer
2010-07-21  4:27       ` Nitin Gupta
2010-07-21 17:37         ` Dan Magenheimer
2010-07-22 19:14 ` Greg KH
2010-07-22 19:54   ` Dan Magenheimer
2010-07-22 21:00     ` Greg KH
2011-01-10 13:16 ` Kirill A. Shutemov
2011-01-18 17:53   ` Dan Magenheimer
2011-01-20 12:33     ` Nitin Gupta
2011-01-20 12:47       ` Christoph Hellwig
2011-01-20 13:16         ` Pekka Enberg
2011-01-20 13:58           ` Nitin Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).