From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754110Ab3BUQX7 (ORCPT ); Thu, 21 Feb 2013 11:23:59 -0500 Received: from aserp1040.oracle.com ([141.146.126.69]:28784 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753470Ab3BUQX6 convert rfc822-to-8bit (ORCPT ); Thu, 21 Feb 2013 11:23:58 -0500 MIME-Version: 1.0 Message-ID: Date: Thu, 21 Feb 2013 08:20:16 -0800 (PST) From: Dan Magenheimer To: Seth Jennings , Ric Mason Cc: Andrew Morton , Greg Kroah-Hartman , Nitin Gupta , Minchan Kim , Konrad Wilk , Robert Jennings , Jenifer Hopper , Mel Gorman , Johannes Weiner , Rik van Riel , Larry Woodman , Benjamin Herrenschmidt , Dave Hansen , Joe Perches , linux-mm@kvack.org, linux-kernel@vger.kernel.org, devel@driverdev.osuosl.org Subject: RE: [PATCHv5 2/8] zsmalloc: add documentation References: <1360780731-11708-1-git-send-email-sjenning@linux.vnet.ibm.com> <1360780731-11708-3-git-send-email-sjenning@linux.vnet.ibm.com> <511F254D.2010909@gmail.com> <51227DF4.9020900@linux.vnet.ibm.com> <5125DFAA.4050706@gmail.com> <5126423F.7040705@linux.vnet.ibm.com> In-Reply-To: <5126423F.7040705@linux.vnet.ibm.com> X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.7 (607090) [OL 12.0.6665.5003 (x86)] Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT X-Source-IP: ucsinet22.oracle.com [156.151.31.94] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com] > Subject: Re: [PATCHv5 2/8] zsmalloc: add documentation > > On 02/21/2013 02:49 AM, Ric Mason wrote: > > On 02/19/2013 03:16 AM, Seth Jennings wrote: > >> On 02/16/2013 12:21 AM, Ric Mason wrote: > >>> On 02/14/2013 02:38 AM, Seth Jennings wrote: > >>>> This patch adds a documentation file for zsmalloc at > >>>> Documentation/vm/zsmalloc.txt > >>>> > >>>> Signed-off-by: Seth Jennings > >>>> --- > >>>> Documentation/vm/zsmalloc.txt | 68 > >>>> +++++++++++++++++++++++++++++++++++++++++ > >>>> 1 file changed, 68 insertions(+) > >>>> create mode 100644 Documentation/vm/zsmalloc.txt > >>>> > >>>> diff --git a/Documentation/vm/zsmalloc.txt > >>>> b/Documentation/vm/zsmalloc.txt > >>>> new file mode 100644 > >>>> index 0000000..85aa617 > >>>> --- /dev/null > >>>> +++ b/Documentation/vm/zsmalloc.txt > >>>> @@ -0,0 +1,68 @@ > >>>> +zsmalloc Memory Allocator > >>>> + > >>>> +Overview > >>>> + > >>>> +zmalloc a new slab-based memory allocator, > >>>> +zsmalloc, for storing compressed pages. It is designed for > >>>> +low fragmentation and high allocation success rate on > >>>> +large object, but <= PAGE_SIZE allocations. > >>>> + > >>>> +zsmalloc differs from the kernel slab allocator in two primary > >>>> +ways to achieve these design goals. > >>>> + > >>>> +zsmalloc never requires high order page allocations to back > >>>> +slabs, or "size classes" in zsmalloc terms. Instead it allows > >>>> +multiple single-order pages to be stitched together into a > >>>> +"zspage" which backs the slab. This allows for higher allocation > >>>> +success rate under memory pressure. > >>>> + > >>>> +Also, zsmalloc allows objects to span page boundaries within the > >>>> +zspage. This allows for lower fragmentation than could be had > >>>> +with the kernel slab allocator for objects between PAGE_SIZE/2 > >>>> +and PAGE_SIZE. With the kernel slab allocator, if a page compresses > >>>> +to 60% of it original size, the memory savings gained through > >>>> +compression is lost in fragmentation because another object of > >>>> +the same size can't be stored in the leftover space. > >>>> + > >>>> +This ability to span pages results in zsmalloc allocations not being > >>>> +directly addressable by the user. The user is given an > >>>> +non-dereferencable handle in response to an allocation request. > >>>> +That handle must be mapped, using zs_map_object(), which returns > >>>> +a pointer to the mapped region that can be used. The mapping is > >>>> +necessary since the object data may reside in two different > >>>> +noncontigious pages. > >>> Do you mean the reason of to use a zsmalloc object must map after > >>> malloc is object data maybe reside in two different nocontiguous pages? > >> Yes, that is one reason for the mapping. The other reason (more of an > >> added bonus) is below. > >> > >>>> + > >>>> +For 32-bit systems, zsmalloc has the added benefit of being > >>>> +able to back slabs with HIGHMEM pages, something not possible > >>> What's the meaning of "back slabs with HIGHMEM pages"? > >> By HIGHMEM, I'm referring to the HIGHMEM memory zone on 32-bit systems > >> with larger that 1GB (actually a little less) of RAM. The upper 3GB > >> of the 4GB address space, depending on kernel build options, is not > >> directly addressable by the kernel, but can be mapped into the kernel > >> address space with functions like kmap() or kmap_atomic(). > >> > >> These pages can't be used by slab/slub because they are not > >> continuously mapped into the kernel address space. However, since > >> zsmalloc requires a mapping anyway to handle objects that span > >> non-contiguous page boundaries, we do the kernel mapping as part of > >> the process. > >> > >> So zspages, the conceptual slab in zsmalloc backed by single-order > >> pages can include pages from the HIGHMEM zone as well. > > > > Thanks for your clarify, > > http://lwn.net/Articles/537422/, your article about zswap in lwn. > > "Additionally, the kernel slab allocator does not allow objects that > > are less > > than a page in size to span a page boundary. This means that if an > > object is > > PAGE_SIZE/2 + 1 bytes in size, it effectively use an entire page, > > resulting in > > ~50% waste. Hense there are *no kmalloc() cache size* between > > PAGE_SIZE/2 and > > PAGE_SIZE." > > Are your sure? It seems that kmalloc cache support big size, your can > > check in > > include/linux/kmalloc_sizes.h > > Yes, kmalloc can allocate large objects > PAGE_SIZE, but there are no > cache sizes _between_ PAGE_SIZE/2 and PAGE_SIZE. For example, on a > system with 4k pages, there are no caches between kmalloc-2048 and > kmalloc-4096. Important and left unsaid here is that, in many workloads, the distribution of compressed pages ("zpages") will have as many as half or more with compressed size ("zsize") between PAGE_SIZE/2 and PAGE_SIZE. And, in many workloads, the majority of values for zsize will be much closer to PAGE_SIZE/2 than PAGE_SIZE, which will result in a great deal of wasted space if slab were used. And, also very important, kmalloc requires page allocations with "order > 0" (2**n contiguous pages) to deal with "big size objects". In-kernel compression would need many of these and they are difficult (often impossible) to allocate when the system is under memory pressure. As a result, various other allocators have been written, first xvmalloc, then zbud, then zsmalloc. Each of these depend only on order==0 page allocations and each has ways of dealing with high quantities of zpages with PAGE_SIZE/2 < zsize < PAGE_SIZE. Hope that helps!