From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755330Ab3A1P3W (ORCPT ); Mon, 28 Jan 2013 10:29:22 -0500 Received: from e39.co.us.ibm.com ([32.97.110.160]:37653 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751755Ab3A1P3T (ORCPT ); Mon, 28 Jan 2013 10:29:19 -0500 Message-ID: <510698F5.5060205@linux.vnet.ibm.com> Date: Mon, 28 Jan 2013 09:27:49 -0600 From: Seth Jennings User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: Rik van Riel CC: Greg Kroah-Hartman , Andrew Morton , Nitin Gupta , Minchan Kim , Konrad Rzeszutek Wilk , Dan Magenheimer , Robert Jennings , Jenifer Hopper , Mel Gorman , Johannes Weiner , Larry Woodman , linux-mm@kvack.org, linux-kernel@vger.kernel.org, devel@driverdev.osuosl.org Subject: Re: [PATCHv2 8/9] zswap: add to mm/ References: <1357590280-31535-1-git-send-email-sjenning@linux.vnet.ibm.com> <1357590280-31535-9-git-send-email-sjenning@linux.vnet.ibm.com> <51030ADA.8030403@redhat.com> In-Reply-To: <51030ADA.8030403@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13012815-3620-0000-0000-0000010401AE Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/25/2013 04:44 PM, Rik van Riel wrote: > On 01/07/2013 03:24 PM, Seth Jennings wrote: >> zswap is a thin compression backend for frontswap. It receives >> pages from frontswap and attempts to store them in a compressed >> memory pool, resulting in an effective partial memory reclaim and >> dramatically reduced swap device I/O. >> >> Additional, in most cases, pages can be retrieved from this >> compressed store much more quickly than reading from tradition >> swap devices resulting in faster performance for many workloads. >> >> This patch adds the zswap driver to mm/ >> >> Signed-off-by: Seth Jennings > > I like the approach of flushing pages into actual disk based > swap when compressed swap is full. I would like it if that > was advertised more prominently in the changelog :) Thanks so much for the review! > The code looks mostly good, complaints are at the nitpick level. > > One worry is that the pool can grow to whatever maximum was > decided, and there is no way to shrink it when memory is > required for something else. > > Would it be an idea to add a shrinker for the zcache pool, > that can also shrink the zcache pool when required? > > Of course, that does lead to the question of how to balance > the pressure from that shrinker, with the new memory entering > zcache from the swap side. I have no clear answers here, just > something to think about... Yes, I prototyped a shrinker interface for zswap, but, as we both figured, it shrinks the zswap compressed pool too aggressively to the point of being useless. Right now I'm working on a zswap thread that will "leak" pages out to the swap device on an LRU basis over time. That way if the page is a rarely accessed page, it will eventually be written out to the swap device and it's memory freed, even if the zswap pool isn't full. Would this address your concerns? >> +static void zswap_flush_entries(unsigned type, int nr) >> +{ >> + struct zswap_tree *tree = zswap_trees[type]; >> + struct zswap_entry *entry; >> + int i, ret; >> + >> +/* >> + * This limits is arbitrary for now until a better >> + * policy can be implemented. This is so we don't >> + * eat all of RAM decompressing pages for writeback. >> + */ >> +#define ZSWAP_MAX_OUTSTANDING_FLUSHES 64 >> + if (atomic_read(&zswap_outstanding_flushes) > >> + ZSWAP_MAX_OUTSTANDING_FLUSHES) >> + return; > > Having this #define right in the middle of the function is > rather ugly. Might be worth moving it to the top. Yes. In my mind, this policy was going to be replaced by a better one soon. Checking may_write_to_queue() was my idea. I didn't spend too much time making that part pretty. >> +static int __init zswap_debugfs_init(void) >> +{ >> + if (!debugfs_initialized()) >> + return -ENODEV; >> + >> + zswap_debugfs_root = debugfs_create_dir("zswap", NULL); >> + if (!zswap_debugfs_root) >> + return -ENOMEM; >> + >> + debugfs_create_u64("saved_by_flush", S_IRUGO, >> + zswap_debugfs_root, &zswap_saved_by_flush); >> + debugfs_create_u64("pool_limit_hit", S_IRUGO, >> + zswap_debugfs_root, &zswap_pool_limit_hit); >> + debugfs_create_u64("reject_flush_attempted", S_IRUGO, >> + zswap_debugfs_root, &zswap_flush_attempted); >> + debugfs_create_u64("reject_tmppage_fail", S_IRUGO, >> + zswap_debugfs_root, &zswap_reject_tmppage_fail); >> + debugfs_create_u64("reject_flush_fail", S_IRUGO, >> + zswap_debugfs_root, &zswap_reject_flush_fail); >> + debugfs_create_u64("reject_zsmalloc_fail", S_IRUGO, >> + zswap_debugfs_root, &zswap_reject_zsmalloc_fail); >> + debugfs_create_u64("reject_kmemcache_fail", S_IRUGO, >> + zswap_debugfs_root, &zswap_reject_kmemcache_fail); >> + debugfs_create_u64("reject_compress_poor", S_IRUGO, >> + zswap_debugfs_root, &zswap_reject_compress_poor); >> + debugfs_create_u64("flushed_pages", S_IRUGO, >> + zswap_debugfs_root, &zswap_flushed_pages); >> + debugfs_create_u64("duplicate_entry", S_IRUGO, >> + zswap_debugfs_root, &zswap_duplicate_entry); >> + debugfs_create_atomic_t("pool_pages", S_IRUGO, >> + zswap_debugfs_root, &zswap_pool_pages); >> + debugfs_create_atomic_t("stored_pages", S_IRUGO, >> + zswap_debugfs_root, &zswap_stored_pages); >> + debugfs_create_atomic_t("outstanding_flushes", S_IRUGO, >> + zswap_debugfs_root, &zswap_outstanding_flushes); >> + > > Some of these statistics would be very useful to system > administrators, who will not be mounting debugfs on > production systems. > > Would it make sense to export some of these statistics > through sysfs? That's fine. Which of these stats do you think should be in sysfs? Thanks again for taking time to look at this! Seth