Re: [PATCH 2/8] Basic zcache functionality

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nitin Gupta <ngupta@vflare.org>
To: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg KH <greg@kroah.com>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Rik van Riel <riel@redhat.com>, Avi Kivity <avi@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	linux-mm <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/8] Basic zcache functionality
Date: Sun, 18 Jul 2010 15:15:54 +0530	[thread overview]
Message-ID: <4C42CD52.3070601@vflare.org> (raw)
In-Reply-To: <4C42B7EA.4020409@cs.helsinki.fi>


On 07/18/2010 01:44 PM, Pekka Enberg wrote:
> Nitin Gupta wrote:
>> +/*
>> + * Individual percpu values can go negative but the sum across all CPUs
>> + * must always be positive (we store various counts). So, return sum as
>> + * unsigned value.
>> + */
>> +static u64 zcache_get_stat(struct zcache_pool *zpool,
>> +        enum zcache_pool_stats_index idx)
>> +{
>> +    int cpu;
>> +    s64 val = 0;
>> +
>> +    for_each_possible_cpu(cpu) {
>> +        unsigned int start;
>> +        struct zcache_pool_stats_cpu *stats;
>> +
>> +        stats = per_cpu_ptr(zpool->stats, cpu);
>> +        do {
>> +            start = u64_stats_fetch_begin(&stats->syncp);
>> +            val += stats->count[idx];
>> +        } while (u64_stats_fetch_retry(&stats->syncp, start));
> 
> Can we use 'struct percpu_counter' for this? OTOH, the warning on top of include/linux/percpu_counter.h makes me think not.
>

Yes, that warning only scared me :)

 
>> +    }
>> +
>> +    BUG_ON(val < 0);
> 
> BUG_ON() seems overly aggressive. How about
> 
>   if (WARN_ON(val < 0))
>           return 0;
> 

Yes, this sounds better. I will change it.


>> +    return val;
>> +}
>> +
>> +static void zcache_add_stat(struct zcache_pool *zpool,
>> +        enum zcache_pool_stats_index idx, s64 val)
>> +{
>> +    struct zcache_pool_stats_cpu *stats;
>> +
>> +    preempt_disable();
>> +    stats = __this_cpu_ptr(zpool->stats);
>> +    u64_stats_update_begin(&stats->syncp);
>> +    stats->count[idx] += val;
>> +    u64_stats_update_end(&stats->syncp);
>> +    preempt_enable();
> 
> What is the preempt_disable/preempt_enable trying to do here?
>

On 32-bit there will be no seqlock to protect this value. So, if we
get preempted after __this_cpu_ptr(), two CPUs can end up racy-writing
to the same variable. I think for the same reason this_cpu_add() finally
does increment with preempt disabled.

Also, I think we shouldn't use this_cpu_add (as you suggested in
another mail) since we have to do this_cpu_ptr() first to get access
to seqlock (stats->syncp) anyways. So, simple increment on thus
obtained pcpu pointer should be okay.

 
>> +static void zcache_destroy_pool(struct zcache_pool *zpool)
>> +{
>> +    int i;
>> +
>> +    if (!zpool)
>> +        return;
>> +
>> +    spin_lock(&zcache->pool_lock);
>> +    zcache->num_pools--;
>> +    for (i = 0; i < MAX_ZCACHE_POOLS; i++)
>> +        if (zcache->pools[i] == zpool)
>> +            break;
>> +    zcache->pools[i] = NULL;
>> +    spin_unlock(&zcache->pool_lock);
>> +
>> +    if (!RB_EMPTY_ROOT(&zpool->inode_tree)) {
> 
> Use WARN_ON here to get a stack trace?
>

This sounds better, will change it.

 
>> +        pr_warn("Memory leak detected. Freeing non-empty pool!\n");
>> +        zcache_dump_stats(zpool);
>> +    }
>> +
>> +    free_percpu(zpool->stats);
>> +    kfree(zpool);
>> +}
>> +
>> +/*
>> + * Allocate a new zcache pool and set default memlimit.
>> + *
>> + * Returns pool_id on success, negative error code otherwise.
>> + */
>> +int zcache_create_pool(void)
>> +{
>> +    int ret;
>> +    u64 memlimit;
>> +    struct zcache_pool *zpool = NULL;
>> +
>> +    spin_lock(&zcache->pool_lock);
>> +    if (zcache->num_pools == MAX_ZCACHE_POOLS) {
>> +        spin_unlock(&zcache->pool_lock);
>> +        pr_info("Cannot create new pool (limit: %u)\n",
>> +                    MAX_ZCACHE_POOLS);
>> +        ret = -EPERM;
>> +        goto out;
>> +    }
>> +    zcache->num_pools++;
>> +    spin_unlock(&zcache->pool_lock);
>> +
>> +    zpool = kzalloc(sizeof(*zpool), GFP_KERNEL);
>> +    if (!zpool) {
>> +        spin_lock(&zcache->pool_lock);
>> +        zcache->num_pools--;
>> +        spin_unlock(&zcache->pool_lock);
>> +        ret = -ENOMEM;
>> +        goto out;
>> +    }
> 
> Why not kmalloc() an new struct zcache_pool object first and then take zcache->pool_lock() and check for MAX_ZCACHE_POOLS? That should make the locking little less confusing here.
> 

kmalloc() before this check should be better. This also avoids unnecessary
num_pools decrement later if kmalloc fails.


>> +
>> +    src_data = kmap_atomic(page, KM_USER0);
>> +    dest_data = kmap_atomic(zpage, KM_USER1);
>> +    memcpy(dest_data, src_data, PAGE_SIZE);
>> +    kunmap_atomic(src_data, KM_USER0);
>> +    kunmap_atomic(dest_data, KM_USER1);
> 
> copy_highpage()
>

Ok. But we will again have to open-code this memcpy() when we start using
xvmalloc (patch 7/8). Same applies to another instance you pointed out.
 

>> +static int zcache_get_page(int pool_id, ino_t inode_no,
>> +            pgoff_t index, struct page *page)
>> +{
>> +    int ret = -1;
>> +    unsigned long flags;
>> +    struct page *src_page;
>> +    void *src_data, *dest_data;
>> +    struct zcache_inode_rb *znode;
>> +    struct zcache_pool *zpool = zcache->pools[pool_id];
>> +
>> +    znode = zcache_find_inode(zpool, inode_no);
>> +    if (!znode)
>> +        goto out;
>> +
>> +    BUG_ON(znode->inode_no != inode_no);
> 
> Maybe use WARN_ON here and return -1?
>

okay.


Thanks for the review.
Nitin

WARNING: multiple messages have this Message-ID (diff)

From: Nitin Gupta <ngupta@vflare.org>
To: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg KH <greg@kroah.com>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Rik van Riel <riel@redhat.com>, Avi Kivity <avi@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	linux-mm <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/8] Basic zcache functionality
Date: Sun, 18 Jul 2010 15:15:54 +0530	[thread overview]
Message-ID: <4C42CD52.3070601@vflare.org> (raw)
In-Reply-To: <4C42B7EA.4020409@cs.helsinki.fi>


On 07/18/2010 01:44 PM, Pekka Enberg wrote:
> Nitin Gupta wrote:
>> +/*
>> + * Individual percpu values can go negative but the sum across all CPUs
>> + * must always be positive (we store various counts). So, return sum as
>> + * unsigned value.
>> + */
>> +static u64 zcache_get_stat(struct zcache_pool *zpool,
>> +        enum zcache_pool_stats_index idx)
>> +{
>> +    int cpu;
>> +    s64 val = 0;
>> +
>> +    for_each_possible_cpu(cpu) {
>> +        unsigned int start;
>> +        struct zcache_pool_stats_cpu *stats;
>> +
>> +        stats = per_cpu_ptr(zpool->stats, cpu);
>> +        do {
>> +            start = u64_stats_fetch_begin(&stats->syncp);
>> +            val += stats->count[idx];
>> +        } while (u64_stats_fetch_retry(&stats->syncp, start));
> 
> Can we use 'struct percpu_counter' for this? OTOH, the warning on top of include/linux/percpu_counter.h makes me think not.
>

Yes, that warning only scared me :)

 
>> +    }
>> +
>> +    BUG_ON(val < 0);
> 
> BUG_ON() seems overly aggressive. How about
> 
>   if (WARN_ON(val < 0))
>           return 0;
> 

Yes, this sounds better. I will change it.


>> +    return val;
>> +}
>> +
>> +static void zcache_add_stat(struct zcache_pool *zpool,
>> +        enum zcache_pool_stats_index idx, s64 val)
>> +{
>> +    struct zcache_pool_stats_cpu *stats;
>> +
>> +    preempt_disable();
>> +    stats = __this_cpu_ptr(zpool->stats);
>> +    u64_stats_update_begin(&stats->syncp);
>> +    stats->count[idx] += val;
>> +    u64_stats_update_end(&stats->syncp);
>> +    preempt_enable();
> 
> What is the preempt_disable/preempt_enable trying to do here?
>

On 32-bit there will be no seqlock to protect this value. So, if we
get preempted after __this_cpu_ptr(), two CPUs can end up racy-writing
to the same variable. I think for the same reason this_cpu_add() finally
does increment with preempt disabled.

Also, I think we shouldn't use this_cpu_add (as you suggested in
another mail) since we have to do this_cpu_ptr() first to get access
to seqlock (stats->syncp) anyways. So, simple increment on thus
obtained pcpu pointer should be okay.

 
>> +static void zcache_destroy_pool(struct zcache_pool *zpool)
>> +{
>> +    int i;
>> +
>> +    if (!zpool)
>> +        return;
>> +
>> +    spin_lock(&zcache->pool_lock);
>> +    zcache->num_pools--;
>> +    for (i = 0; i < MAX_ZCACHE_POOLS; i++)
>> +        if (zcache->pools[i] == zpool)
>> +            break;
>> +    zcache->pools[i] = NULL;
>> +    spin_unlock(&zcache->pool_lock);
>> +
>> +    if (!RB_EMPTY_ROOT(&zpool->inode_tree)) {
> 
> Use WARN_ON here to get a stack trace?
>

This sounds better, will change it.

 
>> +        pr_warn("Memory leak detected. Freeing non-empty pool!\n");
>> +        zcache_dump_stats(zpool);
>> +    }
>> +
>> +    free_percpu(zpool->stats);
>> +    kfree(zpool);
>> +}
>> +
>> +/*
>> + * Allocate a new zcache pool and set default memlimit.
>> + *
>> + * Returns pool_id on success, negative error code otherwise.
>> + */
>> +int zcache_create_pool(void)
>> +{
>> +    int ret;
>> +    u64 memlimit;
>> +    struct zcache_pool *zpool = NULL;
>> +
>> +    spin_lock(&zcache->pool_lock);
>> +    if (zcache->num_pools == MAX_ZCACHE_POOLS) {
>> +        spin_unlock(&zcache->pool_lock);
>> +        pr_info("Cannot create new pool (limit: %u)\n",
>> +                    MAX_ZCACHE_POOLS);
>> +        ret = -EPERM;
>> +        goto out;
>> +    }
>> +    zcache->num_pools++;
>> +    spin_unlock(&zcache->pool_lock);
>> +
>> +    zpool = kzalloc(sizeof(*zpool), GFP_KERNEL);
>> +    if (!zpool) {
>> +        spin_lock(&zcache->pool_lock);
>> +        zcache->num_pools--;
>> +        spin_unlock(&zcache->pool_lock);
>> +        ret = -ENOMEM;
>> +        goto out;
>> +    }
> 
> Why not kmalloc() an new struct zcache_pool object first and then take zcache->pool_lock() and check for MAX_ZCACHE_POOLS? That should make the locking little less confusing here.
> 

kmalloc() before this check should be better. This also avoids unnecessary
num_pools decrement later if kmalloc fails.


>> +
>> +    src_data = kmap_atomic(page, KM_USER0);
>> +    dest_data = kmap_atomic(zpage, KM_USER1);
>> +    memcpy(dest_data, src_data, PAGE_SIZE);
>> +    kunmap_atomic(src_data, KM_USER0);
>> +    kunmap_atomic(dest_data, KM_USER1);
> 
> copy_highpage()
>

Ok. But we will again have to open-code this memcpy() when we start using
xvmalloc (patch 7/8). Same applies to another instance you pointed out.
 

>> +static int zcache_get_page(int pool_id, ino_t inode_no,
>> +            pgoff_t index, struct page *page)
>> +{
>> +    int ret = -1;
>> +    unsigned long flags;
>> +    struct page *src_page;
>> +    void *src_data, *dest_data;
>> +    struct zcache_inode_rb *znode;
>> +    struct zcache_pool *zpool = zcache->pools[pool_id];
>> +
>> +    znode = zcache_find_inode(zpool, inode_no);
>> +    if (!znode)
>> +        goto out;
>> +
>> +    BUG_ON(znode->inode_no != inode_no);
> 
> Maybe use WARN_ON here and return -1?
>

okay.


Thanks for the review.
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-07-18  9:45 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-16 12:37 [PATCH 0/8] zcache: page cache compression support Nitin Gupta
2010-07-16 12:37 ` Nitin Gupta
2010-07-16 12:37 ` [PATCH 1/8] Allow sharing xvmalloc for zram and zcache Nitin Gupta
2010-07-16 12:37   ` Nitin Gupta
2010-07-17 18:10   ` Rik van Riel
2010-07-17 18:10     ` Rik van Riel
2010-07-16 12:37 ` [PATCH 2/8] Basic zcache functionality Nitin Gupta
2010-07-16 12:37   ` Nitin Gupta
2010-07-18  8:14   ` Pekka Enberg
2010-07-18  8:14     ` Pekka Enberg
2010-07-18  9:45     ` Nitin Gupta [this message]
2010-07-18  9:45       ` Nitin Gupta
2010-07-18  8:27   ` Pekka Enberg
2010-07-18  8:27     ` Pekka Enberg
2010-07-18  8:44   ` Eric Dumazet
2010-07-18  8:44     ` Eric Dumazet
2010-07-18  9:51     ` Nitin Gupta
2010-07-18  9:51       ` Nitin Gupta
2010-07-16 12:37 ` [PATCH 3/8] Create sysfs nodes and export basic statistics Nitin Gupta
2010-07-16 12:37   ` Nitin Gupta
2010-07-16 12:37 ` [PATCH 4/8] Shrink zcache based on memlimit Nitin Gupta
2010-07-16 12:37   ` Nitin Gupta
2010-07-20 23:03   ` Minchan Kim
2010-07-20 23:03     ` Minchan Kim
2010-07-21  4:52     ` Nitin Gupta
2010-07-21  4:52       ` Nitin Gupta
2010-07-21 11:32       ` Ed Tomlinson
2010-07-21 11:32         ` Ed Tomlinson
2010-07-23 19:23         ` Nitin Gupta
2010-07-23 19:23           ` Nitin Gupta
2010-07-16 12:37 ` [PATCH 5/8] Eliminate zero-filled pages Nitin Gupta
2010-07-16 12:37   ` Nitin Gupta
2010-07-16 12:37 ` [PATCH 6/8] Compress pages using LZO Nitin Gupta
2010-07-16 12:37   ` Nitin Gupta
2010-07-16 12:37 ` [PATCH 7/8] Use xvmalloc to store compressed chunks Nitin Gupta
2010-07-16 12:37   ` Nitin Gupta
2010-07-18  7:53   ` Pekka Enberg
2010-07-18  7:53     ` Pekka Enberg
2010-07-18  8:21     ` Nitin Gupta
2010-07-18  8:21       ` Nitin Gupta
2010-07-19  4:36       ` Minchan Kim
2010-07-19  4:36         ` Minchan Kim
2010-07-19  6:48         ` Nitin Gupta
2010-07-19  6:48           ` Nitin Gupta
2010-07-16 12:37 ` [PATCH 8/8] Document sysfs entries Nitin Gupta
2010-07-16 12:37   ` Nitin Gupta
2010-07-17 21:13 ` [PATCH 0/8] zcache: page cache compression support Ed Tomlinson
2010-07-17 21:13   ` Ed Tomlinson
2010-07-18  2:23   ` Nitin Gupta
2010-07-18  2:23     ` Nitin Gupta
2010-07-18  7:50 ` Pekka Enberg
2010-07-18  7:50   ` Pekka Enberg
2010-07-18  8:12   ` Nitin Gupta
2010-07-18  8:12     ` Nitin Gupta
2010-07-19 19:57 ` Dan Magenheimer
2010-07-19 19:57   ` Dan Magenheimer
2010-07-20 13:50   ` Nitin Gupta
2010-07-20 13:50     ` Nitin Gupta
2010-07-20 14:28     ` Dan Magenheimer
2010-07-20 14:28       ` Dan Magenheimer
2010-07-21  4:27       ` Nitin Gupta
2010-07-21  4:27         ` Nitin Gupta
2010-07-21 17:37         ` Dan Magenheimer
2010-07-21 17:37           ` Dan Magenheimer
2010-07-22 19:14 ` Greg KH
2010-07-22 19:14   ` Greg KH
2010-07-22 19:54   ` Dan Magenheimer
2010-07-22 19:54     ` Dan Magenheimer
2010-07-22 21:00     ` Greg KH
2010-07-22 21:00       ` Greg KH
2011-01-10 13:16 ` Kirill A. Shutemov
2011-01-10 13:16   ` Kirill A. Shutemov
2011-01-18 17:53   ` Dan Magenheimer
2011-01-18 17:53     ` Dan Magenheimer
2011-01-20 12:33     ` Nitin Gupta
2011-01-20 12:33       ` Nitin Gupta
2011-01-20 12:47       ` Christoph Hellwig
2011-01-20 12:47         ` Christoph Hellwig
2011-01-20 13:16         ` Pekka Enberg
2011-01-20 13:16           ` Pekka Enberg
2011-01-20 13:58           ` Nitin Gupta
2011-01-20 13:58             ` Nitin Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C42CD52.3070601@vflare.org \
    --to=ngupta@vflare.org \
    --cc=akpm@linux-foundation.org \
    --cc=avi@redhat.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=greg@kroah.com \
    --cc=hch@infradead.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=penberg@cs.helsinki.fi \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.