From: Vlastimil Babka <vbabka@suse.cz>
To: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
Junil Lee <junil0814.lee@lge.com>,
ngupta@vflare.org, akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition
Date: Mon, 18 Jan 2016 13:18:31 +0100 [thread overview]
Message-ID: <569CD817.7090309@suse.cz> (raw)
In-Reply-To: <20160118082000.GA20244@bbox>
On 01/18/2016 09:20 AM, Minchan Kim wrote:
> On Mon, Jan 18, 2016 at 08:54:07AM +0100, Vlastimil Babka wrote:
>> On 18.1.2016 8:39, Sergey Senozhatsky wrote:
>>> On (01/18/16 16:11), Minchan Kim wrote:
>>> [..]
>>>>> so, even if clear_bit_unlock/test_and_set_bit_lock do smp_mb or
>>>>> barrier(), there is no corresponding barrier from record_obj()->WRITE_ONCE().
>>>>> so I don't think WRITE_ONCE() will help the compiler, or am I missing
>>>>> something?
>>>>
>>>> We need two things
>>>> 2. memory barrier.
>>>>
>>>> As compiler barrier, WRITE_ONCE works to prevent store tearing here
>>>> by compiler.
>>>> However, if we omit unpin_tag here, we lose memory barrier(e,g, smp_mb)
>>>> so another CPU could see stale data caused CPU memory reordering.
>>>
>>> oh... good find! lost release semantic of unpin_tag()...
>>
>> Ah, release semantic, good point indeed. OK then we need the v2 approach again,
>> with WRITE_ONCE() in record_obj(). Or some kind of record_obj_release() with
>> release semantic, which would be a bit more effective, but I guess migration is
>> not that critical path to be worth introducing it.
>
> WRITE_ONCE in record_obj would add more memory operations in obj_malloc
A simple WRITE_ONCE would just add a compiler barrier. What you suggests
below does indeed add more operations, which are actually needed just in
the migration. What I suggested is the v2 approach of adding the PIN bit
before calling record_obj, *and* simply doing a WRITE_ONCE in
record_obj() to make sure the PIN bit is indeed applied *before* writing
to the handle, and not as two separate writes.
> but I don't feel it's too heavy in this phase so,
I'm afraid it's dangerous for the usage of record_obj() in zs_malloc()
where the handle is freshly allocated by alloc_handle(). Are we sure the
bit is not set?
The code in alloc_handle() is:
return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
pool->flags & ~__GFP_HIGHMEM);
There's no explicit __GFP_ZERO, so the handles are not guaranteed to be
allocated empty? And expecting all zpool users to include __GFP_ZERO in
flags would be too subtle and error prone.
> How about this? Junil, Could you resend patch if others agree this?
> Thanks.
>
> +/*
> + * record_obj updates handle's value to free_obj and it shouldn't
> + * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, otherwise
> + * it breaks synchronization using pin_tag(e,g, zs_free) so let's
> + * keep the lock bit.
> + */
> static void record_obj(unsigned long handle, unsigned long obj)
> {
> - *(unsigned long *)handle = obj;
> + int locked = (*(unsigned long *)handle) & (1<<HANDLE_PIN_BIT);
> + unsigned long val = obj | locked;
> +
> + /*
> + * WRITE_ONCE could prevent store tearing like below
> + * *(unsigned long *)handle = free_obj
> + * *(unsigned long *)handle |= locked;
> + */
> + WRITE_ONCE(*(unsigned long *)handle, val);
> }
>
>
>
>>
>> Thanks,
>> Vlastimil
>>
>>>
>>> -ss
>>>
>>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-01-18 12:18 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-18 5:39 [PATCH v3] zsmalloc: fix migrate_zspage-zs_free race condition Junil Lee
2016-01-18 6:13 ` Sergey Senozhatsky
2016-01-18 6:36 ` Minchan Kim
2016-01-18 6:54 ` Sergey Senozhatsky
2016-01-18 7:11 ` Minchan Kim
2016-01-18 7:39 ` Sergey Senozhatsky
2016-01-18 7:54 ` Vlastimil Babka
2016-01-18 8:20 ` Minchan Kim
2016-01-18 11:08 ` Sergey Senozhatsky
2016-01-18 12:18 ` Vlastimil Babka [this message]
2016-01-18 14:09 ` Minchan Kim
2016-01-18 14:10 ` Vlastimil Babka
-- strict thread matches above, loose matches on Subject: below --
2016-01-18 1:15 Junil Lee
2016-01-18 4:14 ` Sergey Senozhatsky
2016-01-18 4:17 ` Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=569CD817.7090309@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=junil0814.lee@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=ngupta@vflare.org \
--cc=sergey.senozhatsky.work@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).