From: Vladimir Davydov <vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Michel Lespinasse
<walken-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>,
Cyrill Gorcunov
<gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org>,
linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Christoph Lameter
<cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
"Paul E. McKenney"
<paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
Peter Zijlstra
<a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>
Subject: Re: [PATCH v3 3/3] proc: add kpageidle file
Date: Sun, 10 May 2015 13:34:29 +0300 [thread overview]
Message-ID: <20150510103429.GA17628@esperanza> (raw)
In-Reply-To: <20150509151031.GA24141@blaptop>
On Sun, May 10, 2015 at 12:12:38AM +0900, Minchan Kim wrote:
> On Fri, May 08, 2015 at 12:56:04PM +0300, Vladimir Davydov wrote:
> > On Mon, May 04, 2015 at 07:54:59PM +0900, Minchan Kim wrote:
> > > So, I guess once below compiler optimization happens in __page_set_anon_rmap,
> > > it could be corrupt in page_refernced.
> > >
> > > __page_set_anon_rmap:
> > > page->mapping = (struct address_space *) anon_vma;
> > > page->mapping = (struct address_space *)((void *)page_mapping + PAGE_MAPPING_ANON);
> > >
> > > Because page_referenced checks it with PageAnon which has no memory barrier.
> > > So if above compiler optimization happens, page_referenced can pass the anon
> > > page in rmap_walk_file, not ramp_walk_anon. It's my theory. :)
> >
> > FWIW
> >
> > If such splits were possible, we would have bugs all over the kernel
> > IMO. An example is do_wp_page() vs shrink_active_list(). In do_wp_page()
> > we can call page_move_anon_rmap(), which sets page->mapping in exactly
> > the same fashion as above-mentioned __page_set_anon_rmap():
> >
> > anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
> > page->mapping = (struct address_space *) anon_vma;
> >
> > The page in question may be on an LRU list, because nowhere in
> > do_wp_page() we remove it from the list, neither do we take any LRU
> > related locks. The page is locked, that's true, but shrink_active_list()
> > calls page_referenced() on an unlocked page, so according to your logic
> > they can race with the latter receiving a page with page->mapping equal
> > to anon_vma w/o PAGE_MAPPING_ANON bit set:
> >
> > CPU0 CPU1
> > ---- ----
> > do_wp_page shrink_active_list
> > lock_page page_referenced
> > PageAnon->yes, so skip trylock_page
> > page_move_anon_rmap
> > page->mapping = anon_vma
> > rmap_walk
> > PageAnon->no
> > rmap_walk_file
> > BUG
> > page->mapping = page->mapping+PAGE_MAPPING_ANON
> >
> > However, this does not happen.
>
> Good spot.
>
> However, it doesn't mean it's right so you are okay to rely on it.
> Normally, store tearing is not common and such race would be hard to hit
> but I want to call it as BUG.
But then we should call atomic64_set/atomic_long_set a big fat bug,
because it does not use ACCESS_ONCE/volatile stuff on its argument, so
it is prone to write tearing and therefore it is not atomic at all.
>
> Rik wrote the code and commented out.
>
> "Protected against the rmap code by the page lock"
>
> But unfortunately, page_referenced in shrink_active_list doesn't hold
> a page lock so isn't it a bug? Rik?
>
> Please, read store tearing section in Documentation/memory-barrier.txt.
> If you get confused due to aligned memory, please read this link.
>
> https://lkml.org/lkml/2014/7/16/262
I've read it. It describes tearing of
p = 0x00010002;
to
*(u16 *)&p = 0x2;
*((u16 *)&p+1) = 0x1
to avoid computation of 0x00010002 by using two 16-bit immediate-store.
AFAIU that isn't nearly the case in __page_set_anon_rmap:
anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
page->mapping = (struct address_space *) anon_vma;
The compiler doesn't know the value of anon_vma so there is absolutely
no benefit in tearing it - it would only result in two vs one store. I
admit we cannot rule out that some mad compiler can do that, but IMO
that would be a compiler bug, which would result in the kernel tearing
apart.
>
> Other quote from Paul in https://lkml.org/lkml/2015/5/1/229
> "
> ..
> If the thing read/written does fit into a machine word and if the location
> read/written is properly aligned, I would be quite surprised if either
> READ_ONCE() or WRITE_ONCE() resulted in any sort of tearing.
> "
>
> I parsed it as that "even store tearing can happen machine word at
> alinged address and that's why WRITE_ONCE is there to prevent it"
That's a sort of reading between the lines, I can't see it's written here.
>
> If you want to claim GCC doesn't do it, please read below links
>
> https://lkml.org/lkml/2015/4/16/527
> http://yarchive.net/comp/linux/ACCESS_ONCE.html
>
> Quote from Linus
> "
> The thing is, you can't _prove_ that the compiler won't do it, especially
> if you end up changing the code later (without thinking about the fact
> that you're loading things without locking).
>
> So the rule is: if you access unlocked values, you use ACCESS_ONCE(). You
> don't say "but it can't matter". Because you simply don't know.
> "
You took this citation from the context, which has nothing to do with
read/store tearing. It's about the value consistency in some statement.
E.g. in the following statement
int i = x;
if (i > y)
y = i;
we do need ACCESS_ONCE around x, because the compiler is free to fetch
its value twice, in the comparison and the assignment. But it's not
about read/write tearing.
>
> Yeb, I might be paranoid but my point is it might work now on most of
> arch but it seem to be buggy/fragile/subtle because we couldn't prove
> all arch/compiler don't make any trouble. So, intead of adding more
> logics based on fragile, please use right lock model. If lock becomes
> big trouble by overhead, let's fix it(for instance, use WRITE_ONCE for
> update-side and READ_ONCE for read-side) if I don't miss something.
IMO, locking would be an overkill. READ_ONCE is OK, because it has no
performance implications, but I would prefer to be convinced that it is
100% necessary before adding it just in case.
Thanks,
Vladimir
next prev parent reply other threads:[~2015-05-10 10:34 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-28 12:24 [PATCH v3 0/3] idle memory tracking Vladimir Davydov
2015-04-28 12:24 ` [PATCH v3 1/3] memcg: add page_cgroup_ino helper Vladimir Davydov
2015-04-28 12:24 ` [PATCH v3 2/3] proc: add kpagecgroup file Vladimir Davydov
2015-04-28 12:24 ` [PATCH v3 3/3] proc: add kpageidle file Vladimir Davydov
2015-04-29 4:35 ` Minchan Kim
2015-04-29 9:12 ` Vladimir Davydov
2015-04-30 8:25 ` Minchan Kim
2015-04-30 14:50 ` Vladimir Davydov
2015-05-04 3:17 ` Minchan Kim
2015-05-04 9:49 ` Vladimir Davydov
2015-05-04 10:54 ` Minchan Kim
2015-05-08 9:56 ` Vladimir Davydov
2015-05-09 15:12 ` Minchan Kim
2015-05-10 10:34 ` Vladimir Davydov [this message]
2015-05-12 9:41 ` Vladimir Davydov
[not found] ` <4c24a6bf2c9711dd4dbb72a43a16eba6867527b7.1430217477.git.vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2015-04-29 4:57 ` Minchan Kim
2015-04-29 8:31 ` Vladimir Davydov
2015-04-30 6:55 ` Minchan Kim
2015-04-29 3:57 ` [PATCH v3 0/3] idle memory tracking Minchan Kim
2015-04-29 7:58 ` Vladimir Davydov
[not found] ` <cover.1430217477.git.vdavydov-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2015-04-29 5:02 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150510103429.GA17628@esperanza \
--to=vdavydov-bzqdu9zft3wakbo8gow8eq@public.gmane.org \
--cc=a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=corbet-T1hC0tSOHrs@public.gmane.org \
--cc=gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
--cc=gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
--cc=riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=walken-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).