linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gerald Schaefer <gerald.schaefer@de.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: kbuild test robot <lkp@intel.com>,
	Michael Holzheu <holzheu@linux.vnet.ibm.com>,
	kbuild-all@01.org, Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-mm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] numa: fix /proc/<pid>/numa_maps on s390
Date: Fri, 22 Jan 2016 14:37:04 +0100	[thread overview]
Message-ID: <20160122143704.515a5816@thinkpad> (raw)
In-Reply-To: <20160121111919.8327b22f04e790147eefc043@linux-foundation.org>

On Thu, 21 Jan 2016 11:19:19 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 21 Jan 2016 15:57:32 +0100 Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:
> 
> > > --- a/fs/proc/task_mmu.c~numa-fix-proc-pid-numa_maps-on-s390-fix
> > > +++ a/fs/proc/task_mmu.c
> > > @@ -1523,6 +1523,7 @@ static int gather_pte_stats(pmd_t *pmd,
> > >  	pte_t *pte;
> > > 
> > >  	ptl = pmd_trans_huge_lock(pmd, vma);
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > >  	if (ptl) {
> > >  		pte_t huge_pte = huge_ptep_get((pte_t *)pmd);
> > >  		struct page *page;
> > > @@ -1534,6 +1535,7 @@ static int gather_pte_stats(pmd_t *pmd,
> > >  		spin_unlock(ptl);
> > >  		return 0;
> > >  	}
> > > +#endif
> > > 
> > >  	if (pmd_trans_unstable(pmd))
> > >  		return 0;
> > 
> > Hi Andrew,
> > 
> > Unfortunately this seems to be a lot more complicated than we thought.
> > huge_ptep_get() is only defined when CONFIG_HUGETLB_PAGE=y, independent
> > from CONFIG_TRANSPARENT_HUGEPAGE. This is because asm/hugetlb.h is only
> > included from linux/hugetlb.h when CONFIG_HUGETLB_PAGE=y.
> > 
> > So this fix won't fix the build error when CONFIG_HUGETLBFS=n and
> > CONFIG_TRANSPARENT_HUGEPAGE=y.
> > 
> > Since the THP code did not repeat the flaws of the hugtelbfs code, i.e.
> > it is actually working on PMD entries and not PTE entries, there was
> > no need for huge_ptep_get() for THP so far.
> > 
> > Now it seems that the THP code in gather_pte_stats() is an exception to
> > this, as it is not working on a PMD like the rest of the THP code, but
> > also on a fake "PTE" like the hugetlbfs code.
> > 
> > I guess this needs more thinking, two options are crossing my mind:
> > - Fix the THP code in gather_pte_stats() to properly use a PMD instead of
> >   PTE. This would probably require something like a "_pmd" version of
> >   "can_gather_numa_stats()" and a pmd_dirty() check for the
> >   gather_stats() parameter.
> > - Make huge_ptep_get() also available for CONFIG_HUGETLBFS=n, perhaps
> >   by introducing something like HAVE_ARCH_HUGE_PTEP_GET and implementing
> >   the default NOP version in linux/hugetlbfs.h instead of the individual
> >   asm/hugetlbfs.h files for all archs.
> > 
> > The first option seems more correct, but it might entail other problems.
> > The second option would also introduce new problems on s390, where the
> > implementation of huge_ptep_get() in arch/s390/mm/hugetlbpage.c is currently
> > only built with CONFIG_HUGETLBFS=y, but I guess we could handle that.
> > 
> > Any thoughts / more ideas?
> > 
> 
> The first option does of course sound better.
> 
> But you need numa_maps fixed, presumably in 4.5 and possibly backported
> into -stable? (The changelog doesn't describe the end-user-visible
> effects of the bug.  Naughty changelog!)
> 
> So is there some minimal thing we can do for now to get things working
> properly and fix it for-real in 4.6?
> 

Right, the effects are worth a closer look. NUMA accounting is affected
for THP and hugetlbfs mappings on s390, and only the latter can be fixed
with huge_ptep_get() as in Michaels patch.

On current kernel levels, at least since we have NUMA on s390, both
accountings should actually work by chance for most pages, apart from
PROT_NONE PMDs where pte_present() will return 0. Dirty page accounting is
also broken, as pte_dirty() will always return 0 for PMDs.

Initially, I also had concerns about a potentially faulty memory access,
caused by falsely calculated pfn values and struct page addresses, but it
looks like we can only get "offsets" within one large page frame, where
all struct pages should exist.

So, the minimal thing to do for now would probably be to fix only
gather_hugetlb_stats(), i.e. take only the second hunk of Michaels patch.
I guess it would be best if we send a new patch, maybe even with a better
description, and revert what has been added so far.

      reply	other threads:[~2016-01-22 13:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-20 18:17 [PATCH] numa: fix /proc/<pid>/numa_maps on s390 Michael Holzheu
2016-01-20 19:32 ` kbuild test robot
2016-01-20 20:26   ` Andrew Morton
2016-01-21 14:57     ` Gerald Schaefer
2016-01-21 19:19       ` Andrew Morton
2016-01-22 13:37         ` Gerald Schaefer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160122143704.515a5816@thinkpad \
    --to=gerald.schaefer@de.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=holzheu@linux.vnet.ibm.com \
    --cc=kbuild-all@01.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).