All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@linux.intel.com>
To: Rui Teng <rui.teng@linux.vnet.ibm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Michal Hocko <mhocko@suse.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	Santhosh G <santhog4@in.ibm.com>
Subject: Re: [PATCH] memory-hotplug: Fix bad area access on dissolve_free_huge_pages()
Date: Tue, 20 Sep 2016 07:53:24 -0700	[thread overview]
Message-ID: <57E14D64.6090609@linux.intel.com> (raw)
In-Reply-To: <7e642622-72ee-87f6-ceb0-890ce9c28382@linux.vnet.ibm.com>

On 09/20/2016 07:45 AM, Rui Teng wrote:
> On 9/17/16 12:25 AM, Dave Hansen wrote:
>>
>> That's an interesting data point, but it still doesn't quite explain
>> what is going on.
>>
>> It seems like there might be parts of gigantic pages that have
>> PageHuge() set on tail pages, while other parts don't.  If that's true,
>> we have another bug and your patch just papers over the issue.
>>
>> I think you really need to find the root cause before we apply this
>> patch.
>>
> The root cause is the test scripts(tools/testing/selftests/memory-
> hotplug/mem-on-off-test.sh) changes online/offline status on memory
> blocks other than page header. It will *randomly* select 10% memory
> blocks from /sys/devices/system/memory/memory*, and change their
> online/offline status.

Ahh, that does explain it!  Thanks for digging into that!

> That's why we need a PageHead() check now, and why this problem does
> not happened on systems with smaller huge page such as 16M.
> 
> As far as the PageHuge() set, I think PageHuge() will return true for
> all tail pages. Because it will get the compound_head for tail page,
> and then get its huge page flag.
>     page = compound_head(page);
> 
> And as far as the failure message, if one memory block is in use, it
> will return failure when offline it.

That's good, but aren't we still left with a situation where we've
offlined and dissolved the _middle_ of a gigantic huge page while the
head page is still in place and online?

That seems bad.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <dave.hansen@linux.intel.com>
To: Rui Teng <rui.teng@linux.vnet.ibm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Michal Hocko <mhocko@suse.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	Santhosh G <santhog4@in.ibm.com>
Subject: Re: [PATCH] memory-hotplug: Fix bad area access on dissolve_free_huge_pages()
Date: Tue, 20 Sep 2016 07:53:24 -0700	[thread overview]
Message-ID: <57E14D64.6090609@linux.intel.com> (raw)
In-Reply-To: <7e642622-72ee-87f6-ceb0-890ce9c28382@linux.vnet.ibm.com>

On 09/20/2016 07:45 AM, Rui Teng wrote:
> On 9/17/16 12:25 AM, Dave Hansen wrote:
>>
>> That's an interesting data point, but it still doesn't quite explain
>> what is going on.
>>
>> It seems like there might be parts of gigantic pages that have
>> PageHuge() set on tail pages, while other parts don't.  If that's true,
>> we have another bug and your patch just papers over the issue.
>>
>> I think you really need to find the root cause before we apply this
>> patch.
>>
> The root cause is the test scripts(tools/testing/selftests/memory-
> hotplug/mem-on-off-test.sh) changes online/offline status on memory
> blocks other than page header. It will *randomly* select 10% memory
> blocks from /sys/devices/system/memory/memory*, and change their
> online/offline status.

Ahh, that does explain it!  Thanks for digging into that!

> That's why we need a PageHead() check now, and why this problem does
> not happened on systems with smaller huge page such as 16M.
> 
> As far as the PageHuge() set, I think PageHuge() will return true for
> all tail pages. Because it will get the compound_head for tail page,
> and then get its huge page flag.
>     page = compound_head(page);
> 
> And as far as the failure message, if one memory block is in use, it
> will return failure when offline it.

That's good, but aren't we still left with a situation where we've
offlined and dissolved the _middle_ of a gigantic huge page while the
head page is still in place and online?

That seems bad.

  reply	other threads:[~2016-09-20 14:53 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-13  8:39 [PATCH] memory-hotplug: Fix bad area access on dissolve_free_huge_pages() Rui Teng
2016-09-13  8:39 ` Rui Teng
2016-09-13 17:32 ` Dave Hansen
2016-09-13 17:32   ` Dave Hansen
2016-09-14 16:33   ` Rui Teng
2016-09-14 16:33     ` Rui Teng
2016-09-14 16:37     ` Dave Hansen
2016-09-14 16:37       ` Dave Hansen
2016-09-16 13:58       ` Rui Teng
2016-09-16 13:58         ` Rui Teng
2016-09-16 16:25         ` Dave Hansen
2016-09-16 16:25           ` Dave Hansen
2016-09-20 14:45           ` Rui Teng
2016-09-20 14:45             ` Rui Teng
2016-09-20 14:53             ` Dave Hansen [this message]
2016-09-20 14:53               ` Dave Hansen
2016-09-20 15:52               ` Rui Teng
2016-09-20 15:52                 ` Rui Teng
2016-09-20 17:43                 ` Dave Hansen
2016-09-20 17:43                   ` Dave Hansen
2016-09-21 12:05                   ` Michal Hocko
2016-09-21 12:05                     ` Michal Hocko
2016-09-21 16:04                     ` Dave Hansen
2016-09-21 16:04                       ` Dave Hansen
2016-09-21 16:27                       ` Michal Hocko
2016-09-21 16:27                         ` Michal Hocko
2016-09-21 16:32                         ` Dave Hansen
2016-09-21 16:32                           ` Dave Hansen
2016-09-21 16:52                           ` Michal Hocko
2016-09-21 16:52                             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57E14D64.6090609@linux.intel.com \
    --to=dave.hansen@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=paul.gortmaker@windriver.com \
    --cc=rui.teng@linux.vnet.ibm.com \
    --cc=santhog4@in.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.