* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-14 15:48 ` Haren Myneni
0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-14 15:48 UTC (permalink / raw)
To: Mel Gorman
Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar,
srikar
On 5/14/15, Mel Gorman <mgorman@suse.de> wrote:
> On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
>> Hi,
>>
>> I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
>> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
>> 1.6 TB memory. We can easily recreate this issue with kernel compile
>> (make -j500). But I could not reproduce with numa_balancing=disable.
>>
>
> Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
> match up with a BUG_ON. It's close to a PageLocked check but I want to
> be sure there are no other modifications.
Mel, Thanks for your help. I added some printks and dump_page() to get
the page struct and swp_entry information.
>
> Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
> easily reproduced, can the problem be bisected please?
I did not try previous versions other than RHEL kernel (3.10.*). I
will try with previous versions.
In the failure case, also noticed pte and address values are matched
in try_to_unmap_one() and remove_migration_pte(), but entry
(swp_entry_t) value is different. So looks like page strut address in
migration_entry_to_page() is not valid.
try_to_unmap_one()
{
...
} else if (IS_ENABLED(CONFIG_MIGRATION)) {
/*
* Store the pfn of the page in a special migration
* pte. do_swap_page() will wait until the migration
* pte is removed and then restart fault handling.
*/
BUG_ON(!(flags & TTU_MIGRATION));
entry = make_migration_entry(page, pte_write(pteval));
}
swp_pte = swp_entry_to_pte(entry);
if (pte_soft_dirty(pteval))
swp_pte = pte_swp_mksoft_dirty(swp_pte);
set_pte_at(mm, address, pte, swp_pte);
/*pte=0xb16b8d0f80000000 address=0x100008150000
page=0xf000000513f3e1e0 entry=0x3e0000000ec5ae34 */
...
}
remove_migration_pte()
{
...
/* address=0x100008150000 pte=0xb16b8d0f80000000
*old=0xf000000513f3e1e0 */
if (!is_migration_entry(entry) ||
migration_entry_to_page(entry) != old)
goto unlock;
...
}
migration_entry_to_page() {
pte=0xb16b8d0f80000000 entry=0x3e00000002c5ae34
page=0xf0000000f3f3e1e0
}
Thanks
Haren
>
> --
> Mel Gorman
> SUSE Labs
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-14 15:48 ` Haren Myneni
0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-14 15:48 UTC (permalink / raw)
To: Mel Gorman
Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar,
srikar
On 5/14/15, Mel Gorman <mgorman@suse.de> wrote:
> On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
>> Hi,
>>
>> I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
>> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
>> 1.6 TB memory. We can easily recreate this issue with kernel compile
>> (make -j500). But I could not reproduce with numa_balancing=disable.
>>
>
> Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
> match up with a BUG_ON. It's close to a PageLocked check but I want to
> be sure there are no other modifications.
Mel, Thanks for your help. I added some printks and dump_page() to get
the page struct and swp_entry information.
>
> Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
> easily reproduced, can the problem be bisected please?
I did not try previous versions other than RHEL kernel (3.10.*). I
will try with previous versions.
In the failure case, also noticed pte and address values are matched
in try_to_unmap_one() and remove_migration_pte(), but entry
(swp_entry_t) value is different. So looks like page strut address in
migration_entry_to_page() is not valid.
try_to_unmap_one()
{
...
} else if (IS_ENABLED(CONFIG_MIGRATION)) {
/*
* Store the pfn of the page in a special migration
* pte. do_swap_page() will wait until the migration
* pte is removed and then restart fault handling.
*/
BUG_ON(!(flags & TTU_MIGRATION));
entry = make_migration_entry(page, pte_write(pteval));
}
swp_pte = swp_entry_to_pte(entry);
if (pte_soft_dirty(pteval))
swp_pte = pte_swp_mksoft_dirty(swp_pte);
set_pte_at(mm, address, pte, swp_pte);
/*pte=0xb16b8d0f80000000 address=0x100008150000
page=0xf000000513f3e1e0 entry=0x3e0000000ec5ae34 */
...
}
remove_migration_pte()
{
...
/* address=0x100008150000 pte=0xb16b8d0f80000000
*old=0xf000000513f3e1e0 */
if (!is_migration_entry(entry) ||
migration_entry_to_page(entry) != old)
goto unlock;
...
}
migration_entry_to_page() {
pte=0xb16b8d0f80000000 entry=0x3e00000002c5ae34
page=0xf0000000f3f3e1e0
}
Thanks
Haren
>
> --
> Mel Gorman
> SUSE Labs
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
2015-05-14 15:48 ` Haren Myneni
(?)
@ 2015-05-18 7:32 ` Haren Myneni
-1 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-18 7:32 UTC (permalink / raw)
To: Mel Gorman; +Cc: srikar, linux-kernel, linux-mm, aneesh.kumar, linuxppc-dev
Mel,
I am hitting this issue with 4.0 kernel and even with 3.19 and
3.17 kernels. I will also try with previous versions. Please let me
know any suggestions on the debugging.
Thanks
Haren
On 5/14/15, Haren Myneni <hmyneni@gmail.com> wrote:
> On 5/14/15, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
>>> Hi,
>>>
>>> I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
>>> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
>>> 1.6 TB memory. We can easily recreate this issue with kernel compile
>>> (make -j500). But I could not reproduce with numa_balancing=disable.
>>>
>>
>> Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
>> match up with a BUG_ON. It's close to a PageLocked check but I want to
>> be sure there are no other modifications.
>
> Mel, Thanks for your help. I added some printks and dump_page() to get
> the page struct and swp_entry information.
>
>>
>> Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
>> easily reproduced, can the problem be bisected please?
>
> I did not try previous versions other than RHEL kernel (3.10.*). I
> will try with previous versions.
>
> In the failure case, also noticed pte and address values are matched
> in try_to_unmap_one() and remove_migration_pte(), but entry
> (swp_entry_t) value is different. So looks like page strut address in
> migration_entry_to_page() is not valid.
>
> try_to_unmap_one()
> {
>
> ...
> } else if (IS_ENABLED(CONFIG_MIGRATION)) {
> /*
> * Store the pfn of the page in a special migration
> * pte. do_swap_page() will wait until the
> migration
> * pte is removed and then restart fault handling.
> */
> BUG_ON(!(flags & TTU_MIGRATION));
> entry = make_migration_entry(page,
> pte_write(pteval));
> }
> swp_pte = swp_entry_to_pte(entry);
> if (pte_soft_dirty(pteval))
> swp_pte = pte_swp_mksoft_dirty(swp_pte);
> set_pte_at(mm, address, pte, swp_pte);
>
> /*pte=0xb16b8d0f80000000 address=0x100008150000
> page=0xf000000513f3e1e0 entry=0x3e0000000ec5ae34 */
> ...
> }
>
> remove_migration_pte()
> {
> ...
> /* address=0x100008150000 pte=0xb16b8d0f80000000
> *old=0xf000000513f3e1e0 */
> if (!is_migration_entry(entry) ||
> migration_entry_to_page(entry) != old)
> goto unlock;
> ...
> }
>
> migration_entry_to_page() {
> pte=0xb16b8d0f80000000 entry=0x3e00000002c5ae34
> page=0xf0000000f3f3e1e0
> }
>
>
> Thanks
> Haren
>
>>
>> --
>> Mel Gorman
>> SUSE Labs
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-18 7:32 ` Haren Myneni
0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-18 7:32 UTC (permalink / raw)
To: Mel Gorman
Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar,
srikar
Mel,
I am hitting this issue with 4.0 kernel and even with 3.19 and
3.17 kernels. I will also try with previous versions. Please let me
know any suggestions on the debugging.
Thanks
Haren
On 5/14/15, Haren Myneni <hmyneni@gmail.com> wrote:
> On 5/14/15, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
>>> Hi,
>>>
>>> I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
>>> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
>>> 1.6 TB memory. We can easily recreate this issue with kernel compile
>>> (make -j500). But I could not reproduce with numa_balancing=disable.
>>>
>>
>> Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
>> match up with a BUG_ON. It's close to a PageLocked check but I want to
>> be sure there are no other modifications.
>
> Mel, Thanks for your help. I added some printks and dump_page() to get
> the page struct and swp_entry information.
>
>>
>> Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
>> easily reproduced, can the problem be bisected please?
>
> I did not try previous versions other than RHEL kernel (3.10.*). I
> will try with previous versions.
>
> In the failure case, also noticed pte and address values are matched
> in try_to_unmap_one() and remove_migration_pte(), but entry
> (swp_entry_t) value is different. So looks like page strut address in
> migration_entry_to_page() is not valid.
>
> try_to_unmap_one()
> {
>
> ...
> } else if (IS_ENABLED(CONFIG_MIGRATION)) {
> /*
> * Store the pfn of the page in a special migration
> * pte. do_swap_page() will wait until the
> migration
> * pte is removed and then restart fault handling.
> */
> BUG_ON(!(flags & TTU_MIGRATION));
> entry = make_migration_entry(page,
> pte_write(pteval));
> }
> swp_pte = swp_entry_to_pte(entry);
> if (pte_soft_dirty(pteval))
> swp_pte = pte_swp_mksoft_dirty(swp_pte);
> set_pte_at(mm, address, pte, swp_pte);
>
> /*pte=0xb16b8d0f80000000 address=0x100008150000
> page=0xf000000513f3e1e0 entry=0x3e0000000ec5ae34 */
> ...
> }
>
> remove_migration_pte()
> {
> ...
> /* address=0x100008150000 pte=0xb16b8d0f80000000
> *old=0xf000000513f3e1e0 */
> if (!is_migration_entry(entry) ||
> migration_entry_to_page(entry) != old)
> goto unlock;
> ...
> }
>
> migration_entry_to_page() {
> pte=0xb16b8d0f80000000 entry=0x3e00000002c5ae34
> page=0xf0000000f3f3e1e0
> }
>
>
> Thanks
> Haren
>
>>
>> --
>> Mel Gorman
>> SUSE Labs
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-18 7:32 ` Haren Myneni
0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-18 7:32 UTC (permalink / raw)
To: Mel Gorman
Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar,
srikar
Mel,
I am hitting this issue with 4.0 kernel and even with 3.19 and
3.17 kernels. I will also try with previous versions. Please let me
know any suggestions on the debugging.
Thanks
Haren
On 5/14/15, Haren Myneni <hmyneni@gmail.com> wrote:
> On 5/14/15, Mel Gorman <mgorman@suse.de> wrote:
>> On Wed, May 13, 2015 at 01:17:54AM -0700, Haren Myneni wrote:
>>> Hi,
>>>
>>> I am getting BUG_ON in migration_entry_to_page() with 4.1.0-rc2
>>> kernel on powerpc system which has 512 CPUs (64 cores - 16 nodes) and
>>> 1.6 TB memory. We can easily recreate this issue with kernel compile
>>> (make -j500). But I could not reproduce with numa_balancing=disable.
>>>
>>
>> Is this patched in any way? I ask because line 134 on 4.1.0-rc2 does not
>> match up with a BUG_ON. It's close to a PageLocked check but I want to
>> be sure there are no other modifications.
>
> Mel, Thanks for your help. I added some printks and dump_page() to get
> the page struct and swp_entry information.
>
>>
>> Otherwise, when was the last time this worked? Was 4.0 ok? As it can be
>> easily reproduced, can the problem be bisected please?
>
> I did not try previous versions other than RHEL kernel (3.10.*). I
> will try with previous versions.
>
> In the failure case, also noticed pte and address values are matched
> in try_to_unmap_one() and remove_migration_pte(), but entry
> (swp_entry_t) value is different. So looks like page strut address in
> migration_entry_to_page() is not valid.
>
> try_to_unmap_one()
> {
>
> ...
> } else if (IS_ENABLED(CONFIG_MIGRATION)) {
> /*
> * Store the pfn of the page in a special migration
> * pte. do_swap_page() will wait until the
> migration
> * pte is removed and then restart fault handling.
> */
> BUG_ON(!(flags & TTU_MIGRATION));
> entry = make_migration_entry(page,
> pte_write(pteval));
> }
> swp_pte = swp_entry_to_pte(entry);
> if (pte_soft_dirty(pteval))
> swp_pte = pte_swp_mksoft_dirty(swp_pte);
> set_pte_at(mm, address, pte, swp_pte);
>
> /*pte=0xb16b8d0f80000000 address=0x100008150000
> page=0xf000000513f3e1e0 entry=0x3e0000000ec5ae34 */
> ...
> }
>
> remove_migration_pte()
> {
> ...
> /* address=0x100008150000 pte=0xb16b8d0f80000000
> *old=0xf000000513f3e1e0 */
> if (!is_migration_entry(entry) ||
> migration_entry_to_page(entry) != old)
> goto unlock;
> ...
> }
>
> migration_entry_to_page() {
> pte=0xb16b8d0f80000000 entry=0x3e00000002c5ae34
> page=0xf0000000f3f3e1e0
> }
>
>
> Thanks
> Haren
>
>>
>> --
>> Mel Gorman
>> SUSE Labs
>>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
2015-05-18 7:32 ` Haren Myneni
(?)
@ 2015-05-18 8:11 ` Mel Gorman
-1 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2015-05-18 8:11 UTC (permalink / raw)
To: Haren Myneni; +Cc: srikar, linux-kernel, linux-mm, aneesh.kumar, linuxppc-dev
On Mon, May 18, 2015 at 12:32:29AM -0700, Haren Myneni wrote:
> Mel,
> I am hitting this issue with 4.0 kernel and even with 3.19 and
> 3.17 kernels. I will also try with previous versions. Please let me
> know any suggestions on the debugging.
>
Please keep going further back in time to see if there was a point where
this was ever working. It could be a ppc64-specific bug but right now,
I'm still drawing a blank.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-18 8:11 ` Mel Gorman
0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2015-05-18 8:11 UTC (permalink / raw)
To: Haren Myneni
Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar,
srikar
On Mon, May 18, 2015 at 12:32:29AM -0700, Haren Myneni wrote:
> Mel,
> I am hitting this issue with 4.0 kernel and even with 3.19 and
> 3.17 kernels. I will also try with previous versions. Please let me
> know any suggestions on the debugging.
>
Please keep going further back in time to see if there was a point where
this was ever working. It could be a ppc64-specific bug but right now,
I'm still drawing a blank.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-18 8:11 ` Mel Gorman
0 siblings, 0 replies; 18+ messages in thread
From: Mel Gorman @ 2015-05-18 8:11 UTC (permalink / raw)
To: Haren Myneni
Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar,
srikar
On Mon, May 18, 2015 at 12:32:29AM -0700, Haren Myneni wrote:
> Mel,
> I am hitting this issue with 4.0 kernel and even with 3.19 and
> 3.17 kernels. I will also try with previous versions. Please let me
> know any suggestions on the debugging.
>
Please keep going further back in time to see if there was a point where
this was ever working. It could be a ppc64-specific bug but right now,
I'm still drawing a blank.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
2015-05-18 8:11 ` Mel Gorman
(?)
@ 2015-05-18 8:18 ` Haren Myneni
-1 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-18 8:18 UTC (permalink / raw)
To: Mel Gorman; +Cc: srikar, linux-kernel, linux-mm, aneesh.kumar, linuxppc-dev
On 5/18/15, Mel Gorman <mgorman@suse.de> wrote:
> On Mon, May 18, 2015 at 12:32:29AM -0700, Haren Myneni wrote:
>> Mel,
>> I am hitting this issue with 4.0 kernel and even with 3.19 and
>> 3.17 kernels. I will also try with previous versions. Please let me
>> know any suggestions on the debugging.
>>
>
> Please keep going further back in time to see if there was a point where
> this was ever working. It could be a ppc64-specific bug but right now,
> I'm still drawing a blank.
Sure, will do. I am running PPC64 LE kernel, but it does not show any
LE issue so far.
Thanks
Haren
>
> --
> Mel Gorman
> SUSE Labs
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-18 8:18 ` Haren Myneni
0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-18 8:18 UTC (permalink / raw)
To: Mel Gorman
Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar,
srikar
On 5/18/15, Mel Gorman <mgorman@suse.de> wrote:
> On Mon, May 18, 2015 at 12:32:29AM -0700, Haren Myneni wrote:
>> Mel,
>> I am hitting this issue with 4.0 kernel and even with 3.19 and
>> 3.17 kernels. I will also try with previous versions. Please let me
>> know any suggestions on the debugging.
>>
>
> Please keep going further back in time to see if there was a point where
> this was ever working. It could be a ppc64-specific bug but right now,
> I'm still drawing a blank.
Sure, will do. I am running PPC64 LE kernel, but it does not show any
LE issue so far.
Thanks
Haren
>
> --
> Mel Gorman
> SUSE Labs
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: mm: BUG_ON with NUMA_BALANCING (kernel BUG at include/linux/swapops.h:131!)
@ 2015-05-18 8:18 ` Haren Myneni
0 siblings, 0 replies; 18+ messages in thread
From: Haren Myneni @ 2015-05-18 8:18 UTC (permalink / raw)
To: Mel Gorman
Cc: linux-mm, linux-kernel, linuxppc-dev, Haren Myneni, aneesh.kumar,
srikar
On 5/18/15, Mel Gorman <mgorman@suse.de> wrote:
> On Mon, May 18, 2015 at 12:32:29AM -0700, Haren Myneni wrote:
>> Mel,
>> I am hitting this issue with 4.0 kernel and even with 3.19 and
>> 3.17 kernels. I will also try with previous versions. Please let me
>> know any suggestions on the debugging.
>>
>
> Please keep going further back in time to see if there was a point where
> this was ever working. It could be a ppc64-specific bug but right now,
> I'm still drawing a blank.
Sure, will do. I am running PPC64 LE kernel, but it does not show any
LE issue so far.
Thanks
Haren
>
> --
> Mel Gorman
> SUSE Labs
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread