* ARM64: kernel oops in 4.4-rc4+
@ 2015-12-08 6:30 Ming Lei
2015-12-08 10:30 ` Will Deacon
0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2015-12-08 6:30 UTC (permalink / raw)
To: linux-arm-kernel
Hi,
The attached kernel oops can be triggered immediately after
running the following command on APM Mustang:
$stress-ng --all 8 -t 10m
[1] kernel oops log
stress-ng: info: [5220] 5 failures reached, aborting stress process
[ 265.782659] kernel BUG at ./arch/arm64/include/asm/pgtable.h:282!
[ 265.788726] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 265.794186] Modules linked in:
[ 265.797241] CPU: 1 PID: 15830 Comm: stress-ng-cache Tainted: G
W 4.4.0-rc4+ #54
[ 265.805638] Hardware name: AppliedMicro Mustang/Mustang, BIOS 2.0.0
Oct 23 2015
[ 265.812911] task: fffffe02c4795180 ti: fffffe02c4870000 task.ti:
fffffe02c4870000
[ 265.820364] PC is at ptep_set_access_flags+0xdc/0x104
[ 265.825395] LR is at handle_mm_fault+0xfe0/0x11e8
[ 265.830077] pc : [<fffffe00001e39c0>] lr : [<fffffe00001d62b4>]
pstate: 20000145
[ 265.837436] sp : fffffe02c4873c90
[ 265.840733] x29: fffffe02c4873c90 x28: 00680047e20a0bd3
[ 265.846042] x27: fffffe02c07cf780 x26: 000000000000ffe0
[ 265.851352] x25: fffffe02c48e0000 x24: fffffe02c48c0000
[ 265.856662] x23: 0000000000000054 x22: fffffe07dedc5400
[ 265.861971] x21: fffffe02c12bab48 x20: 000003ff9f670000
[ 265.867282] x19: 0000000000000001 x18: 0000000000000000
[ 265.872593] x17: 0000000000474688 x16: 000003ff9f914258
[ 265.877902] x15: 0012df030706601b x14: 000003ff9f862600
[ 265.883211] x13: 00000003e8000000 x12: 0000000000000018
[ 265.888519] x11: 0000000000000007 x10: fffffe02c4873b00
[ 265.893828] x9 : 0000000000000000 x8 : fffffe02c4873c20
[ 265.899138] x7 : fffffe00001d55ac x6 : 0000000000000000
[ 265.904448] x5 : 0008000000000000 x4 : 00680047e20a0bd3
[ 265.909759] x3 : 00680047e20a0fd3 x2 : fffffe02c48efb38
[ 265.915070] x1 : 0008000000000000 x0 : 0008000000000080
[ 265.920381]
[ 265.921868] Process stress-ng-cache (pid: 15830, stack limit =
0xfffffe02c4870020)
[ 265.929401] Stack: (0xfffffe02c4873c90 to 0xfffffe02c4874000)
[ 265.935120] 3c80:
fffffe02c4873cd0 fffffe00001d62b4
[ 265.942912] 3ca0: fffffe02c12bab48 000003ff9f670000
000000000000fb38 fffffe00001d5350
[ 265.950703] 3cc0: fffffe02c12bab48 fffffe00001d55ac
fffffe02c4873db0 fffffe00000a2e90
[ 265.958495] 3ce0: fffffe02c4873ed0 fffffe02c4795180
fffffe07dedc5400 000000009200000b
[ 265.966288] 3d00: 000003ff9f6723b6 0000000000000007
0000000000000054 fffffe02c12bab48
[ 265.974080] 3d20: fffffe07dedc5498 0000000000020000
fffffe02c4873da0 fffffe0000110af4
[ 265.981872] 3d40: fffffe02c48efb38 fffffe02c4795180
fffffe02c4873d80 fffffe00000f0394
[ 265.989664] 3d60: fffffe0000aaca00 fffffe02c48cffe0
0000000000000000 000000009200000b
[ 265.997456] 3d80: fffffe02c4873db0 fffffe00000a2ed4
fffffe02c4873db0 fffffe00000a2d6c
[ 266.005248] 3da0: fffffe02c4873ed0 fffffe02c4795180
fffffe02c4873e20 fffffe00000902c8
[ 266.013040] 3dc0: 000000009200000b fffffe0000cd60f0
000003ff9f6723b6 fffffe02c4873ed0
[ 266.020832] 3de0: 0000000020000000 0000000000000024
000000009200000b 000003ff9f6723b6
[ 266.028624] 3e00: 0000000000000001 fffffe02c4870000
0000000000000001 fffffe00000944d4
[ 266.036416] 3e20: 000003ffc104a160 fffffe000009391c
0000000000000000 000003ff9f79d7a0
[ 266.044209] 3e40: ffffffffffffffff 00000000004128f0
ffffffffffffffff 0000000000412900
[ 266.052001] 3e60: 0000000020000000 fffffe0000097cb0
fffffe02c4873e90 fffffe00000944d4
[ 266.059793] 3e80: fffffe02c4796180 fffffe02c4796300
fffffe02c4873eb0 fffffe0000097cb0
[ 266.067585] 3ea0: 0000000000000008 000003ff9f79d7a0
000003ffc104a160 fffffe0000093ba8
[ 266.075377] 3ec0: 0000000000000000 fffffe0000093a98
00000000000e23b6 0000000000000001
[ 266.083169] 3ee0: 000000000000010d 00000000001d9e3c
0000000000000008 00000000000003e8
[ 266.090960] 3f00: 000003ff9fb20000 0000000000475ae0
000000000011dc49 0000000000008001
[ 266.098752] 3f20: 0000000000007504 0000000000000007
0000000000000018 00000003e8000000
[ 266.106544] 3f40: 000003ff9f862600 0012df030706601b
000003ff9f914258 0000000000474688
[ 266.114336] 3f60: 0000000000000000 000003ff9f590000
000003ff9f79d7a0 0000000000009069
[ 266.122128] 3f80: 0000000000004650 0000000000475ae0
00000000220d97ab 00000000001fffff
[ 266.129919] 3fa0: 000003ffc104a1e8 0000000000000001
0000000000475698 000003ffc104a160
[ 266.137712] 3fc0: 0000000000412850 000003ffc104a160
00000000004128f0 0000000020000000
[ 266.145504] 3fe0: 0000000000000000 ffffffffffffffff
6b6b6b6b6b6b6b6b a56b6b6b6b6b6b6b
[ 266.153295] Call trace:
[ 266.155732] [<fffffe00001e39c0>] ptep_set_access_flags+0xdc/0x104
[ 266.161799] [<fffffe00001d62b4>] handle_mm_fault+0xfe0/0x11e8
[ 266.167519] [<fffffe00000a2e90>] do_page_fault+0x200/0x324
[ 266.172979] [<fffffe00000902c8>] do_mem_abort+0x40/0xa0
[ 266.178170] Exception stack(0xfffffe02c4873e30 to 0xfffffe02c4873f50)
[ 266.184578] 3e20:
0000000000000000 000003ff9f79d7a0
[ 266.192370] 3e40: ffffffffffffffff 00000000004128f0
ffffffffffffffff 0000000000412900
[ 266.200162] 3e60: 0000000020000000 fffffe0000097cb0
fffffe02c4873e90 fffffe00000944d4
[ 266.207954] 3e80: fffffe02c4796180 fffffe02c4796300
fffffe02c4873eb0 fffffe0000097cb0
[ 266.215746] 3ea0: 0000000000000008 000003ff9f79d7a0
000003ffc104a160 fffffe0000093ba8
[ 266.223538] 3ec0: 0000000000000000 fffffe0000093a98
00000000000e23b6 0000000000000001
[ 266.231329] 3ee0: 000000000000010d 00000000001d9e3c
0000000000000008 00000000000003e8
[ 266.239121] 3f00: 000003ff9fb20000 0000000000475ae0
000000000011dc49 0000000000008001
[ 266.246913] 3f20: 0000000000007504 0000000000000007
0000000000000018 00000003e8000000
[ 266.254705] 3f40: 000003ff9f862600 0012df030706601b
[ 266.259561] [<fffffe000009391c>] el0_da+0x18/0x1c
[ 266.264244] Code: 8a000060 d2e00101 eb01001f 54fffb60 (d4210000)
[ 266.270370] ---[ end trace 9db51a647dfce800 ]---
^ permalink raw reply [flat|nested] 7+ messages in thread
* ARM64: kernel oops in 4.4-rc4+
2015-12-08 6:30 ARM64: kernel oops in 4.4-rc4+ Ming Lei
@ 2015-12-08 10:30 ` Will Deacon
2015-12-08 10:51 ` Will Deacon
2015-12-08 13:08 ` Ming Lei
0 siblings, 2 replies; 7+ messages in thread
From: Will Deacon @ 2015-12-08 10:30 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Dec 08, 2015 at 02:30:33PM +0800, Ming Lei wrote:
> Hi,
[adding Catalin to cc]
> The attached kernel oops can be triggered immediately after
> running the following command on APM Mustang:
>
> $stress-ng --all 8 -t 10m
>
> [1] kernel oops log
> stress-ng: info: [5220] 5 failures reached, aborting stress process
> [ 265.782659] kernel BUG at ./arch/arm64/include/asm/pgtable.h:282!
Yikes, this means we're replacing a writable pte with a clean pte, so
there's a potential race w/ hardware DBM.
Could you dump pte and *ptep please?
> [ 265.788726] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [ 265.794186] Modules linked in:
> [ 265.797241] CPU: 1 PID: 15830 Comm: stress-ng-cache Tainted: G
> W 4.4.0-rc4+ #54
What changes do you have on top of -rc4?
Will
^ permalink raw reply [flat|nested] 7+ messages in thread
* ARM64: kernel oops in 4.4-rc4+
2015-12-08 10:30 ` Will Deacon
@ 2015-12-08 10:51 ` Will Deacon
2015-12-08 12:06 ` Catalin Marinas
2015-12-08 13:08 ` Ming Lei
1 sibling, 1 reply; 7+ messages in thread
From: Will Deacon @ 2015-12-08 10:51 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Dec 08, 2015 at 10:30:13AM +0000, Will Deacon wrote:
> On Tue, Dec 08, 2015 at 02:30:33PM +0800, Ming Lei wrote:
> > The attached kernel oops can be triggered immediately after
> > running the following command on APM Mustang:
> >
> > $stress-ng --all 8 -t 10m
> >
> > [1] kernel oops log
> > stress-ng: info: [5220] 5 failures reached, aborting stress process
> > [ 265.782659] kernel BUG at ./arch/arm64/include/asm/pgtable.h:282!
>
> Yikes, this means we're replacing a writable pte with a clean pte, so
> there's a potential race w/ hardware DBM.
>
> Could you dump pte and *ptep please?
I tried running this on my Juno and pretty quickly saw the OOM killer
coming in. Perhaps, in your case, pte is a swap entry and its confusing
the checks (so pte_dirty/pte_young are looking at random bits of the
file offset)?
Anyway, I'll wait for you to dump those values.
Will
^ permalink raw reply [flat|nested] 7+ messages in thread
* ARM64: kernel oops in 4.4-rc4+
2015-12-08 10:51 ` Will Deacon
@ 2015-12-08 12:06 ` Catalin Marinas
0 siblings, 0 replies; 7+ messages in thread
From: Catalin Marinas @ 2015-12-08 12:06 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Dec 08, 2015 at 10:51:52AM +0000, Will Deacon wrote:
> On Tue, Dec 08, 2015 at 10:30:13AM +0000, Will Deacon wrote:
> > On Tue, Dec 08, 2015 at 02:30:33PM +0800, Ming Lei wrote:
> > > The attached kernel oops can be triggered immediately after
> > > running the following command on APM Mustang:
> > >
> > > $stress-ng --all 8 -t 10m
> > >
> > > [1] kernel oops log
> > > stress-ng: info: [5220] 5 failures reached, aborting stress process
> > > [ 265.782659] kernel BUG at ./arch/arm64/include/asm/pgtable.h:282!
> >
> > Yikes, this means we're replacing a writable pte with a clean pte, so
> > there's a potential race w/ hardware DBM.
> >
> > Could you dump pte and *ptep please?
>
> I tried running this on my Juno and pretty quickly saw the OOM killer
> coming in. Perhaps, in your case, pte is a swap entry and its confusing
> the checks (so pte_dirty/pte_young are looking at random bits of the
> file offset)?
It could indeed be that the new pte is swap or file and the check misses
that. The easiest is to move the check inside the if (pte_valid_user(pte))
block:
--------------8<------------------------
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7e074f93f383..12d89ee5ab7f 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -269,17 +269,17 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_val(pte) &= ~PTE_RDONLY;
else
pte_val(pte) |= PTE_RDONLY;
- }
- /*
- * If the existing pte is valid, check for potential race with
- * hardware updates of the pte (ptep_set_access_flags safely changes
- * valid ptes without going through an invalid entry).
- */
- if (IS_ENABLED(CONFIG_DEBUG_VM) && IS_ENABLED(CONFIG_ARM64_HW_AFDBM) &&
- pte_valid(*ptep)) {
- BUG_ON(!pte_young(pte));
- BUG_ON(pte_write(*ptep) && !pte_dirty(pte));
+ /*
+ * If the existing pte is valid, check for potential race with
+ * hardware updates of the pte (ptep_set_access_flags safely
+ * changes valid ptes without going through an invalid entry).
+ */
+ if (IS_ENABLED(CONFIG_DEBUG_VM) && IS_ENABLED(CONFIG_ARM64_HW_AFDBM) &&
+ pte_valid(*ptep)) {
+ BUG_ON(!pte_young(pte));
+ BUG_ON(pte_write(*ptep) && !pte_dirty(pte));
+ }
}
set_pte(ptep, pte);
--------------8<------------------------
--
Catalin
^ permalink raw reply related [flat|nested] 7+ messages in thread
* ARM64: kernel oops in 4.4-rc4+
2015-12-08 10:30 ` Will Deacon
2015-12-08 10:51 ` Will Deacon
@ 2015-12-08 13:08 ` Ming Lei
2015-12-08 13:49 ` Will Deacon
1 sibling, 1 reply; 7+ messages in thread
From: Ming Lei @ 2015-12-08 13:08 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Dec 8, 2015 at 6:30 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Dec 08, 2015 at 02:30:33PM +0800, Ming Lei wrote:
>> Hi,
>
> [adding Catalin to cc]
>
>> The attached kernel oops can be triggered immediately after
>> running the following command on APM Mustang:
>>
>> $stress-ng --all 8 -t 10m
>>
>> [1] kernel oops log
>> stress-ng: info: [5220] 5 failures reached, aborting stress process
>> [ 265.782659] kernel BUG at ./arch/arm64/include/asm/pgtable.h:282!
>
> Yikes, this means we're replacing a writable pte with a clean pte, so
> there's a potential race w/ hardware DBM.
>
> Could you dump pte and *ptep please?
They are dumped as so:
set_pte_at: addr 470000, ptep fffffe00bc870238, *ptep 680047348a0bd3,
pte 680047348a0fd3
with the following change:
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7e074f9..1c5f0ee 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -278,6 +278,12 @@ static inline void set_pte_at(struct mm_struct *mm, unsigne
*/
if (IS_ENABLED(CONFIG_DEBUG_VM) && IS_ENABLED(CONFIG_ARM64_HW_AFDBM) &&
pte_valid(*ptep)) {
+
+ if (!pte_young(pte) || (pte_write(*ptep) && !pte_dirty(pte)))
+ trace_printk("%s: addr %lx, ptep %p, *ptep %llx, pte %ll
+ __func__, addr, ptep,
+ (unsigned long long)pte_val(*ptep),
+ (unsigned long long )pte_val(pte));
BUG_ON(!pte_young(pte));
BUG_ON(pte_write(*ptep) && !pte_dirty(pte));
>
>> [ 265.788726] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>> [ 265.794186] Modules linked in:
>> [ 265.797241] CPU: 1 PID: 15830 Comm: stress-ng-cache Tainted: G
>> W 4.4.0-rc4+ #54
>
> What changes do you have on top of -rc4?
Nothing else, and it is just built from the latest linus tree.
Thanks,
Ming Lei
^ permalink raw reply related [flat|nested] 7+ messages in thread
* ARM64: kernel oops in 4.4-rc4+
2015-12-08 13:08 ` Ming Lei
@ 2015-12-08 13:49 ` Will Deacon
2015-12-08 16:08 ` Catalin Marinas
0 siblings, 1 reply; 7+ messages in thread
From: Will Deacon @ 2015-12-08 13:49 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Dec 08, 2015 at 09:08:32PM +0800, Ming Lei wrote:
> On Tue, Dec 8, 2015 at 6:30 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Tue, Dec 08, 2015 at 02:30:33PM +0800, Ming Lei wrote:
> >> The attached kernel oops can be triggered immediately after
> >> running the following command on APM Mustang:
> >>
> >> $stress-ng --all 8 -t 10m
> >>
> >> [1] kernel oops log
> >> stress-ng: info: [5220] 5 failures reached, aborting stress process
> >> [ 265.782659] kernel BUG at ./arch/arm64/include/asm/pgtable.h:282!
> >
> > Yikes, this means we're replacing a writable pte with a clean pte, so
> > there's a potential race w/ hardware DBM.
> >
> > Could you dump pte and *ptep please?
>
> They are dumped as so:
>
> set_pte_at: addr 470000, ptep fffffe00bc870238, *ptep 680047348a0bd3,
> pte 680047348a0fd3
Thanks for dumping these.
It looks like we're trying to set the access flag in the pte, so its
got nothing to do with swp entries (although they may well be broken
anyway with these BUG_ONs). With H/W DBM enabled, we shouldn't be doing
software management of the access flag, so the BUG_ON looks like a red
herring in this case.
I'm not sure on the best fix for this, though. We can either make the
BUG_ON dependent on the hardware supporting DBM or we could override
ptep_set_access_flags to avoid the debug check.
Will
^ permalink raw reply [flat|nested] 7+ messages in thread
* ARM64: kernel oops in 4.4-rc4+
2015-12-08 13:49 ` Will Deacon
@ 2015-12-08 16:08 ` Catalin Marinas
0 siblings, 0 replies; 7+ messages in thread
From: Catalin Marinas @ 2015-12-08 16:08 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Dec 08, 2015 at 01:49:52PM +0000, Will Deacon wrote:
> On Tue, Dec 08, 2015 at 09:08:32PM +0800, Ming Lei wrote:
> > On Tue, Dec 8, 2015 at 6:30 PM, Will Deacon <will.deacon@arm.com> wrote:
> > > On Tue, Dec 08, 2015 at 02:30:33PM +0800, Ming Lei wrote:
> > >> The attached kernel oops can be triggered immediately after
> > >> running the following command on APM Mustang:
> > >>
> > >> $stress-ng --all 8 -t 10m
> > >>
> > >> [1] kernel oops log
> > >> stress-ng: info: [5220] 5 failures reached, aborting stress process
> > >> [ 265.782659] kernel BUG at ./arch/arm64/include/asm/pgtable.h:282!
> > >
> > > Yikes, this means we're replacing a writable pte with a clean pte, so
> > > there's a potential race w/ hardware DBM.
> > >
> > > Could you dump pte and *ptep please?
> >
> > They are dumped as so:
> >
> > set_pte_at: addr 470000, ptep fffffe00bc870238, *ptep 680047348a0bd3,
> > pte 680047348a0fd3
>
> Thanks for dumping these.
>
> It looks like we're trying to set the access flag in the pte, so its
> got nothing to do with swp entries (although they may well be broken
> anyway with these BUG_ONs). With H/W DBM enabled, we shouldn't be doing
> software management of the access flag, so the BUG_ON looks like a red
> herring in this case.
Setting the access flag is fine, clearing it is a potential problem (the
first BUG_ON). If the line numbers match mainline, 282 means the second
BUG_ON in set_pte_at(). The ptep_set_access_flags() is meant to update
the access or dirty flags without corrupting the state. I think the
scenario is:
1. *ptep is writable and clean and old (PTE_WRITE && PTE_RDONLY && !PTE_AF)
2. ptep_set_access_flags() tries to set PTE_AF while keeping the rest
unchanged
3. BUG_ON triggers because a writable PTE is overridden with a writeable
&& clean PTE, potentially overriding a hardware-updated dirty entry
Point 3 could easily cause us problems if you mix DBM and non-DBM agents
in the same system.
Another scenario when all agents can do DBM is for two CPUs to take a
read fault on a non-present PTE. The first CPU that takes the mm
semaphore updates the PTE from non-present to valid && writable && clean
&& young. The second CPU comes in and finds the PTE present in
handle_pte_fault() and goes on to do a ptep_set_access_flags(). At this
point we have a race with any other agent updating the dirty status.
I think the BUG_ON check is still useful, we just need at atomic
implementation of ptep_set_access_flags(). I'll post something and take
it from there (I need to think some more about how this would interact
with the set_pte_at() checks for the dirty bit).
BTW, I don't think the check for pte_valid(pte) is needed in
set_pte_at() as we should never go from valid pte to file without
break-before-make. With proper b-b-m, pte_valid(*ptep) is false, so no
BUG_ON checks.
--
Catalin
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-12-08 16:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-08 6:30 ARM64: kernel oops in 4.4-rc4+ Ming Lei
2015-12-08 10:30 ` Will Deacon
2015-12-08 10:51 ` Will Deacon
2015-12-08 12:06 ` Catalin Marinas
2015-12-08 13:08 ` Ming Lei
2015-12-08 13:49 ` Will Deacon
2015-12-08 16:08 ` Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).