* Linux 3.3+ and memory cgroup kernel panics
@ 2012-12-22 2:44 David Strauss
[not found] ` <CAKz8sYXcY9kP=QPVAWdP4a-6Nuq-04yDJNuvBojDTfKbvj=x9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: David Strauss @ 2012-12-22 2:44 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA
The kernel seemed to replace the cgroups memory "charging" mechanism
in 3.3 with a more efficient implementation [1], but we think it may
be broken under Xen virtualization and load. We do not see any issue
in Linux 3.2 and earlier.
We have documented panics for Fedora kernels 3.3.4-5.fc17.x86_64,
3.3.5-2.fc16.x86_64, and 3.6.10-2.fc16.x86_64 but *not* on Fedora
kernels 3.1.0-7.fc16.x86_64 or 3.2.6-3.fc16.x86_64.
Many of our services use MemoryLimit= and similar systemd options that
create a memory cgroup for the service. This correlates with kernel
panics under the following call path (full listing here [2]):
[20488075.457394] [<ffffffff811825e7>] ? mem_cgroup_charge_statistics+0x17/0x60
[20488075.457403] [<ffffffff81184ade>] __mem_cgroup_uncharge_common+0xfe/0x330
[20488075.457410] [<ffffffff8100632d>] ? xen_pte_val+0x1d/0x40
[20488075.457417] [<ffffffff81188457>] mem_cgroup_uncharge_page+0x37/0x40
[20488075.457424] [<ffffffff8115e6d1>] page_remove_rmap+0xb1/0x140
It culminates in this failure:
[20488075.457183] kernel BUG at arch/x86/mm/fault.c:396!
[20488075.457189] invalid opcode: 0000 [#1] SMP
There are also reports of similar failures [3] unrelated to systemd
use and on non-Fedora kernels.
It appears to be an issue with re-attributing the charge for a page to
a different cgroup. Any ideas why we would be seeing this with Linux
3.3+? I can generally reproduce the issue (often minutes after
booting) on any heavily loaded machine in order to collect any
additional data to help troubleshooting.
[1] https://lwn.net/Articles/443241/
[2] https://gist.github.com/raw/70afc901a73e427a0a71
[3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073238/comments/6
--
David Strauss
| david-WnlvKBBViykE1dmCBd9WyQ@public.gmane.org
| +1 512 577 5827 [mobile]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Linux 3.3+ and memory cgroup kernel panics
2012-12-22 2:44 Linux 3.3+ and memory cgroup kernel panics David Strauss
@ 2012-12-27 14:53 ` Michal Hocko
0 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2012-12-27 14:53 UTC (permalink / raw)
To: David Strauss
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
[Adding linux-mm to CC]
On Fri 21-12-12 18:44:23, David Strauss wrote:
> The kernel seemed to replace the cgroups memory "charging" mechanism
> in 3.3 with a more efficient implementation [1], but we think it may
> be broken under Xen virtualization and load.
What are the steps to reproduce this?
> We do not see any issue in Linux 3.2 and earlier.
>
> We have documented panics for Fedora kernels 3.3.4-5.fc17.x86_64,
> 3.3.5-2.fc16.x86_64, and 3.6.10-2.fc16.x86_64 but *not* on Fedora
> kernels 3.1.0-7.fc16.x86_64 or 3.2.6-3.fc16.x86_64.
Are you able to reproduce with the vanilla kernel as well? Ideally with
the current Linus tree?
> Many of our services use MemoryLimit= and similar systemd options that
> create a memory cgroup for the service. This correlates with kernel
> panics under the following call path (full listing here [2]):
>
> [20488075.457394] [<ffffffff811825e7>] ? mem_cgroup_charge_statistics+0x17/0x60
> [20488075.457403] [<ffffffff81184ade>] __mem_cgroup_uncharge_common+0xfe/0x330
> [20488075.457410] [<ffffffff8100632d>] ? xen_pte_val+0x1d/0x40
> [20488075.457417] [<ffffffff81188457>] mem_cgroup_uncharge_page+0x37/0x40
> [20488075.457424] [<ffffffff8115e6d1>] page_remove_rmap+0xb1/0x140
>
> It culminates in this failure:
>
> [20488075.457183] kernel BUG at arch/x86/mm/fault.c:396!
> [20488075.457189] invalid opcode: 0000 [#1] SMP
>
> There are also reports of similar failures [3] unrelated to systemd
> use and on non-Fedora kernels.
>
> It appears to be an issue with re-attributing the charge for a page to
> a different cgroup. Any ideas why we would be seeing this with Linux
> 3.3+? I can generally reproduce the issue (often minutes after
> booting) on any heavily loaded machine in order to collect any
> additional data to help troubleshooting.
>
> [1] https://lwn.net/Articles/443241/
> [2] https://gist.github.com/raw/70afc901a73e427a0a71
> [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073238/comments/6
>
> --
> David Strauss
> | david-WnlvKBBViykE1dmCBd9WyQ@public.gmane.org
> | +1 512 577 5827 [mobile]
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Linux 3.3+ and memory cgroup kernel panics
@ 2012-12-27 14:53 ` Michal Hocko
0 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2012-12-27 14:53 UTC (permalink / raw)
To: David Strauss; +Cc: cgroups, linux-mm
[Adding linux-mm to CC]
On Fri 21-12-12 18:44:23, David Strauss wrote:
> The kernel seemed to replace the cgroups memory "charging" mechanism
> in 3.3 with a more efficient implementation [1], but we think it may
> be broken under Xen virtualization and load.
What are the steps to reproduce this?
> We do not see any issue in Linux 3.2 and earlier.
>
> We have documented panics for Fedora kernels 3.3.4-5.fc17.x86_64,
> 3.3.5-2.fc16.x86_64, and 3.6.10-2.fc16.x86_64 but *not* on Fedora
> kernels 3.1.0-7.fc16.x86_64 or 3.2.6-3.fc16.x86_64.
Are you able to reproduce with the vanilla kernel as well? Ideally with
the current Linus tree?
> Many of our services use MemoryLimit= and similar systemd options that
> create a memory cgroup for the service. This correlates with kernel
> panics under the following call path (full listing here [2]):
>
> [20488075.457394] [<ffffffff811825e7>] ? mem_cgroup_charge_statistics+0x17/0x60
> [20488075.457403] [<ffffffff81184ade>] __mem_cgroup_uncharge_common+0xfe/0x330
> [20488075.457410] [<ffffffff8100632d>] ? xen_pte_val+0x1d/0x40
> [20488075.457417] [<ffffffff81188457>] mem_cgroup_uncharge_page+0x37/0x40
> [20488075.457424] [<ffffffff8115e6d1>] page_remove_rmap+0xb1/0x140
>
> It culminates in this failure:
>
> [20488075.457183] kernel BUG at arch/x86/mm/fault.c:396!
> [20488075.457189] invalid opcode: 0000 [#1] SMP
>
> There are also reports of similar failures [3] unrelated to systemd
> use and on non-Fedora kernels.
>
> It appears to be an issue with re-attributing the charge for a page to
> a different cgroup. Any ideas why we would be seeing this with Linux
> 3.3+? I can generally reproduce the issue (often minutes after
> booting) on any heavily loaded machine in order to collect any
> additional data to help troubleshooting.
>
> [1] https://lwn.net/Articles/443241/
> [2] https://gist.github.com/raw/70afc901a73e427a0a71
> [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073238/comments/6
>
> --
> David Strauss
> | david@davidstrauss.net
> | +1 512 577 5827 [mobile]
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Linux 3.3+ and memory cgroup kernel panics
[not found] ` <CAKz8sYXcY9kP=QPVAWdP4a-6Nuq-04yDJNuvBojDTfKbvj=x9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-27 14:53 ` Michal Hocko
@ 2012-12-28 1:50 ` Kamezawa Hiroyuki
1 sibling, 0 replies; 4+ messages in thread
From: Kamezawa Hiroyuki @ 2012-12-28 1:50 UTC (permalink / raw)
To: David Strauss; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA
(2012/12/22 11:44), David Strauss wrote:
> The kernel seemed to replace the cgroups memory "charging" mechanism
> in 3.3 with a more efficient implementation [1], but we think it may
> be broken under Xen virtualization and load. We do not see any issue
> in Linux 3.2 and earlier.
>
> We have documented panics for Fedora kernels 3.3.4-5.fc17.x86_64,
> 3.3.5-2.fc16.x86_64, and 3.6.10-2.fc16.x86_64 but *not* on Fedora
> kernels 3.1.0-7.fc16.x86_64 or 3.2.6-3.fc16.x86_64.
>
> Many of our services use MemoryLimit= and similar systemd options that
> create a memory cgroup for the service. This correlates with kernel
> panics under the following call path (full listing here [2]):
>
> [20488075.457394] [<ffffffff811825e7>] ? mem_cgroup_charge_statistics+0x17/0x60
> [20488075.457403] [<ffffffff81184ade>] __mem_cgroup_uncharge_common+0xfe/0x330
> [20488075.457410] [<ffffffff8100632d>] ? xen_pte_val+0x1d/0x40
> [20488075.457417] [<ffffffff81188457>] mem_cgroup_uncharge_page+0x37/0x40
> [20488075.457424] [<ffffffff8115e6d1>] page_remove_rmap+0xb1/0x140
>
> It culminates in this failure:
>
> [20488075.457183] kernel BUG at arch/x86/mm/fault.c:396!
> [20488075.457189] invalid opcode: 0000 [#1] SMP
>
> There are also reports of similar failures [3] unrelated to systemd
> use and on non-Fedora kernels.
>
> It appears to be an issue with re-attributing the charge for a page to
> a different cgroup. Any ideas why we would be seeing this with Linux
> 3.3+? I can generally reproduce the issue (often minutes after
> booting) on any heavily loaded machine in order to collect any
> additional data to help troubleshooting.
>
> [1] https://lwn.net/Articles/443241/
> [2] https://gist.github.com/raw/70afc901a73e427a0a71
> [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073238/comments/6
>
This is 1st time for me to see this kind of backlog...EIP is finally BUG_ON()
in vmalloc_fault(). The fault address was in VMALLOC range.
VMALLOC_START < address < VMALLOC_END.
Maybe it's percpu area by memcg->stat which is backed by vmalloc area.
Hmm....no troubles on native host ?
Thanks,
-Kame
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-12-28 1:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-22 2:44 Linux 3.3+ and memory cgroup kernel panics David Strauss
[not found] ` <CAKz8sYXcY9kP=QPVAWdP4a-6Nuq-04yDJNuvBojDTfKbvj=x9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-27 14:53 ` Michal Hocko
2012-12-27 14:53 ` Michal Hocko
2012-12-28 1:50 ` Kamezawa Hiroyuki
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.