* kernel warning percpu ref in obj_cgroup_release (was: Re: linux-next: Tree for Mar 29) [not found] <20210329205249.6b557510@canb.auug.org.au> @ 2021-03-30 11:32 ` Christian Borntraeger 2021-03-30 13:27 ` kernel warning percpu ref in obj_cgroup_release Christian Borntraeger 0 siblings, 1 reply; 9+ messages in thread From: Christian Borntraeger @ 2021-03-30 11:32 UTC (permalink / raw) To: Stephen Rothwell, Linux Next Mailing List, Yang Shi, Andrew Morton, Muchun Song Cc: Linux Kernel Mailing List, linux-s390 On 29.03.21 11:52, Stephen Rothwell wrote: > Hi all, > > News: there will be no linux-next release on Friday or the following > Monday. > > Changes since 20210326: > > The arm64 tree gained a conflict against Linus' tree. > > I applied a supplied patch for clang brakage in the kbuild tree. > > The net-next tree gained a conflict against the bpf tree. > > The drm tree gained a conflict against Linus' tree. > > The staging tree gained a conflict against the scmi tree and a semantic > conflict against the spi tree. > > The rust tree gained a conflict against the kbuild tree. > > The akpm-current tree lost its its build failure and gained a conflict > against the gpio-brgl tree. > > Non-merge commits (relative to Linus' tree): 7289 > 7213 files changed, 432170 insertions(+), 147471 deletions(-) > This next (328 is fine) triggers several bugs during our KVM CI run: [ 1506.494716] ------------[ cut here ]------------ [ 1506.494730] percpu ref (obj_cgroup_release) <= 0 (-1) after switching to atomic [ 1506.494766] WARNING: CPU: 6 PID: 0 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8 [ 1506.494774] Modules linked in: kvm vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct dm_service_time nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink zfcp scsi_transport_fc rpcrdma sunrpc dm_multipath rdma_ucm scsi_dh_rdac scsi_dh_emc rdma_cm scsi_dh_alua iw_cm ib_cm mlx5_ib ib_uverbs dm_mod ib_core s390_trng vfio_ccw vfio_mdev mdev vfio_iommu_type1 zcrypt_cex4 vfio eadm_sch sch_fq_codel configfs ip_tables x_tables ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 mlx5_core sha512_s390 sha256_s390 sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: vfio_ap] [ 1506.494832] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 5.12.0-20210330.rc4.git0.9d49ed9ca93b.300.fc33.s390x+next #1 [ 1506.494834] Hardware name: IBM 8561 T01 703 (LPAR) [ 1506.494836] Krnl PSW : 0704c00180000000 00000002d71dd21e (percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8) [ 1506.494840] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 1506.494842] Krnl GPRS: c0000000fffeffff 00000002f7256818 0000000000000043 00000000fffeffff [ 1506.494844] 00000000ffffffea 0000038000000001 0000000000000000 000003800000017c [ 1506.494846] 00000002d7924988 0000000227eb97a0 000003ff5413c7e0 7fffffffffffffff [ 1506.494848] 0000000080360000 00000002f726b570 00000002d71dd21a 00000380000bba28 [ 1506.494856] Krnl Code: 00000002d71dd20e: e3309fe8ff04 lg %r3,-24(%r9) 00000002d71dd214: c0e5001eb556 brasl %r14,00000002d75b3cc0 #00000002d71dd21a: af000000 mc 0,0 >00000002d71dd21e: a7f4ffcc brc 15,00000002d71dd1b6 00000002d71dd222: 0707 bcr 0,%r7 00000002d71dd224: 0707 bcr 0,%r7 00000002d71dd226: 0707 bcr 0,%r7 00000002d71dd228: eb6ff0480024 stmg %r6,%r15,72(%r15) [ 1506.494928] Call Trace: [ 1506.494933] [<00000002d71dd21e>] percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8 [ 1506.494940] ([<00000002d71dd21a>] percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8) [ 1506.494942] [<00000002d6b8a6c6>] rcu_do_batch+0x146/0x608 [ 1506.494946] [<00000002d6b8ec04>] rcu_core+0x124/0x1d0 [ 1506.494948] [<00000002d75d0222>] __do_softirq+0x13a/0x3c8 [ 1506.494952] [<00000002d6b05306>] irq_exit+0xce/0xf8 [ 1506.494955] [<00000002d75c1eb4>] do_ext_irq+0xdc/0x170 [ 1506.494957] [<00000002d75cdea4>] ext_int_handler+0xc4/0xf4 [ 1506.494959] [<0000000000000000>] 0x0 [ 1506.494963] [<00000002d75cd9c2>] default_idle_call+0x42/0x110 [ 1506.494965] [<00000002d6b411a0>] do_idle+0xd8/0x168 [ 1506.494968] [<00000002d6b413ee>] cpu_startup_entry+0x36/0x40 [ 1506.494971] [<00000002d6ac730a>] smp_start_secondary+0x82/0x88 [ 1506.494974] Last Breaking-Event-Address: [ 1506.494975] [<00000002d6b71898>] vprintk_emit+0xa8/0x110 [ 1506.494978] Kernel panic - not syncing: panic_on_warn set ... I will try to bisect this, but if anyone has an idea. CC some candidates. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel warning percpu ref in obj_cgroup_release 2021-03-30 11:32 ` kernel warning percpu ref in obj_cgroup_release (was: Re: linux-next: Tree for Mar 29) Christian Borntraeger @ 2021-03-30 13:27 ` Christian Borntraeger 2021-03-30 13:49 ` [External] " Muchun Song 0 siblings, 1 reply; 9+ messages in thread From: Christian Borntraeger @ 2021-03-30 13:27 UTC (permalink / raw) To: Stephen Rothwell, Linux Next Mailing List, Yang Shi, Andrew Morton, Muchun Song Cc: Linux Kernel Mailing List, linux-s390, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Vladimir Davydov, Xiongchun Duan So bisect shows this for belows warning: 636c3ef8229ecb4e7d045e86f36505d24a8f019a is the first bad commit commit 636c3ef8229ecb4e7d045e86f36505d24a8f019a Author: Muchun Song <songmuchun@bytedance.com> Date: Mon Mar 29 11:12:06 2021 +1100 mm: memcontrol: use obj_cgroup APIs to charge kmem pages Since Roman's series "The new cgroup slab memory controller" applied. All slab objects are charged via the new APIs of obj_cgroup. The new APIs introduce a struct obj_cgroup to charge slab objects. It prevents long-living objects from pinning the original memory cgroup in the memory. But there are still some corner objects (e.g. allocations larger than order-1 page on SLUB) which are not charged via the new APIs. Those objects (include the pages which are allocated from buddy allocator directly) are charged as kmem pages which still hold a reference to the memory cgroup. We want to reuse the obj_cgroup APIs to charge the kmem pages. If we do that, we should store an object cgroup pointer to page->memcg_data for the kmem pages. Finally, page->memcg_data will have 3 different meanings. 1) For the slab pages, page->memcg_data points to an object cgroups vector. 2) For the kmem pages (exclude the slab pages), page->memcg_data points to an object cgroup. 3) For the user pages (e.g. the LRU pages), page->memcg_data points to a memory cgroup. We do not change the behavior of page_memcg() and page_memcg_rcu(). They are also suitable for LRU pages and kmem pages. Why? Because memory allocations pinning memcgs for a long time - it exists at a larger scale and is causing recurring problems in the real world: page cache doesn't get reclaimed for a long time, or is used by the second, third, fourth, ... instance of the same job that was restarted into a new cgroup every time. Unreclaimable dying cgroups pile up, waste memory, and make page reclaim very inefficient. We can convert LRU pages and most other raw memcg pins to the objcg direction to fix this problem, and then the page->memcg will always point to an object cgroup pointer. At that time, LRU pages and kmem pages will be treated the same. The implementation of page_memcg() will remove the kmem page check. This patch aims to charge the kmem pages by using the new APIs of obj_cgroup. Finally, the page->memcg_data of the kmem page points to an object cgroup. We can use the __page_objcg() to get the object cgroup associated with a kmem page. Or we can use page_memcg() to get the memory cgroup associated with a kmem page, but caller must ensure that the returned memcg won't be released (e.g. acquire the rcu_read_lock or css_set_lock). Link: https://lkml.kernel.org/r/20210319163821.20704-6-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> include/linux/memcontrol.h | 116 +++++++++++++++++++++++++++++++++++---------- mm/memcontrol.c | 110 +++++++++++++++++++++--------------------- 2 files changed, 145 insertions(+), 81 deletions(-) On 30.03.21 13:32, Christian Borntraeger wrote: [...] > > This next (328 is fine) triggers several bugs during our KVM CI run: > > [ 1506.494716] ------------[ cut here ]------------ > [ 1506.494730] percpu ref (obj_cgroup_release) <= 0 (-1) after switching to atomic > [ 1506.494766] WARNING: CPU: 6 PID: 0 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8 > [ 1506.494774] Modules linked in: kvm vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct dm_service_time nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink zfcp scsi_transport_fc rpcrdma sunrpc dm_multipath rdma_ucm scsi_dh_rdac scsi_dh_emc rdma_cm scsi_dh_alua iw_cm ib_cm mlx5_ib ib_uverbs dm_mod ib_core s390_trng vfio_ccw vfio_mdev mdev vfio_iommu_type1 zcrypt_cex4 vfio eadm_sch sch_fq_codel configfs ip_tables x_tables ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 mlx5_core sha512_s390 sha256_s390 sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: vfio_ap] > [ 1506.494832] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 5.12.0-20210330.rc4.git0.9d49ed9ca93b.300.fc33.s390x+next #1 > [ 1506.494834] Hardware name: IBM 8561 T01 703 (LPAR) > [ 1506.494836] Krnl PSW : 0704c00180000000 00000002d71dd21e (percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8) > [ 1506.494840] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > [ 1506.494842] Krnl GPRS: c0000000fffeffff 00000002f7256818 0000000000000043 00000000fffeffff > [ 1506.494844] 00000000ffffffea 0000038000000001 0000000000000000 000003800000017c > [ 1506.494846] 00000002d7924988 0000000227eb97a0 000003ff5413c7e0 7fffffffffffffff > [ 1506.494848] 0000000080360000 00000002f726b570 00000002d71dd21a 00000380000bba28 > [ 1506.494856] Krnl Code: 00000002d71dd20e: e3309fe8ff04 lg %r3,-24(%r9) > 00000002d71dd214: c0e5001eb556 brasl %r14,00000002d75b3cc0 > #00000002d71dd21a: af000000 mc 0,0 > >00000002d71dd21e: a7f4ffcc brc 15,00000002d71dd1b6 > 00000002d71dd222: 0707 bcr 0,%r7 > 00000002d71dd224: 0707 bcr 0,%r7 > 00000002d71dd226: 0707 bcr 0,%r7 > 00000002d71dd228: eb6ff0480024 stmg %r6,%r15,72(%r15) > [ 1506.494928] Call Trace: > [ 1506.494933] [<00000002d71dd21e>] percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8 > [ 1506.494940] ([<00000002d71dd21a>] percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8) > [ 1506.494942] [<00000002d6b8a6c6>] rcu_do_batch+0x146/0x608 > [ 1506.494946] [<00000002d6b8ec04>] rcu_core+0x124/0x1d0 > [ 1506.494948] [<00000002d75d0222>] __do_softirq+0x13a/0x3c8 > [ 1506.494952] [<00000002d6b05306>] irq_exit+0xce/0xf8 > [ 1506.494955] [<00000002d75c1eb4>] do_ext_irq+0xdc/0x170 > [ 1506.494957] [<00000002d75cdea4>] ext_int_handler+0xc4/0xf4 > [ 1506.494959] [<0000000000000000>] 0x0 > [ 1506.494963] [<00000002d75cd9c2>] default_idle_call+0x42/0x110 > [ 1506.494965] [<00000002d6b411a0>] do_idle+0xd8/0x168 > [ 1506.494968] [<00000002d6b413ee>] cpu_startup_entry+0x36/0x40 > [ 1506.494971] [<00000002d6ac730a>] smp_start_secondary+0x82/0x88 > [ 1506.494974] Last Breaking-Event-Address: > [ 1506.494975] [<00000002d6b71898>] vprintk_emit+0xa8/0x110 > [ 1506.494978] Kernel panic - not syncing: panic_on_warn set ... > > > > I will try to bisect this, but if anyone has an idea. CC some candidates. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [External] Re: kernel warning percpu ref in obj_cgroup_release 2021-03-30 13:27 ` kernel warning percpu ref in obj_cgroup_release Christian Borntraeger @ 2021-03-30 13:49 ` Muchun Song 2021-03-30 15:10 ` Christian Borntraeger 0 siblings, 1 reply; 9+ messages in thread From: Muchun Song @ 2021-03-30 13:49 UTC (permalink / raw) To: Christian Borntraeger Cc: Stephen Rothwell, Linux Next Mailing List, Yang Shi, Andrew Morton, Linux Kernel Mailing List, linux-s390, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Vladimir Davydov, Xiongchun Duan On Tue, Mar 30, 2021 at 9:27 PM Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > So bisect shows this for belows warning: Thanks for your effort on this. Can you share your config? > > 636c3ef8229ecb4e7d045e86f36505d24a8f019a is the first bad commit > commit 636c3ef8229ecb4e7d045e86f36505d24a8f019a > Author: Muchun Song <songmuchun@bytedance.com> > Date: Mon Mar 29 11:12:06 2021 +1100 > > mm: memcontrol: use obj_cgroup APIs to charge kmem pages > > Since Roman's series "The new cgroup slab memory controller" applied. All > slab objects are charged via the new APIs of obj_cgroup. The new APIs > introduce a struct obj_cgroup to charge slab objects. It prevents > long-living objects from pinning the original memory cgroup in the memory. > But there are still some corner objects (e.g. allocations larger than > order-1 page on SLUB) which are not charged via the new APIs. Those > objects (include the pages which are allocated from buddy allocator > directly) are charged as kmem pages which still hold a reference to the > memory cgroup. > > We want to reuse the obj_cgroup APIs to charge the kmem pages. If we do > that, we should store an object cgroup pointer to page->memcg_data for the > kmem pages. > > Finally, page->memcg_data will have 3 different meanings. > > 1) For the slab pages, page->memcg_data points to an object cgroups > vector. > > 2) For the kmem pages (exclude the slab pages), page->memcg_data > points to an object cgroup. > > 3) For the user pages (e.g. the LRU pages), page->memcg_data points > to a memory cgroup. > > We do not change the behavior of page_memcg() and page_memcg_rcu(). They > are also suitable for LRU pages and kmem pages. Why? > > Because memory allocations pinning memcgs for a long time - it exists at a > larger scale and is causing recurring problems in the real world: page > cache doesn't get reclaimed for a long time, or is used by the second, > third, fourth, ... instance of the same job that was restarted into a new > cgroup every time. Unreclaimable dying cgroups pile up, waste memory, and > make page reclaim very inefficient. > > We can convert LRU pages and most other raw memcg pins to the objcg > direction to fix this problem, and then the page->memcg will always point > to an object cgroup pointer. At that time, LRU pages and kmem pages will > be treated the same. The implementation of page_memcg() will remove the > kmem page check. > > This patch aims to charge the kmem pages by using the new APIs of > obj_cgroup. Finally, the page->memcg_data of the kmem page points to an > object cgroup. We can use the __page_objcg() to get the object cgroup > associated with a kmem page. Or we can use page_memcg() to get the memory > cgroup associated with a kmem page, but caller must ensure that the > returned memcg won't be released (e.g. acquire the rcu_read_lock or > css_set_lock). > > Link: https://lkml.kernel.org/r/20210319163821.20704-6-songmuchun@bytedance.com > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Roman Gushchin <guro@fb.com> > Cc: Shakeel Butt <shakeelb@google.com> > Cc: Vladimir Davydov <vdavydov.dev@gmail.com> > Cc: Xiongchun Duan <duanxiongchun@bytedance.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> > > include/linux/memcontrol.h | 116 +++++++++++++++++++++++++++++++++++---------- > mm/memcontrol.c | 110 +++++++++++++++++++++--------------------- > 2 files changed, 145 insertions(+), 81 deletions(-) > > > > > > On 30.03.21 13:32, Christian Borntraeger wrote: > [...] > > > > This next (328 is fine) triggers several bugs during our KVM CI run: > > > > [ 1506.494716] ------------[ cut here ]------------ > > [ 1506.494730] percpu ref (obj_cgroup_release) <= 0 (-1) after switching to atomic > > [ 1506.494766] WARNING: CPU: 6 PID: 0 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8 > > [ 1506.494774] Modules linked in: kvm vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct dm_service_time nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink zfcp scsi_transport_fc rpcrdma sunrpc dm_multipath rdma_ucm scsi_dh_rdac scsi_dh_emc rdma_cm scsi_dh_alua iw_cm ib_cm mlx5_ib ib_uverbs dm_mod ib_core s390_trng vfio_ccw vfio_mdev mdev vfio_iommu_type1 zcrypt_cex4 vfio eadm_sch sch_fq_codel configfs ip_tables x_tables ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 mlx5_core sha512_s390 sha256_s390 sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: vfio_ap] > > [ 1506.494832] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 5.12.0-20210330.rc4.git0.9d49ed9ca93b.300.fc33.s390x+next #1 > > [ 1506.494834] Hardware name: IBM 8561 T01 703 (LPAR) > > [ 1506.494836] Krnl PSW : 0704c00180000000 00000002d71dd21e (percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8) > > [ 1506.494840] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > > [ 1506.494842] Krnl GPRS: c0000000fffeffff 00000002f7256818 0000000000000043 00000000fffeffff > > [ 1506.494844] 00000000ffffffea 0000038000000001 0000000000000000 000003800000017c > > [ 1506.494846] 00000002d7924988 0000000227eb97a0 000003ff5413c7e0 7fffffffffffffff > > [ 1506.494848] 0000000080360000 00000002f726b570 00000002d71dd21a 00000380000bba28 > > [ 1506.494856] Krnl Code: 00000002d71dd20e: e3309fe8ff04 lg %r3,-24(%r9) > > 00000002d71dd214: c0e5001eb556 brasl %r14,00000002d75b3cc0 > > #00000002d71dd21a: af000000 mc 0,0 > > >00000002d71dd21e: a7f4ffcc brc 15,00000002d71dd1b6 > > 00000002d71dd222: 0707 bcr 0,%r7 > > 00000002d71dd224: 0707 bcr 0,%r7 > > 00000002d71dd226: 0707 bcr 0,%r7 > > 00000002d71dd228: eb6ff0480024 stmg %r6,%r15,72(%r15) > > [ 1506.494928] Call Trace: > > [ 1506.494933] [<00000002d71dd21e>] percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8 > > [ 1506.494940] ([<00000002d71dd21a>] percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8) > > [ 1506.494942] [<00000002d6b8a6c6>] rcu_do_batch+0x146/0x608 > > [ 1506.494946] [<00000002d6b8ec04>] rcu_core+0x124/0x1d0 > > [ 1506.494948] [<00000002d75d0222>] __do_softirq+0x13a/0x3c8 > > [ 1506.494952] [<00000002d6b05306>] irq_exit+0xce/0xf8 > > [ 1506.494955] [<00000002d75c1eb4>] do_ext_irq+0xdc/0x170 > > [ 1506.494957] [<00000002d75cdea4>] ext_int_handler+0xc4/0xf4 > > [ 1506.494959] [<0000000000000000>] 0x0 > > [ 1506.494963] [<00000002d75cd9c2>] default_idle_call+0x42/0x110 > > [ 1506.494965] [<00000002d6b411a0>] do_idle+0xd8/0x168 > > [ 1506.494968] [<00000002d6b413ee>] cpu_startup_entry+0x36/0x40 > > [ 1506.494971] [<00000002d6ac730a>] smp_start_secondary+0x82/0x88 > > [ 1506.494974] Last Breaking-Event-Address: > > [ 1506.494975] [<00000002d6b71898>] vprintk_emit+0xa8/0x110 > > [ 1506.494978] Kernel panic - not syncing: panic_on_warn set ... > > > > > > > > I will try to bisect this, but if anyone has an idea. CC some candidates. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: kernel warning percpu ref in obj_cgroup_release 2021-03-30 13:49 ` [External] " Muchun Song @ 2021-03-30 15:10 ` Christian Borntraeger 2021-03-30 16:25 ` [External] " Muchun Song 0 siblings, 1 reply; 9+ messages in thread From: Christian Borntraeger @ 2021-03-30 15:10 UTC (permalink / raw) To: Muchun Song Cc: Stephen Rothwell, Linux Next Mailing List, Yang Shi, Andrew Morton, Linux Kernel Mailing List, linux-s390, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Vladimir Davydov, Xiongchun Duan [-- Attachment #1: Type: text/plain, Size: 8370 bytes --] On 30.03.21 15:49, Muchun Song wrote: > On Tue, Mar 30, 2021 at 9:27 PM Christian Borntraeger > <borntraeger@de.ibm.com> wrote: >> >> So bisect shows this for belows warning: > > Thanks for your effort on this. Can you share your config? attached (but its s390x) for next-20210330 The problem goes away when I add cgroup_controllers = [ ] to /etc/libvirt/qemu.conf The testcase that triggers the problem starts and stops multipe KVM guests with 248 CPUs. Do we happen to have maybe only a byte of refcount space? > >> >> 636c3ef8229ecb4e7d045e86f36505d24a8f019a is the first bad commit >> commit 636c3ef8229ecb4e7d045e86f36505d24a8f019a >> Author: Muchun Song <songmuchun@bytedance.com> >> Date: Mon Mar 29 11:12:06 2021 +1100 >> >> mm: memcontrol: use obj_cgroup APIs to charge kmem pages >> >> Since Roman's series "The new cgroup slab memory controller" applied. All >> slab objects are charged via the new APIs of obj_cgroup. The new APIs >> introduce a struct obj_cgroup to charge slab objects. It prevents >> long-living objects from pinning the original memory cgroup in the memory. >> But there are still some corner objects (e.g. allocations larger than >> order-1 page on SLUB) which are not charged via the new APIs. Those >> objects (include the pages which are allocated from buddy allocator >> directly) are charged as kmem pages which still hold a reference to the >> memory cgroup. >> >> We want to reuse the obj_cgroup APIs to charge the kmem pages. If we do >> that, we should store an object cgroup pointer to page->memcg_data for the >> kmem pages. >> >> Finally, page->memcg_data will have 3 different meanings. >> >> 1) For the slab pages, page->memcg_data points to an object cgroups >> vector. >> >> 2) For the kmem pages (exclude the slab pages), page->memcg_data >> points to an object cgroup. >> >> 3) For the user pages (e.g. the LRU pages), page->memcg_data points >> to a memory cgroup. >> >> We do not change the behavior of page_memcg() and page_memcg_rcu(). They >> are also suitable for LRU pages and kmem pages. Why? >> >> Because memory allocations pinning memcgs for a long time - it exists at a >> larger scale and is causing recurring problems in the real world: page >> cache doesn't get reclaimed for a long time, or is used by the second, >> third, fourth, ... instance of the same job that was restarted into a new >> cgroup every time. Unreclaimable dying cgroups pile up, waste memory, and >> make page reclaim very inefficient. >> >> We can convert LRU pages and most other raw memcg pins to the objcg >> direction to fix this problem, and then the page->memcg will always point >> to an object cgroup pointer. At that time, LRU pages and kmem pages will >> be treated the same. The implementation of page_memcg() will remove the >> kmem page check. >> >> This patch aims to charge the kmem pages by using the new APIs of >> obj_cgroup. Finally, the page->memcg_data of the kmem page points to an >> object cgroup. We can use the __page_objcg() to get the object cgroup >> associated with a kmem page. Or we can use page_memcg() to get the memory >> cgroup associated with a kmem page, but caller must ensure that the >> returned memcg won't be released (e.g. acquire the rcu_read_lock or >> css_set_lock). >> >> Link: https://lkml.kernel.org/r/20210319163821.20704-6-songmuchun@bytedance.com >> Signed-off-by: Muchun Song <songmuchun@bytedance.com> >> Acked-by: Johannes Weiner <hannes@cmpxchg.org> >> Cc: Michal Hocko <mhocko@kernel.org> >> Cc: Roman Gushchin <guro@fb.com> >> Cc: Shakeel Butt <shakeelb@google.com> >> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> >> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> >> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> >> >> include/linux/memcontrol.h | 116 +++++++++++++++++++++++++++++++++++---------- >> mm/memcontrol.c | 110 +++++++++++++++++++++--------------------- >> 2 files changed, 145 insertions(+), 81 deletions(-) >> >> >> >> >> >> On 30.03.21 13:32, Christian Borntraeger wrote: >> [...] >>> >>> This next (328 is fine) triggers several bugs during our KVM CI run: >>> >>> [ 1506.494716] ------------[ cut here ]------------ >>> [ 1506.494730] percpu ref (obj_cgroup_release) <= 0 (-1) after switching to atomic >>> [ 1506.494766] WARNING: CPU: 6 PID: 0 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8 >>> [ 1506.494774] Modules linked in: kvm vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct dm_service_time nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink zfcp scsi_transport_fc rpcrdma sunrpc dm_multipath rdma_ucm scsi_dh_rdac scsi_dh_emc rdma_cm scsi_dh_alua iw_cm ib_cm mlx5_ib ib_uverbs dm_mod ib_core s390_trng vfio_ccw vfio_mdev mdev vfio_iommu_type1 zcrypt_cex4 vfio eadm_sch sch_fq_codel configfs ip_tables x_tables ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 mlx5_core sha512_s390 sha256_s390 sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: vfio_ap] >>> [ 1506.494832] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 5.12.0-20210330.rc4.git0.9d49ed9ca93b.300.fc33.s390x+next #1 >>> [ 1506.494834] Hardware name: IBM 8561 T01 703 (LPAR) >>> [ 1506.494836] Krnl PSW : 0704c00180000000 00000002d71dd21e (percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8) >>> [ 1506.494840] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >>> [ 1506.494842] Krnl GPRS: c0000000fffeffff 00000002f7256818 0000000000000043 00000000fffeffff >>> [ 1506.494844] 00000000ffffffea 0000038000000001 0000000000000000 000003800000017c >>> [ 1506.494846] 00000002d7924988 0000000227eb97a0 000003ff5413c7e0 7fffffffffffffff >>> [ 1506.494848] 0000000080360000 00000002f726b570 00000002d71dd21a 00000380000bba28 >>> [ 1506.494856] Krnl Code: 00000002d71dd20e: e3309fe8ff04 lg %r3,-24(%r9) >>> 00000002d71dd214: c0e5001eb556 brasl %r14,00000002d75b3cc0 >>> #00000002d71dd21a: af000000 mc 0,0 >>> >00000002d71dd21e: a7f4ffcc brc 15,00000002d71dd1b6 >>> 00000002d71dd222: 0707 bcr 0,%r7 >>> 00000002d71dd224: 0707 bcr 0,%r7 >>> 00000002d71dd226: 0707 bcr 0,%r7 >>> 00000002d71dd228: eb6ff0480024 stmg %r6,%r15,72(%r15) >>> [ 1506.494928] Call Trace: >>> [ 1506.494933] [<00000002d71dd21e>] percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8 >>> [ 1506.494940] ([<00000002d71dd21a>] percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8) >>> [ 1506.494942] [<00000002d6b8a6c6>] rcu_do_batch+0x146/0x608 >>> [ 1506.494946] [<00000002d6b8ec04>] rcu_core+0x124/0x1d0 >>> [ 1506.494948] [<00000002d75d0222>] __do_softirq+0x13a/0x3c8 >>> [ 1506.494952] [<00000002d6b05306>] irq_exit+0xce/0xf8 >>> [ 1506.494955] [<00000002d75c1eb4>] do_ext_irq+0xdc/0x170 >>> [ 1506.494957] [<00000002d75cdea4>] ext_int_handler+0xc4/0xf4 >>> [ 1506.494959] [<0000000000000000>] 0x0 >>> [ 1506.494963] [<00000002d75cd9c2>] default_idle_call+0x42/0x110 >>> [ 1506.494965] [<00000002d6b411a0>] do_idle+0xd8/0x168 >>> [ 1506.494968] [<00000002d6b413ee>] cpu_startup_entry+0x36/0x40 >>> [ 1506.494971] [<00000002d6ac730a>] smp_start_secondary+0x82/0x88 >>> [ 1506.494974] Last Breaking-Event-Address: >>> [ 1506.494975] [<00000002d6b71898>] vprintk_emit+0xa8/0x110 >>> [ 1506.494978] Kernel panic - not syncing: panic_on_warn set ... >>> >>> >>> >>> I will try to bisect this, but if anyone has an idea. CC some candidates. >> [-- Attachment #2: config-bad.gz --] [-- Type: application/gzip, Size: 20017 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [External] RE: kernel warning percpu ref in obj_cgroup_release 2021-03-30 15:10 ` Christian Borntraeger @ 2021-03-30 16:25 ` Muchun Song 2021-03-31 6:22 ` Christian Borntraeger 0 siblings, 1 reply; 9+ messages in thread From: Muchun Song @ 2021-03-30 16:25 UTC (permalink / raw) To: Christian Borntraeger Cc: Stephen Rothwell, Linux Next Mailing List, Yang Shi, Andrew Morton, Linux Kernel Mailing List, linux-s390, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Vladimir Davydov, Xiongchun Duan On Tue, Mar 30, 2021 at 11:10 PM Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > On 30.03.21 15:49, Muchun Song wrote: > > On Tue, Mar 30, 2021 at 9:27 PM Christian Borntraeger > > <borntraeger@de.ibm.com> wrote: > >> > >> So bisect shows this for belows warning: > > > > Thanks for your effort on this. Can you share your config? > > attached (but its s390x) for next-20210330 Thanks. Can you apply the following patch and help me test? Very Thanks. diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 7fdc92e1983e..579408e4d46f 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -793,6 +793,12 @@ static inline void obj_cgroup_get(struct obj_cgroup *objcg) percpu_ref_get(&objcg->refcnt); } +static inline void obj_cgroup_get_many(struct obj_cgroup *objcg, + unsigned long nr) +{ + percpu_ref_get_many(&objcg->refcnt, nr); +} + static inline void obj_cgroup_put(struct obj_cgroup *objcg) { percpu_ref_put(&objcg->refcnt); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c0b83a396299..1634dba1044c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3133,7 +3133,10 @@ void split_page_memcg(struct page *head, unsigned int nr) for (i = 1; i < nr; i++) head[i].memcg_data = head->memcg_data; - css_get_many(&memcg->css, nr - 1); + if (PageMemcgKmem(head)) + obj_cgroup_get_many(__page_objcg(head), nr - 1); + else + css_get_many(&memcg->css, nr - 1); } #ifdef CONFIG_MEMCG_SWAP > > The problem goes away when I add > cgroup_controllers = [ ] > to /etc/libvirt/qemu.conf > > The testcase that triggers the problem starts and stops multipe KVM guests with 248 CPUs. > Do we happen to have maybe only a byte of refcount space? > > > > > >> > >> 636c3ef8229ecb4e7d045e86f36505d24a8f019a is the first bad commit > >> commit 636c3ef8229ecb4e7d045e86f36505d24a8f019a > >> Author: Muchun Song <songmuchun@bytedance.com> > >> Date: Mon Mar 29 11:12:06 2021 +1100 > >> > >> mm: memcontrol: use obj_cgroup APIs to charge kmem pages > >> > >> Since Roman's series "The new cgroup slab memory controller" applied. All > >> slab objects are charged via the new APIs of obj_cgroup. The new APIs > >> introduce a struct obj_cgroup to charge slab objects. It prevents > >> long-living objects from pinning the original memory cgroup in the memory. > >> But there are still some corner objects (e.g. allocations larger than > >> order-1 page on SLUB) which are not charged via the new APIs. Those > >> objects (include the pages which are allocated from buddy allocator > >> directly) are charged as kmem pages which still hold a reference to the > >> memory cgroup. > >> > >> We want to reuse the obj_cgroup APIs to charge the kmem pages. If we do > >> that, we should store an object cgroup pointer to page->memcg_data for the > >> kmem pages. > >> > >> Finally, page->memcg_data will have 3 different meanings. > >> > >> 1) For the slab pages, page->memcg_data points to an object cgroups > >> vector. > >> > >> 2) For the kmem pages (exclude the slab pages), page->memcg_data > >> points to an object cgroup. > >> > >> 3) For the user pages (e.g. the LRU pages), page->memcg_data points > >> to a memory cgroup. > >> > >> We do not change the behavior of page_memcg() and page_memcg_rcu(). They > >> are also suitable for LRU pages and kmem pages. Why? > >> > >> Because memory allocations pinning memcgs for a long time - it exists at a > >> larger scale and is causing recurring problems in the real world: page > >> cache doesn't get reclaimed for a long time, or is used by the second, > >> third, fourth, ... instance of the same job that was restarted into a new > >> cgroup every time. Unreclaimable dying cgroups pile up, waste memory, and > >> make page reclaim very inefficient. > >> > >> We can convert LRU pages and most other raw memcg pins to the objcg > >> direction to fix this problem, and then the page->memcg will always point > >> to an object cgroup pointer. At that time, LRU pages and kmem pages will > >> be treated the same. The implementation of page_memcg() will remove the > >> kmem page check. > >> > >> This patch aims to charge the kmem pages by using the new APIs of > >> obj_cgroup. Finally, the page->memcg_data of the kmem page points to an > >> object cgroup. We can use the __page_objcg() to get the object cgroup > >> associated with a kmem page. Or we can use page_memcg() to get the memory > >> cgroup associated with a kmem page, but caller must ensure that the > >> returned memcg won't be released (e.g. acquire the rcu_read_lock or > >> css_set_lock). > >> > >> Link: https://lkml.kernel.org/r/20210319163821.20704-6-songmuchun@bytedance.com > >> Signed-off-by: Muchun Song <songmuchun@bytedance.com> > >> Acked-by: Johannes Weiner <hannes@cmpxchg.org> > >> Cc: Michal Hocko <mhocko@kernel.org> > >> Cc: Roman Gushchin <guro@fb.com> > >> Cc: Shakeel Butt <shakeelb@google.com> > >> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> > >> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> > >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > >> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> > >> > >> include/linux/memcontrol.h | 116 +++++++++++++++++++++++++++++++++++---------- > >> mm/memcontrol.c | 110 +++++++++++++++++++++--------------------- > >> 2 files changed, 145 insertions(+), 81 deletions(-) > >> > >> > >> > >> > >> > >> On 30.03.21 13:32, Christian Borntraeger wrote: > >> [...] > >>> > >>> This next (328 is fine) triggers several bugs during our KVM CI run: > >>> > >>> [ 1506.494716] ------------[ cut here ]------------ > >>> [ 1506.494730] percpu ref (obj_cgroup_release) <= 0 (-1) after switching to atomic > >>> [ 1506.494766] WARNING: CPU: 6 PID: 0 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8 > >>> [ 1506.494774] Modules linked in: kvm vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp nft_counter bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct dm_service_time nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink zfcp scsi_transport_fc rpcrdma sunrpc dm_multipath rdma_ucm scsi_dh_rdac scsi_dh_emc rdma_cm scsi_dh_alua iw_cm ib_cm mlx5_ib ib_uverbs dm_mod ib_core s390_trng vfio_ccw vfio_mdev mdev vfio_iommu_type1 zcrypt_cex4 vfio eadm_sch sch_fq_codel configfs ip_tables x_tables ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 mlx5_core sha512_s390 sha256_s390 sha1_s390 sha_common nvme nvme_core pkey zcrypt rng_core autofs4 [last unloaded: vfio_ap] > >>> [ 1506.494832] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 5.12.0-20210330.rc4.git0.9d49ed9ca93b.300.fc33.s390x+next #1 > >>> [ 1506.494834] Hardware name: IBM 8561 T01 703 (LPAR) > >>> [ 1506.494836] Krnl PSW : 0704c00180000000 00000002d71dd21e (percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8) > >>> [ 1506.494840] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > >>> [ 1506.494842] Krnl GPRS: c0000000fffeffff 00000002f7256818 0000000000000043 00000000fffeffff > >>> [ 1506.494844] 00000000ffffffea 0000038000000001 0000000000000000 000003800000017c > >>> [ 1506.494846] 00000002d7924988 0000000227eb97a0 000003ff5413c7e0 7fffffffffffffff > >>> [ 1506.494848] 0000000080360000 00000002f726b570 00000002d71dd21a 00000380000bba28 > >>> [ 1506.494856] Krnl Code: 00000002d71dd20e: e3309fe8ff04 lg %r3,-24(%r9) > >>> 00000002d71dd214: c0e5001eb556 brasl %r14,00000002d75b3cc0 > >>> #00000002d71dd21a: af000000 mc 0,0 > >>> >00000002d71dd21e: a7f4ffcc brc 15,00000002d71dd1b6 > >>> 00000002d71dd222: 0707 bcr 0,%r7 > >>> 00000002d71dd224: 0707 bcr 0,%r7 > >>> 00000002d71dd226: 0707 bcr 0,%r7 > >>> 00000002d71dd228: eb6ff0480024 stmg %r6,%r15,72(%r15) > >>> [ 1506.494928] Call Trace: > >>> [ 1506.494933] [<00000002d71dd21e>] percpu_ref_switch_to_atomic_rcu+0x1ee/0x1f8 > >>> [ 1506.494940] ([<00000002d71dd21a>] percpu_ref_switch_to_atomic_rcu+0x1ea/0x1f8) > >>> [ 1506.494942] [<00000002d6b8a6c6>] rcu_do_batch+0x146/0x608 > >>> [ 1506.494946] [<00000002d6b8ec04>] rcu_core+0x124/0x1d0 > >>> [ 1506.494948] [<00000002d75d0222>] __do_softirq+0x13a/0x3c8 > >>> [ 1506.494952] [<00000002d6b05306>] irq_exit+0xce/0xf8 > >>> [ 1506.494955] [<00000002d75c1eb4>] do_ext_irq+0xdc/0x170 > >>> [ 1506.494957] [<00000002d75cdea4>] ext_int_handler+0xc4/0xf4 > >>> [ 1506.494959] [<0000000000000000>] 0x0 > >>> [ 1506.494963] [<00000002d75cd9c2>] default_idle_call+0x42/0x110 > >>> [ 1506.494965] [<00000002d6b411a0>] do_idle+0xd8/0x168 > >>> [ 1506.494968] [<00000002d6b413ee>] cpu_startup_entry+0x36/0x40 > >>> [ 1506.494971] [<00000002d6ac730a>] smp_start_secondary+0x82/0x88 > >>> [ 1506.494974] Last Breaking-Event-Address: > >>> [ 1506.494975] [<00000002d6b71898>] vprintk_emit+0xa8/0x110 > >>> [ 1506.494978] Kernel panic - not syncing: panic_on_warn set ... > >>> > >>> > >>> > >>> I will try to bisect this, but if anyone has an idea. CC some candidates. > >> ^ permalink raw reply related [flat|nested] 9+ messages in thread
* RE: kernel warning percpu ref in obj_cgroup_release 2021-03-30 16:25 ` [External] " Muchun Song @ 2021-03-31 6:22 ` Christian Borntraeger 2021-03-31 11:42 ` [External] " Muchun Song 2021-03-31 14:45 ` Muchun Song 0 siblings, 2 replies; 9+ messages in thread From: Christian Borntraeger @ 2021-03-31 6:22 UTC (permalink / raw) To: Muchun Song Cc: Stephen Rothwell, Linux Next Mailing List, Yang Shi, Andrew Morton, Linux Kernel Mailing List, linux-s390, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Vladimir Davydov, Xiongchun Duan On 30.03.21 18:25, Muchun Song wrote: > On Tue, Mar 30, 2021 at 11:10 PM Christian Borntraeger > <borntraeger@de.ibm.com> wrote: >> >> >> On 30.03.21 15:49, Muchun Song wrote: >>> On Tue, Mar 30, 2021 at 9:27 PM Christian Borntraeger >>> <borntraeger@de.ibm.com> wrote: >>>> >>>> So bisect shows this for belows warning: >>> >>> Thanks for your effort on this. Can you share your config? >> >> attached (but its s390x) for next-20210330 > > Thanks. Can you apply the following patch and help me test? > Very Thanks. > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 7fdc92e1983e..579408e4d46f 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -793,6 +793,12 @@ static inline void obj_cgroup_get(struct obj_cgroup *objcg) > percpu_ref_get(&objcg->refcnt); > } > > +static inline void obj_cgroup_get_many(struct obj_cgroup *objcg, > + unsigned long nr) > +{ > + percpu_ref_get_many(&objcg->refcnt, nr); > +} > + > static inline void obj_cgroup_put(struct obj_cgroup *objcg) > { > percpu_ref_put(&objcg->refcnt); > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c0b83a396299..1634dba1044c 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3133,7 +3133,10 @@ void split_page_memcg(struct page *head, unsigned int nr) > > for (i = 1; i < nr; i++) > head[i].memcg_data = head->memcg_data; > - css_get_many(&memcg->css, nr - 1); > + if (PageMemcgKmem(head)) > + obj_cgroup_get_many(__page_objcg(head), nr - 1); > + else > + css_get_many(&memcg->css, nr - 1); > } > > #ifdef CONFIG_MEMCG_SWAP > This one seems to do the trick, I can no longer see the warning. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [External] RE: kernel warning percpu ref in obj_cgroup_release 2021-03-31 6:22 ` Christian Borntraeger @ 2021-03-31 11:42 ` Muchun Song 2021-03-31 14:45 ` Muchun Song 1 sibling, 0 replies; 9+ messages in thread From: Muchun Song @ 2021-03-31 11:42 UTC (permalink / raw) To: Christian Borntraeger Cc: Stephen Rothwell, Linux Next Mailing List, Yang Shi, Andrew Morton, Linux Kernel Mailing List, linux-s390, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Vladimir Davydov, Xiongchun Duan On Wed, Mar 31, 2021 at 2:22 PM Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > > On 30.03.21 18:25, Muchun Song wrote: > > On Tue, Mar 30, 2021 at 11:10 PM Christian Borntraeger > > <borntraeger@de.ibm.com> wrote: > >> > >> > >> On 30.03.21 15:49, Muchun Song wrote: > >>> On Tue, Mar 30, 2021 at 9:27 PM Christian Borntraeger > >>> <borntraeger@de.ibm.com> wrote: > >>>> > >>>> So bisect shows this for belows warning: > >>> > >>> Thanks for your effort on this. Can you share your config? > >> > >> attached (but its s390x) for next-20210330 > > > > Thanks. Can you apply the following patch and help me test? > > Very Thanks. > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 7fdc92e1983e..579408e4d46f 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -793,6 +793,12 @@ static inline void obj_cgroup_get(struct obj_cgroup *objcg) > > percpu_ref_get(&objcg->refcnt); > > } > > > > +static inline void obj_cgroup_get_many(struct obj_cgroup *objcg, > > + unsigned long nr) > > +{ > > + percpu_ref_get_many(&objcg->refcnt, nr); > > +} > > + > > static inline void obj_cgroup_put(struct obj_cgroup *objcg) > > { > > percpu_ref_put(&objcg->refcnt); > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c0b83a396299..1634dba1044c 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -3133,7 +3133,10 @@ void split_page_memcg(struct page *head, unsigned int nr) > > > > for (i = 1; i < nr; i++) > > head[i].memcg_data = head->memcg_data; > > - css_get_many(&memcg->css, nr - 1); > > + if (PageMemcgKmem(head)) > > + obj_cgroup_get_many(__page_objcg(head), nr - 1); > > + else > > + css_get_many(&memcg->css, nr - 1); > > } > > > > #ifdef CONFIG_MEMCG_SWAP > > > > This one seems to do the trick, I can no longer see the warning. Thanks for your testing. I will send a fix patch. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [External] RE: kernel warning percpu ref in obj_cgroup_release 2021-03-31 6:22 ` Christian Borntraeger 2021-03-31 11:42 ` [External] " Muchun Song @ 2021-03-31 14:45 ` Muchun Song 2021-04-01 0:25 ` Andrew Morton 1 sibling, 1 reply; 9+ messages in thread From: Muchun Song @ 2021-03-31 14:45 UTC (permalink / raw) To: Andrew Morton Cc: Stephen Rothwell, Linux Next Mailing List, Yang Shi, Christian Borntraeger, Linux Kernel Mailing List, linux-s390, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Vladimir Davydov, Xiongchun Duan On Wed, Mar 31, 2021 at 2:22 PM Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > > On 30.03.21 18:25, Muchun Song wrote: > > On Tue, Mar 30, 2021 at 11:10 PM Christian Borntraeger > > <borntraeger@de.ibm.com> wrote: > >> > >> > >> On 30.03.21 15:49, Muchun Song wrote: > >>> On Tue, Mar 30, 2021 at 9:27 PM Christian Borntraeger > >>> <borntraeger@de.ibm.com> wrote: > >>>> > >>>> So bisect shows this for belows warning: > >>> > >>> Thanks for your effort on this. Can you share your config? > >> > >> attached (but its s390x) for next-20210330 > > > > Thanks. Can you apply the following patch and help me test? > > Very Thanks. > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 7fdc92e1983e..579408e4d46f 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -793,6 +793,12 @@ static inline void obj_cgroup_get(struct obj_cgroup *objcg) > > percpu_ref_get(&objcg->refcnt); > > } > > > > +static inline void obj_cgroup_get_many(struct obj_cgroup *objcg, > > + unsigned long nr) > > +{ > > + percpu_ref_get_many(&objcg->refcnt, nr); > > +} > > + > > static inline void obj_cgroup_put(struct obj_cgroup *objcg) > > { > > percpu_ref_put(&objcg->refcnt); > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c0b83a396299..1634dba1044c 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -3133,7 +3133,10 @@ void split_page_memcg(struct page *head, unsigned int nr) > > > > for (i = 1; i < nr; i++) > > head[i].memcg_data = head->memcg_data; > > - css_get_many(&memcg->css, nr - 1); > > + if (PageMemcgKmem(head)) > > + obj_cgroup_get_many(__page_objcg(head), nr - 1); > > + else > > + css_get_many(&memcg->css, nr - 1); > > } > > > > #ifdef CONFIG_MEMCG_SWAP Hi Andrew, Now we have two choices to fix this issue. 1) Send a v6 patchset (Use obj_cgroup APIs to charge kmem pages) to fix this issue. 2) Send a separate fix patch (Just like above). Both ways are ok for me. But I want to know which one is more convenient for you. Thanks. > > > > This one seems to do the trick, I can no longer see the warning. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [External] RE: kernel warning percpu ref in obj_cgroup_release 2021-03-31 14:45 ` Muchun Song @ 2021-04-01 0:25 ` Andrew Morton 0 siblings, 0 replies; 9+ messages in thread From: Andrew Morton @ 2021-04-01 0:25 UTC (permalink / raw) To: Muchun Song Cc: Stephen Rothwell, Linux Next Mailing List, Yang Shi, Christian Borntraeger, Linux Kernel Mailing List, linux-s390, Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Vladimir Davydov, Xiongchun Duan On Wed, 31 Mar 2021 22:45:12 +0800 Muchun Song <songmuchun@bytedance.com> wrote: > > Hi Andrew, > > Now we have two choices to fix this issue. > > 1) Send a v6 patchset (Use obj_cgroup APIs to charge kmem pages) > to fix this issue. > 2) Send a separate fix patch (Just like above). > > Both ways are ok for me. But I want to know which one is more > convenient for you. Either is OK. 2) is easier. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-04-01 0:26 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20210329205249.6b557510@canb.auug.org.au>
2021-03-30 11:32 ` kernel warning percpu ref in obj_cgroup_release (was: Re: linux-next: Tree for Mar 29) Christian Borntraeger
2021-03-30 13:27 ` kernel warning percpu ref in obj_cgroup_release Christian Borntraeger
2021-03-30 13:49 ` [External] " Muchun Song
2021-03-30 15:10 ` Christian Borntraeger
2021-03-30 16:25 ` [External] " Muchun Song
2021-03-31 6:22 ` Christian Borntraeger
2021-03-31 11:42 ` [External] " Muchun Song
2021-03-31 14:45 ` Muchun Song
2021-04-01 0:25 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox