Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
       [not found] <CAMuHMdXC=zEjbZADE5wELjOq_kBiFNewpdUrMCe8d3Utu98h8A@mail.gmail.com>
@ 2016-06-14  6:24 ` Joonsoo Kim
  2016-06-14  7:31   ` Geert Uytterhoeven
  2016-06-14 13:10 ` Geert Uytterhoeven
  1 sibling, 1 reply; 33+ messages in thread
From: Joonsoo Kim @ 2016-06-14  6:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> Hi Joonsoo,

Hello,

> 
> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > To check whther free objects exist or not precisely, we need to grab a
> > lock.  But, accuracy isn't that important because race window would be
> > even small and if there is too much free object, cache reaper would reap
> > it.  So, this patch makes the check for free object exisistence not to
> > hold a lock.  This will reduce lock contention in heavily allocation case.
> >
> > Note that until now, n->shared can be freed during the processing by
> > writing slabinfo, but, with some trick in this patch, we can access it
> > freely within interrupt disabled period.
> >
> > Below is the result of concurrent allocation/free in slab allocation
> > benchmark made by Christoph a long time ago.  I make the output simpler.
> > The number shows cycle count during alloc/free respectively so less is
> > better.
> >
> > * Before
> > Kmalloc N*alloc N*free(32): Average=248/966
> > Kmalloc N*alloc N*free(64): Average=261/949
> > Kmalloc N*alloc N*free(128): Average=314/1016
> > Kmalloc N*alloc N*free(256): Average=741/1061
> > Kmalloc N*alloc N*free(512): Average=1246/1152
> > Kmalloc N*alloc N*free(1024): Average=2437/1259
> > Kmalloc N*alloc N*free(2048): Average=4980/1800
> > Kmalloc N*alloc N*free(4096): Average=9000/2078
> >
> > * After
> > Kmalloc N*alloc N*free(32): Average=344/792
> > Kmalloc N*alloc N*free(64): Average=347/882
> > Kmalloc N*alloc N*free(128): Average=390/959
> > Kmalloc N*alloc N*free(256): Average=393/1067
> > Kmalloc N*alloc N*free(512): Average=683/1229
> > Kmalloc N*alloc N*free(1024): Average=1295/1325
> > Kmalloc N*alloc N*free(2048): Average=2513/1664
> > Kmalloc N*alloc N*free(4096): Average=4742/2172
> >
> > It shows that allocation performance decreases for the object size up to
> > 128 and it may be due to extra checks in cache_alloc_refill().  But, with
> > considering improvement of free performance, net result looks the same.
> > Result for other size class looks very promising, roughly, 50% performance
> > improvement.
> >
> > v2: replace kick_all_cpus_sync() with synchronize_sched().
> >
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
> (Renesas dual Cortex A9) to this patch, which is upstream commit
> 801faf0db8947e01877920e848a4d338dd7a99e7.
> 
> I've attached my .config. I don't know if it also happens with
> shmobile_defconfig, as something went wrong with my remote access to the board,
> preventing further testing. I also couldn't verify if the issue persists in
> v4.7-rc3.
> 
> Do you have a clue?

I don't have yet. Could you help me to narrow down the problem?
Following diff is half-revert change to check that synchronize_sched()
has no problem.

Thanks.

----->8-----
diff --git a/mm/slab.c b/mm/slab.c
index 763096a..257a0eb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3016,9 +3016,6 @@ static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags)
        n = get_node(cachep, node);
 
        BUG_ON(ac->avail > 0 || !n);
-       shared = READ_ONCE(n->shared);
-       if (!n->free_objects && (!shared || !shared->avail))
-               goto direct_grow;
 
        spin_lock(&n->list_lock);
        shared = READ_ONCE(n->shared);
@@ -3047,7 +3044,6 @@ alloc_done:
        spin_unlock(&n->list_lock);
        fixup_objfreelist_debug(cachep, &list);
 
-direct_grow:
        if (unlikely(!ac->avail)) {
                /* Check if we can use obj in pfmemalloc slab */
                if (sk_memalloc_socks()) {

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-14  6:24 ` Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache) Joonsoo Kim
@ 2016-06-14  7:31   ` Geert Uytterhoeven
  2016-06-14  8:11     ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Geert Uytterhoeven @ 2016-06-14  7:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Joonsoo,

On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
>> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
>> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> > To check whther free objects exist or not precisely, we need to grab a
>> > lock.  But, accuracy isn't that important because race window would be
>> > even small and if there is too much free object, cache reaper would reap
>> > it.  So, this patch makes the check for free object exisistence not to
>> > hold a lock.  This will reduce lock contention in heavily allocation case.
>> >
>> > Note that until now, n->shared can be freed during the processing by
>> > writing slabinfo, but, with some trick in this patch, we can access it
>> > freely within interrupt disabled period.
>> >
>> > Below is the result of concurrent allocation/free in slab allocation
>> > benchmark made by Christoph a long time ago.  I make the output simpler.
>> > The number shows cycle count during alloc/free respectively so less is
>> > better.
>> >
>> > * Before
>> > Kmalloc N*alloc N*free(32): Average=248/966
>> > Kmalloc N*alloc N*free(64): Average=261/949
>> > Kmalloc N*alloc N*free(128): Average=314/1016
>> > Kmalloc N*alloc N*free(256): Average=741/1061
>> > Kmalloc N*alloc N*free(512): Average=1246/1152
>> > Kmalloc N*alloc N*free(1024): Average=2437/1259
>> > Kmalloc N*alloc N*free(2048): Average=4980/1800
>> > Kmalloc N*alloc N*free(4096): Average=9000/2078
>> >
>> > * After
>> > Kmalloc N*alloc N*free(32): Average=344/792
>> > Kmalloc N*alloc N*free(64): Average=347/882
>> > Kmalloc N*alloc N*free(128): Average=390/959
>> > Kmalloc N*alloc N*free(256): Average=393/1067
>> > Kmalloc N*alloc N*free(512): Average=683/1229
>> > Kmalloc N*alloc N*free(1024): Average=1295/1325
>> > Kmalloc N*alloc N*free(2048): Average=2513/1664
>> > Kmalloc N*alloc N*free(4096): Average=4742/2172
>> >
>> > It shows that allocation performance decreases for the object size up to
>> > 128 and it may be due to extra checks in cache_alloc_refill().  But, with
>> > considering improvement of free performance, net result looks the same.
>> > Result for other size class looks very promising, roughly, 50% performance
>> > improvement.
>> >
>> > v2: replace kick_all_cpus_sync() with synchronize_sched().
>> >
>> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>
>> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
>> (Renesas dual Cortex A9) to this patch, which is upstream commit
>> 801faf0db8947e01877920e848a4d338dd7a99e7.
>>
>> I've attached my .config. I don't know if it also happens with
>> shmobile_defconfig, as something went wrong with my remote access to the board,
>> preventing further testing. I also couldn't verify if the issue persists in
>> v4.7-rc3.

In the mean time, I've verified it also happens with shmobile_defconfig.

>>
>> Do you have a clue?
>
> I don't have yet. Could you help me to narrow down the problem?
> Following diff is half-revert change to check that synchronize_sched()
> has no problem.

Thanks!

Unfortunately the half revert is not sufficient. The full revert is.

> ----->8-----
> diff --git a/mm/slab.c b/mm/slab.c
> index 763096a..257a0eb 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3016,9 +3016,6 @@ static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags)
>         n = get_node(cachep, node);
>
>         BUG_ON(ac->avail > 0 || !n);
> -       shared = READ_ONCE(n->shared);
> -       if (!n->free_objects && (!shared || !shared->avail))
> -               goto direct_grow;
>
>         spin_lock(&n->list_lock);
>         shared = READ_ONCE(n->shared);
> @@ -3047,7 +3044,6 @@ alloc_done:
>         spin_unlock(&n->list_lock);
>         fixup_objfreelist_debug(cachep, &list);
>
> -direct_grow:
>         if (unlikely(!ac->avail)) {
>                 /* Check if we can use obj in pfmemalloc slab */
>                 if (sk_memalloc_socks()) {

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-14  7:31   ` Geert Uytterhoeven
@ 2016-06-14  8:11     ` Joonsoo Kim
  2016-06-14 10:45       ` Geert Uytterhoeven
  0 siblings, 1 reply; 33+ messages in thread
From: Joonsoo Kim @ 2016-06-14  8:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
> Hi Joonsoo,
> 
> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
> >> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> > To check whther free objects exist or not precisely, we need to grab a
> >> > lock.  But, accuracy isn't that important because race window would be
> >> > even small and if there is too much free object, cache reaper would reap
> >> > it.  So, this patch makes the check for free object exisistence not to
> >> > hold a lock.  This will reduce lock contention in heavily allocation case.
> >> >
> >> > Note that until now, n->shared can be freed during the processing by
> >> > writing slabinfo, but, with some trick in this patch, we can access it
> >> > freely within interrupt disabled period.
> >> >
> >> > Below is the result of concurrent allocation/free in slab allocation
> >> > benchmark made by Christoph a long time ago.  I make the output simpler.
> >> > The number shows cycle count during alloc/free respectively so less is
> >> > better.
> >> >
> >> > * Before
> >> > Kmalloc N*alloc N*free(32): Average=248/966
> >> > Kmalloc N*alloc N*free(64): Average=261/949
> >> > Kmalloc N*alloc N*free(128): Average=314/1016
> >> > Kmalloc N*alloc N*free(256): Average=741/1061
> >> > Kmalloc N*alloc N*free(512): Average=1246/1152
> >> > Kmalloc N*alloc N*free(1024): Average=2437/1259
> >> > Kmalloc N*alloc N*free(2048): Average=4980/1800
> >> > Kmalloc N*alloc N*free(4096): Average=9000/2078
> >> >
> >> > * After
> >> > Kmalloc N*alloc N*free(32): Average=344/792
> >> > Kmalloc N*alloc N*free(64): Average=347/882
> >> > Kmalloc N*alloc N*free(128): Average=390/959
> >> > Kmalloc N*alloc N*free(256): Average=393/1067
> >> > Kmalloc N*alloc N*free(512): Average=683/1229
> >> > Kmalloc N*alloc N*free(1024): Average=1295/1325
> >> > Kmalloc N*alloc N*free(2048): Average=2513/1664
> >> > Kmalloc N*alloc N*free(4096): Average=4742/2172
> >> >
> >> > It shows that allocation performance decreases for the object size up to
> >> > 128 and it may be due to extra checks in cache_alloc_refill().  But, with
> >> > considering improvement of free performance, net result looks the same.
> >> > Result for other size class looks very promising, roughly, 50% performance
> >> > improvement.
> >> >
> >> > v2: replace kick_all_cpus_sync() with synchronize_sched().
> >> >
> >> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >>
> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
> >>
> >> I've attached my .config. I don't know if it also happens with
> >> shmobile_defconfig, as something went wrong with my remote access to the board,
> >> preventing further testing. I also couldn't verify if the issue persists in
> >> v4.7-rc3.
> 
> In the mean time, I've verified it also happens with shmobile_defconfig.
> 
> >>
> >> Do you have a clue?
> >
> > I don't have yet. Could you help me to narrow down the problem?
> > Following diff is half-revert change to check that synchronize_sched()
> > has no problem.
> 
> Thanks!
> 
> Unfortunately the half revert is not sufficient. The full revert is.

Thanks for quick testing!

Could I ask one more time to check that synchronize_sched() is root
cause of the problem? Testing following two diffs will be helpful to me.

Thanks.

------->8--------
diff --git a/mm/slab.c b/mm/slab.c
index 763096a..d892364 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -965,7 +965,7 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
         * freed after synchronize_sched().
         */
        if (force_change)
-               synchronize_sched();
+               kick_all_cpus_sync();
 
 fail:
        kfree(old_shared);

------->8------
diff --git a/mm/slab.c b/mm/slab.c
index 763096a..38d99c2 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -964,8 +964,6 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
         * guaranteed to be valid until irq is re-enabled, because it will be
         * freed after synchronize_sched().
         */
-       if (force_change)
-               synchronize_sched();
 
 fail:
        kfree(old_shared);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-14  8:11     ` Joonsoo Kim
@ 2016-06-14 10:45       ` Geert Uytterhoeven
  2016-06-15  2:23         ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Geert Uytterhoeven @ 2016-06-14 10:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Joonsoo,

On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
>> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
>> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
>> >> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> >> > To check whther free objects exist or not precisely, we need to grab a
>> >> > lock.  But, accuracy isn't that important because race window would be
>> >> > even small and if there is too much free object, cache reaper would reap
>> >> > it.  So, this patch makes the check for free object exisistence not to
>> >> > hold a lock.  This will reduce lock contention in heavily allocation case.
>> >> >
>> >> > Note that until now, n->shared can be freed during the processing by
>> >> > writing slabinfo, but, with some trick in this patch, we can access it
>> >> > freely within interrupt disabled period.
>> >> >
>> >> > Below is the result of concurrent allocation/free in slab allocation
>> >> > benchmark made by Christoph a long time ago.  I make the output simpler.
>> >> > The number shows cycle count during alloc/free respectively so less is
>> >> > better.
>> >> >
>> >> > * Before
>> >> > Kmalloc N*alloc N*free(32): Average=248/966
>> >> > Kmalloc N*alloc N*free(64): Average=261/949
>> >> > Kmalloc N*alloc N*free(128): Average=314/1016
>> >> > Kmalloc N*alloc N*free(256): Average=741/1061
>> >> > Kmalloc N*alloc N*free(512): Average=1246/1152
>> >> > Kmalloc N*alloc N*free(1024): Average=2437/1259
>> >> > Kmalloc N*alloc N*free(2048): Average=4980/1800
>> >> > Kmalloc N*alloc N*free(4096): Average=9000/2078
>> >> >
>> >> > * After
>> >> > Kmalloc N*alloc N*free(32): Average=344/792
>> >> > Kmalloc N*alloc N*free(64): Average=347/882
>> >> > Kmalloc N*alloc N*free(128): Average=390/959
>> >> > Kmalloc N*alloc N*free(256): Average=393/1067
>> >> > Kmalloc N*alloc N*free(512): Average=683/1229
>> >> > Kmalloc N*alloc N*free(1024): Average=1295/1325
>> >> > Kmalloc N*alloc N*free(2048): Average=2513/1664
>> >> > Kmalloc N*alloc N*free(4096): Average=4742/2172
>> >> >
>> >> > It shows that allocation performance decreases for the object size up to
>> >> > 128 and it may be due to extra checks in cache_alloc_refill().  But, with
>> >> > considering improvement of free performance, net result looks the same.
>> >> > Result for other size class looks very promising, roughly, 50% performance
>> >> > improvement.
>> >> >
>> >> > v2: replace kick_all_cpus_sync() with synchronize_sched().
>> >> >
>> >> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> >>
>> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
>> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
>> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
>> >>
>> >> I've attached my .config. I don't know if it also happens with
>> >> shmobile_defconfig, as something went wrong with my remote access to the board,
>> >> preventing further testing. I also couldn't verify if the issue persists in
>> >> v4.7-rc3.
>>
>> In the mean time, I've verified it also happens with shmobile_defconfig.
>>
>> >> Do you have a clue?
>> >
>> > I don't have yet. Could you help me to narrow down the problem?
>> > Following diff is half-revert change to check that synchronize_sched()
>> > has no problem.
>>
>> Thanks!
>>
>> Unfortunately the half revert is not sufficient. The full revert is.
>
> Thanks for quick testing!
>
> Could I ask one more time to check that synchronize_sched() is root
> cause of the problem? Testing following two diffs will be helpful to me.
>
> Thanks.
>
> ------->8--------
> diff --git a/mm/slab.c b/mm/slab.c
> index 763096a..d892364 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -965,7 +965,7 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
>          * freed after synchronize_sched().
>          */
>         if (force_change)
> -               synchronize_sched();
> +               kick_all_cpus_sync();
>
>  fail:
>         kfree(old_shared);

Works.

> ------->8------
> diff --git a/mm/slab.c b/mm/slab.c
> index 763096a..38d99c2 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -964,8 +964,6 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
>          * guaranteed to be valid until irq is re-enabled, because it will be
>          * freed after synchronize_sched().
>          */
> -       if (force_change)
> -               synchronize_sched();
>
>  fail:
>         kfree(old_shared);
>

Also works.

Note that I do not see this problem on any of the other boards I use, one
of which is also a dual Cortex A9.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
       [not found] <CAMuHMdXC=zEjbZADE5wELjOq_kBiFNewpdUrMCe8d3Utu98h8A@mail.gmail.com>
  2016-06-14  6:24 ` Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache) Joonsoo Kim
@ 2016-06-14 13:10 ` Geert Uytterhoeven
  1 sibling, 0 replies; 33+ messages in thread
From: Geert Uytterhoeven @ 2016-06-14 13:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jonsoo,

On Mon, Jun 13, 2016 at 9:43 PM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
>> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> To check whther free objects exist or not precisely, we need to grab a
>> lock.  But, accuracy isn't that important because race window would be
>> even small and if there is too much free object, cache reaper would reap
>> it.  So, this patch makes the check for free object exisistence not to
>> hold a lock.  This will reduce lock contention in heavily allocation case.

>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
> (Renesas dual Cortex A9) to this patch, which is upstream commit
> 801faf0db8947e01877920e848a4d338dd7a99e7.

BTW, when disabling SMP, the problem goes away.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-14 10:45       ` Geert Uytterhoeven
@ 2016-06-15  2:23         ` Joonsoo Kim
  2016-06-15  8:39           ` Geert Uytterhoeven
  0 siblings, 1 reply; 33+ messages in thread
From: Joonsoo Kim @ 2016-06-15  2:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote:
> Hi Joonsoo,
> 
> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> >> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
> >> >> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> >> > To check whther free objects exist or not precisely, we need to grab a
> >> >> > lock.  But, accuracy isn't that important because race window would be
> >> >> > even small and if there is too much free object, cache reaper would reap
> >> >> > it.  So, this patch makes the check for free object exisistence not to
> >> >> > hold a lock.  This will reduce lock contention in heavily allocation case.
> >> >> >
> >> >> > Note that until now, n->shared can be freed during the processing by
> >> >> > writing slabinfo, but, with some trick in this patch, we can access it
> >> >> > freely within interrupt disabled period.
> >> >> >
> >> >> > Below is the result of concurrent allocation/free in slab allocation
> >> >> > benchmark made by Christoph a long time ago.  I make the output simpler.
> >> >> > The number shows cycle count during alloc/free respectively so less is
> >> >> > better.
> >> >> >
> >> >> > * Before
> >> >> > Kmalloc N*alloc N*free(32): Average=248/966
> >> >> > Kmalloc N*alloc N*free(64): Average=261/949
> >> >> > Kmalloc N*alloc N*free(128): Average=314/1016
> >> >> > Kmalloc N*alloc N*free(256): Average=741/1061
> >> >> > Kmalloc N*alloc N*free(512): Average=1246/1152
> >> >> > Kmalloc N*alloc N*free(1024): Average=2437/1259
> >> >> > Kmalloc N*alloc N*free(2048): Average=4980/1800
> >> >> > Kmalloc N*alloc N*free(4096): Average=9000/2078
> >> >> >
> >> >> > * After
> >> >> > Kmalloc N*alloc N*free(32): Average=344/792
> >> >> > Kmalloc N*alloc N*free(64): Average=347/882
> >> >> > Kmalloc N*alloc N*free(128): Average=390/959
> >> >> > Kmalloc N*alloc N*free(256): Average=393/1067
> >> >> > Kmalloc N*alloc N*free(512): Average=683/1229
> >> >> > Kmalloc N*alloc N*free(1024): Average=1295/1325
> >> >> > Kmalloc N*alloc N*free(2048): Average=2513/1664
> >> >> > Kmalloc N*alloc N*free(4096): Average=4742/2172
> >> >> >
> >> >> > It shows that allocation performance decreases for the object size up to
> >> >> > 128 and it may be due to extra checks in cache_alloc_refill().  But, with
> >> >> > considering improvement of free performance, net result looks the same.
> >> >> > Result for other size class looks very promising, roughly, 50% performance
> >> >> > improvement.
> >> >> >
> >> >> > v2: replace kick_all_cpus_sync() with synchronize_sched().
> >> >> >
> >> >> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> >>
> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
> >> >>
> >> >> I've attached my .config. I don't know if it also happens with
> >> >> shmobile_defconfig, as something went wrong with my remote access to the board,
> >> >> preventing further testing. I also couldn't verify if the issue persists in
> >> >> v4.7-rc3.
> >>
> >> In the mean time, I've verified it also happens with shmobile_defconfig.
> >>
> >> >> Do you have a clue?
> >> >
> >> > I don't have yet. Could you help me to narrow down the problem?
> >> > Following diff is half-revert change to check that synchronize_sched()
> >> > has no problem.
> >>
> >> Thanks!
> >>
> >> Unfortunately the half revert is not sufficient. The full revert is.
> >
> > Thanks for quick testing!
> >
> > Could I ask one more time to check that synchronize_sched() is root
> > cause of the problem? Testing following two diffs will be helpful to me.
> >
> > Thanks.
> >
> > ------->8--------
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 763096a..d892364 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -965,7 +965,7 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> >          * freed after synchronize_sched().
> >          */
> >         if (force_change)
> > -               synchronize_sched();
> > +               kick_all_cpus_sync();
> >
> >  fail:
> >         kfree(old_shared);
> 
> Works.
> 
> > ------->8------
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 763096a..38d99c2 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -964,8 +964,6 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> >          * guaranteed to be valid until irq is re-enabled, because it will be
> >          * freed after synchronize_sched().
> >          */
> > -       if (force_change)
> > -               synchronize_sched();
> >
> >  fail:
> >         kfree(old_shared);
> >
> 
> Also works.
> 
> Note that I do not see this problem on any of the other boards I use, one
> of which is also a dual Cortex A9.

Thanks for your help!

It's curious that synchronize_sched() has some effect in this early
phase. In synchronize_sched(), rcu_blocking_is_gp() is called and
it checks num_online_cpus <= 1. If so, synchronize_sched() does nothing.

It would be related to might_sleep() in rcu_blocking_is_gp() but I'm not sure now.

First, I'd like to confirm that num_online_cpus() is correct.
Could you try following patch and give me a dmesg?

Thanks.

------->8----------
diff --git a/mm/slab.c b/mm/slab.c
index 763096a..5b7300a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -964,8 +964,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
         * guaranteed to be valid until irq is re-enabled, because it will be
         * freed after synchronize_sched().
         */
-       if (force_change)
-               synchronize_sched();
+       if (force_change) {
+               WARN_ON_ONCE(num_online_cpus() <= 1);
+               WARN_ON_ONCE(num_online_cpus() > 1);
+       }
 
 fail:
        kfree(old_shared);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-15  2:23         ` Joonsoo Kim
@ 2016-06-15  8:39           ` Geert Uytterhoeven
  2016-06-20  6:39             ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Geert Uytterhoeven @ 2016-06-15  8:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Joonsoo,

On Wed, Jun 15, 2016 at 4:23 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote:
>> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
>> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
>> >> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
>> >> >> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> >> >> > To check whther free objects exist or not precisely, we need to grab a
>> >> >> > lock.  But, accuracy isn't that important because race window would be
>> >> >> > even small and if there is too much free object, cache reaper would reap
>> >> >> > it.  So, this patch makes the check for free object exisistence not to
>> >> >> > hold a lock.  This will reduce lock contention in heavily allocation case.

>> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
>> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
>> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7.

> It's curious that synchronize_sched() has some effect in this early
> phase. In synchronize_sched(), rcu_blocking_is_gp() is called and
> it checks num_online_cpus <= 1. If so, synchronize_sched() does nothing.
>
> It would be related to might_sleep() in rcu_blocking_is_gp() but I'm not sure now.
>
> First, I'd like to confirm that num_online_cpus() is correct.
> Could you try following patch and give me a dmesg?
>
> Thanks.
>
> ------->8----------
> diff --git a/mm/slab.c b/mm/slab.c
> index 763096a..5b7300a 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -964,8 +964,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
>          * guaranteed to be valid until irq is re-enabled, because it will be
>          * freed after synchronize_sched().
>          */
> -       if (force_change)
> -               synchronize_sched();
> +       if (force_change) {
> +               WARN_ON_ONCE(num_online_cpus() <= 1);
> +               WARN_ON_ONCE(num_online_cpus() > 1);
> +       }

Full dmesg output below.

I also tested whether it's the call to synchronize_sched() before or after
secondary CPU bringup that hangs.

        if (force_change && num_online_cpus() <= 1)
                synchronize_sched();

boots.

        if (force_change && num_online_cpus() > 1)
                synchronize_sched();

hangs.

Booting Linux on physical CPU 0x0
Linux version 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty (geert at ramsan)
(gcc version 4.9.0 (GCC) ) #84 SMP Wed Jun 15 10:20:12 CEST 2016
CPU: ARMv7 Processor [411fc093] revision 3 (ARMv7), cr=10c5387d
CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
Machine model: EMEV2 KZM9D Board
debug: ignoring loglevel setting.
Memory policy: Data cache writealloc
On node 0 totalpages: 32768
free_area_init_node: node 0, pgdat c09286c0, node_mem_map c7efa000
  Normal zone: 256 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 32768 pages, LIFO batch:7
percpu: Embedded 12 pages/cpu @c7ed9000 s19264 r8192 d21696 u49152
pcpu-alloc: s19264 r8192 d21696 u49152 alloc=12*4096
pcpu-alloc: [0] 0 [0] 1
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
Kernel command line: console=ttyS1,115200n81 ignore_loglevel
root=/dev/nfs ip=dhcp
PID hash table entries: 512 (order: -1, 2048 bytes)
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 121144K/131072K available (4243K kernel code, 165K rwdata,
1344K rodata, 2048K init, 264K bss, 9928K reserved, 0K cma-reserved,
0K highmem)
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
    vmalloc : 0xc8800000 - 0xff800000   ( 880 MB)
    lowmem  : 0xc0000000 - 0xc8000000   ( 128 MB)
    pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
    modules : 0xbf000000 - 0xbfe00000   (  14 MB)
      .text : 0xc0008000 - 0xc0674eb8   (6580 kB)
      .init : 0xc0700000 - 0xc0900000   (2048 kB)
      .data : 0xc0900000 - 0xc0929420   ( 166 kB)
       .bss : 0xc092b000 - 0xc096d1e8   ( 265 kB)
Hierarchical RCU implementation.
 Build-time adjustment of leaf fanout to 32.
 RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
RCU: Adjusting geometry for rcu_fanout_leaf=32, nr_cpu_ids=2
NR_IRQS:16 nr_irqs:16 16
clocksource_probe: no matching clocksources found
sched_clock: 32 bits at 100 Hz, resolution 10000000ns, wraps every
21474836475000000ns
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/slab.c:975 setup_kmem_cache_node+0x160/0x1c8
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.6.0-kzm9d-05060-g801faf0db8947e01-dirty #84
Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
[<c010de08>] (unwind_backtrace) from [<c010a620>] (show_stack+0x10/0x14)
[<c010a620>] (show_stack) from [<c02b3178>] (dump_stack+0x7c/0x9c)
[<c02b3178>] (dump_stack) from [<c011f9fc>] (__warn+0xcc/0xfc)
[<c011f9fc>] (__warn) from [<c011fad0>] (warn_slowpath_null+0x1c/0x24)
[<c011fad0>] (warn_slowpath_null) from [<c01ced4c>]
(setup_kmem_cache_node+0x160/0x1c8)
[<c01ced4c>] (setup_kmem_cache_node) from [<c01cf02c>]
(__do_tune_cpucache+0xf4/0x114)
[<c01cf02c>] (__do_tune_cpucache) from [<c01cf0b0>] (enable_cpucache+0x64/0xb4)
[<c01cf0b0>] (enable_cpucache) from [<c0710324>]
(kmem_cache_init_late+0x40/0x84)
[<c0710324>] (kmem_cache_init_late) from [<c0700af8>] (start_kernel+0x238/0x36c)
[<c0700af8>] (start_kernel) from [<4000807c>] (0x4000807c)
---[ end trace cb88537fdc8fa200 ]---
Console: colour dummy device 80x30
Calibrating delay loop (skipped) preset value.. 1066.00 BogoMIPS (lpj=5330000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
CPU: Testing write buffer coherency: ok
CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
Setting up static identity map for 0x40100000 - 0x40100058
CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
Brought up 2 CPUs
SMP: Total of 2 processors activated (2132.00 BogoMIPS).
CPU: All CPU(s) started in SVC mode.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at mm/slab.c:976 setup_kmem_cache_node+0x198/0x1c8
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W
4.6.0-kzm9d-05060-g801faf0db8947e01-dirty #84
Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
[<c010de08>] (unwind_backtrace) from [<c010a620>] (show_stack+0x10/0x14)
[<c010a620>] (show_stack) from [<c02b3178>] (dump_stack+0x7c/0x9c)
[<c02b3178>] (dump_stack) from [<c011f9fc>] (__warn+0xcc/0xfc)
[<c011f9fc>] (__warn) from [<c011fad0>] (warn_slowpath_null+0x1c/0x24)
[<c011fad0>] (warn_slowpath_null) from [<c01ced84>]
(setup_kmem_cache_node+0x198/0x1c8)
[<c01ced84>] (setup_kmem_cache_node) from [<c01cf02c>]
(__do_tune_cpucache+0xf4/0x114)
[<c01cf02c>] (__do_tune_cpucache) from [<c01cf0b0>] (enable_cpucache+0x64/0xb4)
[<c01cf0b0>] (enable_cpucache) from [<c01cf4cc>]
(__kmem_cache_create+0x1a0/0x1c8)
[<c01cf4cc>] (__kmem_cache_create) from [<c01b2904>]
(kmem_cache_create+0xbc/0x190)
[<c01b2904>] (kmem_cache_create) from [<c070d724>] (shmem_init+0x34/0xb0)
[<c070d724>] (shmem_init) from [<c0700cc4>] (kernel_init_freeable+0x98/0x1ec)
[<c0700cc4>] (kernel_init_freeable) from [<c0497760>] (kernel_init+0x8/0x110)
[<c0497760>] (kernel_init) from [<c0106c78>] (ret_from_fork+0x14/0x3c)
---[ end trace cb88537fdc8fa201 ]---
devtmpfs: initialized
VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
max_idle_ns: 19112604462750000 ns
pinctrl core: initialized pinctrl subsystem
NET: Registered protocol family 16
DMA: preallocated 256 KiB pool for atomic coherent allocations
sh-pfc e0140200.pfc: emev2_pfc support registered
gpiochip_find_base: found new base at 992
gpio gpiochip0: (e0050000.gpio): created GPIO range 0->31 ==>
e0140200.pfc PIN 0->31
gpio gpiochip0: (e0050000.gpio): added GPIO chardev (254:0)
gpiochip_setup_dev: registered GPIOs 992 to 1023 on device: gpiochip0
(e0050000.gpio)
gpiochip_find_base: found new base at 960
gpio gpiochip1: (e0050080.gpio): created GPIO range 0->31 ==>
e0140200.pfc PIN 32->63
gpio gpiochip1: (e0050080.gpio): added GPIO chardev (254:1)
gpiochip_setup_dev: registered GPIOs 960 to 991 on device: gpiochip1
(e0050080.gpio)
gpiochip_find_base: found new base at 928
gpio gpiochip2: (e0050100.gpio): created GPIO range 0->31 ==>
e0140200.pfc PIN 64->95
gpio gpiochip2: (e0050100.gpio): added GPIO chardev (254:2)
gpiochip_setup_dev: registered GPIOs 928 to 959 on device: gpiochip2
(e0050100.gpio)
gpiochip_find_base: found new base at 896
gpio gpiochip3: (e0050180.gpio): created GPIO range 0->31 ==>
e0140200.pfc PIN 96->127
gpio gpiochip3: (e0050180.gpio): added GPIO chardev (254:3)
gpiochip_setup_dev: registered GPIOs 896 to 927 on device: gpiochip3
(e0050180.gpio)
gpiochip_find_base: found new base at 865
gpio gpiochip4: (e0050200.gpio): created GPIO range 0->30 ==>
e0140200.pfc PIN 128->158
gpio gpiochip4: (e0050200.gpio): added GPIO chardev (254:4)
gpiochip_setup_dev: registered GPIOs 865 to 895 on device: gpiochip4
(e0050200.gpio)
gio: map hw irq = 1, irq = 35
gio: sense irq = 1, mode = 8
No ATAGs?
hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
hw-breakpoint: maximum watchpoint size is 4 bytes.
of_get_named_gpiod_flags: can't parse 'gpio' property of node '/regulator at 0[0]'
of_get_named_gpiod_flags: can't parse 'gpio' property of node '/regulator at 1[0]'
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
em_sti e0180000.timer: used for clock events
em_sti e0180000.timer: used for oneshot clock events
em_sti e0180000.timer: used as clock source
clocksource: e0180000.timer: mask: 0xffffffffffff max_cycles:
0x1ef4687b1, max_idle_ns: 3697658158765000000 ns
Advanced Linux Sound Architecture Driver Initialized.
NET: Registered protocol family 23
clocksource: e0180000.timer: mask: 0xffffffffffff max_cycles:
0x1ef4687b1, max_idle_ns: 112843571739654 ns
clocksource: Switched to clocksource e0180000.timer
NET: Registered protocol family 2
TCP established hash table entries: 1024 (order: 0, 4096 bytes)
TCP bind hash table entries: 1024 (order: 2, 20480 bytes)
TCP: Hash tables configured (established 1024 bind 1024)
UDP hash table entries: 256 (order: 1, 12288 bytes)
UDP-Lite hash table entries: 256 (order: 1, 12288 bytes)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
Clockevents: could not switch to one-shot mode:
Clockevents: could not switch to one-shot mode: dummy_timer is not functional.
Could not switch to high resolution mode on CPU 0
 dummy_timer is not functional.
Could not switch to high resolution mode on CPU 1
hw perfevents: enabled with armv7_cortex_a9 PMU driver, 7 counters available
futex hash table entries: 512 (order: 3, 32768 bytes)
workingset: timestamp_bits=28 max_order=15 bucket_order=0
NFS: Registering the id_resolver key type
Key type id_resolver registered
Key type id_legacy registered
nfs4filelayout_init: NFSv4 File Layout Driver Registering...
io scheduler noop registered (default)
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
e1020000.serial: ttyS0 at MMIO 0xe1020000 (irq = 19, base_baud =
796444) is a 16550A
e1030000.serial: ttyS1 at MMIO 0xe1030000 (irq = 20, base_baud =
7168000) is a 16550A
console [ttyS1] enabled
e1040000.serial: ttyS2 at MMIO 0xe1040000 (irq = 21, base_baud =
14336000) is a 16550A
e1050000.serial: ttyS3 at MMIO 0xe1050000 (irq = 22, base_baud =
2389333) is a 16550A
gio: sense irq = 1, mode = 8
libphy: smsc911x-mdio: probed
Generic PHY 20000000.etherne:01: attached PHY driver [Generic PHY]
(mii_bus:phy_addr=20000000.etherne:01, irq=-1)
smsc911x 20000000.ethernet eth0: MAC Address: 00:01:9b:04:03:cf
usbcore: registered new interface driver usb-storage
i2c /dev entries driver
em-i2c e0070000.i2c: Added i2c controller 0, irq 33
em-i2c e10a0000.i2c: Added i2c controller 1, irq 34
cpu cpu0: failed to get clock: -2
cpufreq-dt: probe of cpufreq-dt failed with error -2
ledtrig-cpu: registered to indicate activity on CPUs
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
NET: Registered protocol family 17
Key type dns_resolver registered
Registering SWP/SWPB emulation handler
of_get_named_gpiod_flags: parsed 'gpios' property of node
'/gpio_keys/button at 1[0]' - status (0)
of_get_named_gpiod_flags: parsed 'gpios' property of node
'/gpio_keys/button at 2[0]' - status (0)
of_get_named_gpiod_flags: parsed 'gpios' property of node
'/gpio_keys/button at 3[0]' - status (0)
of_get_named_gpiod_flags: parsed 'gpios' property of node
'/gpio_keys/button at 4[0]' - status (0)
gpio-1006 (DSW2-1): gpiod_set_debounce: missing set() or
set_debounce() operations
gio: map hw irq = 14, irq = 36
gio: sense irq = 14, mode = 12
gpio-1007 (DSW2-2): gpiod_set_debounce: missing set() or
set_debounce() operations
gio: map hw irq = 15, irq = 37
gio: sense irq = 15, mode = 12
gpio-1008 (DSW2-3): gpiod_set_debounce: missing set() or
set_debounce() operations
gio: map hw irq = 16, irq = 38
gio: sense irq = 16, mode = 12
gpio-1009 (DSW2-4): gpiod_set_debounce: missing set() or
set_debounce() operations
gio: map hw irq = 17, irq = 39
gio: sense irq = 17, mode = 12
input: gpio_keys as /devices/platform/gpio_keys/input/input0
hctosys: unable to open rtc device (rtc0)
smsc911x 20000000.ethernet eth0: SMSC911x/921x identified at 0xc8880000, IRQ: 35
Sending DHCP requests .., OK
IP-Config: Got DHCP answer from 192.168.97.254, my address is 192.168.97.215
IP-Config: Complete:
     device=eth0, hwaddr=00:01:9b:04:03:cf, ipaddr=192.168.97.215,
mask=255.255.255.0, gw=192.168.97.254
     host=192.168.97.215, domain=of.borg, nis-domain=(none)
     bootserver=192.168.97.254, rootserver=192.168.97.254, rootpath=
  nameserver0=192.168.97.254
ALSA device list:
  No soundcards found.
Freeing unused kernel memory: 2048K (c0700000 - c0900000)
sysctl: error: 'kernel.hotplug' is an unknown key

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-15  8:39           ` Geert Uytterhoeven
@ 2016-06-20  6:39             ` Joonsoo Kim
  2016-06-20 13:12               ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Joonsoo Kim @ 2016-06-20  6:39 UTC (permalink / raw)
  To: linux-arm-kernel

CCing Paul to ask some question.

On Wed, Jun 15, 2016 at 10:39:47AM +0200, Geert Uytterhoeven wrote:
> Hi Joonsoo,
> 
> On Wed, Jun 15, 2016 at 4:23 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote:
> >> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> >> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
> >> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> >> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> >> >> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
> >> >> >> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> >> >> > To check whther free objects exist or not precisely, we need to grab a
> >> >> >> > lock.  But, accuracy isn't that important because race window would be
> >> >> >> > even small and if there is too much free object, cache reaper would reap
> >> >> >> > it.  So, this patch makes the check for free object exisistence not to
> >> >> >> > hold a lock.  This will reduce lock contention in heavily allocation case.
> 
> >> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
> >> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
> >> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
> 
> > It's curious that synchronize_sched() has some effect in this early
> > phase. In synchronize_sched(), rcu_blocking_is_gp() is called and
> > it checks num_online_cpus <= 1. If so, synchronize_sched() does nothing.
> >
> > It would be related to might_sleep() in rcu_blocking_is_gp() but I'm not sure now.
> >
> > First, I'd like to confirm that num_online_cpus() is correct.
> > Could you try following patch and give me a dmesg?
> >
> > Thanks.
> >
> > ------->8----------
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 763096a..5b7300a 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -964,8 +964,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> >          * guaranteed to be valid until irq is re-enabled, because it will be
> >          * freed after synchronize_sched().
> >          */
> > -       if (force_change)
> > -               synchronize_sched();
> > +       if (force_change) {
> > +               WARN_ON_ONCE(num_online_cpus() <= 1);
> > +               WARN_ON_ONCE(num_online_cpus() > 1);
> > +       }
> 
> Full dmesg output below.
> 
> I also tested whether it's the call to synchronize_sched() before or after
> secondary CPU bringup that hangs.
> 
>         if (force_change && num_online_cpus() <= 1)
>                 synchronize_sched();
> 
> boots.
> 
>         if (force_change && num_online_cpus() > 1)
>                 synchronize_sched();
> 
> hangs.

Hello, Paul.

I changed slab.c to use synchronize_sched() for full memory barrier. First
call happens on kmem_cache_init_late() and it would not be a problem
because, at this time, num_online_cpus() <= 1 and synchronize_sched()
would return immediately. Second call site would be shmem_init()
and it seems that system hangs on it. Since smp is already initialized
at that time, there would be some effect of synchronize_sched() but I
can't imagine what's wrong here. Is it invalid moment to call
synchronize_sched()?

Note that my x86 virtual machine works fine even if
synchronize_sched() is called in shmem_init() but Geert's some ARM
machines (not all ARM machine) don't work well with it.

Thanks.

> 
> Booting Linux on physical CPU 0x0
> Linux version 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty (geert at ramsan)
> (gcc version 4.9.0 (GCC) ) #84 SMP Wed Jun 15 10:20:12 CEST 2016
> CPU: ARMv7 Processor [411fc093] revision 3 (ARMv7), cr=10c5387d
> CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
> Machine model: EMEV2 KZM9D Board
> debug: ignoring loglevel setting.
> Memory policy: Data cache writealloc
> On node 0 totalpages: 32768
> free_area_init_node: node 0, pgdat c09286c0, node_mem_map c7efa000
>   Normal zone: 256 pages used for memmap
>   Normal zone: 0 pages reserved
>   Normal zone: 32768 pages, LIFO batch:7
> percpu: Embedded 12 pages/cpu @c7ed9000 s19264 r8192 d21696 u49152
> pcpu-alloc: s19264 r8192 d21696 u49152 alloc=12*4096
> pcpu-alloc: [0] 0 [0] 1
> Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
> Kernel command line: console=ttyS1,115200n81 ignore_loglevel
> root=/dev/nfs ip=dhcp
> PID hash table entries: 512 (order: -1, 2048 bytes)
> Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
> Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
> Memory: 121144K/131072K available (4243K kernel code, 165K rwdata,
> 1344K rodata, 2048K init, 264K bss, 9928K reserved, 0K cma-reserved,
> 0K highmem)
> Virtual kernel memory layout:
>     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
>     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
>     vmalloc : 0xc8800000 - 0xff800000   ( 880 MB)
>     lowmem  : 0xc0000000 - 0xc8000000   ( 128 MB)
>     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
>     modules : 0xbf000000 - 0xbfe00000   (  14 MB)
>       .text : 0xc0008000 - 0xc0674eb8   (6580 kB)
>       .init : 0xc0700000 - 0xc0900000   (2048 kB)
>       .data : 0xc0900000 - 0xc0929420   ( 166 kB)
>        .bss : 0xc092b000 - 0xc096d1e8   ( 265 kB)
> Hierarchical RCU implementation.
>  Build-time adjustment of leaf fanout to 32.
>  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
> RCU: Adjusting geometry for rcu_fanout_leaf=32, nr_cpu_ids=2
> NR_IRQS:16 nr_irqs:16 16
> clocksource_probe: no matching clocksources found
> sched_clock: 32 bits at 100 Hz, resolution 10000000ns, wraps every
> 21474836475000000ns
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at mm/slab.c:975 setup_kmem_cache_node+0x160/0x1c8
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty #84
> Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> [<c010de08>] (unwind_backtrace) from [<c010a620>] (show_stack+0x10/0x14)
> [<c010a620>] (show_stack) from [<c02b3178>] (dump_stack+0x7c/0x9c)
> [<c02b3178>] (dump_stack) from [<c011f9fc>] (__warn+0xcc/0xfc)
> [<c011f9fc>] (__warn) from [<c011fad0>] (warn_slowpath_null+0x1c/0x24)
> [<c011fad0>] (warn_slowpath_null) from [<c01ced4c>]
> (setup_kmem_cache_node+0x160/0x1c8)
> [<c01ced4c>] (setup_kmem_cache_node) from [<c01cf02c>]
> (__do_tune_cpucache+0xf4/0x114)
> [<c01cf02c>] (__do_tune_cpucache) from [<c01cf0b0>] (enable_cpucache+0x64/0xb4)
> [<c01cf0b0>] (enable_cpucache) from [<c0710324>]
> (kmem_cache_init_late+0x40/0x84)
> [<c0710324>] (kmem_cache_init_late) from [<c0700af8>] (start_kernel+0x238/0x36c)
> [<c0700af8>] (start_kernel) from [<4000807c>] (0x4000807c)
> ---[ end trace cb88537fdc8fa200 ]---
> Console: colour dummy device 80x30
> Calibrating delay loop (skipped) preset value.. 1066.00 BogoMIPS (lpj=5330000)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
> Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
> CPU: Testing write buffer coherency: ok
> CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> Setting up static identity map for 0x40100000 - 0x40100058
> CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> Brought up 2 CPUs
> SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> CPU: All CPU(s) started in SVC mode.
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at mm/slab.c:976 setup_kmem_cache_node+0x198/0x1c8
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W
> 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty #84
> Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> [<c010de08>] (unwind_backtrace) from [<c010a620>] (show_stack+0x10/0x14)
> [<c010a620>] (show_stack) from [<c02b3178>] (dump_stack+0x7c/0x9c)
> [<c02b3178>] (dump_stack) from [<c011f9fc>] (__warn+0xcc/0xfc)
> [<c011f9fc>] (__warn) from [<c011fad0>] (warn_slowpath_null+0x1c/0x24)
> [<c011fad0>] (warn_slowpath_null) from [<c01ced84>]
> (setup_kmem_cache_node+0x198/0x1c8)
> [<c01ced84>] (setup_kmem_cache_node) from [<c01cf02c>]
> (__do_tune_cpucache+0xf4/0x114)
> [<c01cf02c>] (__do_tune_cpucache) from [<c01cf0b0>] (enable_cpucache+0x64/0xb4)
> [<c01cf0b0>] (enable_cpucache) from [<c01cf4cc>]
> (__kmem_cache_create+0x1a0/0x1c8)
> [<c01cf4cc>] (__kmem_cache_create) from [<c01b2904>]
> (kmem_cache_create+0xbc/0x190)
> [<c01b2904>] (kmem_cache_create) from [<c070d724>] (shmem_init+0x34/0xb0)
> [<c070d724>] (shmem_init) from [<c0700cc4>] (kernel_init_freeable+0x98/0x1ec)
> [<c0700cc4>] (kernel_init_freeable) from [<c0497760>] (kernel_init+0x8/0x110)
> [<c0497760>] (kernel_init) from [<c0106c78>] (ret_from_fork+0x14/0x3c)
> ---[ end trace cb88537fdc8fa201 ]---
> devtmpfs: initialized
> VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> max_idle_ns: 19112604462750000 ns
> pinctrl core: initialized pinctrl subsystem
> NET: Registered protocol family 16
> DMA: preallocated 256 KiB pool for atomic coherent allocations
> sh-pfc e0140200.pfc: emev2_pfc support registered
> gpiochip_find_base: found new base at 992
> gpio gpiochip0: (e0050000.gpio): created GPIO range 0->31 ==>
> e0140200.pfc PIN 0->31
> gpio gpiochip0: (e0050000.gpio): added GPIO chardev (254:0)
> gpiochip_setup_dev: registered GPIOs 992 to 1023 on device: gpiochip0
> (e0050000.gpio)
> gpiochip_find_base: found new base at 960
> gpio gpiochip1: (e0050080.gpio): created GPIO range 0->31 ==>
> e0140200.pfc PIN 32->63
> gpio gpiochip1: (e0050080.gpio): added GPIO chardev (254:1)
> gpiochip_setup_dev: registered GPIOs 960 to 991 on device: gpiochip1
> (e0050080.gpio)
> gpiochip_find_base: found new base at 928
> gpio gpiochip2: (e0050100.gpio): created GPIO range 0->31 ==>
> e0140200.pfc PIN 64->95
> gpio gpiochip2: (e0050100.gpio): added GPIO chardev (254:2)
> gpiochip_setup_dev: registered GPIOs 928 to 959 on device: gpiochip2
> (e0050100.gpio)
> gpiochip_find_base: found new base at 896
> gpio gpiochip3: (e0050180.gpio): created GPIO range 0->31 ==>
> e0140200.pfc PIN 96->127
> gpio gpiochip3: (e0050180.gpio): added GPIO chardev (254:3)
> gpiochip_setup_dev: registered GPIOs 896 to 927 on device: gpiochip3
> (e0050180.gpio)
> gpiochip_find_base: found new base at 865
> gpio gpiochip4: (e0050200.gpio): created GPIO range 0->30 ==>
> e0140200.pfc PIN 128->158
> gpio gpiochip4: (e0050200.gpio): added GPIO chardev (254:4)
> gpiochip_setup_dev: registered GPIOs 865 to 895 on device: gpiochip4
> (e0050200.gpio)
> gio: map hw irq = 1, irq = 35
> gio: sense irq = 1, mode = 8
> No ATAGs?
> hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
> hw-breakpoint: maximum watchpoint size is 4 bytes.
> of_get_named_gpiod_flags: can't parse 'gpio' property of node '/regulator at 0[0]'
> of_get_named_gpiod_flags: can't parse 'gpio' property of node '/regulator at 1[0]'
> SCSI subsystem initialized
> usbcore: registered new interface driver usbfs
> usbcore: registered new interface driver hub
> usbcore: registered new device driver usb
> em_sti e0180000.timer: used for clock events
> em_sti e0180000.timer: used for oneshot clock events
> em_sti e0180000.timer: used as clock source
> clocksource: e0180000.timer: mask: 0xffffffffffff max_cycles:
> 0x1ef4687b1, max_idle_ns: 3697658158765000000 ns
> Advanced Linux Sound Architecture Driver Initialized.
> NET: Registered protocol family 23
> clocksource: e0180000.timer: mask: 0xffffffffffff max_cycles:
> 0x1ef4687b1, max_idle_ns: 112843571739654 ns
> clocksource: Switched to clocksource e0180000.timer
> NET: Registered protocol family 2
> TCP established hash table entries: 1024 (order: 0, 4096 bytes)
> TCP bind hash table entries: 1024 (order: 2, 20480 bytes)
> TCP: Hash tables configured (established 1024 bind 1024)
> UDP hash table entries: 256 (order: 1, 12288 bytes)
> UDP-Lite hash table entries: 256 (order: 1, 12288 bytes)
> NET: Registered protocol family 1
> RPC: Registered named UNIX socket transport module.
> RPC: Registered udp transport module.
> RPC: Registered tcp transport module.
> RPC: Registered tcp NFSv4.1 backchannel transport module.
> Clockevents: could not switch to one-shot mode:
> Clockevents: could not switch to one-shot mode: dummy_timer is not functional.
> Could not switch to high resolution mode on CPU 0
>  dummy_timer is not functional.
> Could not switch to high resolution mode on CPU 1
> hw perfevents: enabled with armv7_cortex_a9 PMU driver, 7 counters available
> futex hash table entries: 512 (order: 3, 32768 bytes)
> workingset: timestamp_bits=28 max_order=15 bucket_order=0
> NFS: Registering the id_resolver key type
> Key type id_resolver registered
> Key type id_legacy registered
> nfs4filelayout_init: NFSv4 File Layout Driver Registering...
> io scheduler noop registered (default)
> Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> e1020000.serial: ttyS0 at MMIO 0xe1020000 (irq = 19, base_baud =
> 796444) is a 16550A
> e1030000.serial: ttyS1 at MMIO 0xe1030000 (irq = 20, base_baud =
> 7168000) is a 16550A
> console [ttyS1] enabled
> e1040000.serial: ttyS2 at MMIO 0xe1040000 (irq = 21, base_baud =
> 14336000) is a 16550A
> e1050000.serial: ttyS3 at MMIO 0xe1050000 (irq = 22, base_baud =
> 2389333) is a 16550A
> gio: sense irq = 1, mode = 8
> libphy: smsc911x-mdio: probed
> Generic PHY 20000000.etherne:01: attached PHY driver [Generic PHY]
> (mii_bus:phy_addr=20000000.etherne:01, irq=-1)
> smsc911x 20000000.ethernet eth0: MAC Address: 00:01:9b:04:03:cf
> usbcore: registered new interface driver usb-storage
> i2c /dev entries driver
> em-i2c e0070000.i2c: Added i2c controller 0, irq 33
> em-i2c e10a0000.i2c: Added i2c controller 1, irq 34
> cpu cpu0: failed to get clock: -2
> cpufreq-dt: probe of cpufreq-dt failed with error -2
> ledtrig-cpu: registered to indicate activity on CPUs
> usbcore: registered new interface driver usbhid
> usbhid: USB HID core driver
> NET: Registered protocol family 17
> Key type dns_resolver registered
> Registering SWP/SWPB emulation handler
> of_get_named_gpiod_flags: parsed 'gpios' property of node
> '/gpio_keys/button at 1[0]' - status (0)
> of_get_named_gpiod_flags: parsed 'gpios' property of node
> '/gpio_keys/button at 2[0]' - status (0)
> of_get_named_gpiod_flags: parsed 'gpios' property of node
> '/gpio_keys/button at 3[0]' - status (0)
> of_get_named_gpiod_flags: parsed 'gpios' property of node
> '/gpio_keys/button at 4[0]' - status (0)
> gpio-1006 (DSW2-1): gpiod_set_debounce: missing set() or
> set_debounce() operations
> gio: map hw irq = 14, irq = 36
> gio: sense irq = 14, mode = 12
> gpio-1007 (DSW2-2): gpiod_set_debounce: missing set() or
> set_debounce() operations
> gio: map hw irq = 15, irq = 37
> gio: sense irq = 15, mode = 12
> gpio-1008 (DSW2-3): gpiod_set_debounce: missing set() or
> set_debounce() operations
> gio: map hw irq = 16, irq = 38
> gio: sense irq = 16, mode = 12
> gpio-1009 (DSW2-4): gpiod_set_debounce: missing set() or
> set_debounce() operations
> gio: map hw irq = 17, irq = 39
> gio: sense irq = 17, mode = 12
> input: gpio_keys as /devices/platform/gpio_keys/input/input0
> hctosys: unable to open rtc device (rtc0)
> smsc911x 20000000.ethernet eth0: SMSC911x/921x identified at 0xc8880000, IRQ: 35
> Sending DHCP requests .., OK
> IP-Config: Got DHCP answer from 192.168.97.254, my address is 192.168.97.215
> IP-Config: Complete:
>      device=eth0, hwaddr=00:01:9b:04:03:cf, ipaddr=192.168.97.215,
> mask=255.255.255.0, gw=192.168.97.254
>      host=192.168.97.215, domain=of.borg, nis-domain=(none)
>      bootserver=192.168.97.254, rootserver=192.168.97.254, rootpath=
>   nameserver0=192.168.97.254
> ALSA device list:
>   No soundcards found.
> Freeing unused kernel memory: 2048K (c0700000 - c0900000)
> sysctl: error: 'kernel.hotplug' is an unknown key
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo at kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email at kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-20  6:39             ` Joonsoo Kim
@ 2016-06-20 13:12               ` Paul E. McKenney
  2016-06-21  6:43                 ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-20 13:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 20, 2016 at 03:39:43PM +0900, Joonsoo Kim wrote:
> CCing Paul to ask some question.
> 
> On Wed, Jun 15, 2016 at 10:39:47AM +0200, Geert Uytterhoeven wrote:
> > Hi Joonsoo,
> > 
> > On Wed, Jun 15, 2016 at 4:23 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote:
> > >> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > >> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
> > >> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > >> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> > >> >> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
> > >> >> >> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > >> >> >> > To check whther free objects exist or not precisely, we need to grab a
> > >> >> >> > lock.  But, accuracy isn't that important because race window would be
> > >> >> >> > even small and if there is too much free object, cache reaper would reap
> > >> >> >> > it.  So, this patch makes the check for free object exisistence not to
> > >> >> >> > hold a lock.  This will reduce lock contention in heavily allocation case.
> > 
> > >> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
> > >> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
> > >> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
> > 
> > > It's curious that synchronize_sched() has some effect in this early
> > > phase. In synchronize_sched(), rcu_blocking_is_gp() is called and
> > > it checks num_online_cpus <= 1. If so, synchronize_sched() does nothing.
> > >
> > > It would be related to might_sleep() in rcu_blocking_is_gp() but I'm not sure now.
> > >
> > > First, I'd like to confirm that num_online_cpus() is correct.
> > > Could you try following patch and give me a dmesg?
> > >
> > > Thanks.
> > >
> > > ------->8----------
> > > diff --git a/mm/slab.c b/mm/slab.c
> > > index 763096a..5b7300a 100644
> > > --- a/mm/slab.c
> > > +++ b/mm/slab.c
> > > @@ -964,8 +964,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > >          * guaranteed to be valid until irq is re-enabled, because it will be
> > >          * freed after synchronize_sched().
> > >          */
> > > -       if (force_change)
> > > -               synchronize_sched();
> > > +       if (force_change) {
> > > +               WARN_ON_ONCE(num_online_cpus() <= 1);
> > > +               WARN_ON_ONCE(num_online_cpus() > 1);
> > > +       }
> > 
> > Full dmesg output below.
> > 
> > I also tested whether it's the call to synchronize_sched() before or after
> > secondary CPU bringup that hangs.
> > 
> >         if (force_change && num_online_cpus() <= 1)
> >                 synchronize_sched();
> > 
> > boots.
> > 
> >         if (force_change && num_online_cpus() > 1)
> >                 synchronize_sched();
> > 
> > hangs.
> 
> Hello, Paul.
> 
> I changed slab.c to use synchronize_sched() for full memory barrier. First
> call happens on kmem_cache_init_late() and it would not be a problem
> because, at this time, num_online_cpus() <= 1 and synchronize_sched()
> would return immediately. Second call site would be shmem_init()
> and it seems that system hangs on it. Since smp is already initialized
> at that time, there would be some effect of synchronize_sched() but I
> can't imagine what's wrong here. Is it invalid moment to call
> synchronize_sched()?
> 
> Note that my x86 virtual machine works fine even if
> synchronize_sched() is called in shmem_init() but Geert's some ARM
> machines (not all ARM machine) don't work well with it.

Color me confused.

Is Geert's ARM system somehow adding the second CPU before
rcu_spawn_gp_kthread() is called, that is, before or during
early_initcall() time?

RCU does assume that its kthreads are created before the second
CPU shows up.

							Thanx, Paul

> Thanks.
> 
> > 
> > Booting Linux on physical CPU 0x0
> > Linux version 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty (geert at ramsan)
> > (gcc version 4.9.0 (GCC) ) #84 SMP Wed Jun 15 10:20:12 CEST 2016
> > CPU: ARMv7 Processor [411fc093] revision 3 (ARMv7), cr=10c5387d
> > CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
> > Machine model: EMEV2 KZM9D Board
> > debug: ignoring loglevel setting.
> > Memory policy: Data cache writealloc
> > On node 0 totalpages: 32768
> > free_area_init_node: node 0, pgdat c09286c0, node_mem_map c7efa000
> >   Normal zone: 256 pages used for memmap
> >   Normal zone: 0 pages reserved
> >   Normal zone: 32768 pages, LIFO batch:7
> > percpu: Embedded 12 pages/cpu @c7ed9000 s19264 r8192 d21696 u49152
> > pcpu-alloc: s19264 r8192 d21696 u49152 alloc=12*4096
> > pcpu-alloc: [0] 0 [0] 1
> > Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
> > Kernel command line: console=ttyS1,115200n81 ignore_loglevel
> > root=/dev/nfs ip=dhcp
> > PID hash table entries: 512 (order: -1, 2048 bytes)
> > Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
> > Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
> > Memory: 121144K/131072K available (4243K kernel code, 165K rwdata,
> > 1344K rodata, 2048K init, 264K bss, 9928K reserved, 0K cma-reserved,
> > 0K highmem)
> > Virtual kernel memory layout:
> >     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
> >     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
> >     vmalloc : 0xc8800000 - 0xff800000   ( 880 MB)
> >     lowmem  : 0xc0000000 - 0xc8000000   ( 128 MB)
> >     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
> >     modules : 0xbf000000 - 0xbfe00000   (  14 MB)
> >       .text : 0xc0008000 - 0xc0674eb8   (6580 kB)
> >       .init : 0xc0700000 - 0xc0900000   (2048 kB)
> >       .data : 0xc0900000 - 0xc0929420   ( 166 kB)
> >        .bss : 0xc092b000 - 0xc096d1e8   ( 265 kB)
> > Hierarchical RCU implementation.
> >  Build-time adjustment of leaf fanout to 32.
> >  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
> > RCU: Adjusting geometry for rcu_fanout_leaf=32, nr_cpu_ids=2
> > NR_IRQS:16 nr_irqs:16 16
> > clocksource_probe: no matching clocksources found
> > sched_clock: 32 bits at 100 Hz, resolution 10000000ns, wraps every
> > 21474836475000000ns
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 0 at mm/slab.c:975 setup_kmem_cache_node+0x160/0x1c8
> > Modules linked in:
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> > 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty #84
> > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > [<c010de08>] (unwind_backtrace) from [<c010a620>] (show_stack+0x10/0x14)
> > [<c010a620>] (show_stack) from [<c02b3178>] (dump_stack+0x7c/0x9c)
> > [<c02b3178>] (dump_stack) from [<c011f9fc>] (__warn+0xcc/0xfc)
> > [<c011f9fc>] (__warn) from [<c011fad0>] (warn_slowpath_null+0x1c/0x24)
> > [<c011fad0>] (warn_slowpath_null) from [<c01ced4c>]
> > (setup_kmem_cache_node+0x160/0x1c8)
> > [<c01ced4c>] (setup_kmem_cache_node) from [<c01cf02c>]
> > (__do_tune_cpucache+0xf4/0x114)
> > [<c01cf02c>] (__do_tune_cpucache) from [<c01cf0b0>] (enable_cpucache+0x64/0xb4)
> > [<c01cf0b0>] (enable_cpucache) from [<c0710324>]
> > (kmem_cache_init_late+0x40/0x84)
> > [<c0710324>] (kmem_cache_init_late) from [<c0700af8>] (start_kernel+0x238/0x36c)
> > [<c0700af8>] (start_kernel) from [<4000807c>] (0x4000807c)
> > ---[ end trace cb88537fdc8fa200 ]---
> > Console: colour dummy device 80x30
> > Calibrating delay loop (skipped) preset value.. 1066.00 BogoMIPS (lpj=5330000)
> > pid_max: default: 32768 minimum: 301
> > Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
> > Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
> > CPU: Testing write buffer coherency: ok
> > CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> > Setting up static identity map for 0x40100000 - 0x40100058
> > CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> > Brought up 2 CPUs
> > SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > CPU: All CPU(s) started in SVC mode.
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 1 at mm/slab.c:976 setup_kmem_cache_node+0x198/0x1c8
> > Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W
> > 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty #84
> > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > [<c010de08>] (unwind_backtrace) from [<c010a620>] (show_stack+0x10/0x14)
> > [<c010a620>] (show_stack) from [<c02b3178>] (dump_stack+0x7c/0x9c)
> > [<c02b3178>] (dump_stack) from [<c011f9fc>] (__warn+0xcc/0xfc)
> > [<c011f9fc>] (__warn) from [<c011fad0>] (warn_slowpath_null+0x1c/0x24)
> > [<c011fad0>] (warn_slowpath_null) from [<c01ced84>]
> > (setup_kmem_cache_node+0x198/0x1c8)
> > [<c01ced84>] (setup_kmem_cache_node) from [<c01cf02c>]
> > (__do_tune_cpucache+0xf4/0x114)
> > [<c01cf02c>] (__do_tune_cpucache) from [<c01cf0b0>] (enable_cpucache+0x64/0xb4)
> > [<c01cf0b0>] (enable_cpucache) from [<c01cf4cc>]
> > (__kmem_cache_create+0x1a0/0x1c8)
> > [<c01cf4cc>] (__kmem_cache_create) from [<c01b2904>]
> > (kmem_cache_create+0xbc/0x190)
> > [<c01b2904>] (kmem_cache_create) from [<c070d724>] (shmem_init+0x34/0xb0)
> > [<c070d724>] (shmem_init) from [<c0700cc4>] (kernel_init_freeable+0x98/0x1ec)
> > [<c0700cc4>] (kernel_init_freeable) from [<c0497760>] (kernel_init+0x8/0x110)
> > [<c0497760>] (kernel_init) from [<c0106c78>] (ret_from_fork+0x14/0x3c)
> > ---[ end trace cb88537fdc8fa201 ]---
> > devtmpfs: initialized
> > VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> > clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> > max_idle_ns: 19112604462750000 ns
> > pinctrl core: initialized pinctrl subsystem
> > NET: Registered protocol family 16
> > DMA: preallocated 256 KiB pool for atomic coherent allocations
> > sh-pfc e0140200.pfc: emev2_pfc support registered
> > gpiochip_find_base: found new base at 992
> > gpio gpiochip0: (e0050000.gpio): created GPIO range 0->31 ==>
> > e0140200.pfc PIN 0->31
> > gpio gpiochip0: (e0050000.gpio): added GPIO chardev (254:0)
> > gpiochip_setup_dev: registered GPIOs 992 to 1023 on device: gpiochip0
> > (e0050000.gpio)
> > gpiochip_find_base: found new base at 960
> > gpio gpiochip1: (e0050080.gpio): created GPIO range 0->31 ==>
> > e0140200.pfc PIN 32->63
> > gpio gpiochip1: (e0050080.gpio): added GPIO chardev (254:1)
> > gpiochip_setup_dev: registered GPIOs 960 to 991 on device: gpiochip1
> > (e0050080.gpio)
> > gpiochip_find_base: found new base at 928
> > gpio gpiochip2: (e0050100.gpio): created GPIO range 0->31 ==>
> > e0140200.pfc PIN 64->95
> > gpio gpiochip2: (e0050100.gpio): added GPIO chardev (254:2)
> > gpiochip_setup_dev: registered GPIOs 928 to 959 on device: gpiochip2
> > (e0050100.gpio)
> > gpiochip_find_base: found new base at 896
> > gpio gpiochip3: (e0050180.gpio): created GPIO range 0->31 ==>
> > e0140200.pfc PIN 96->127
> > gpio gpiochip3: (e0050180.gpio): added GPIO chardev (254:3)
> > gpiochip_setup_dev: registered GPIOs 896 to 927 on device: gpiochip3
> > (e0050180.gpio)
> > gpiochip_find_base: found new base at 865
> > gpio gpiochip4: (e0050200.gpio): created GPIO range 0->30 ==>
> > e0140200.pfc PIN 128->158
> > gpio gpiochip4: (e0050200.gpio): added GPIO chardev (254:4)
> > gpiochip_setup_dev: registered GPIOs 865 to 895 on device: gpiochip4
> > (e0050200.gpio)
> > gio: map hw irq = 1, irq = 35
> > gio: sense irq = 1, mode = 8
> > No ATAGs?
> > hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
> > hw-breakpoint: maximum watchpoint size is 4 bytes.
> > of_get_named_gpiod_flags: can't parse 'gpio' property of node '/regulator at 0[0]'
> > of_get_named_gpiod_flags: can't parse 'gpio' property of node '/regulator at 1[0]'
> > SCSI subsystem initialized
> > usbcore: registered new interface driver usbfs
> > usbcore: registered new interface driver hub
> > usbcore: registered new device driver usb
> > em_sti e0180000.timer: used for clock events
> > em_sti e0180000.timer: used for oneshot clock events
> > em_sti e0180000.timer: used as clock source
> > clocksource: e0180000.timer: mask: 0xffffffffffff max_cycles:
> > 0x1ef4687b1, max_idle_ns: 3697658158765000000 ns
> > Advanced Linux Sound Architecture Driver Initialized.
> > NET: Registered protocol family 23
> > clocksource: e0180000.timer: mask: 0xffffffffffff max_cycles:
> > 0x1ef4687b1, max_idle_ns: 112843571739654 ns
> > clocksource: Switched to clocksource e0180000.timer
> > NET: Registered protocol family 2
> > TCP established hash table entries: 1024 (order: 0, 4096 bytes)
> > TCP bind hash table entries: 1024 (order: 2, 20480 bytes)
> > TCP: Hash tables configured (established 1024 bind 1024)
> > UDP hash table entries: 256 (order: 1, 12288 bytes)
> > UDP-Lite hash table entries: 256 (order: 1, 12288 bytes)
> > NET: Registered protocol family 1
> > RPC: Registered named UNIX socket transport module.
> > RPC: Registered udp transport module.
> > RPC: Registered tcp transport module.
> > RPC: Registered tcp NFSv4.1 backchannel transport module.
> > Clockevents: could not switch to one-shot mode:
> > Clockevents: could not switch to one-shot mode: dummy_timer is not functional.
> > Could not switch to high resolution mode on CPU 0
> >  dummy_timer is not functional.
> > Could not switch to high resolution mode on CPU 1
> > hw perfevents: enabled with armv7_cortex_a9 PMU driver, 7 counters available
> > futex hash table entries: 512 (order: 3, 32768 bytes)
> > workingset: timestamp_bits=28 max_order=15 bucket_order=0
> > NFS: Registering the id_resolver key type
> > Key type id_resolver registered
> > Key type id_legacy registered
> > nfs4filelayout_init: NFSv4 File Layout Driver Registering...
> > io scheduler noop registered (default)
> > Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> > e1020000.serial: ttyS0 at MMIO 0xe1020000 (irq = 19, base_baud =
> > 796444) is a 16550A
> > e1030000.serial: ttyS1 at MMIO 0xe1030000 (irq = 20, base_baud =
> > 7168000) is a 16550A
> > console [ttyS1] enabled
> > e1040000.serial: ttyS2 at MMIO 0xe1040000 (irq = 21, base_baud =
> > 14336000) is a 16550A
> > e1050000.serial: ttyS3 at MMIO 0xe1050000 (irq = 22, base_baud =
> > 2389333) is a 16550A
> > gio: sense irq = 1, mode = 8
> > libphy: smsc911x-mdio: probed
> > Generic PHY 20000000.etherne:01: attached PHY driver [Generic PHY]
> > (mii_bus:phy_addr=20000000.etherne:01, irq=-1)
> > smsc911x 20000000.ethernet eth0: MAC Address: 00:01:9b:04:03:cf
> > usbcore: registered new interface driver usb-storage
> > i2c /dev entries driver
> > em-i2c e0070000.i2c: Added i2c controller 0, irq 33
> > em-i2c e10a0000.i2c: Added i2c controller 1, irq 34
> > cpu cpu0: failed to get clock: -2
> > cpufreq-dt: probe of cpufreq-dt failed with error -2
> > ledtrig-cpu: registered to indicate activity on CPUs
> > usbcore: registered new interface driver usbhid
> > usbhid: USB HID core driver
> > NET: Registered protocol family 17
> > Key type dns_resolver registered
> > Registering SWP/SWPB emulation handler
> > of_get_named_gpiod_flags: parsed 'gpios' property of node
> > '/gpio_keys/button at 1[0]' - status (0)
> > of_get_named_gpiod_flags: parsed 'gpios' property of node
> > '/gpio_keys/button at 2[0]' - status (0)
> > of_get_named_gpiod_flags: parsed 'gpios' property of node
> > '/gpio_keys/button at 3[0]' - status (0)
> > of_get_named_gpiod_flags: parsed 'gpios' property of node
> > '/gpio_keys/button at 4[0]' - status (0)
> > gpio-1006 (DSW2-1): gpiod_set_debounce: missing set() or
> > set_debounce() operations
> > gio: map hw irq = 14, irq = 36
> > gio: sense irq = 14, mode = 12
> > gpio-1007 (DSW2-2): gpiod_set_debounce: missing set() or
> > set_debounce() operations
> > gio: map hw irq = 15, irq = 37
> > gio: sense irq = 15, mode = 12
> > gpio-1008 (DSW2-3): gpiod_set_debounce: missing set() or
> > set_debounce() operations
> > gio: map hw irq = 16, irq = 38
> > gio: sense irq = 16, mode = 12
> > gpio-1009 (DSW2-4): gpiod_set_debounce: missing set() or
> > set_debounce() operations
> > gio: map hw irq = 17, irq = 39
> > gio: sense irq = 17, mode = 12
> > input: gpio_keys as /devices/platform/gpio_keys/input/input0
> > hctosys: unable to open rtc device (rtc0)
> > smsc911x 20000000.ethernet eth0: SMSC911x/921x identified at 0xc8880000, IRQ: 35
> > Sending DHCP requests .., OK
> > IP-Config: Got DHCP answer from 192.168.97.254, my address is 192.168.97.215
> > IP-Config: Complete:
> >      device=eth0, hwaddr=00:01:9b:04:03:cf, ipaddr=192.168.97.215,
> > mask=255.255.255.0, gw=192.168.97.254
> >      host=192.168.97.215, domain=of.borg, nis-domain=(none)
> >      bootserver=192.168.97.254, rootserver=192.168.97.254, rootpath=
> >   nameserver0=192.168.97.254
> > ALSA device list:
> >   No soundcards found.
> > Freeing unused kernel memory: 2048K (c0700000 - c0900000)
> > sysctl: error: 'kernel.hotplug' is an unknown key
> > 
> > Gr{oetje,eeting}s,
> > 
> >                         Geert
> > 
> > --
> > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org
> > 
> > In personal conversations with technical people, I call myself a hacker. But
> > when I'm talking to journalists I just say "programmer" or something like that.
> >                                 -- Linus Torvalds
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo at kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email at kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-20 13:12               ` Paul E. McKenney
@ 2016-06-21  6:43                 ` Joonsoo Kim
  2016-06-21 12:54                   ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Joonsoo Kim @ 2016-06-21  6:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 20, 2016 at 06:12:54AM -0700, Paul E. McKenney wrote:
> On Mon, Jun 20, 2016 at 03:39:43PM +0900, Joonsoo Kim wrote:
> > CCing Paul to ask some question.
> > 
> > On Wed, Jun 15, 2016 at 10:39:47AM +0200, Geert Uytterhoeven wrote:
> > > Hi Joonsoo,
> > > 
> > > On Wed, Jun 15, 2016 at 4:23 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote:
> > > >> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > >> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
> > > >> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > >> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> > > >> >> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
> > > >> >> >> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > >> >> >> > To check whther free objects exist or not precisely, we need to grab a
> > > >> >> >> > lock.  But, accuracy isn't that important because race window would be
> > > >> >> >> > even small and if there is too much free object, cache reaper would reap
> > > >> >> >> > it.  So, this patch makes the check for free object exisistence not to
> > > >> >> >> > hold a lock.  This will reduce lock contention in heavily allocation case.
> > > 
> > > >> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
> > > >> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
> > > >> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
> > > 
> > > > It's curious that synchronize_sched() has some effect in this early
> > > > phase. In synchronize_sched(), rcu_blocking_is_gp() is called and
> > > > it checks num_online_cpus <= 1. If so, synchronize_sched() does nothing.
> > > >
> > > > It would be related to might_sleep() in rcu_blocking_is_gp() but I'm not sure now.
> > > >
> > > > First, I'd like to confirm that num_online_cpus() is correct.
> > > > Could you try following patch and give me a dmesg?
> > > >
> > > > Thanks.
> > > >
> > > > ------->8----------
> > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > index 763096a..5b7300a 100644
> > > > --- a/mm/slab.c
> > > > +++ b/mm/slab.c
> > > > @@ -964,8 +964,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > > >          * guaranteed to be valid until irq is re-enabled, because it will be
> > > >          * freed after synchronize_sched().
> > > >          */
> > > > -       if (force_change)
> > > > -               synchronize_sched();
> > > > +       if (force_change) {
> > > > +               WARN_ON_ONCE(num_online_cpus() <= 1);
> > > > +               WARN_ON_ONCE(num_online_cpus() > 1);
> > > > +       }
> > > 
> > > Full dmesg output below.
> > > 
> > > I also tested whether it's the call to synchronize_sched() before or after
> > > secondary CPU bringup that hangs.
> > > 
> > >         if (force_change && num_online_cpus() <= 1)
> > >                 synchronize_sched();
> > > 
> > > boots.
> > > 
> > >         if (force_change && num_online_cpus() > 1)
> > >                 synchronize_sched();
> > > 
> > > hangs.
> > 
> > Hello, Paul.
> > 
> > I changed slab.c to use synchronize_sched() for full memory barrier. First
> > call happens on kmem_cache_init_late() and it would not be a problem
> > because, at this time, num_online_cpus() <= 1 and synchronize_sched()
> > would return immediately. Second call site would be shmem_init()
> > and it seems that system hangs on it. Since smp is already initialized
> > at that time, there would be some effect of synchronize_sched() but I
> > can't imagine what's wrong here. Is it invalid moment to call
> > synchronize_sched()?
> > 
> > Note that my x86 virtual machine works fine even if
> > synchronize_sched() is called in shmem_init() but Geert's some ARM
> > machines (not all ARM machine) don't work well with it.
> 
> Color me confused.
> 
> Is Geert's ARM system somehow adding the second CPU before
> rcu_spawn_gp_kthread() is called, that is, before or during
> early_initcall() time?

Hang would happen on shmem_init() which is called in do_basic_setup().
do_basic_setup() is called after early_initcall().

Hmm... Is it okay to call synchronize_sched() by kernel thread?

Thanks.

> 
> RCU does assume that its kthreads are created before the second
> CPU shows up.
> 
> 							Thanx, Paul
> 
> > Thanks.
> > 
> > > 
> > > Booting Linux on physical CPU 0x0
> > > Linux version 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty (geert at ramsan)
> > > (gcc version 4.9.0 (GCC) ) #84 SMP Wed Jun 15 10:20:12 CEST 2016
> > > CPU: ARMv7 Processor [411fc093] revision 3 (ARMv7), cr=10c5387d
> > > CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
> > > Machine model: EMEV2 KZM9D Board
> > > debug: ignoring loglevel setting.
> > > Memory policy: Data cache writealloc
> > > On node 0 totalpages: 32768
> > > free_area_init_node: node 0, pgdat c09286c0, node_mem_map c7efa000
> > >   Normal zone: 256 pages used for memmap
> > >   Normal zone: 0 pages reserved
> > >   Normal zone: 32768 pages, LIFO batch:7
> > > percpu: Embedded 12 pages/cpu @c7ed9000 s19264 r8192 d21696 u49152
> > > pcpu-alloc: s19264 r8192 d21696 u49152 alloc=12*4096
> > > pcpu-alloc: [0] 0 [0] 1
> > > Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
> > > Kernel command line: console=ttyS1,115200n81 ignore_loglevel
> > > root=/dev/nfs ip=dhcp
> > > PID hash table entries: 512 (order: -1, 2048 bytes)
> > > Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
> > > Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
> > > Memory: 121144K/131072K available (4243K kernel code, 165K rwdata,
> > > 1344K rodata, 2048K init, 264K bss, 9928K reserved, 0K cma-reserved,
> > > 0K highmem)
> > > Virtual kernel memory layout:
> > >     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
> > >     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
> > >     vmalloc : 0xc8800000 - 0xff800000   ( 880 MB)
> > >     lowmem  : 0xc0000000 - 0xc8000000   ( 128 MB)
> > >     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
> > >     modules : 0xbf000000 - 0xbfe00000   (  14 MB)
> > >       .text : 0xc0008000 - 0xc0674eb8   (6580 kB)
> > >       .init : 0xc0700000 - 0xc0900000   (2048 kB)
> > >       .data : 0xc0900000 - 0xc0929420   ( 166 kB)
> > >        .bss : 0xc092b000 - 0xc096d1e8   ( 265 kB)
> > > Hierarchical RCU implementation.
> > >  Build-time adjustment of leaf fanout to 32.
> > >  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
> > > RCU: Adjusting geometry for rcu_fanout_leaf=32, nr_cpu_ids=2
> > > NR_IRQS:16 nr_irqs:16 16
> > > clocksource_probe: no matching clocksources found
> > > sched_clock: 32 bits at 100 Hz, resolution 10000000ns, wraps every
> > > 21474836475000000ns
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 0 PID: 0 at mm/slab.c:975 setup_kmem_cache_node+0x160/0x1c8
> > > Modules linked in:
> > > CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> > > 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty #84
> > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > > [<c010de08>] (unwind_backtrace) from [<c010a620>] (show_stack+0x10/0x14)
> > > [<c010a620>] (show_stack) from [<c02b3178>] (dump_stack+0x7c/0x9c)
> > > [<c02b3178>] (dump_stack) from [<c011f9fc>] (__warn+0xcc/0xfc)
> > > [<c011f9fc>] (__warn) from [<c011fad0>] (warn_slowpath_null+0x1c/0x24)
> > > [<c011fad0>] (warn_slowpath_null) from [<c01ced4c>]
> > > (setup_kmem_cache_node+0x160/0x1c8)
> > > [<c01ced4c>] (setup_kmem_cache_node) from [<c01cf02c>]
> > > (__do_tune_cpucache+0xf4/0x114)
> > > [<c01cf02c>] (__do_tune_cpucache) from [<c01cf0b0>] (enable_cpucache+0x64/0xb4)
> > > [<c01cf0b0>] (enable_cpucache) from [<c0710324>]
> > > (kmem_cache_init_late+0x40/0x84)
> > > [<c0710324>] (kmem_cache_init_late) from [<c0700af8>] (start_kernel+0x238/0x36c)
> > > [<c0700af8>] (start_kernel) from [<4000807c>] (0x4000807c)
> > > ---[ end trace cb88537fdc8fa200 ]---
> > > Console: colour dummy device 80x30
> > > Calibrating delay loop (skipped) preset value.. 1066.00 BogoMIPS (lpj=5330000)
> > > pid_max: default: 32768 minimum: 301
> > > Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
> > > Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
> > > CPU: Testing write buffer coherency: ok
> > > CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> > > Setting up static identity map for 0x40100000 - 0x40100058
> > > CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> > > Brought up 2 CPUs
> > > SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > > CPU: All CPU(s) started in SVC mode.
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 0 PID: 1 at mm/slab.c:976 setup_kmem_cache_node+0x198/0x1c8
> > > Modules linked in:
> > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W
> > > 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty #84
> > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > > [<c010de08>] (unwind_backtrace) from [<c010a620>] (show_stack+0x10/0x14)
> > > [<c010a620>] (show_stack) from [<c02b3178>] (dump_stack+0x7c/0x9c)
> > > [<c02b3178>] (dump_stack) from [<c011f9fc>] (__warn+0xcc/0xfc)
> > > [<c011f9fc>] (__warn) from [<c011fad0>] (warn_slowpath_null+0x1c/0x24)
> > > [<c011fad0>] (warn_slowpath_null) from [<c01ced84>]
> > > (setup_kmem_cache_node+0x198/0x1c8)
> > > [<c01ced84>] (setup_kmem_cache_node) from [<c01cf02c>]
> > > (__do_tune_cpucache+0xf4/0x114)
> > > [<c01cf02c>] (__do_tune_cpucache) from [<c01cf0b0>] (enable_cpucache+0x64/0xb4)
> > > [<c01cf0b0>] (enable_cpucache) from [<c01cf4cc>]
> > > (__kmem_cache_create+0x1a0/0x1c8)
> > > [<c01cf4cc>] (__kmem_cache_create) from [<c01b2904>]
> > > (kmem_cache_create+0xbc/0x190)
> > > [<c01b2904>] (kmem_cache_create) from [<c070d724>] (shmem_init+0x34/0xb0)
> > > [<c070d724>] (shmem_init) from [<c0700cc4>] (kernel_init_freeable+0x98/0x1ec)
> > > [<c0700cc4>] (kernel_init_freeable) from [<c0497760>] (kernel_init+0x8/0x110)
> > > [<c0497760>] (kernel_init) from [<c0106c78>] (ret_from_fork+0x14/0x3c)
> > > ---[ end trace cb88537fdc8fa201 ]---
> > > devtmpfs: initialized
> > > VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> > > clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> > > max_idle_ns: 19112604462750000 ns
> > > pinctrl core: initialized pinctrl subsystem
> > > NET: Registered protocol family 16
> > > DMA: preallocated 256 KiB pool for atomic coherent allocations
> > > sh-pfc e0140200.pfc: emev2_pfc support registered
> > > gpiochip_find_base: found new base at 992
> > > gpio gpiochip0: (e0050000.gpio): created GPIO range 0->31 ==>
> > > e0140200.pfc PIN 0->31
> > > gpio gpiochip0: (e0050000.gpio): added GPIO chardev (254:0)
> > > gpiochip_setup_dev: registered GPIOs 992 to 1023 on device: gpiochip0
> > > (e0050000.gpio)
> > > gpiochip_find_base: found new base at 960
> > > gpio gpiochip1: (e0050080.gpio): created GPIO range 0->31 ==>
> > > e0140200.pfc PIN 32->63
> > > gpio gpiochip1: (e0050080.gpio): added GPIO chardev (254:1)
> > > gpiochip_setup_dev: registered GPIOs 960 to 991 on device: gpiochip1
> > > (e0050080.gpio)
> > > gpiochip_find_base: found new base at 928
> > > gpio gpiochip2: (e0050100.gpio): created GPIO range 0->31 ==>
> > > e0140200.pfc PIN 64->95
> > > gpio gpiochip2: (e0050100.gpio): added GPIO chardev (254:2)
> > > gpiochip_setup_dev: registered GPIOs 928 to 959 on device: gpiochip2
> > > (e0050100.gpio)
> > > gpiochip_find_base: found new base at 896
> > > gpio gpiochip3: (e0050180.gpio): created GPIO range 0->31 ==>
> > > e0140200.pfc PIN 96->127
> > > gpio gpiochip3: (e0050180.gpio): added GPIO chardev (254:3)
> > > gpiochip_setup_dev: registered GPIOs 896 to 927 on device: gpiochip3
> > > (e0050180.gpio)
> > > gpiochip_find_base: found new base at 865
> > > gpio gpiochip4: (e0050200.gpio): created GPIO range 0->30 ==>
> > > e0140200.pfc PIN 128->158
> > > gpio gpiochip4: (e0050200.gpio): added GPIO chardev (254:4)
> > > gpiochip_setup_dev: registered GPIOs 865 to 895 on device: gpiochip4
> > > (e0050200.gpio)
> > > gio: map hw irq = 1, irq = 35
> > > gio: sense irq = 1, mode = 8
> > > No ATAGs?
> > > hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
> > > hw-breakpoint: maximum watchpoint size is 4 bytes.
> > > of_get_named_gpiod_flags: can't parse 'gpio' property of node '/regulator at 0[0]'
> > > of_get_named_gpiod_flags: can't parse 'gpio' property of node '/regulator at 1[0]'
> > > SCSI subsystem initialized
> > > usbcore: registered new interface driver usbfs
> > > usbcore: registered new interface driver hub
> > > usbcore: registered new device driver usb
> > > em_sti e0180000.timer: used for clock events
> > > em_sti e0180000.timer: used for oneshot clock events
> > > em_sti e0180000.timer: used as clock source
> > > clocksource: e0180000.timer: mask: 0xffffffffffff max_cycles:
> > > 0x1ef4687b1, max_idle_ns: 3697658158765000000 ns
> > > Advanced Linux Sound Architecture Driver Initialized.
> > > NET: Registered protocol family 23
> > > clocksource: e0180000.timer: mask: 0xffffffffffff max_cycles:
> > > 0x1ef4687b1, max_idle_ns: 112843571739654 ns
> > > clocksource: Switched to clocksource e0180000.timer
> > > NET: Registered protocol family 2
> > > TCP established hash table entries: 1024 (order: 0, 4096 bytes)
> > > TCP bind hash table entries: 1024 (order: 2, 20480 bytes)
> > > TCP: Hash tables configured (established 1024 bind 1024)
> > > UDP hash table entries: 256 (order: 1, 12288 bytes)
> > > UDP-Lite hash table entries: 256 (order: 1, 12288 bytes)
> > > NET: Registered protocol family 1
> > > RPC: Registered named UNIX socket transport module.
> > > RPC: Registered udp transport module.
> > > RPC: Registered tcp transport module.
> > > RPC: Registered tcp NFSv4.1 backchannel transport module.
> > > Clockevents: could not switch to one-shot mode:
> > > Clockevents: could not switch to one-shot mode: dummy_timer is not functional.
> > > Could not switch to high resolution mode on CPU 0
> > >  dummy_timer is not functional.
> > > Could not switch to high resolution mode on CPU 1
> > > hw perfevents: enabled with armv7_cortex_a9 PMU driver, 7 counters available
> > > futex hash table entries: 512 (order: 3, 32768 bytes)
> > > workingset: timestamp_bits=28 max_order=15 bucket_order=0
> > > NFS: Registering the id_resolver key type
> > > Key type id_resolver registered
> > > Key type id_legacy registered
> > > nfs4filelayout_init: NFSv4 File Layout Driver Registering...
> > > io scheduler noop registered (default)
> > > Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> > > e1020000.serial: ttyS0 at MMIO 0xe1020000 (irq = 19, base_baud =
> > > 796444) is a 16550A
> > > e1030000.serial: ttyS1 at MMIO 0xe1030000 (irq = 20, base_baud =
> > > 7168000) is a 16550A
> > > console [ttyS1] enabled
> > > e1040000.serial: ttyS2 at MMIO 0xe1040000 (irq = 21, base_baud =
> > > 14336000) is a 16550A
> > > e1050000.serial: ttyS3 at MMIO 0xe1050000 (irq = 22, base_baud =
> > > 2389333) is a 16550A
> > > gio: sense irq = 1, mode = 8
> > > libphy: smsc911x-mdio: probed
> > > Generic PHY 20000000.etherne:01: attached PHY driver [Generic PHY]
> > > (mii_bus:phy_addr=20000000.etherne:01, irq=-1)
> > > smsc911x 20000000.ethernet eth0: MAC Address: 00:01:9b:04:03:cf
> > > usbcore: registered new interface driver usb-storage
> > > i2c /dev entries driver
> > > em-i2c e0070000.i2c: Added i2c controller 0, irq 33
> > > em-i2c e10a0000.i2c: Added i2c controller 1, irq 34
> > > cpu cpu0: failed to get clock: -2
> > > cpufreq-dt: probe of cpufreq-dt failed with error -2
> > > ledtrig-cpu: registered to indicate activity on CPUs
> > > usbcore: registered new interface driver usbhid
> > > usbhid: USB HID core driver
> > > NET: Registered protocol family 17
> > > Key type dns_resolver registered
> > > Registering SWP/SWPB emulation handler
> > > of_get_named_gpiod_flags: parsed 'gpios' property of node
> > > '/gpio_keys/button at 1[0]' - status (0)
> > > of_get_named_gpiod_flags: parsed 'gpios' property of node
> > > '/gpio_keys/button at 2[0]' - status (0)
> > > of_get_named_gpiod_flags: parsed 'gpios' property of node
> > > '/gpio_keys/button at 3[0]' - status (0)
> > > of_get_named_gpiod_flags: parsed 'gpios' property of node
> > > '/gpio_keys/button at 4[0]' - status (0)
> > > gpio-1006 (DSW2-1): gpiod_set_debounce: missing set() or
> > > set_debounce() operations
> > > gio: map hw irq = 14, irq = 36
> > > gio: sense irq = 14, mode = 12
> > > gpio-1007 (DSW2-2): gpiod_set_debounce: missing set() or
> > > set_debounce() operations
> > > gio: map hw irq = 15, irq = 37
> > > gio: sense irq = 15, mode = 12
> > > gpio-1008 (DSW2-3): gpiod_set_debounce: missing set() or
> > > set_debounce() operations
> > > gio: map hw irq = 16, irq = 38
> > > gio: sense irq = 16, mode = 12
> > > gpio-1009 (DSW2-4): gpiod_set_debounce: missing set() or
> > > set_debounce() operations
> > > gio: map hw irq = 17, irq = 39
> > > gio: sense irq = 17, mode = 12
> > > input: gpio_keys as /devices/platform/gpio_keys/input/input0
> > > hctosys: unable to open rtc device (rtc0)
> > > smsc911x 20000000.ethernet eth0: SMSC911x/921x identified at 0xc8880000, IRQ: 35
> > > Sending DHCP requests .., OK
> > > IP-Config: Got DHCP answer from 192.168.97.254, my address is 192.168.97.215
> > > IP-Config: Complete:
> > >      device=eth0, hwaddr=00:01:9b:04:03:cf, ipaddr=192.168.97.215,
> > > mask=255.255.255.0, gw=192.168.97.254
> > >      host=192.168.97.215, domain=of.borg, nis-domain=(none)
> > >      bootserver=192.168.97.254, rootserver=192.168.97.254, rootpath=
> > >   nameserver0=192.168.97.254
> > > ALSA device list:
> > >   No soundcards found.
> > > Freeing unused kernel memory: 2048K (c0700000 - c0900000)
> > > sysctl: error: 'kernel.hotplug' is an unknown key
> > > 
> > > Gr{oetje,eeting}s,
> > > 
> > >                         Geert
> > > 
> > > --
> > > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org
> > > 
> > > In personal conversations with technical people, I call myself a hacker. But
> > > when I'm talking to journalists I just say "programmer" or something like that.
> > >                                 -- Linus Torvalds
> > > 
> > > --
> > > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > > the body to majordomo at kvack.org.  For more info on Linux MM,
> > > see: http://www.linux-mm.org/ .
> > > Don't email: <a href=mailto:"dont@kvack.org"> email at kvack.org </a>
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo at kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email at kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-21  6:43                 ` Joonsoo Kim
@ 2016-06-21 12:54                   ` Paul E. McKenney
  2016-06-22  0:52                     ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-21 12:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 21, 2016 at 03:43:02PM +0900, Joonsoo Kim wrote:
> On Mon, Jun 20, 2016 at 06:12:54AM -0700, Paul E. McKenney wrote:
> > On Mon, Jun 20, 2016 at 03:39:43PM +0900, Joonsoo Kim wrote:
> > > CCing Paul to ask some question.
> > > 
> > > On Wed, Jun 15, 2016 at 10:39:47AM +0200, Geert Uytterhoeven wrote:
> > > > Hi Joonsoo,
> > > > 
> > > > On Wed, Jun 15, 2016 at 4:23 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > > On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote:
> > > > >> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > >> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
> > > > >> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > >> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> > > > >> >> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
> > > > >> >> >> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > > >> >> >> > To check whther free objects exist or not precisely, we need to grab a
> > > > >> >> >> > lock.  But, accuracy isn't that important because race window would be
> > > > >> >> >> > even small and if there is too much free object, cache reaper would reap
> > > > >> >> >> > it.  So, this patch makes the check for free object exisistence not to
> > > > >> >> >> > hold a lock.  This will reduce lock contention in heavily allocation case.
> > > > 
> > > > >> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
> > > > >> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
> > > > >> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
> > > > 
> > > > > It's curious that synchronize_sched() has some effect in this early
> > > > > phase. In synchronize_sched(), rcu_blocking_is_gp() is called and
> > > > > it checks num_online_cpus <= 1. If so, synchronize_sched() does nothing.
> > > > >
> > > > > It would be related to might_sleep() in rcu_blocking_is_gp() but I'm not sure now.
> > > > >
> > > > > First, I'd like to confirm that num_online_cpus() is correct.
> > > > > Could you try following patch and give me a dmesg?
> > > > >
> > > > > Thanks.
> > > > >
> > > > > ------->8----------
> > > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > > index 763096a..5b7300a 100644
> > > > > --- a/mm/slab.c
> > > > > +++ b/mm/slab.c
> > > > > @@ -964,8 +964,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > > > >          * guaranteed to be valid until irq is re-enabled, because it will be
> > > > >          * freed after synchronize_sched().
> > > > >          */
> > > > > -       if (force_change)
> > > > > -               synchronize_sched();
> > > > > +       if (force_change) {
> > > > > +               WARN_ON_ONCE(num_online_cpus() <= 1);
> > > > > +               WARN_ON_ONCE(num_online_cpus() > 1);
> > > > > +       }
> > > > 
> > > > Full dmesg output below.
> > > > 
> > > > I also tested whether it's the call to synchronize_sched() before or after
> > > > secondary CPU bringup that hangs.
> > > > 
> > > >         if (force_change && num_online_cpus() <= 1)
> > > >                 synchronize_sched();
> > > > 
> > > > boots.
> > > > 
> > > >         if (force_change && num_online_cpus() > 1)
> > > >                 synchronize_sched();
> > > > 
> > > > hangs.
> > > 
> > > Hello, Paul.
> > > 
> > > I changed slab.c to use synchronize_sched() for full memory barrier. First
> > > call happens on kmem_cache_init_late() and it would not be a problem
> > > because, at this time, num_online_cpus() <= 1 and synchronize_sched()
> > > would return immediately. Second call site would be shmem_init()
> > > and it seems that system hangs on it. Since smp is already initialized
> > > at that time, there would be some effect of synchronize_sched() but I
> > > can't imagine what's wrong here. Is it invalid moment to call
> > > synchronize_sched()?
> > > 
> > > Note that my x86 virtual machine works fine even if
> > > synchronize_sched() is called in shmem_init() but Geert's some ARM
> > > machines (not all ARM machine) don't work well with it.
> > 
> > Color me confused.
> > 
> > Is Geert's ARM system somehow adding the second CPU before
> > rcu_spawn_gp_kthread() is called, that is, before or during
> > early_initcall() time?
> 
> Hang would happen on shmem_init() which is called in do_basic_setup().
> do_basic_setup() is called after early_initcall().

Thank you for the info!

That should be lat enough that the RCU kthreads are alive and well.

Can you get sysalt-t output?

> Hmm... Is it okay to call synchronize_sched() by kernel thread?

Yes, it can, in fact, rcutorture does this all the time.  As do any
number of other kthreads.

							Thanx, Paul

> Thanks.
> 
> > 
> > RCU does assume that its kthreads are created before the second
> > CPU shows up.
> > 
> > 							Thanx, Paul
> > 
> > > Thanks.
> > > 
> > > > 
> > > > Booting Linux on physical CPU 0x0
> > > > Linux version 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty (geert at ramsan)
> > > > (gcc version 4.9.0 (GCC) ) #84 SMP Wed Jun 15 10:20:12 CEST 2016
> > > > CPU: ARMv7 Processor [411fc093] revision 3 (ARMv7), cr=10c5387d
> > > > CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
> > > > Machine model: EMEV2 KZM9D Board
> > > > debug: ignoring loglevel setting.
> > > > Memory policy: Data cache writealloc
> > > > On node 0 totalpages: 32768
> > > > free_area_init_node: node 0, pgdat c09286c0, node_mem_map c7efa000
> > > >   Normal zone: 256 pages used for memmap
> > > >   Normal zone: 0 pages reserved
> > > >   Normal zone: 32768 pages, LIFO batch:7
> > > > percpu: Embedded 12 pages/cpu @c7ed9000 s19264 r8192 d21696 u49152
> > > > pcpu-alloc: s19264 r8192 d21696 u49152 alloc=12*4096
> > > > pcpu-alloc: [0] 0 [0] 1
> > > > Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
> > > > Kernel command line: console=ttyS1,115200n81 ignore_loglevel
> > > > root=/dev/nfs ip=dhcp
> > > > PID hash table entries: 512 (order: -1, 2048 bytes)
> > > > Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
> > > > Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
> > > > Memory: 121144K/131072K available (4243K kernel code, 165K rwdata,
> > > > 1344K rodata, 2048K init, 264K bss, 9928K reserved, 0K cma-reserved,
> > > > 0K highmem)
> > > > Virtual kernel memory layout:
> > > >     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
> > > >     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
> > > >     vmalloc : 0xc8800000 - 0xff800000   ( 880 MB)
> > > >     lowmem  : 0xc0000000 - 0xc8000000   ( 128 MB)
> > > >     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
> > > >     modules : 0xbf000000 - 0xbfe00000   (  14 MB)
> > > >       .text : 0xc0008000 - 0xc0674eb8   (6580 kB)
> > > >       .init : 0xc0700000 - 0xc0900000   (2048 kB)
> > > >       .data : 0xc0900000 - 0xc0929420   ( 166 kB)
> > > >        .bss : 0xc092b000 - 0xc096d1e8   ( 265 kB)
> > > > Hierarchical RCU implementation.
> > > >  Build-time adjustment of leaf fanout to 32.
> > > >  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
> > > > RCU: Adjusting geometry for rcu_fanout_leaf=32, nr_cpu_ids=2
> > > > NR_IRQS:16 nr_irqs:16 16
> > > > clocksource_probe: no matching clocksources found
> > > > sched_clock: 32 bits at 100 Hz, resolution 10000000ns, wraps every
> > > > 21474836475000000ns
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 0 PID: 0 at mm/slab.c:975 setup_kmem_cache_node+0x160/0x1c8
> > > > Modules linked in:
> > > > CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> > > > 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty #84
> > > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > > > [<c010de08>] (unwind_backtrace) from [<c010a620>] (show_stack+0x10/0x14)
> > > > [<c010a620>] (show_stack) from [<c02b3178>] (dump_stack+0x7c/0x9c)
> > > > [<c02b3178>] (dump_stack) from [<c011f9fc>] (__warn+0xcc/0xfc)
> > > > [<c011f9fc>] (__warn) from [<c011fad0>] (warn_slowpath_null+0x1c/0x24)
> > > > [<c011fad0>] (warn_slowpath_null) from [<c01ced4c>]
> > > > (setup_kmem_cache_node+0x160/0x1c8)
> > > > [<c01ced4c>] (setup_kmem_cache_node) from [<c01cf02c>]
> > > > (__do_tune_cpucache+0xf4/0x114)
> > > > [<c01cf02c>] (__do_tune_cpucache) from [<c01cf0b0>] (enable_cpucache+0x64/0xb4)
> > > > [<c01cf0b0>] (enable_cpucache) from [<c0710324>]
> > > > (kmem_cache_init_late+0x40/0x84)
> > > > [<c0710324>] (kmem_cache_init_late) from [<c0700af8>] (start_kernel+0x238/0x36c)
> > > > [<c0700af8>] (start_kernel) from [<4000807c>] (0x4000807c)
> > > > ---[ end trace cb88537fdc8fa200 ]---
> > > > Console: colour dummy device 80x30
> > > > Calibrating delay loop (skipped) preset value.. 1066.00 BogoMIPS (lpj=5330000)
> > > > pid_max: default: 32768 minimum: 301
> > > > Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
> > > > Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
> > > > CPU: Testing write buffer coherency: ok
> > > > CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> > > > Setting up static identity map for 0x40100000 - 0x40100058
> > > > CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> > > > Brought up 2 CPUs
> > > > SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > > > CPU: All CPU(s) started in SVC mode.
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 0 PID: 1 at mm/slab.c:976 setup_kmem_cache_node+0x198/0x1c8
> > > > Modules linked in:
> > > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W
> > > > 4.6.0-kzm9d-05060-g801faf0db8947e01-dirty #84
> > > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > > > [<c010de08>] (unwind_backtrace) from [<c010a620>] (show_stack+0x10/0x14)
> > > > [<c010a620>] (show_stack) from [<c02b3178>] (dump_stack+0x7c/0x9c)
> > > > [<c02b3178>] (dump_stack) from [<c011f9fc>] (__warn+0xcc/0xfc)
> > > > [<c011f9fc>] (__warn) from [<c011fad0>] (warn_slowpath_null+0x1c/0x24)
> > > > [<c011fad0>] (warn_slowpath_null) from [<c01ced84>]
> > > > (setup_kmem_cache_node+0x198/0x1c8)
> > > > [<c01ced84>] (setup_kmem_cache_node) from [<c01cf02c>]
> > > > (__do_tune_cpucache+0xf4/0x114)
> > > > [<c01cf02c>] (__do_tune_cpucache) from [<c01cf0b0>] (enable_cpucache+0x64/0xb4)
> > > > [<c01cf0b0>] (enable_cpucache) from [<c01cf4cc>]
> > > > (__kmem_cache_create+0x1a0/0x1c8)
> > > > [<c01cf4cc>] (__kmem_cache_create) from [<c01b2904>]
> > > > (kmem_cache_create+0xbc/0x190)
> > > > [<c01b2904>] (kmem_cache_create) from [<c070d724>] (shmem_init+0x34/0xb0)
> > > > [<c070d724>] (shmem_init) from [<c0700cc4>] (kernel_init_freeable+0x98/0x1ec)
> > > > [<c0700cc4>] (kernel_init_freeable) from [<c0497760>] (kernel_init+0x8/0x110)
> > > > [<c0497760>] (kernel_init) from [<c0106c78>] (ret_from_fork+0x14/0x3c)
> > > > ---[ end trace cb88537fdc8fa201 ]---
> > > > devtmpfs: initialized
> > > > VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> > > > clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> > > > max_idle_ns: 19112604462750000 ns
> > > > pinctrl core: initialized pinctrl subsystem
> > > > NET: Registered protocol family 16
> > > > DMA: preallocated 256 KiB pool for atomic coherent allocations
> > > > sh-pfc e0140200.pfc: emev2_pfc support registered
> > > > gpiochip_find_base: found new base at 992
> > > > gpio gpiochip0: (e0050000.gpio): created GPIO range 0->31 ==>
> > > > e0140200.pfc PIN 0->31
> > > > gpio gpiochip0: (e0050000.gpio): added GPIO chardev (254:0)
> > > > gpiochip_setup_dev: registered GPIOs 992 to 1023 on device: gpiochip0
> > > > (e0050000.gpio)
> > > > gpiochip_find_base: found new base at 960
> > > > gpio gpiochip1: (e0050080.gpio): created GPIO range 0->31 ==>
> > > > e0140200.pfc PIN 32->63
> > > > gpio gpiochip1: (e0050080.gpio): added GPIO chardev (254:1)
> > > > gpiochip_setup_dev: registered GPIOs 960 to 991 on device: gpiochip1
> > > > (e0050080.gpio)
> > > > gpiochip_find_base: found new base at 928
> > > > gpio gpiochip2: (e0050100.gpio): created GPIO range 0->31 ==>
> > > > e0140200.pfc PIN 64->95
> > > > gpio gpiochip2: (e0050100.gpio): added GPIO chardev (254:2)
> > > > gpiochip_setup_dev: registered GPIOs 928 to 959 on device: gpiochip2
> > > > (e0050100.gpio)
> > > > gpiochip_find_base: found new base at 896
> > > > gpio gpiochip3: (e0050180.gpio): created GPIO range 0->31 ==>
> > > > e0140200.pfc PIN 96->127
> > > > gpio gpiochip3: (e0050180.gpio): added GPIO chardev (254:3)
> > > > gpiochip_setup_dev: registered GPIOs 896 to 927 on device: gpiochip3
> > > > (e0050180.gpio)
> > > > gpiochip_find_base: found new base at 865
> > > > gpio gpiochip4: (e0050200.gpio): created GPIO range 0->30 ==>
> > > > e0140200.pfc PIN 128->158
> > > > gpio gpiochip4: (e0050200.gpio): added GPIO chardev (254:4)
> > > > gpiochip_setup_dev: registered GPIOs 865 to 895 on device: gpiochip4
> > > > (e0050200.gpio)
> > > > gio: map hw irq = 1, irq = 35
> > > > gio: sense irq = 1, mode = 8
> > > > No ATAGs?
> > > > hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
> > > > hw-breakpoint: maximum watchpoint size is 4 bytes.
> > > > of_get_named_gpiod_flags: can't parse 'gpio' property of node '/regulator at 0[0]'
> > > > of_get_named_gpiod_flags: can't parse 'gpio' property of node '/regulator at 1[0]'
> > > > SCSI subsystem initialized
> > > > usbcore: registered new interface driver usbfs
> > > > usbcore: registered new interface driver hub
> > > > usbcore: registered new device driver usb
> > > > em_sti e0180000.timer: used for clock events
> > > > em_sti e0180000.timer: used for oneshot clock events
> > > > em_sti e0180000.timer: used as clock source
> > > > clocksource: e0180000.timer: mask: 0xffffffffffff max_cycles:
> > > > 0x1ef4687b1, max_idle_ns: 3697658158765000000 ns
> > > > Advanced Linux Sound Architecture Driver Initialized.
> > > > NET: Registered protocol family 23
> > > > clocksource: e0180000.timer: mask: 0xffffffffffff max_cycles:
> > > > 0x1ef4687b1, max_idle_ns: 112843571739654 ns
> > > > clocksource: Switched to clocksource e0180000.timer
> > > > NET: Registered protocol family 2
> > > > TCP established hash table entries: 1024 (order: 0, 4096 bytes)
> > > > TCP bind hash table entries: 1024 (order: 2, 20480 bytes)
> > > > TCP: Hash tables configured (established 1024 bind 1024)
> > > > UDP hash table entries: 256 (order: 1, 12288 bytes)
> > > > UDP-Lite hash table entries: 256 (order: 1, 12288 bytes)
> > > > NET: Registered protocol family 1
> > > > RPC: Registered named UNIX socket transport module.
> > > > RPC: Registered udp transport module.
> > > > RPC: Registered tcp transport module.
> > > > RPC: Registered tcp NFSv4.1 backchannel transport module.
> > > > Clockevents: could not switch to one-shot mode:
> > > > Clockevents: could not switch to one-shot mode: dummy_timer is not functional.
> > > > Could not switch to high resolution mode on CPU 0
> > > >  dummy_timer is not functional.
> > > > Could not switch to high resolution mode on CPU 1
> > > > hw perfevents: enabled with armv7_cortex_a9 PMU driver, 7 counters available
> > > > futex hash table entries: 512 (order: 3, 32768 bytes)
> > > > workingset: timestamp_bits=28 max_order=15 bucket_order=0
> > > > NFS: Registering the id_resolver key type
> > > > Key type id_resolver registered
> > > > Key type id_legacy registered
> > > > nfs4filelayout_init: NFSv4 File Layout Driver Registering...
> > > > io scheduler noop registered (default)
> > > > Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> > > > e1020000.serial: ttyS0 at MMIO 0xe1020000 (irq = 19, base_baud =
> > > > 796444) is a 16550A
> > > > e1030000.serial: ttyS1 at MMIO 0xe1030000 (irq = 20, base_baud =
> > > > 7168000) is a 16550A
> > > > console [ttyS1] enabled
> > > > e1040000.serial: ttyS2 at MMIO 0xe1040000 (irq = 21, base_baud =
> > > > 14336000) is a 16550A
> > > > e1050000.serial: ttyS3 at MMIO 0xe1050000 (irq = 22, base_baud =
> > > > 2389333) is a 16550A
> > > > gio: sense irq = 1, mode = 8
> > > > libphy: smsc911x-mdio: probed
> > > > Generic PHY 20000000.etherne:01: attached PHY driver [Generic PHY]
> > > > (mii_bus:phy_addr=20000000.etherne:01, irq=-1)
> > > > smsc911x 20000000.ethernet eth0: MAC Address: 00:01:9b:04:03:cf
> > > > usbcore: registered new interface driver usb-storage
> > > > i2c /dev entries driver
> > > > em-i2c e0070000.i2c: Added i2c controller 0, irq 33
> > > > em-i2c e10a0000.i2c: Added i2c controller 1, irq 34
> > > > cpu cpu0: failed to get clock: -2
> > > > cpufreq-dt: probe of cpufreq-dt failed with error -2
> > > > ledtrig-cpu: registered to indicate activity on CPUs
> > > > usbcore: registered new interface driver usbhid
> > > > usbhid: USB HID core driver
> > > > NET: Registered protocol family 17
> > > > Key type dns_resolver registered
> > > > Registering SWP/SWPB emulation handler
> > > > of_get_named_gpiod_flags: parsed 'gpios' property of node
> > > > '/gpio_keys/button at 1[0]' - status (0)
> > > > of_get_named_gpiod_flags: parsed 'gpios' property of node
> > > > '/gpio_keys/button at 2[0]' - status (0)
> > > > of_get_named_gpiod_flags: parsed 'gpios' property of node
> > > > '/gpio_keys/button at 3[0]' - status (0)
> > > > of_get_named_gpiod_flags: parsed 'gpios' property of node
> > > > '/gpio_keys/button at 4[0]' - status (0)
> > > > gpio-1006 (DSW2-1): gpiod_set_debounce: missing set() or
> > > > set_debounce() operations
> > > > gio: map hw irq = 14, irq = 36
> > > > gio: sense irq = 14, mode = 12
> > > > gpio-1007 (DSW2-2): gpiod_set_debounce: missing set() or
> > > > set_debounce() operations
> > > > gio: map hw irq = 15, irq = 37
> > > > gio: sense irq = 15, mode = 12
> > > > gpio-1008 (DSW2-3): gpiod_set_debounce: missing set() or
> > > > set_debounce() operations
> > > > gio: map hw irq = 16, irq = 38
> > > > gio: sense irq = 16, mode = 12
> > > > gpio-1009 (DSW2-4): gpiod_set_debounce: missing set() or
> > > > set_debounce() operations
> > > > gio: map hw irq = 17, irq = 39
> > > > gio: sense irq = 17, mode = 12
> > > > input: gpio_keys as /devices/platform/gpio_keys/input/input0
> > > > hctosys: unable to open rtc device (rtc0)
> > > > smsc911x 20000000.ethernet eth0: SMSC911x/921x identified at 0xc8880000, IRQ: 35
> > > > Sending DHCP requests .., OK
> > > > IP-Config: Got DHCP answer from 192.168.97.254, my address is 192.168.97.215
> > > > IP-Config: Complete:
> > > >      device=eth0, hwaddr=00:01:9b:04:03:cf, ipaddr=192.168.97.215,
> > > > mask=255.255.255.0, gw=192.168.97.254
> > > >      host=192.168.97.215, domain=of.borg, nis-domain=(none)
> > > >      bootserver=192.168.97.254, rootserver=192.168.97.254, rootpath=
> > > >   nameserver0=192.168.97.254
> > > > ALSA device list:
> > > >   No soundcards found.
> > > > Freeing unused kernel memory: 2048K (c0700000 - c0900000)
> > > > sysctl: error: 'kernel.hotplug' is an unknown key
> > > > 
> > > > Gr{oetje,eeting}s,
> > > > 
> > > >                         Geert
> > > > 
> > > > --
> > > > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org
> > > > 
> > > > In personal conversations with technical people, I call myself a hacker. But
> > > > when I'm talking to journalists I just say "programmer" or something like that.
> > > >                                 -- Linus Torvalds
> > > > 
> > > > --
> > > > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > > > the body to majordomo at kvack.org.  For more info on Linux MM,
> > > > see: http://www.linux-mm.org/ .
> > > > Don't email: <a href=mailto:"dont@kvack.org"> email at kvack.org </a>
> > > 
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo at kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email at kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-21 12:54                   ` Paul E. McKenney
@ 2016-06-22  0:52                     ` Joonsoo Kim
  2016-06-22  3:15                       ` Paul E. McKenney
  2016-06-22 15:01                       ` Geert Uytterhoeven
  0 siblings, 2 replies; 33+ messages in thread
From: Joonsoo Kim @ 2016-06-22  0:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 21, 2016 at 05:54:06AM -0700, Paul E. McKenney wrote:
> On Tue, Jun 21, 2016 at 03:43:02PM +0900, Joonsoo Kim wrote:
> > On Mon, Jun 20, 2016 at 06:12:54AM -0700, Paul E. McKenney wrote:
> > > On Mon, Jun 20, 2016 at 03:39:43PM +0900, Joonsoo Kim wrote:
> > > > CCing Paul to ask some question.
> > > > 
> > > > On Wed, Jun 15, 2016 at 10:39:47AM +0200, Geert Uytterhoeven wrote:
> > > > > Hi Joonsoo,
> > > > > 
> > > > > On Wed, Jun 15, 2016 at 4:23 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > > > On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote:
> > > > > >> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > > >> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
> > > > > >> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > > >> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> > > > > >> >> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
> > > > > >> >> >> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > > > >> >> >> > To check whther free objects exist or not precisely, we need to grab a
> > > > > >> >> >> > lock.  But, accuracy isn't that important because race window would be
> > > > > >> >> >> > even small and if there is too much free object, cache reaper would reap
> > > > > >> >> >> > it.  So, this patch makes the check for free object exisistence not to
> > > > > >> >> >> > hold a lock.  This will reduce lock contention in heavily allocation case.
> > > > > 
> > > > > >> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
> > > > > >> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
> > > > > >> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
> > > > > 
> > > > > > It's curious that synchronize_sched() has some effect in this early
> > > > > > phase. In synchronize_sched(), rcu_blocking_is_gp() is called and
> > > > > > it checks num_online_cpus <= 1. If so, synchronize_sched() does nothing.
> > > > > >
> > > > > > It would be related to might_sleep() in rcu_blocking_is_gp() but I'm not sure now.
> > > > > >
> > > > > > First, I'd like to confirm that num_online_cpus() is correct.
> > > > > > Could you try following patch and give me a dmesg?
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > ------->8----------
> > > > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > > > index 763096a..5b7300a 100644
> > > > > > --- a/mm/slab.c
> > > > > > +++ b/mm/slab.c
> > > > > > @@ -964,8 +964,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > > > > >          * guaranteed to be valid until irq is re-enabled, because it will be
> > > > > >          * freed after synchronize_sched().
> > > > > >          */
> > > > > > -       if (force_change)
> > > > > > -               synchronize_sched();
> > > > > > +       if (force_change) {
> > > > > > +               WARN_ON_ONCE(num_online_cpus() <= 1);
> > > > > > +               WARN_ON_ONCE(num_online_cpus() > 1);
> > > > > > +       }
> > > > > 
> > > > > Full dmesg output below.
> > > > > 
> > > > > I also tested whether it's the call to synchronize_sched() before or after
> > > > > secondary CPU bringup that hangs.
> > > > > 
> > > > >         if (force_change && num_online_cpus() <= 1)
> > > > >                 synchronize_sched();
> > > > > 
> > > > > boots.
> > > > > 
> > > > >         if (force_change && num_online_cpus() > 1)
> > > > >                 synchronize_sched();
> > > > > 
> > > > > hangs.
> > > > 
> > > > Hello, Paul.
> > > > 
> > > > I changed slab.c to use synchronize_sched() for full memory barrier. First
> > > > call happens on kmem_cache_init_late() and it would not be a problem
> > > > because, at this time, num_online_cpus() <= 1 and synchronize_sched()
> > > > would return immediately. Second call site would be shmem_init()
> > > > and it seems that system hangs on it. Since smp is already initialized
> > > > at that time, there would be some effect of synchronize_sched() but I
> > > > can't imagine what's wrong here. Is it invalid moment to call
> > > > synchronize_sched()?
> > > > 
> > > > Note that my x86 virtual machine works fine even if
> > > > synchronize_sched() is called in shmem_init() but Geert's some ARM
> > > > machines (not all ARM machine) don't work well with it.
> > > 
> > > Color me confused.
> > > 
> > > Is Geert's ARM system somehow adding the second CPU before
> > > rcu_spawn_gp_kthread() is called, that is, before or during
> > > early_initcall() time?
> > 
> > Hang would happen on shmem_init() which is called in do_basic_setup().
> > do_basic_setup() is called after early_initcall().
> 
> Thank you for the info!
> 
> That should be lat enough that the RCU kthreads are alive and well.
> 
> Can you get sysalt-t output?
> 
> > Hmm... Is it okay to call synchronize_sched() by kernel thread?
> 
> Yes, it can, in fact, rcutorture does this all the time.  As do any
> number of other kthreads.

Paul, thanks for confirmation.

Geert, we need to try more debugging.

Could you try below patch to check who causes the hang?

And, if sysalt-t works when hang, could you get sysalt-t output? I haven't
used it before but Paul could find some culprit on it. :)

Thanks.


----->8-----
diff --git a/mm/slab.c b/mm/slab.c
index 763096a..9652d38 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
         * guaranteed to be valid until irq is re-enabled, because it will be
         * freed after synchronize_sched().
         */
-       if (force_change)
+       if (force_change) {
+               if (num_online_cpus() > 1)
+                       dump_stack();
                synchronize_sched();
+               if (num_online_cpus() > 1)
+                       dump_stack();
+       }
 
 fail:
        kfree(old_shared);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-22  0:52                     ` Joonsoo Kim
@ 2016-06-22  3:15                       ` Paul E. McKenney
  2016-06-22 15:01                       ` Geert Uytterhoeven
  1 sibling, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-22  3:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 22, 2016 at 09:52:08AM +0900, Joonsoo Kim wrote:
> On Tue, Jun 21, 2016 at 05:54:06AM -0700, Paul E. McKenney wrote:
> > On Tue, Jun 21, 2016 at 03:43:02PM +0900, Joonsoo Kim wrote:
> > > On Mon, Jun 20, 2016 at 06:12:54AM -0700, Paul E. McKenney wrote:
> > > > On Mon, Jun 20, 2016 at 03:39:43PM +0900, Joonsoo Kim wrote:
> > > > > CCing Paul to ask some question.
> > > > > 
> > > > > On Wed, Jun 15, 2016 at 10:39:47AM +0200, Geert Uytterhoeven wrote:
> > > > > > Hi Joonsoo,
> > > > > > 
> > > > > > On Wed, Jun 15, 2016 at 4:23 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > > > > On Tue, Jun 14, 2016 at 12:45:14PM +0200, Geert Uytterhoeven wrote:
> > > > > > >> On Tue, Jun 14, 2016 at 10:11 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > > > >> > On Tue, Jun 14, 2016 at 09:31:23AM +0200, Geert Uytterhoeven wrote:
> > > > > > >> >> On Tue, Jun 14, 2016 at 8:24 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > > > >> >> > On Mon, Jun 13, 2016 at 09:43:13PM +0200, Geert Uytterhoeven wrote:
> > > > > > >> >> >> On Tue, Apr 12, 2016 at 6:51 AM,  <js1304@gmail.com> wrote:
> > > > > > >> >> >> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > > > > >> >> >> > To check whther free objects exist or not precisely, we need to grab a
> > > > > > >> >> >> > lock.  But, accuracy isn't that important because race window would be
> > > > > > >> >> >> > even small and if there is too much free object, cache reaper would reap
> > > > > > >> >> >> > it.  So, this patch makes the check for free object exisistence not to
> > > > > > >> >> >> > hold a lock.  This will reduce lock contention in heavily allocation case.
> > > > > > 
> > > > > > >> >> >> I've bisected a boot failure (no output at all) in v4.7-rc2 on emev2/kzm9d
> > > > > > >> >> >> (Renesas dual Cortex A9) to this patch, which is upstream commit
> > > > > > >> >> >> 801faf0db8947e01877920e848a4d338dd7a99e7.
> > > > > > 
> > > > > > > It's curious that synchronize_sched() has some effect in this early
> > > > > > > phase. In synchronize_sched(), rcu_blocking_is_gp() is called and
> > > > > > > it checks num_online_cpus <= 1. If so, synchronize_sched() does nothing.
> > > > > > >
> > > > > > > It would be related to might_sleep() in rcu_blocking_is_gp() but I'm not sure now.
> > > > > > >
> > > > > > > First, I'd like to confirm that num_online_cpus() is correct.
> > > > > > > Could you try following patch and give me a dmesg?
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > ------->8----------
> > > > > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > > > > index 763096a..5b7300a 100644
> > > > > > > --- a/mm/slab.c
> > > > > > > +++ b/mm/slab.c
> > > > > > > @@ -964,8 +964,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > > > > > >          * guaranteed to be valid until irq is re-enabled, because it will be
> > > > > > >          * freed after synchronize_sched().
> > > > > > >          */
> > > > > > > -       if (force_change)
> > > > > > > -               synchronize_sched();
> > > > > > > +       if (force_change) {
> > > > > > > +               WARN_ON_ONCE(num_online_cpus() <= 1);
> > > > > > > +               WARN_ON_ONCE(num_online_cpus() > 1);
> > > > > > > +       }
> > > > > > 
> > > > > > Full dmesg output below.
> > > > > > 
> > > > > > I also tested whether it's the call to synchronize_sched() before or after
> > > > > > secondary CPU bringup that hangs.
> > > > > > 
> > > > > >         if (force_change && num_online_cpus() <= 1)
> > > > > >                 synchronize_sched();
> > > > > > 
> > > > > > boots.
> > > > > > 
> > > > > >         if (force_change && num_online_cpus() > 1)
> > > > > >                 synchronize_sched();
> > > > > > 
> > > > > > hangs.
> > > > > 
> > > > > Hello, Paul.
> > > > > 
> > > > > I changed slab.c to use synchronize_sched() for full memory barrier. First
> > > > > call happens on kmem_cache_init_late() and it would not be a problem
> > > > > because, at this time, num_online_cpus() <= 1 and synchronize_sched()
> > > > > would return immediately. Second call site would be shmem_init()
> > > > > and it seems that system hangs on it. Since smp is already initialized
> > > > > at that time, there would be some effect of synchronize_sched() but I
> > > > > can't imagine what's wrong here. Is it invalid moment to call
> > > > > synchronize_sched()?
> > > > > 
> > > > > Note that my x86 virtual machine works fine even if
> > > > > synchronize_sched() is called in shmem_init() but Geert's some ARM
> > > > > machines (not all ARM machine) don't work well with it.
> > > > 
> > > > Color me confused.
> > > > 
> > > > Is Geert's ARM system somehow adding the second CPU before
> > > > rcu_spawn_gp_kthread() is called, that is, before or during
> > > > early_initcall() time?
> > > 
> > > Hang would happen on shmem_init() which is called in do_basic_setup().
> > > do_basic_setup() is called after early_initcall().
> > 
> > Thank you for the info!
> > 
> > That should be lat enough that the RCU kthreads are alive and well.
> > 
> > Can you get sysalt-t output?
> > 
> > > Hmm... Is it okay to call synchronize_sched() by kernel thread?
> > 
> > Yes, it can, in fact, rcutorture does this all the time.  As do any
> > number of other kthreads.
> 
> Paul, thanks for confirmation.
> 
> Geert, we need to try more debugging.
> 
> Could you try below patch to check who causes the hang?

Nice!  That might be quite valuable!

> And, if sysalt-t works when hang, could you get sysalt-t output? I haven't
> used it before but Paul could find some culprit on it. :)

And the other thing to do is to read the last portion of
Documentation/RCU/stallwarn.txt, the part starting with
"What Causes RCU CPU Stall Warnings?".  I would expect any
of these things to also result in an RCU CPU stall warning,
but perhaps something is preventing them from being printed.
Short summary:  If a CPU gets stuck badly enough, RCU grace
periods won't end and therefore synchronize_sched() won't
ever return.

							Thanx, Paul

> Thanks.
> 
> 
> ----->8-----
> diff --git a/mm/slab.c b/mm/slab.c
> index 763096a..9652d38 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
>          * guaranteed to be valid until irq is re-enabled, because it will be
>          * freed after synchronize_sched().
>          */
> -       if (force_change)
> +       if (force_change) {
> +               if (num_online_cpus() > 1)
> +                       dump_stack();
>                 synchronize_sched();
> +               if (num_online_cpus() > 1)
> +                       dump_stack();
> +       }
> 
>  fail:
>         kfree(old_shared);
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-22  0:52                     ` Joonsoo Kim
  2016-06-22  3:15                       ` Paul E. McKenney
@ 2016-06-22 15:01                       ` Geert Uytterhoeven
  2016-06-22 19:08                         ` Paul E. McKenney
  1 sibling, 1 reply; 33+ messages in thread
From: Geert Uytterhoeven @ 2016-06-22 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> Could you try below patch to check who causes the hang?
>
> And, if sysalt-t works when hang, could you get sysalt-t output? I haven't
> used it before but Paul could find some culprit on it. :)
>
> Thanks.
>
>
> ----->8-----
> diff --git a/mm/slab.c b/mm/slab.c
> index 763096a..9652d38 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
>          * guaranteed to be valid until irq is re-enabled, because it will be
>          * freed after synchronize_sched().
>          */
> -       if (force_change)
> +       if (force_change) {
> +               if (num_online_cpus() > 1)
> +                       dump_stack();
>                 synchronize_sched();
> +               if (num_online_cpus() > 1)
> +                       dump_stack();
> +       }

I've only added the first one, as I would never see the second one. All of
this happens before the serial console is activated, earlycon is not supported,
and I only have remote access.

Brought up 2 CPUs
SMP: Total of 2 processors activated (2132.00 BogoMIPS).
CPU: All CPU(s) started in SVC mode.
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89
Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
[<c010de68>] (unwind_backtrace) from [<c010a658>] (show_stack+0x10/0x14)
[<c010a658>] (show_stack) from [<c02b5cf8>] (dump_stack+0x7c/0x9c)
[<c02b5cf8>] (dump_stack) from [<c01cfa4c>] (setup_kmem_cache_node+0x140/0x170)
[<c01cfa4c>] (setup_kmem_cache_node) from [<c01cfe3c>]
(__do_tune_cpucache+0xf4/0x114)
[<c01cfe3c>] (__do_tune_cpucache) from [<c01cff54>] (enable_cpucache+0xf8/0x148)
[<c01cff54>] (enable_cpucache) from [<c01d0190>]
(__kmem_cache_create+0x1a8/0x1d0)
[<c01d0190>] (__kmem_cache_create) from [<c01b32d0>]
(kmem_cache_create+0xbc/0x190)
[<c01b32d0>] (kmem_cache_create) from [<c070d968>] (shmem_init+0x34/0xb0)
[<c070d968>] (shmem_init) from [<c0700cc8>] (kernel_init_freeable+0x98/0x1ec)
[<c0700cc8>] (kernel_init_freeable) from [<c049fdbc>] (kernel_init+0x8/0x110)
[<c049fdbc>] (kernel_init) from [<c0106cb8>] (ret_from_fork+0x14/0x3c)
devtmpfs: initialized

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-22 15:01                       ` Geert Uytterhoeven
@ 2016-06-22 19:08                         ` Paul E. McKenney
  2016-06-23  0:49                           ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-22 19:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 22, 2016 at 05:01:35PM +0200, Geert Uytterhoeven wrote:
> On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > Could you try below patch to check who causes the hang?
> >
> > And, if sysalt-t works when hang, could you get sysalt-t output? I haven't
> > used it before but Paul could find some culprit on it. :)
> >
> > Thanks.
> >
> >
> > ----->8-----
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 763096a..9652d38 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> >          * guaranteed to be valid until irq is re-enabled, because it will be
> >          * freed after synchronize_sched().
> >          */
> > -       if (force_change)
> > +       if (force_change) {
> > +               if (num_online_cpus() > 1)
> > +                       dump_stack();
> >                 synchronize_sched();
> > +               if (num_online_cpus() > 1)
> > +                       dump_stack();
> > +       }
> 
> I've only added the first one, as I would never see the second one. All of
> this happens before the serial console is activated, earlycon is not supported,
> and I only have remote access.
> 
> Brought up 2 CPUs
> SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> CPU: All CPU(s) started in SVC mode.
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89
> Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> [<c010de68>] (unwind_backtrace) from [<c010a658>] (show_stack+0x10/0x14)
> [<c010a658>] (show_stack) from [<c02b5cf8>] (dump_stack+0x7c/0x9c)
> [<c02b5cf8>] (dump_stack) from [<c01cfa4c>] (setup_kmem_cache_node+0x140/0x170)
> [<c01cfa4c>] (setup_kmem_cache_node) from [<c01cfe3c>]
> (__do_tune_cpucache+0xf4/0x114)
> [<c01cfe3c>] (__do_tune_cpucache) from [<c01cff54>] (enable_cpucache+0xf8/0x148)
> [<c01cff54>] (enable_cpucache) from [<c01d0190>]
> (__kmem_cache_create+0x1a8/0x1d0)
> [<c01d0190>] (__kmem_cache_create) from [<c01b32d0>]
> (kmem_cache_create+0xbc/0x190)
> [<c01b32d0>] (kmem_cache_create) from [<c070d968>] (shmem_init+0x34/0xb0)
> [<c070d968>] (shmem_init) from [<c0700cc8>] (kernel_init_freeable+0x98/0x1ec)
> [<c0700cc8>] (kernel_init_freeable) from [<c049fdbc>] (kernel_init+0x8/0x110)
> [<c049fdbc>] (kernel_init) from [<c0106cb8>] (ret_from_fork+0x14/0x3c)
> devtmpfs: initialized

I don't see anything here that would prevent grace periods from completing.

The CPUs are using the normal hotplug sequence to come online, correct?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-22 19:08                         ` Paul E. McKenney
@ 2016-06-23  0:49                           ` Paul E. McKenney
  2016-06-23  2:37                             ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-23  0:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 22, 2016 at 12:08:59PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 22, 2016 at 05:01:35PM +0200, Geert Uytterhoeven wrote:
> > On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > Could you try below patch to check who causes the hang?
> > >
> > > And, if sysalt-t works when hang, could you get sysalt-t output? I haven't
> > > used it before but Paul could find some culprit on it. :)
> > >
> > > Thanks.
> > >
> > >
> > > ----->8-----
> > > diff --git a/mm/slab.c b/mm/slab.c
> > > index 763096a..9652d38 100644
> > > --- a/mm/slab.c
> > > +++ b/mm/slab.c
> > > @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > >          * guaranteed to be valid until irq is re-enabled, because it will be
> > >          * freed after synchronize_sched().
> > >          */
> > > -       if (force_change)
> > > +       if (force_change) {
> > > +               if (num_online_cpus() > 1)
> > > +                       dump_stack();
> > >                 synchronize_sched();
> > > +               if (num_online_cpus() > 1)
> > > +                       dump_stack();
> > > +       }
> > 
> > I've only added the first one, as I would never see the second one. All of
> > this happens before the serial console is activated, earlycon is not supported,
> > and I only have remote access.
> > 
> > Brought up 2 CPUs
> > SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > CPU: All CPU(s) started in SVC mode.
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > 4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89
> > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > [<c010de68>] (unwind_backtrace) from [<c010a658>] (show_stack+0x10/0x14)
> > [<c010a658>] (show_stack) from [<c02b5cf8>] (dump_stack+0x7c/0x9c)
> > [<c02b5cf8>] (dump_stack) from [<c01cfa4c>] (setup_kmem_cache_node+0x140/0x170)
> > [<c01cfa4c>] (setup_kmem_cache_node) from [<c01cfe3c>]
> > (__do_tune_cpucache+0xf4/0x114)
> > [<c01cfe3c>] (__do_tune_cpucache) from [<c01cff54>] (enable_cpucache+0xf8/0x148)
> > [<c01cff54>] (enable_cpucache) from [<c01d0190>]
> > (__kmem_cache_create+0x1a8/0x1d0)
> > [<c01d0190>] (__kmem_cache_create) from [<c01b32d0>]
> > (kmem_cache_create+0xbc/0x190)
> > [<c01b32d0>] (kmem_cache_create) from [<c070d968>] (shmem_init+0x34/0xb0)
> > [<c070d968>] (shmem_init) from [<c0700cc8>] (kernel_init_freeable+0x98/0x1ec)
> > [<c0700cc8>] (kernel_init_freeable) from [<c049fdbc>] (kernel_init+0x8/0x110)
> > [<c049fdbc>] (kernel_init) from [<c0106cb8>] (ret_from_fork+0x14/0x3c)
> > devtmpfs: initialized
> 
> I don't see anything here that would prevent grace periods from completing.
> 
> The CPUs are using the normal hotplug sequence to come online, correct?

And either way, could you please apply the patch below and then
invoke rcu_dump_rcu_sched_tree() just before the offending call to
synchronize_sched()?  That will tell me what CPUs RCU believes exist,
and perhaps also which CPU is holding it up.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-23  0:49                           ` Paul E. McKenney
@ 2016-06-23  2:37                             ` Joonsoo Kim
  2016-06-23  2:47                               ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Joonsoo Kim @ 2016-06-23  2:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 22, 2016 at 05:49:35PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 22, 2016 at 12:08:59PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 22, 2016 at 05:01:35PM +0200, Geert Uytterhoeven wrote:
> > > On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > Could you try below patch to check who causes the hang?
> > > >
> > > > And, if sysalt-t works when hang, could you get sysalt-t output? I haven't
> > > > used it before but Paul could find some culprit on it. :)
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > ----->8-----
> > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > index 763096a..9652d38 100644
> > > > --- a/mm/slab.c
> > > > +++ b/mm/slab.c
> > > > @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > > >          * guaranteed to be valid until irq is re-enabled, because it will be
> > > >          * freed after synchronize_sched().
> > > >          */
> > > > -       if (force_change)
> > > > +       if (force_change) {
> > > > +               if (num_online_cpus() > 1)
> > > > +                       dump_stack();
> > > >                 synchronize_sched();
> > > > +               if (num_online_cpus() > 1)
> > > > +                       dump_stack();
> > > > +       }
> > > 
> > > I've only added the first one, as I would never see the second one. All of
> > > this happens before the serial console is activated, earlycon is not supported,
> > > and I only have remote access.
> > > 
> > > Brought up 2 CPUs
> > > SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > > CPU: All CPU(s) started in SVC mode.
> > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > 4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89
> > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > > [<c010de68>] (unwind_backtrace) from [<c010a658>] (show_stack+0x10/0x14)
> > > [<c010a658>] (show_stack) from [<c02b5cf8>] (dump_stack+0x7c/0x9c)
> > > [<c02b5cf8>] (dump_stack) from [<c01cfa4c>] (setup_kmem_cache_node+0x140/0x170)
> > > [<c01cfa4c>] (setup_kmem_cache_node) from [<c01cfe3c>]
> > > (__do_tune_cpucache+0xf4/0x114)
> > > [<c01cfe3c>] (__do_tune_cpucache) from [<c01cff54>] (enable_cpucache+0xf8/0x148)
> > > [<c01cff54>] (enable_cpucache) from [<c01d0190>]
> > > (__kmem_cache_create+0x1a8/0x1d0)
> > > [<c01d0190>] (__kmem_cache_create) from [<c01b32d0>]
> > > (kmem_cache_create+0xbc/0x190)
> > > [<c01b32d0>] (kmem_cache_create) from [<c070d968>] (shmem_init+0x34/0xb0)
> > > [<c070d968>] (shmem_init) from [<c0700cc8>] (kernel_init_freeable+0x98/0x1ec)
> > > [<c0700cc8>] (kernel_init_freeable) from [<c049fdbc>] (kernel_init+0x8/0x110)
> > > [<c049fdbc>] (kernel_init) from [<c0106cb8>] (ret_from_fork+0x14/0x3c)
> > > devtmpfs: initialized
> > 
> > I don't see anything here that would prevent grace periods from completing.
> > 
> > The CPUs are using the normal hotplug sequence to come online, correct?
> 
> And either way, could you please apply the patch below and then
> invoke rcu_dump_rcu_sched_tree() just before the offending call to
> synchronize_sched()?  That will tell me what CPUs RCU believes exist,
> and perhaps also which CPU is holding it up.

I can't find rcu_dump_rcu_sched_tree(). Do you mean
rcu_dump_rcu_node_tree()? Anyway, there is no patch below so I attach
one which does what Paul want, maybe.

Thanks.

------->8---------
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 88d3f95..6b650f0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4171,7 +4171,7 @@ static void __init rcu_init_geometry(void)
  * Dump out the structure of the rcu_node combining tree associated
  * with the rcu_state structure referenced by rsp.
  */
-static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
+static void rcu_dump_rcu_node_tree(struct rcu_state *rsp)
 {
        int level = 0;
        struct rcu_node *rnp;
@@ -4189,6 +4189,11 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
        pr_cont("\n");
 }
 
+void rcu_dump_rcu_sched_tree(void)
+{
+       rcu_dump_rcu_node_tree(&rcu_sched_state);
+}
+
 void __init rcu_init(void)
 {
        int cpu;
diff --git a/mm/slab.c b/mm/slab.c
index 763096a..d88976c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -909,6 +909,8 @@ static int init_cache_node_node(int node)
        return 0;
 }
 
+extern void rcu_dump_rcu_sched_tree(void);
+
 static int setup_kmem_cache_node(struct kmem_cache *cachep,
                                int node, gfp_t gfp, bool force_change)
 {
@@ -964,8 +966,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
         * guaranteed to be valid until irq is re-enabled, because it will be
         * freed after synchronize_sched().
         */
-       if (force_change)
+       if (force_change) {
+               rcu_dump_rcu_sched_tree();
                synchronize_sched();
+       }
 
 fail:
        kfree(old_shared);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-23  2:37                             ` Joonsoo Kim
@ 2016-06-23  2:47                               ` Paul E. McKenney
  2016-06-23  2:53                                 ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-23  2:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 23, 2016 at 11:37:56AM +0900, Joonsoo Kim wrote:
> On Wed, Jun 22, 2016 at 05:49:35PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 22, 2016 at 12:08:59PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 22, 2016 at 05:01:35PM +0200, Geert Uytterhoeven wrote:
> > > > On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > > Could you try below patch to check who causes the hang?
> > > > >
> > > > > And, if sysalt-t works when hang, could you get sysalt-t output? I haven't
> > > > > used it before but Paul could find some culprit on it. :)
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > ----->8-----
> > > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > > index 763096a..9652d38 100644
> > > > > --- a/mm/slab.c
> > > > > +++ b/mm/slab.c
> > > > > @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > > > >          * guaranteed to be valid until irq is re-enabled, because it will be
> > > > >          * freed after synchronize_sched().
> > > > >          */
> > > > > -       if (force_change)
> > > > > +       if (force_change) {
> > > > > +               if (num_online_cpus() > 1)
> > > > > +                       dump_stack();
> > > > >                 synchronize_sched();
> > > > > +               if (num_online_cpus() > 1)
> > > > > +                       dump_stack();
> > > > > +       }
> > > > 
> > > > I've only added the first one, as I would never see the second one. All of
> > > > this happens before the serial console is activated, earlycon is not supported,
> > > > and I only have remote access.
> > > > 
> > > > Brought up 2 CPUs
> > > > SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > > > CPU: All CPU(s) started in SVC mode.
> > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > > 4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89
> > > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > > > [<c010de68>] (unwind_backtrace) from [<c010a658>] (show_stack+0x10/0x14)
> > > > [<c010a658>] (show_stack) from [<c02b5cf8>] (dump_stack+0x7c/0x9c)
> > > > [<c02b5cf8>] (dump_stack) from [<c01cfa4c>] (setup_kmem_cache_node+0x140/0x170)
> > > > [<c01cfa4c>] (setup_kmem_cache_node) from [<c01cfe3c>]
> > > > (__do_tune_cpucache+0xf4/0x114)
> > > > [<c01cfe3c>] (__do_tune_cpucache) from [<c01cff54>] (enable_cpucache+0xf8/0x148)
> > > > [<c01cff54>] (enable_cpucache) from [<c01d0190>]
> > > > (__kmem_cache_create+0x1a8/0x1d0)
> > > > [<c01d0190>] (__kmem_cache_create) from [<c01b32d0>]
> > > > (kmem_cache_create+0xbc/0x190)
> > > > [<c01b32d0>] (kmem_cache_create) from [<c070d968>] (shmem_init+0x34/0xb0)
> > > > [<c070d968>] (shmem_init) from [<c0700cc8>] (kernel_init_freeable+0x98/0x1ec)
> > > > [<c0700cc8>] (kernel_init_freeable) from [<c049fdbc>] (kernel_init+0x8/0x110)
> > > > [<c049fdbc>] (kernel_init) from [<c0106cb8>] (ret_from_fork+0x14/0x3c)
> > > > devtmpfs: initialized
> > > 
> > > I don't see anything here that would prevent grace periods from completing.
> > > 
> > > The CPUs are using the normal hotplug sequence to come online, correct?
> > 
> > And either way, could you please apply the patch below and then
> > invoke rcu_dump_rcu_sched_tree() just before the offending call to
> > synchronize_sched()?  That will tell me what CPUs RCU believes exist,
> > and perhaps also which CPU is holding it up.
> 
> I can't find rcu_dump_rcu_sched_tree(). Do you mean
> rcu_dump_rcu_node_tree()? Anyway, there is no patch below so I attach
> one which does what Paul want, maybe.

One of those days, I guess!  :-/

Your patch is exactly what I intended to send, thank you!

							Thanx, Paul

> Thanks.
> 
> ------->8---------
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 88d3f95..6b650f0 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4171,7 +4171,7 @@ static void __init rcu_init_geometry(void)
>   * Dump out the structure of the rcu_node combining tree associated
>   * with the rcu_state structure referenced by rsp.
>   */
> -static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> +static void rcu_dump_rcu_node_tree(struct rcu_state *rsp)
>  {
>         int level = 0;
>         struct rcu_node *rnp;
> @@ -4189,6 +4189,11 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
>         pr_cont("\n");
>  }
> 
> +void rcu_dump_rcu_sched_tree(void)
> +{
> +       rcu_dump_rcu_node_tree(&rcu_sched_state);
> +}
> +
>  void __init rcu_init(void)
>  {
>         int cpu;
> diff --git a/mm/slab.c b/mm/slab.c
> index 763096a..d88976c 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -909,6 +909,8 @@ static int init_cache_node_node(int node)
>         return 0;
>  }
> 
> +extern void rcu_dump_rcu_sched_tree(void);
> +
>  static int setup_kmem_cache_node(struct kmem_cache *cachep,
>                                 int node, gfp_t gfp, bool force_change)
>  {
> @@ -964,8 +966,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
>          * guaranteed to be valid until irq is re-enabled, because it will be
>          * freed after synchronize_sched().
>          */
> -       if (force_change)
> +       if (force_change) {
> +               rcu_dump_rcu_sched_tree();
>                 synchronize_sched();
> +       }
> 
>  fail:
>         kfree(old_shared);
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-23  2:47                               ` Paul E. McKenney
@ 2016-06-23  2:53                                 ` Paul E. McKenney
  2016-06-28  0:12                                   ` Paul E. McKenney
  2016-06-29 14:54                                   ` Geert Uytterhoeven
  0 siblings, 2 replies; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-23  2:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 23, 2016 at 11:37:56AM +0900, Joonsoo Kim wrote:
> > On Wed, Jun 22, 2016 at 05:49:35PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 22, 2016 at 12:08:59PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Jun 22, 2016 at 05:01:35PM +0200, Geert Uytterhoeven wrote:
> > > > > On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > > > Could you try below patch to check who causes the hang?
> > > > > >
> > > > > > And, if sysalt-t works when hang, could you get sysalt-t output? I haven't
> > > > > > used it before but Paul could find some culprit on it. :)
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > ----->8-----
> > > > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > > > index 763096a..9652d38 100644
> > > > > > --- a/mm/slab.c
> > > > > > +++ b/mm/slab.c
> > > > > > @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > > > > >          * guaranteed to be valid until irq is re-enabled, because it will be
> > > > > >          * freed after synchronize_sched().
> > > > > >          */
> > > > > > -       if (force_change)
> > > > > > +       if (force_change) {
> > > > > > +               if (num_online_cpus() > 1)
> > > > > > +                       dump_stack();
> > > > > >                 synchronize_sched();
> > > > > > +               if (num_online_cpus() > 1)
> > > > > > +                       dump_stack();
> > > > > > +       }
> > > > > 
> > > > > I've only added the first one, as I would never see the second one. All of
> > > > > this happens before the serial console is activated, earlycon is not supported,
> > > > > and I only have remote access.
> > > > > 
> > > > > Brought up 2 CPUs
> > > > > SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > > > > CPU: All CPU(s) started in SVC mode.
> > > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > > > 4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89
> > > > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > > > > [<c010de68>] (unwind_backtrace) from [<c010a658>] (show_stack+0x10/0x14)
> > > > > [<c010a658>] (show_stack) from [<c02b5cf8>] (dump_stack+0x7c/0x9c)
> > > > > [<c02b5cf8>] (dump_stack) from [<c01cfa4c>] (setup_kmem_cache_node+0x140/0x170)
> > > > > [<c01cfa4c>] (setup_kmem_cache_node) from [<c01cfe3c>]
> > > > > (__do_tune_cpucache+0xf4/0x114)
> > > > > [<c01cfe3c>] (__do_tune_cpucache) from [<c01cff54>] (enable_cpucache+0xf8/0x148)
> > > > > [<c01cff54>] (enable_cpucache) from [<c01d0190>]
> > > > > (__kmem_cache_create+0x1a8/0x1d0)
> > > > > [<c01d0190>] (__kmem_cache_create) from [<c01b32d0>]
> > > > > (kmem_cache_create+0xbc/0x190)
> > > > > [<c01b32d0>] (kmem_cache_create) from [<c070d968>] (shmem_init+0x34/0xb0)
> > > > > [<c070d968>] (shmem_init) from [<c0700cc8>] (kernel_init_freeable+0x98/0x1ec)
> > > > > [<c0700cc8>] (kernel_init_freeable) from [<c049fdbc>] (kernel_init+0x8/0x110)
> > > > > [<c049fdbc>] (kernel_init) from [<c0106cb8>] (ret_from_fork+0x14/0x3c)
> > > > > devtmpfs: initialized
> > > > 
> > > > I don't see anything here that would prevent grace periods from completing.
> > > > 
> > > > The CPUs are using the normal hotplug sequence to come online, correct?
> > > 
> > > And either way, could you please apply the patch below and then
> > > invoke rcu_dump_rcu_sched_tree() just before the offending call to
> > > synchronize_sched()?  That will tell me what CPUs RCU believes exist,
> > > and perhaps also which CPU is holding it up.
> > 
> > I can't find rcu_dump_rcu_sched_tree(). Do you mean
> > rcu_dump_rcu_node_tree()? Anyway, there is no patch below so I attach
> > one which does what Paul want, maybe.
> 
> One of those days, I guess!  :-/
> 
> Your patch is exactly what I intended to send, thank you!

Ah, but your telepathy was not sufficient to intuit the additional
information I need.  Please see the patch at the end.  Your hunk
in mm/slab.c is needed on top of my patch.

So I am clearly having difficulties reading as well as including patches
today...

							Thanx, Paul

> > Thanks.
> > 
> > ------->8---------
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 88d3f95..6b650f0 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -4171,7 +4171,7 @@ static void __init rcu_init_geometry(void)
> >   * Dump out the structure of the rcu_node combining tree associated
> >   * with the rcu_state structure referenced by rsp.
> >   */
> > -static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> > +static void rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> >  {
> >         int level = 0;
> >         struct rcu_node *rnp;
> > @@ -4189,6 +4189,11 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> >         pr_cont("\n");
> >  }
> > 
> > +void rcu_dump_rcu_sched_tree(void)
> > +{
> > +       rcu_dump_rcu_node_tree(&rcu_sched_state);
> > +}
> > +
> >  void __init rcu_init(void)
> >  {
> >         int cpu;
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 763096a..d88976c 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -909,6 +909,8 @@ static int init_cache_node_node(int node)
> >         return 0;
> >  }
> > 
> > +extern void rcu_dump_rcu_sched_tree(void);
> > +
> >  static int setup_kmem_cache_node(struct kmem_cache *cachep,
> >                                 int node, gfp_t gfp, bool force_change)
> >  {
> > @@ -964,8 +966,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> >          * guaranteed to be valid until irq is re-enabled, because it will be
> >          * freed after synchronize_sched().
> >          */
> > -       if (force_change)
> > +       if (force_change) {
> > +               rcu_dump_rcu_sched_tree();
> >                 synchronize_sched();
> > +       }
> > 
> >  fail:
> >         kfree(old_shared);
> > 

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c7f1bc4f817c..2eda7bece401 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4707,7 +4707,7 @@ static void __init rcu_init_geometry(void)
  * Dump out the structure of the rcu_node combining tree associated
  * with the rcu_state structure referenced by rsp.
  */
-static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
+static void rcu_dump_rcu_node_tree(struct rcu_state *rsp)
 {
 	int level = 0;
 	struct rcu_node *rnp;
@@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
 			pr_info(" ");
 			level = rnp->level;
 		}
-		pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
+		pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
+			rnp->qsmask,
+			rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
 	}
 	pr_cont("\n");
 }
 
+void rcu_dump_rcu_sched_tree(void)
+{
+	rcu_dump_rcu_node_tree(&rcu_sched_state);
+}
+
 void __init rcu_init(void)
 {
 	int cpu;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-23  2:53                                 ` Paul E. McKenney
@ 2016-06-28  0:12                                   ` Paul E. McKenney
  2016-06-28  8:33                                     ` Joonsoo Kim
  2016-06-29 14:54                                   ` Geert Uytterhoeven
  1 sibling, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-28  0:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 22, 2016 at 07:53:29PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 23, 2016 at 11:37:56AM +0900, Joonsoo Kim wrote:
> > > On Wed, Jun 22, 2016 at 05:49:35PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Jun 22, 2016 at 12:08:59PM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Jun 22, 2016 at 05:01:35PM +0200, Geert Uytterhoeven wrote:
> > > > > > On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > > > > Could you try below patch to check who causes the hang?
> > > > > > >
> > > > > > > And, if sysalt-t works when hang, could you get sysalt-t output? I haven't
> > > > > > > used it before but Paul could find some culprit on it. :)
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > > ----->8-----
> > > > > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > > > > index 763096a..9652d38 100644
> > > > > > > --- a/mm/slab.c
> > > > > > > +++ b/mm/slab.c
> > > > > > > @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > > > > > >          * guaranteed to be valid until irq is re-enabled, because it will be
> > > > > > >          * freed after synchronize_sched().
> > > > > > >          */
> > > > > > > -       if (force_change)
> > > > > > > +       if (force_change) {
> > > > > > > +               if (num_online_cpus() > 1)
> > > > > > > +                       dump_stack();
> > > > > > >                 synchronize_sched();
> > > > > > > +               if (num_online_cpus() > 1)
> > > > > > > +                       dump_stack();
> > > > > > > +       }
> > > > > > 
> > > > > > I've only added the first one, as I would never see the second one. All of
> > > > > > this happens before the serial console is activated, earlycon is not supported,
> > > > > > and I only have remote access.
> > > > > > 
> > > > > > Brought up 2 CPUs
> > > > > > SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > > > > > CPU: All CPU(s) started in SVC mode.
> > > > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > > > > 4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89
> > > > > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > > > > > [<c010de68>] (unwind_backtrace) from [<c010a658>] (show_stack+0x10/0x14)
> > > > > > [<c010a658>] (show_stack) from [<c02b5cf8>] (dump_stack+0x7c/0x9c)
> > > > > > [<c02b5cf8>] (dump_stack) from [<c01cfa4c>] (setup_kmem_cache_node+0x140/0x170)
> > > > > > [<c01cfa4c>] (setup_kmem_cache_node) from [<c01cfe3c>]
> > > > > > (__do_tune_cpucache+0xf4/0x114)
> > > > > > [<c01cfe3c>] (__do_tune_cpucache) from [<c01cff54>] (enable_cpucache+0xf8/0x148)
> > > > > > [<c01cff54>] (enable_cpucache) from [<c01d0190>]
> > > > > > (__kmem_cache_create+0x1a8/0x1d0)
> > > > > > [<c01d0190>] (__kmem_cache_create) from [<c01b32d0>]
> > > > > > (kmem_cache_create+0xbc/0x190)
> > > > > > [<c01b32d0>] (kmem_cache_create) from [<c070d968>] (shmem_init+0x34/0xb0)
> > > > > > [<c070d968>] (shmem_init) from [<c0700cc8>] (kernel_init_freeable+0x98/0x1ec)
> > > > > > [<c0700cc8>] (kernel_init_freeable) from [<c049fdbc>] (kernel_init+0x8/0x110)
> > > > > > [<c049fdbc>] (kernel_init) from [<c0106cb8>] (ret_from_fork+0x14/0x3c)
> > > > > > devtmpfs: initialized
> > > > > 
> > > > > I don't see anything here that would prevent grace periods from completing.
> > > > > 
> > > > > The CPUs are using the normal hotplug sequence to come online, correct?
> > > > 
> > > > And either way, could you please apply the patch below and then
> > > > invoke rcu_dump_rcu_sched_tree() just before the offending call to
> > > > synchronize_sched()?  That will tell me what CPUs RCU believes exist,
> > > > and perhaps also which CPU is holding it up.
> > > 
> > > I can't find rcu_dump_rcu_sched_tree(). Do you mean
> > > rcu_dump_rcu_node_tree()? Anyway, there is no patch below so I attach
> > > one which does what Paul want, maybe.
> > 
> > One of those days, I guess!  :-/
> > 
> > Your patch is exactly what I intended to send, thank you!
> 
> Ah, but your telepathy was not sufficient to intuit the additional
> information I need.  Please see the patch at the end.  Your hunk
> in mm/slab.c is needed on top of my patch.
> 
> So I am clearly having difficulties reading as well as including patches
> today...

Just following up, any news using my diagnostic patch?

							Thanx, Paul

> > > Thanks.
> > > 
> > > ------->8---------
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 88d3f95..6b650f0 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -4171,7 +4171,7 @@ static void __init rcu_init_geometry(void)
> > >   * Dump out the structure of the rcu_node combining tree associated
> > >   * with the rcu_state structure referenced by rsp.
> > >   */
> > > -static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> > > +static void rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> > >  {
> > >         int level = 0;
> > >         struct rcu_node *rnp;
> > > @@ -4189,6 +4189,11 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> > >         pr_cont("\n");
> > >  }
> > > 
> > > +void rcu_dump_rcu_sched_tree(void)
> > > +{
> > > +       rcu_dump_rcu_node_tree(&rcu_sched_state);
> > > +}
> > > +
> > >  void __init rcu_init(void)
> > >  {
> > >         int cpu;
> > > diff --git a/mm/slab.c b/mm/slab.c
> > > index 763096a..d88976c 100644
> > > --- a/mm/slab.c
> > > +++ b/mm/slab.c
> > > @@ -909,6 +909,8 @@ static int init_cache_node_node(int node)
> > >         return 0;
> > >  }
> > > 
> > > +extern void rcu_dump_rcu_sched_tree(void);
> > > +
> > >  static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > >                                 int node, gfp_t gfp, bool force_change)
> > >  {
> > > @@ -964,8 +966,10 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > >          * guaranteed to be valid until irq is re-enabled, because it will be
> > >          * freed after synchronize_sched().
> > >          */
> > > -       if (force_change)
> > > +       if (force_change) {
> > > +               rcu_dump_rcu_sched_tree();
> > >                 synchronize_sched();
> > > +       }
> > > 
> > >  fail:
> > >         kfree(old_shared);
> > > 
> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c7f1bc4f817c..2eda7bece401 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4707,7 +4707,7 @@ static void __init rcu_init_geometry(void)
>   * Dump out the structure of the rcu_node combining tree associated
>   * with the rcu_state structure referenced by rsp.
>   */
> -static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> +static void rcu_dump_rcu_node_tree(struct rcu_state *rsp)
>  {
>  	int level = 0;
>  	struct rcu_node *rnp;
> @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
>  			pr_info(" ");
>  			level = rnp->level;
>  		}
> -		pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
> +		pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
> +			rnp->qsmask,
> +			rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
>  	}
>  	pr_cont("\n");
>  }
>  
> +void rcu_dump_rcu_sched_tree(void)
> +{
> +	rcu_dump_rcu_node_tree(&rcu_sched_state);
> +}
> +
>  void __init rcu_init(void)
>  {
>  	int cpu;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-28  0:12                                   ` Paul E. McKenney
@ 2016-06-28  8:33                                     ` Joonsoo Kim
  0 siblings, 0 replies; 33+ messages in thread
From: Joonsoo Kim @ 2016-06-28  8:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 27, 2016 at 05:12:43PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 22, 2016 at 07:53:29PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 23, 2016 at 11:37:56AM +0900, Joonsoo Kim wrote:
> > > > On Wed, Jun 22, 2016 at 05:49:35PM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Jun 22, 2016 at 12:08:59PM -0700, Paul E. McKenney wrote:
> > > > > > On Wed, Jun 22, 2016 at 05:01:35PM +0200, Geert Uytterhoeven wrote:
> > > > > > > On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > > > > > Could you try below patch to check who causes the hang?
> > > > > > > >
> > > > > > > > And, if sysalt-t works when hang, could you get sysalt-t output? I haven't
> > > > > > > > used it before but Paul could find some culprit on it. :)
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > ----->8-----
> > > > > > > > diff --git a/mm/slab.c b/mm/slab.c
> > > > > > > > index 763096a..9652d38 100644
> > > > > > > > --- a/mm/slab.c
> > > > > > > > +++ b/mm/slab.c
> > > > > > > > @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
> > > > > > > >          * guaranteed to be valid until irq is re-enabled, because it will be
> > > > > > > >          * freed after synchronize_sched().
> > > > > > > >          */
> > > > > > > > -       if (force_change)
> > > > > > > > +       if (force_change) {
> > > > > > > > +               if (num_online_cpus() > 1)
> > > > > > > > +                       dump_stack();
> > > > > > > >                 synchronize_sched();
> > > > > > > > +               if (num_online_cpus() > 1)
> > > > > > > > +                       dump_stack();
> > > > > > > > +       }
> > > > > > > 
> > > > > > > I've only added the first one, as I would never see the second one. All of
> > > > > > > this happens before the serial console is activated, earlycon is not supported,
> > > > > > > and I only have remote access.
> > > > > > > 
> > > > > > > Brought up 2 CPUs
> > > > > > > SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > > > > > > CPU: All CPU(s) started in SVC mode.
> > > > > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> > > > > > > 4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89
> > > > > > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
> > > > > > > [<c010de68>] (unwind_backtrace) from [<c010a658>] (show_stack+0x10/0x14)
> > > > > > > [<c010a658>] (show_stack) from [<c02b5cf8>] (dump_stack+0x7c/0x9c)
> > > > > > > [<c02b5cf8>] (dump_stack) from [<c01cfa4c>] (setup_kmem_cache_node+0x140/0x170)
> > > > > > > [<c01cfa4c>] (setup_kmem_cache_node) from [<c01cfe3c>]
> > > > > > > (__do_tune_cpucache+0xf4/0x114)
> > > > > > > [<c01cfe3c>] (__do_tune_cpucache) from [<c01cff54>] (enable_cpucache+0xf8/0x148)
> > > > > > > [<c01cff54>] (enable_cpucache) from [<c01d0190>]
> > > > > > > (__kmem_cache_create+0x1a8/0x1d0)
> > > > > > > [<c01d0190>] (__kmem_cache_create) from [<c01b32d0>]
> > > > > > > (kmem_cache_create+0xbc/0x190)
> > > > > > > [<c01b32d0>] (kmem_cache_create) from [<c070d968>] (shmem_init+0x34/0xb0)
> > > > > > > [<c070d968>] (shmem_init) from [<c0700cc8>] (kernel_init_freeable+0x98/0x1ec)
> > > > > > > [<c0700cc8>] (kernel_init_freeable) from [<c049fdbc>] (kernel_init+0x8/0x110)
> > > > > > > [<c049fdbc>] (kernel_init) from [<c0106cb8>] (ret_from_fork+0x14/0x3c)
> > > > > > > devtmpfs: initialized
> > > > > > 
> > > > > > I don't see anything here that would prevent grace periods from completing.
> > > > > > 
> > > > > > The CPUs are using the normal hotplug sequence to come online, correct?
> > > > > 
> > > > > And either way, could you please apply the patch below and then
> > > > > invoke rcu_dump_rcu_sched_tree() just before the offending call to
> > > > > synchronize_sched()?  That will tell me what CPUs RCU believes exist,
> > > > > and perhaps also which CPU is holding it up.
> > > > 
> > > > I can't find rcu_dump_rcu_sched_tree(). Do you mean
> > > > rcu_dump_rcu_node_tree()? Anyway, there is no patch below so I attach
> > > > one which does what Paul want, maybe.
> > > 
> > > One of those days, I guess!  :-/
> > > 
> > > Your patch is exactly what I intended to send, thank you!
> > 
> > Ah, but your telepathy was not sufficient to intuit the additional
> > information I need.  Please see the patch at the end.  Your hunk
> > in mm/slab.c is needed on top of my patch.
> > 
> > So I am clearly having difficulties reading as well as including patches
> > today...
> 
> Just following up, any news using my diagnostic patch?

Hello, Paul.

Unfortunately, I have no hardware to re-generate it, so we need to wait Geert's
feedback.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-23  2:53                                 ` Paul E. McKenney
  2016-06-28  0:12                                   ` Paul E. McKenney
@ 2016-06-29 14:54                                   ` Geert Uytterhoeven
  2016-06-29 16:44                                     ` Paul E. McKenney
  1 sibling, 1 reply; 33+ messages in thread
From: Geert Uytterhoeven @ 2016-06-29 14:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Paul,

On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
>> On Thu, Jun 23, 2016 at 11:37:56AM +0900, Joonsoo Kim wrote:
>> > On Wed, Jun 22, 2016 at 05:49:35PM -0700, Paul E. McKenney wrote:
>> > > On Wed, Jun 22, 2016 at 12:08:59PM -0700, Paul E. McKenney wrote:
>> > > > On Wed, Jun 22, 2016 at 05:01:35PM +0200, Geert Uytterhoeven wrote:
>> > > > > On Wed, Jun 22, 2016 at 2:52 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> > > > > > Could you try below patch to check who causes the hang?
>> > > > > >
>> > > > > > And, if sysalt-t works when hang, could you get sysalt-t output? I haven't
>> > > > > > used it before but Paul could find some culprit on it. :)
>> > > > > >
>> > > > > > Thanks.
>> > > > > >
>> > > > > >
>> > > > > > ----->8-----
>> > > > > > diff --git a/mm/slab.c b/mm/slab.c
>> > > > > > index 763096a..9652d38 100644
>> > > > > > --- a/mm/slab.c
>> > > > > > +++ b/mm/slab.c
>> > > > > > @@ -964,8 +964,13 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
>> > > > > >          * guaranteed to be valid until irq is re-enabled, because it will be
>> > > > > >          * freed after synchronize_sched().
>> > > > > >          */
>> > > > > > -       if (force_change)
>> > > > > > +       if (force_change) {
>> > > > > > +               if (num_online_cpus() > 1)
>> > > > > > +                       dump_stack();
>> > > > > >                 synchronize_sched();
>> > > > > > +               if (num_online_cpus() > 1)
>> > > > > > +                       dump_stack();
>> > > > > > +       }
>> > > > >
>> > > > > I've only added the first one, as I would never see the second one. All of
>> > > > > this happens before the serial console is activated, earlycon is not supported,
>> > > > > and I only have remote access.
>> > > > >
>> > > > > Brought up 2 CPUs
>> > > > > SMP: Total of 2 processors activated (2132.00 BogoMIPS).
>> > > > > CPU: All CPU(s) started in SVC mode.
>> > > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>> > > > > 4.7.0-rc4-kzm9d-00404-g4a235e6dde4404dd-dirty #89
>> > > > > Hardware name: Generic Emma Mobile EV2 (Flattened Device Tree)
>> > > > > [<c010de68>] (unwind_backtrace) from [<c010a658>] (show_stack+0x10/0x14)
>> > > > > [<c010a658>] (show_stack) from [<c02b5cf8>] (dump_stack+0x7c/0x9c)
>> > > > > [<c02b5cf8>] (dump_stack) from [<c01cfa4c>] (setup_kmem_cache_node+0x140/0x170)
>> > > > > [<c01cfa4c>] (setup_kmem_cache_node) from [<c01cfe3c>]
>> > > > > (__do_tune_cpucache+0xf4/0x114)
>> > > > > [<c01cfe3c>] (__do_tune_cpucache) from [<c01cff54>] (enable_cpucache+0xf8/0x148)
>> > > > > [<c01cff54>] (enable_cpucache) from [<c01d0190>]
>> > > > > (__kmem_cache_create+0x1a8/0x1d0)
>> > > > > [<c01d0190>] (__kmem_cache_create) from [<c01b32d0>]
>> > > > > (kmem_cache_create+0xbc/0x190)
>> > > > > [<c01b32d0>] (kmem_cache_create) from [<c070d968>] (shmem_init+0x34/0xb0)
>> > > > > [<c070d968>] (shmem_init) from [<c0700cc8>] (kernel_init_freeable+0x98/0x1ec)
>> > > > > [<c0700cc8>] (kernel_init_freeable) from [<c049fdbc>] (kernel_init+0x8/0x110)
>> > > > > [<c049fdbc>] (kernel_init) from [<c0106cb8>] (ret_from_fork+0x14/0x3c)
>> > > > > devtmpfs: initialized
>> > > >
>> > > > I don't see anything here that would prevent grace periods from completing.
>> > > >
>> > > > The CPUs are using the normal hotplug sequence to come online, correct?
>> > >
>> > > And either way, could you please apply the patch below and then
>> > > invoke rcu_dump_rcu_sched_tree() just before the offending call to
>> > > synchronize_sched()?  That will tell me what CPUs RCU believes exist,
>> > > and perhaps also which CPU is holding it up.
>> >
>> > I can't find rcu_dump_rcu_sched_tree(). Do you mean
>> > rcu_dump_rcu_node_tree()? Anyway, there is no patch below so I attach
>> > one which does what Paul want, maybe.
>>
>> One of those days, I guess!  :-/
>>
>> Your patch is exactly what I intended to send, thank you!
>
> Ah, but your telepathy was not sufficient to intuit the additional
> information I need.  Please see the patch at the end.  Your hunk
> in mm/slab.c is needed on top of my patch.

> @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
>                         pr_info(" ");
>                         level = rnp->level;
>                 }
> -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
> +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
> +                       rnp->qsmask,
> +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
>         }
>         pr_cont("\n");
>  }

For me it always crashes during the 37th call of synchronize_sched() in
setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
With your and my debug code, I get:

  CPU: Testing write buffer coherency: ok
  CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
  Setting up static identity map for 0x40100000 - 0x40100058
  cnt = 36, sync
  CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
  Brought up 2 CPUs
  SMP: Total of 2 processors activated (2132.00 BogoMIPS).
  CPU: All CPU(s) started in SVC mode.
  rcu_node tree layout dump
   0:1/0x0/0x3 ^0
  devtmpfs: initialized
  VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
  clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
max_idle_ns: 19112604462750000 ns

I hope it helps. Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-29 14:54                                   ` Geert Uytterhoeven
@ 2016-06-29 16:44                                     ` Paul E. McKenney
  2016-06-29 17:52                                       ` Geert Uytterhoeven
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-29 16:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
> Hi Paul,
> 
> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:

[ . . . ]

> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> >                         pr_info(" ");
> >                         level = rnp->level;
> >                 }
> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
> > +                       rnp->qsmask,
> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
> >         }
> >         pr_cont("\n");
> >  }
> 
> For me it always crashes during the 37th call of synchronize_sched() in
> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
> With your and my debug code, I get:
> 
>   CPU: Testing write buffer coherency: ok
>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>   Setting up static identity map for 0x40100000 - 0x40100058
>   cnt = 36, sync
>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>   Brought up 2 CPUs
>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
>   CPU: All CPU(s) started in SVC mode.
>   rcu_node tree layout dump
>    0:1/0x0/0x3 ^0

Thank you for running this!

OK, so RCU knows about both CPUs (the "0x3"), and the previous
grace period has seen quiescent states from both of them (the "0x0").
That would indicate that your synchronize_sched() showed up when RCU was
idle, so it had to start a new grace period.  It also rules out failure
modes where RCU thinks that there are more CPUs than really exist.
(Don't laugh, such things have really happened.)

>   devtmpfs: initialized
>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> max_idle_ns: 19112604462750000 ns
> 
> I hope it helps. Thanks!

I am going to guess that this was the first grace period since the second
CPU came online.  When there only on CPU online, synchronize_sched()
is a no-op.

OK, this showed some things that aren't a problem.  What might the
problem be?

o	The grace-period kthread has not yet started.  It -should- start
	at early_initcall() time, but who knows?  Adding code to print
	out that kthread's task_struct address.

o	The grace-period kthread might not be responding to wakeups.
	Checking this requires that a grace period be in progress,
	so please put a call_rcu_sched() just before the call to
	rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
	to my patch to print out more GP-kthread state as well.

o	One of the CPUs might not be responding to RCU.  That -should-
	result in an RCU CPU stall warning, so I will ignore this
	possibility for the moment.

	That said, do you have some way to determine whether scheduling
	clock interrupts are really happening?  Without these interrupts,
	no RCU CPU stall warnings.

OK, that should be enough for the next phase, please see the end for the
patch.  This patch applies on top of my previous one.

Could you please set this up as follows?

	struct rcu_head rh;

	rcu_dump_rcu_node_tree(&rcu_sched_state);  /* Initial state. */
	call_rcu(&rh, do_nothing_cb);
	schedule_timeout_uninterruptible(5 * HZ);  /* Or whatever delay. */
	rcu_dump_rcu_node_tree(&rcu_sched_state);  /* GP state. */
	synchronize_sched();  /* Probably hangs. */
	rcu_barrier();  /* Drop RCU's references to rh before return. */

							Thanx, Paul

------------------------------------------------------------------------

commit 82829ec76c2c0de18874a60ebd7ff8ee80f244d1
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed Jun 29 09:42:13 2016 -0700

    rcu: More diagnostics
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2eda7bece401..ff55c569473c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4712,6 +4712,11 @@ static void rcu_dump_rcu_node_tree(struct rcu_state *rsp)
 	int level = 0;
 	struct rcu_node *rnp;
 
+	pr_info("RCU: %s GP kthread: %p state: %d flags: %#x g:%ld c:%ld\n",
+		rsp->name, rsp->gp_kthread, rsp->gp_state, rsp->gp_flags,
+		(long)rsp->gpnum, (long)rsp->completed);
+	pr_info("     jiffies: %#lx  GP start: %#lx Last GP activity: %#lx\n",
+		jiffies, rsp->gp_start, rsp->gp_activity);
 	pr_info("rcu_node tree layout dump\n");
 	pr_info(" ");
 	rcu_for_each_node_breadth_first(rsp, rnp) {

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-29 16:44                                     ` Paul E. McKenney
@ 2016-06-29 17:52                                       ` Geert Uytterhoeven
  2016-06-29 18:12                                         ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Geert Uytterhoeven @ 2016-06-29 17:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Paul,

On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
>> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com> wrote:
>> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
>
> [ . . . ]
>
>> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
>> >                         pr_info(" ");
>> >                         level = rnp->level;
>> >                 }
>> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
>> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
>> > +                       rnp->qsmask,
>> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
>> >         }
>> >         pr_cont("\n");
>> >  }
>>
>> For me it always crashes during the 37th call of synchronize_sched() in
>> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
>> With your and my debug code, I get:
>>
>>   CPU: Testing write buffer coherency: ok
>>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>>   Setting up static identity map for 0x40100000 - 0x40100058
>>   cnt = 36, sync
>>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>>   Brought up 2 CPUs
>>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
>>   CPU: All CPU(s) started in SVC mode.
>>   rcu_node tree layout dump
>>    0:1/0x0/0x3 ^0
>
> Thank you for running this!
>
> OK, so RCU knows about both CPUs (the "0x3"), and the previous
> grace period has seen quiescent states from both of them (the "0x0").
> That would indicate that your synchronize_sched() showed up when RCU was
> idle, so it had to start a new grace period.  It also rules out failure
> modes where RCU thinks that there are more CPUs than really exist.
> (Don't laugh, such things have really happened.)
>
>>   devtmpfs: initialized
>>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
>>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
>> max_idle_ns: 19112604462750000 ns
>>
>> I hope it helps. Thanks!
>
> I am going to guess that this was the first grace period since the second
> CPU came online.  When there only on CPU online, synchronize_sched()
> is a no-op.
>
> OK, this showed some things that aren't a problem.  What might the
> problem be?
>
> o       The grace-period kthread has not yet started.  It -should- start
>         at early_initcall() time, but who knows?  Adding code to print
>         out that kthread's task_struct address.
>
> o       The grace-period kthread might not be responding to wakeups.
>         Checking this requires that a grace period be in progress,
>         so please put a call_rcu_sched() just before the call to
>         rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
>         to my patch to print out more GP-kthread state as well.
>
> o       One of the CPUs might not be responding to RCU.  That -should-
>         result in an RCU CPU stall warning, so I will ignore this
>         possibility for the moment.
>
>         That said, do you have some way to determine whether scheduling
>         clock interrupts are really happening?  Without these interrupts,
>         no RCU CPU stall warnings.

I believe there are no clocksources yet. The jiffies clocksource is the first
clocksource found, and that happens after the first call to
synchronize_sched(), cfr. my dmesg snippet above.

In a working boot:
# cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
e0180000.timer jiffies
# cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
e0180000.timer

> OK, that should be enough for the next phase, please see the end for the
> patch.  This patch applies on top of my previous one.
>
> Could you please set this up as follows?
>
>         struct rcu_head rh;
>
>         rcu_dump_rcu_node_tree(&rcu_sched_state);  /* Initial state. */
>         call_rcu(&rh, do_nothing_cb);

I added an empty do_nothing_cb() for this:

    static void do_nothing_cb(struct rcu_head *rcu_head)
    {
    }

According to the debugging technique "comment everything out until it boots",
it now hangs in call_rcu().

>         schedule_timeout_uninterruptible(5 * HZ);  /* Or whatever delay. */
>         rcu_dump_rcu_node_tree(&rcu_sched_state);  /* GP state. */
>         synchronize_sched();  /* Probably hangs. */
>         rcu_barrier();  /* Drop RCU's references to rh before return. */

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-29 17:52                                       ` Geert Uytterhoeven
@ 2016-06-29 18:12                                         ` Paul E. McKenney
  2016-06-30  7:47                                           ` Joonsoo Kim
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-29 18:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 29, 2016 at 07:52:06PM +0200, Geert Uytterhoeven wrote:
> Hi Paul,
> 
> On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
> >> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
> >
> > [ . . . ]
> >
> >> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> >> >                         pr_info(" ");
> >> >                         level = rnp->level;
> >> >                 }
> >> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
> >> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
> >> > +                       rnp->qsmask,
> >> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
> >> >         }
> >> >         pr_cont("\n");
> >> >  }
> >>
> >> For me it always crashes during the 37th call of synchronize_sched() in
> >> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
> >> With your and my debug code, I get:
> >>
> >>   CPU: Testing write buffer coherency: ok
> >>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> >>   Setting up static identity map for 0x40100000 - 0x40100058
> >>   cnt = 36, sync
> >>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> >>   Brought up 2 CPUs
> >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> >>   CPU: All CPU(s) started in SVC mode.
> >>   rcu_node tree layout dump
> >>    0:1/0x0/0x3 ^0
> >
> > Thank you for running this!
> >
> > OK, so RCU knows about both CPUs (the "0x3"), and the previous
> > grace period has seen quiescent states from both of them (the "0x0").
> > That would indicate that your synchronize_sched() showed up when RCU was
> > idle, so it had to start a new grace period.  It also rules out failure
> > modes where RCU thinks that there are more CPUs than really exist.
> > (Don't laugh, such things have really happened.)
> >
> >>   devtmpfs: initialized
> >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> >>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> >> max_idle_ns: 19112604462750000 ns
> >>
> >> I hope it helps. Thanks!
> >
> > I am going to guess that this was the first grace period since the second
> > CPU came online.  When there only on CPU online, synchronize_sched()
> > is a no-op.
> >
> > OK, this showed some things that aren't a problem.  What might the
> > problem be?
> >
> > o       The grace-period kthread has not yet started.  It -should- start
> >         at early_initcall() time, but who knows?  Adding code to print
> >         out that kthread's task_struct address.
> >
> > o       The grace-period kthread might not be responding to wakeups.
> >         Checking this requires that a grace period be in progress,
> >         so please put a call_rcu_sched() just before the call to
> >         rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
> >         to my patch to print out more GP-kthread state as well.
> >
> > o       One of the CPUs might not be responding to RCU.  That -should-
> >         result in an RCU CPU stall warning, so I will ignore this
> >         possibility for the moment.
> >
> >         That said, do you have some way to determine whether scheduling
> >         clock interrupts are really happening?  Without these interrupts,
> >         no RCU CPU stall warnings.
> 
> I believe there are no clocksources yet. The jiffies clocksource is the first
> clocksource found, and that happens after the first call to
> synchronize_sched(), cfr. my dmesg snippet above.
> 
> In a working boot:
> # cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
> e0180000.timer jiffies
> # cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
> e0180000.timer

Ah!  But if there is no jiffies clocksource, then schedule_timeout()
and friends will never return, correct?  If so, I guarantee you that
synchronize_sched() will unconditionally hang.

So if I understand correctly, the fix is to get the jiffies clocksource
running before the first call to synchronize_sched().

							Thanx, Paul

> > OK, that should be enough for the next phase, please see the end for the
> > patch.  This patch applies on top of my previous one.
> >
> > Could you please set this up as follows?
> >
> >         struct rcu_head rh;
> >
> >         rcu_dump_rcu_node_tree(&rcu_sched_state);  /* Initial state. */
> >         call_rcu(&rh, do_nothing_cb);
> 
> I added an empty do_nothing_cb() for this:
> 
>     static void do_nothing_cb(struct rcu_head *rcu_head)
>     {
>     }
> 
> According to the debugging technique "comment everything out until it boots",
> it now hangs in call_rcu().
> 
> >         schedule_timeout_uninterruptible(5 * HZ);  /* Or whatever delay. */
> >         rcu_dump_rcu_node_tree(&rcu_sched_state);  /* GP state. */
> >         synchronize_sched();  /* Probably hangs. */
> >         rcu_barrier();  /* Drop RCU's references to rh before return. */
> 
> Thanks!
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-29 18:12                                         ` Paul E. McKenney
@ 2016-06-30  7:47                                           ` Joonsoo Kim
  2016-06-30  7:58                                             ` Geert Uytterhoeven
  0 siblings, 1 reply; 33+ messages in thread
From: Joonsoo Kim @ 2016-06-30  7:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 29, 2016 at 11:12:08AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 29, 2016 at 07:52:06PM +0200, Geert Uytterhoeven wrote:
> > Hi Paul,
> > 
> > On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
> > >> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
> > >> <paulmck@linux.vnet.ibm.com> wrote:
> > >> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
> > >
> > > [ . . . ]
> > >
> > >> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> > >> >                         pr_info(" ");
> > >> >                         level = rnp->level;
> > >> >                 }
> > >> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
> > >> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
> > >> > +                       rnp->qsmask,
> > >> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
> > >> >         }
> > >> >         pr_cont("\n");
> > >> >  }
> > >>
> > >> For me it always crashes during the 37th call of synchronize_sched() in
> > >> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
> > >> With your and my debug code, I get:
> > >>
> > >>   CPU: Testing write buffer coherency: ok
> > >>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> > >>   Setting up static identity map for 0x40100000 - 0x40100058
> > >>   cnt = 36, sync
> > >>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> > >>   Brought up 2 CPUs
> > >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > >>   CPU: All CPU(s) started in SVC mode.
> > >>   rcu_node tree layout dump
> > >>    0:1/0x0/0x3 ^0
> > >
> > > Thank you for running this!
> > >
> > > OK, so RCU knows about both CPUs (the "0x3"), and the previous
> > > grace period has seen quiescent states from both of them (the "0x0").
> > > That would indicate that your synchronize_sched() showed up when RCU was
> > > idle, so it had to start a new grace period.  It also rules out failure
> > > modes where RCU thinks that there are more CPUs than really exist.
> > > (Don't laugh, such things have really happened.)
> > >
> > >>   devtmpfs: initialized
> > >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> > >>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> > >> max_idle_ns: 19112604462750000 ns
> > >>
> > >> I hope it helps. Thanks!
> > >
> > > I am going to guess that this was the first grace period since the second
> > > CPU came online.  When there only on CPU online, synchronize_sched()
> > > is a no-op.
> > >
> > > OK, this showed some things that aren't a problem.  What might the
> > > problem be?
> > >
> > > o       The grace-period kthread has not yet started.  It -should- start
> > >         at early_initcall() time, but who knows?  Adding code to print
> > >         out that kthread's task_struct address.
> > >
> > > o       The grace-period kthread might not be responding to wakeups.
> > >         Checking this requires that a grace period be in progress,
> > >         so please put a call_rcu_sched() just before the call to
> > >         rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
> > >         to my patch to print out more GP-kthread state as well.
> > >
> > > o       One of the CPUs might not be responding to RCU.  That -should-
> > >         result in an RCU CPU stall warning, so I will ignore this
> > >         possibility for the moment.
> > >
> > >         That said, do you have some way to determine whether scheduling
> > >         clock interrupts are really happening?  Without these interrupts,
> > >         no RCU CPU stall warnings.
> > 
> > I believe there are no clocksources yet. The jiffies clocksource is the first
> > clocksource found, and that happens after the first call to
> > synchronize_sched(), cfr. my dmesg snippet above.
> > 
> > In a working boot:
> > # cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
> > e0180000.timer jiffies
> > # cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
> > e0180000.timer
> 
> Ah!  But if there is no jiffies clocksource, then schedule_timeout()
> and friends will never return, correct?  If so, I guarantee you that
> synchronize_sched() will unconditionally hang.
> 
> So if I understand correctly, the fix is to get the jiffies clocksource
> running before the first call to synchronize_sched().

If so, following change would be sufficient.

Thanks.

------>8-------
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index 555e21f..4f6471f 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -98,7 +98,7 @@ static int __init init_jiffies_clocksource(void)
        return __clocksource_register(&clocksource_jiffies);
 }
 
-core_initcall(init_jiffies_clocksource);
+early_initcall(init_jiffies_clocksource);
 
 struct clocksource * __init __weak clocksource_default_clock(void)
 {

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-30  7:47                                           ` Joonsoo Kim
@ 2016-06-30  7:58                                             ` Geert Uytterhoeven
  2016-06-30 13:24                                               ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Geert Uytterhoeven @ 2016-06-30  7:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Joonsoo,

On Thu, Jun 30, 2016 at 9:47 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Wed, Jun 29, 2016 at 11:12:08AM -0700, Paul E. McKenney wrote:
>> On Wed, Jun 29, 2016 at 07:52:06PM +0200, Geert Uytterhoeven wrote:
>> > On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney
>> > <paulmck@linux.vnet.ibm.com> wrote:
>> > > On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
>> > >> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
>> > >> <paulmck@linux.vnet.ibm.com> wrote:
>> > >> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
>> > >
>> > > [ . . . ]
>> > >
>> > >> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
>> > >> >                         pr_info(" ");
>> > >> >                         level = rnp->level;
>> > >> >                 }
>> > >> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
>> > >> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
>> > >> > +                       rnp->qsmask,
>> > >> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
>> > >> >         }
>> > >> >         pr_cont("\n");
>> > >> >  }
>> > >>
>> > >> For me it always crashes during the 37th call of synchronize_sched() in
>> > >> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
>> > >> With your and my debug code, I get:
>> > >>
>> > >>   CPU: Testing write buffer coherency: ok
>> > >>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>> > >>   Setting up static identity map for 0x40100000 - 0x40100058
>> > >>   cnt = 36, sync
>> > >>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>> > >>   Brought up 2 CPUs
>> > >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
>> > >>   CPU: All CPU(s) started in SVC mode.
>> > >>   rcu_node tree layout dump
>> > >>    0:1/0x0/0x3 ^0
>> > >
>> > > Thank you for running this!
>> > >
>> > > OK, so RCU knows about both CPUs (the "0x3"), and the previous
>> > > grace period has seen quiescent states from both of them (the "0x0").
>> > > That would indicate that your synchronize_sched() showed up when RCU was
>> > > idle, so it had to start a new grace period.  It also rules out failure
>> > > modes where RCU thinks that there are more CPUs than really exist.
>> > > (Don't laugh, such things have really happened.)
>> > >
>> > >>   devtmpfs: initialized
>> > >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
>> > >>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
>> > >> max_idle_ns: 19112604462750000 ns
>> > >>
>> > >> I hope it helps. Thanks!
>> > >
>> > > I am going to guess that this was the first grace period since the second
>> > > CPU came online.  When there only on CPU online, synchronize_sched()
>> > > is a no-op.
>> > >
>> > > OK, this showed some things that aren't a problem.  What might the
>> > > problem be?
>> > >
>> > > o       The grace-period kthread has not yet started.  It -should- start
>> > >         at early_initcall() time, but who knows?  Adding code to print
>> > >         out that kthread's task_struct address.
>> > >
>> > > o       The grace-period kthread might not be responding to wakeups.
>> > >         Checking this requires that a grace period be in progress,
>> > >         so please put a call_rcu_sched() just before the call to
>> > >         rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
>> > >         to my patch to print out more GP-kthread state as well.
>> > >
>> > > o       One of the CPUs might not be responding to RCU.  That -should-
>> > >         result in an RCU CPU stall warning, so I will ignore this
>> > >         possibility for the moment.
>> > >
>> > >         That said, do you have some way to determine whether scheduling
>> > >         clock interrupts are really happening?  Without these interrupts,
>> > >         no RCU CPU stall warnings.
>> >
>> > I believe there are no clocksources yet. The jiffies clocksource is the first
>> > clocksource found, and that happens after the first call to
>> > synchronize_sched(), cfr. my dmesg snippet above.
>> >
>> > In a working boot:
>> > # cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
>> > e0180000.timer jiffies
>> > # cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
>> > e0180000.timer
>>
>> Ah!  But if there is no jiffies clocksource, then schedule_timeout()
>> and friends will never return, correct?  If so, I guarantee you that
>> synchronize_sched() will unconditionally hang.
>>
>> So if I understand correctly, the fix is to get the jiffies clocksource
>> running before the first call to synchronize_sched().
>
> If so, following change would be sufficient.
>
> Thanks.
>
> ------>8-------
> diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
> index 555e21f..4f6471f 100644
> --- a/kernel/time/jiffies.c
> +++ b/kernel/time/jiffies.c
> @@ -98,7 +98,7 @@ static int __init init_jiffies_clocksource(void)
>         return __clocksource_register(&clocksource_jiffies);
>  }
>
> -core_initcall(init_jiffies_clocksource);
> +early_initcall(init_jiffies_clocksource);
>
>  struct clocksource * __init __weak clocksource_default_clock(void)
>  {

Thanks for your patch!

While this does move jiffies clocksource initialization before secondary CPU
bringup, it still hangs when calling call_rcu() or synchronize_sched():

  CPU: Testing write buffer coherency: ok
  CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
  Setting up static identity map for 0x40100000 - 0x40100058
  cnt = 36, sync
  clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
max_idle_ns: 19112604462750000 ns
  CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
  Brought up 2 CPUs
  SMP: Total of 2 processors activated (2132.00 BogoMIPS).
  CPU: All CPU(s) started in SVC mode.
  RCU: rcu_sched GP kthread: c784e1c0 state: 1 flags: 0x0 g:-300 c:-300
       jiffies: 0xffff8ad0  GP start: 0x0 Last GP activity: 0x0
  rcu_node tree layout dump
   0:1/0x0/0x3 ^0
  devtmpfs: initialized
  VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-30  7:58                                             ` Geert Uytterhoeven
@ 2016-06-30 13:24                                               ` Paul E. McKenney
  2016-06-30 13:31                                                 ` Geert Uytterhoeven
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-30 13:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 30, 2016 at 09:58:51AM +0200, Geert Uytterhoeven wrote:
> Hi Joonsoo,
> 
> On Thu, Jun 30, 2016 at 9:47 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > On Wed, Jun 29, 2016 at 11:12:08AM -0700, Paul E. McKenney wrote:
> >> On Wed, Jun 29, 2016 at 07:52:06PM +0200, Geert Uytterhoeven wrote:
> >> > On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney
> >> > <paulmck@linux.vnet.ibm.com> wrote:
> >> > > On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
> >> > >> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
> >> > >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > >> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
> >> > >
> >> > > [ . . . ]
> >> > >
> >> > >> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> >> > >> >                         pr_info(" ");
> >> > >> >                         level = rnp->level;
> >> > >> >                 }
> >> > >> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
> >> > >> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
> >> > >> > +                       rnp->qsmask,
> >> > >> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
> >> > >> >         }
> >> > >> >         pr_cont("\n");
> >> > >> >  }
> >> > >>
> >> > >> For me it always crashes during the 37th call of synchronize_sched() in
> >> > >> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
> >> > >> With your and my debug code, I get:
> >> > >>
> >> > >>   CPU: Testing write buffer coherency: ok
> >> > >>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> >> > >>   Setting up static identity map for 0x40100000 - 0x40100058
> >> > >>   cnt = 36, sync
> >> > >>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> >> > >>   Brought up 2 CPUs
> >> > >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> >> > >>   CPU: All CPU(s) started in SVC mode.
> >> > >>   rcu_node tree layout dump
> >> > >>    0:1/0x0/0x3 ^0
> >> > >
> >> > > Thank you for running this!
> >> > >
> >> > > OK, so RCU knows about both CPUs (the "0x3"), and the previous
> >> > > grace period has seen quiescent states from both of them (the "0x0").
> >> > > That would indicate that your synchronize_sched() showed up when RCU was
> >> > > idle, so it had to start a new grace period.  It also rules out failure
> >> > > modes where RCU thinks that there are more CPUs than really exist.
> >> > > (Don't laugh, such things have really happened.)
> >> > >
> >> > >>   devtmpfs: initialized
> >> > >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> >> > >>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> >> > >> max_idle_ns: 19112604462750000 ns
> >> > >>
> >> > >> I hope it helps. Thanks!
> >> > >
> >> > > I am going to guess that this was the first grace period since the second
> >> > > CPU came online.  When there only on CPU online, synchronize_sched()
> >> > > is a no-op.
> >> > >
> >> > > OK, this showed some things that aren't a problem.  What might the
> >> > > problem be?
> >> > >
> >> > > o       The grace-period kthread has not yet started.  It -should- start
> >> > >         at early_initcall() time, but who knows?  Adding code to print
> >> > >         out that kthread's task_struct address.
> >> > >
> >> > > o       The grace-period kthread might not be responding to wakeups.
> >> > >         Checking this requires that a grace period be in progress,
> >> > >         so please put a call_rcu_sched() just before the call to
> >> > >         rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
> >> > >         to my patch to print out more GP-kthread state as well.
> >> > >
> >> > > o       One of the CPUs might not be responding to RCU.  That -should-
> >> > >         result in an RCU CPU stall warning, so I will ignore this
> >> > >         possibility for the moment.
> >> > >
> >> > >         That said, do you have some way to determine whether scheduling
> >> > >         clock interrupts are really happening?  Without these interrupts,
> >> > >         no RCU CPU stall warnings.
> >> >
> >> > I believe there are no clocksources yet. The jiffies clocksource is the first
> >> > clocksource found, and that happens after the first call to
> >> > synchronize_sched(), cfr. my dmesg snippet above.
> >> >
> >> > In a working boot:
> >> > # cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
> >> > e0180000.timer jiffies
> >> > # cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
> >> > e0180000.timer
> >>
> >> Ah!  But if there is no jiffies clocksource, then schedule_timeout()
> >> and friends will never return, correct?  If so, I guarantee you that
> >> synchronize_sched() will unconditionally hang.
> >>
> >> So if I understand correctly, the fix is to get the jiffies clocksource
> >> running before the first call to synchronize_sched().
> >
> > If so, following change would be sufficient.
> >
> > Thanks.
> >
> > ------>8-------
> > diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
> > index 555e21f..4f6471f 100644
> > --- a/kernel/time/jiffies.c
> > +++ b/kernel/time/jiffies.c
> > @@ -98,7 +98,7 @@ static int __init init_jiffies_clocksource(void)
> >         return __clocksource_register(&clocksource_jiffies);
> >  }
> >
> > -core_initcall(init_jiffies_clocksource);
> > +early_initcall(init_jiffies_clocksource);
> >
> >  struct clocksource * __init __weak clocksource_default_clock(void)
> >  {
> 
> Thanks for your patch!
> 
> While this does move jiffies clocksource initialization before secondary CPU
> bringup, it still hangs when calling call_rcu() or synchronize_sched():
> 
>   CPU: Testing write buffer coherency: ok
>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>   Setting up static identity map for 0x40100000 - 0x40100058
>   cnt = 36, sync
>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> max_idle_ns: 19112604462750000 ns
>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>   Brought up 2 CPUs
>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
>   CPU: All CPU(s) started in SVC mode.
>   RCU: rcu_sched GP kthread: c784e1c0 state: 1 flags: 0x0 g:-300 c:-300
>        jiffies: 0xffff8ad0  GP start: 0x0 Last GP activity: 0x0
>   rcu_node tree layout dump
>    0:1/0x0/0x3 ^0

This is in fact the initial state for RCU grace periods.  In other words,
all the earlier calls to synchronize_sched() likely happened while there
was only one CPU online.

>   devtmpfs: initialized
>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1

Could you please add the call_rcu() and timed delay as described in my
earlier email?  That would hopefully help me see the state of the stalled
grace period.

							Thanx, Paul

> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-30 13:24                                               ` Paul E. McKenney
@ 2016-06-30 13:31                                                 ` Geert Uytterhoeven
  2016-06-30 15:18                                                   ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Geert Uytterhoeven @ 2016-06-30 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Paul,

On Thu, Jun 30, 2016 at 3:24 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, Jun 30, 2016 at 09:58:51AM +0200, Geert Uytterhoeven wrote:
>> On Thu, Jun 30, 2016 at 9:47 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> > On Wed, Jun 29, 2016 at 11:12:08AM -0700, Paul E. McKenney wrote:
>> >> On Wed, Jun 29, 2016 at 07:52:06PM +0200, Geert Uytterhoeven wrote:
>> >> > On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney
>> >> > <paulmck@linux.vnet.ibm.com> wrote:
>> >> > > On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
>> >> > >> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
>> >> > >> <paulmck@linux.vnet.ibm.com> wrote:
>> >> > >> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
>> >> > >
>> >> > > [ . . . ]
>> >> > >
>> >> > >> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
>> >> > >> >                         pr_info(" ");
>> >> > >> >                         level = rnp->level;
>> >> > >> >                 }
>> >> > >> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
>> >> > >> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
>> >> > >> > +                       rnp->qsmask,
>> >> > >> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
>> >> > >> >         }
>> >> > >> >         pr_cont("\n");
>> >> > >> >  }
>> >> > >>
>> >> > >> For me it always crashes during the 37th call of synchronize_sched() in
>> >> > >> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
>> >> > >> With your and my debug code, I get:
>> >> > >>
>> >> > >>   CPU: Testing write buffer coherency: ok
>> >> > >>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>> >> > >>   Setting up static identity map for 0x40100000 - 0x40100058
>> >> > >>   cnt = 36, sync
>> >> > >>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>> >> > >>   Brought up 2 CPUs
>> >> > >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
>> >> > >>   CPU: All CPU(s) started in SVC mode.
>> >> > >>   rcu_node tree layout dump
>> >> > >>    0:1/0x0/0x3 ^0
>> >> > >
>> >> > > Thank you for running this!
>> >> > >
>> >> > > OK, so RCU knows about both CPUs (the "0x3"), and the previous
>> >> > > grace period has seen quiescent states from both of them (the "0x0").
>> >> > > That would indicate that your synchronize_sched() showed up when RCU was
>> >> > > idle, so it had to start a new grace period.  It also rules out failure
>> >> > > modes where RCU thinks that there are more CPUs than really exist.
>> >> > > (Don't laugh, such things have really happened.)
>> >> > >
>> >> > >>   devtmpfs: initialized
>> >> > >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
>> >> > >>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
>> >> > >> max_idle_ns: 19112604462750000 ns
>> >> > >>
>> >> > >> I hope it helps. Thanks!
>> >> > >
>> >> > > I am going to guess that this was the first grace period since the second
>> >> > > CPU came online.  When there only on CPU online, synchronize_sched()
>> >> > > is a no-op.
>> >> > >
>> >> > > OK, this showed some things that aren't a problem.  What might the
>> >> > > problem be?
>> >> > >
>> >> > > o       The grace-period kthread has not yet started.  It -should- start
>> >> > >         at early_initcall() time, but who knows?  Adding code to print
>> >> > >         out that kthread's task_struct address.
>> >> > >
>> >> > > o       The grace-period kthread might not be responding to wakeups.
>> >> > >         Checking this requires that a grace period be in progress,
>> >> > >         so please put a call_rcu_sched() just before the call to
>> >> > >         rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
>> >> > >         to my patch to print out more GP-kthread state as well.
>> >> > >
>> >> > > o       One of the CPUs might not be responding to RCU.  That -should-
>> >> > >         result in an RCU CPU stall warning, so I will ignore this
>> >> > >         possibility for the moment.
>> >> > >
>> >> > >         That said, do you have some way to determine whether scheduling
>> >> > >         clock interrupts are really happening?  Without these interrupts,
>> >> > >         no RCU CPU stall warnings.
>> >> >
>> >> > I believe there are no clocksources yet. The jiffies clocksource is the first
>> >> > clocksource found, and that happens after the first call to
>> >> > synchronize_sched(), cfr. my dmesg snippet above.
>> >> >
>> >> > In a working boot:
>> >> > # cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
>> >> > e0180000.timer jiffies
>> >> > # cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
>> >> > e0180000.timer
>> >>
>> >> Ah!  But if there is no jiffies clocksource, then schedule_timeout()
>> >> and friends will never return, correct?  If so, I guarantee you that
>> >> synchronize_sched() will unconditionally hang.
>> >>
>> >> So if I understand correctly, the fix is to get the jiffies clocksource
>> >> running before the first call to synchronize_sched().
>> >
>> > If so, following change would be sufficient.
>> >
>> > Thanks.
>> >
>> > ------>8-------
>> > diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
>> > index 555e21f..4f6471f 100644
>> > --- a/kernel/time/jiffies.c
>> > +++ b/kernel/time/jiffies.c
>> > @@ -98,7 +98,7 @@ static int __init init_jiffies_clocksource(void)
>> >         return __clocksource_register(&clocksource_jiffies);
>> >  }
>> >
>> > -core_initcall(init_jiffies_clocksource);
>> > +early_initcall(init_jiffies_clocksource);
>> >
>> >  struct clocksource * __init __weak clocksource_default_clock(void)
>> >  {
>>
>> Thanks for your patch!
>>
>> While this does move jiffies clocksource initialization before secondary CPU
>> bringup, it still hangs when calling call_rcu() or synchronize_sched():
>>
>>   CPU: Testing write buffer coherency: ok
>>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>>   Setting up static identity map for 0x40100000 - 0x40100058
>>   cnt = 36, sync
>>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
>> max_idle_ns: 19112604462750000 ns
>>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>>   Brought up 2 CPUs
>>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
>>   CPU: All CPU(s) started in SVC mode.
>>   RCU: rcu_sched GP kthread: c784e1c0 state: 1 flags: 0x0 g:-300 c:-300
>>        jiffies: 0xffff8ad0  GP start: 0x0 Last GP activity: 0x0
>>   rcu_node tree layout dump
>>    0:1/0x0/0x3 ^0
>
> This is in fact the initial state for RCU grace periods.  In other words,
> all the earlier calls to synchronize_sched() likely happened while there
> was only one CPU online.
>
>>   devtmpfs: initialized
>>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
>
> Could you please add the call_rcu() and timed delay as described in my
> earlier email?  That would hopefully help me see the state of the stalled
> grace period.

I already did, cfr. "it still hangs when calling call_rcu() or
synchronize_sched()".

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-30 13:31                                                 ` Geert Uytterhoeven
@ 2016-06-30 15:18                                                   ` Paul E. McKenney
  2016-06-30 15:53                                                     ` Geert Uytterhoeven
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-30 15:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 30, 2016 at 03:31:57PM +0200, Geert Uytterhoeven wrote:
> Hi Paul,
> 
> On Thu, Jun 30, 2016 at 3:24 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Thu, Jun 30, 2016 at 09:58:51AM +0200, Geert Uytterhoeven wrote:
> >> On Thu, Jun 30, 2016 at 9:47 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> >> > On Wed, Jun 29, 2016 at 11:12:08AM -0700, Paul E. McKenney wrote:
> >> >> On Wed, Jun 29, 2016 at 07:52:06PM +0200, Geert Uytterhoeven wrote:
> >> >> > On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney
> >> >> > <paulmck@linux.vnet.ibm.com> wrote:
> >> >> > > On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
> >> >> > >> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
> >> >> > >> <paulmck@linux.vnet.ibm.com> wrote:
> >> >> > >> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
> >> >> > >
> >> >> > > [ . . . ]
> >> >> > >
> >> >> > >> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> >> >> > >> >                         pr_info(" ");
> >> >> > >> >                         level = rnp->level;
> >> >> > >> >                 }
> >> >> > >> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
> >> >> > >> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
> >> >> > >> > +                       rnp->qsmask,
> >> >> > >> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
> >> >> > >> >         }
> >> >> > >> >         pr_cont("\n");
> >> >> > >> >  }
> >> >> > >>
> >> >> > >> For me it always crashes during the 37th call of synchronize_sched() in
> >> >> > >> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
> >> >> > >> With your and my debug code, I get:
> >> >> > >>
> >> >> > >>   CPU: Testing write buffer coherency: ok
> >> >> > >>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> >> >> > >>   Setting up static identity map for 0x40100000 - 0x40100058
> >> >> > >>   cnt = 36, sync
> >> >> > >>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> >> >> > >>   Brought up 2 CPUs
> >> >> > >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> >> >> > >>   CPU: All CPU(s) started in SVC mode.
> >> >> > >>   rcu_node tree layout dump
> >> >> > >>    0:1/0x0/0x3 ^0
> >> >> > >
> >> >> > > Thank you for running this!
> >> >> > >
> >> >> > > OK, so RCU knows about both CPUs (the "0x3"), and the previous
> >> >> > > grace period has seen quiescent states from both of them (the "0x0").
> >> >> > > That would indicate that your synchronize_sched() showed up when RCU was
> >> >> > > idle, so it had to start a new grace period.  It also rules out failure
> >> >> > > modes where RCU thinks that there are more CPUs than really exist.
> >> >> > > (Don't laugh, such things have really happened.)
> >> >> > >
> >> >> > >>   devtmpfs: initialized
> >> >> > >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> >> >> > >>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> >> >> > >> max_idle_ns: 19112604462750000 ns
> >> >> > >>
> >> >> > >> I hope it helps. Thanks!
> >> >> > >
> >> >> > > I am going to guess that this was the first grace period since the second
> >> >> > > CPU came online.  When there only on CPU online, synchronize_sched()
> >> >> > > is a no-op.
> >> >> > >
> >> >> > > OK, this showed some things that aren't a problem.  What might the
> >> >> > > problem be?
> >> >> > >
> >> >> > > o       The grace-period kthread has not yet started.  It -should- start
> >> >> > >         at early_initcall() time, but who knows?  Adding code to print
> >> >> > >         out that kthread's task_struct address.
> >> >> > >
> >> >> > > o       The grace-period kthread might not be responding to wakeups.
> >> >> > >         Checking this requires that a grace period be in progress,
> >> >> > >         so please put a call_rcu_sched() just before the call to
> >> >> > >         rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
> >> >> > >         to my patch to print out more GP-kthread state as well.
> >> >> > >
> >> >> > > o       One of the CPUs might not be responding to RCU.  That -should-
> >> >> > >         result in an RCU CPU stall warning, so I will ignore this
> >> >> > >         possibility for the moment.
> >> >> > >
> >> >> > >         That said, do you have some way to determine whether scheduling
> >> >> > >         clock interrupts are really happening?  Without these interrupts,
> >> >> > >         no RCU CPU stall warnings.
> >> >> >
> >> >> > I believe there are no clocksources yet. The jiffies clocksource is the first
> >> >> > clocksource found, and that happens after the first call to
> >> >> > synchronize_sched(), cfr. my dmesg snippet above.
> >> >> >
> >> >> > In a working boot:
> >> >> > # cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
> >> >> > e0180000.timer jiffies
> >> >> > # cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
> >> >> > e0180000.timer
> >> >>
> >> >> Ah!  But if there is no jiffies clocksource, then schedule_timeout()
> >> >> and friends will never return, correct?  If so, I guarantee you that
> >> >> synchronize_sched() will unconditionally hang.
> >> >>
> >> >> So if I understand correctly, the fix is to get the jiffies clocksource
> >> >> running before the first call to synchronize_sched().
> >> >
> >> > If so, following change would be sufficient.
> >> >
> >> > Thanks.
> >> >
> >> > ------>8-------
> >> > diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
> >> > index 555e21f..4f6471f 100644
> >> > --- a/kernel/time/jiffies.c
> >> > +++ b/kernel/time/jiffies.c
> >> > @@ -98,7 +98,7 @@ static int __init init_jiffies_clocksource(void)
> >> >         return __clocksource_register(&clocksource_jiffies);
> >> >  }
> >> >
> >> > -core_initcall(init_jiffies_clocksource);
> >> > +early_initcall(init_jiffies_clocksource);
> >> >
> >> >  struct clocksource * __init __weak clocksource_default_clock(void)
> >> >  {
> >>
> >> Thanks for your patch!
> >>
> >> While this does move jiffies clocksource initialization before secondary CPU
> >> bringup, it still hangs when calling call_rcu() or synchronize_sched():
> >>
> >>   CPU: Testing write buffer coherency: ok
> >>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> >>   Setting up static identity map for 0x40100000 - 0x40100058
> >>   cnt = 36, sync
> >>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> >> max_idle_ns: 19112604462750000 ns
> >>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> >>   Brought up 2 CPUs
> >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> >>   CPU: All CPU(s) started in SVC mode.
> >>   RCU: rcu_sched GP kthread: c784e1c0 state: 1 flags: 0x0 g:-300 c:-300
> >>        jiffies: 0xffff8ad0  GP start: 0x0 Last GP activity: 0x0
> >>   rcu_node tree layout dump
> >>    0:1/0x0/0x3 ^0
> >
> > This is in fact the initial state for RCU grace periods.  In other words,
> > all the earlier calls to synchronize_sched() likely happened while there
> > was only one CPU online.
> >
> >>   devtmpfs: initialized
> >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> >
> > Could you please add the call_rcu() and timed delay as described in my
> > earlier email?  That would hopefully help me see the state of the stalled
> > grace period.
> 
> I already did, cfr. "it still hangs when calling call_rcu() or
> synchronize_sched()".

Ah, sorry for my inattention.

I am a bit surprised that it could hang when calling call_rcu(), given
that call_rcu() is callable from atomic contexts.  Could you please show
me the current test code you have?

If the hang is in call_rcu(), could you please try disabling irqs across
the call to call_rcu()?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-30 15:18                                                   ` Paul E. McKenney
@ 2016-06-30 15:53                                                     ` Geert Uytterhoeven
  2016-06-30 16:52                                                       ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Geert Uytterhoeven @ 2016-06-30 15:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 30 Jun 2016, Paul E. McKenney wrote:
> On Thu, Jun 30, 2016 at 03:31:57PM +0200, Geert Uytterhoeven wrote:
> > On Thu, Jun 30, 2016 at 3:24 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > On Thu, Jun 30, 2016 at 09:58:51AM +0200, Geert Uytterhoeven wrote:
> > >> On Thu, Jun 30, 2016 at 9:47 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > >> > On Wed, Jun 29, 2016 at 11:12:08AM -0700, Paul E. McKenney wrote:
> > >> >> On Wed, Jun 29, 2016 at 07:52:06PM +0200, Geert Uytterhoeven wrote:
> > >> >> > On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney
> > >> >> > <paulmck@linux.vnet.ibm.com> wrote:
> > >> >> > > On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
> > >> >> > >> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
> > >> >> > >> <paulmck@linux.vnet.ibm.com> wrote:
> > >> >> > >> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
> > >> >> > >
> > >> >> > > [ . . . ]
> > >> >> > >
> > >> >> > >> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> > >> >> > >> >                         pr_info(" ");
> > >> >> > >> >                         level = rnp->level;
> > >> >> > >> >                 }
> > >> >> > >> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
> > >> >> > >> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
> > >> >> > >> > +                       rnp->qsmask,
> > >> >> > >> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
> > >> >> > >> >         }
> > >> >> > >> >         pr_cont("\n");
> > >> >> > >> >  }
> > >> >> > >>
> > >> >> > >> For me it always crashes during the 37th call of synchronize_sched() in
> > >> >> > >> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
> > >> >> > >> With your and my debug code, I get:
> > >> >> > >>
> > >> >> > >>   CPU: Testing write buffer coherency: ok
> > >> >> > >>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> > >> >> > >>   Setting up static identity map for 0x40100000 - 0x40100058
> > >> >> > >>   cnt = 36, sync
> > >> >> > >>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> > >> >> > >>   Brought up 2 CPUs
> > >> >> > >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > >> >> > >>   CPU: All CPU(s) started in SVC mode.
> > >> >> > >>   rcu_node tree layout dump
> > >> >> > >>    0:1/0x0/0x3 ^0
> > >> >> > >
> > >> >> > > Thank you for running this!
> > >> >> > >
> > >> >> > > OK, so RCU knows about both CPUs (the "0x3"), and the previous
> > >> >> > > grace period has seen quiescent states from both of them (the "0x0").
> > >> >> > > That would indicate that your synchronize_sched() showed up when RCU was
> > >> >> > > idle, so it had to start a new grace period.  It also rules out failure
> > >> >> > > modes where RCU thinks that there are more CPUs than really exist.
> > >> >> > > (Don't laugh, such things have really happened.)
> > >> >> > >
> > >> >> > >>   devtmpfs: initialized
> > >> >> > >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> > >> >> > >>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> > >> >> > >> max_idle_ns: 19112604462750000 ns
> > >> >> > >>
> > >> >> > >> I hope it helps. Thanks!
> > >> >> > >
> > >> >> > > I am going to guess that this was the first grace period since the second
> > >> >> > > CPU came online.  When there only on CPU online, synchronize_sched()
> > >> >> > > is a no-op.
> > >> >> > >
> > >> >> > > OK, this showed some things that aren't a problem.  What might the
> > >> >> > > problem be?
> > >> >> > >
> > >> >> > > o       The grace-period kthread has not yet started.  It -should- start
> > >> >> > >         at early_initcall() time, but who knows?  Adding code to print
> > >> >> > >         out that kthread's task_struct address.
> > >> >> > >
> > >> >> > > o       The grace-period kthread might not be responding to wakeups.
> > >> >> > >         Checking this requires that a grace period be in progress,
> > >> >> > >         so please put a call_rcu_sched() just before the call to
> > >> >> > >         rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
> > >> >> > >         to my patch to print out more GP-kthread state as well.
> > >> >> > >
> > >> >> > > o       One of the CPUs might not be responding to RCU.  That -should-
> > >> >> > >         result in an RCU CPU stall warning, so I will ignore this
> > >> >> > >         possibility for the moment.
> > >> >> > >
> > >> >> > >         That said, do you have some way to determine whether scheduling
> > >> >> > >         clock interrupts are really happening?  Without these interrupts,
> > >> >> > >         no RCU CPU stall warnings.
> > >> >> >
> > >> >> > I believe there are no clocksources yet. The jiffies clocksource is the first
> > >> >> > clocksource found, and that happens after the first call to
> > >> >> > synchronize_sched(), cfr. my dmesg snippet above.
> > >> >> >
> > >> >> > In a working boot:
> > >> >> > # cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
> > >> >> > e0180000.timer jiffies
> > >> >> > # cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
> > >> >> > e0180000.timer
> > >> >>
> > >> >> Ah!  But if there is no jiffies clocksource, then schedule_timeout()
> > >> >> and friends will never return, correct?  If so, I guarantee you that
> > >> >> synchronize_sched() will unconditionally hang.
> > >> >>
> > >> >> So if I understand correctly, the fix is to get the jiffies clocksource
> > >> >> running before the first call to synchronize_sched().
> > >> >
> > >> > If so, following change would be sufficient.
> > >> >
> > >> > Thanks.
> > >> >
> > >> > ------>8-------
> > >> > diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
> > >> > index 555e21f..4f6471f 100644
> > >> > --- a/kernel/time/jiffies.c
> > >> > +++ b/kernel/time/jiffies.c
> > >> > @@ -98,7 +98,7 @@ static int __init init_jiffies_clocksource(void)
> > >> >         return __clocksource_register(&clocksource_jiffies);
> > >> >  }
> > >> >
> > >> > -core_initcall(init_jiffies_clocksource);
> > >> > +early_initcall(init_jiffies_clocksource);
> > >> >
> > >> >  struct clocksource * __init __weak clocksource_default_clock(void)
> > >> >  {
> > >>
> > >> Thanks for your patch!
> > >>
> > >> While this does move jiffies clocksource initialization before secondary CPU
> > >> bringup, it still hangs when calling call_rcu() or synchronize_sched():
> > >>
> > >>   CPU: Testing write buffer coherency: ok
> > >>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> > >>   Setting up static identity map for 0x40100000 - 0x40100058
> > >>   cnt = 36, sync
> > >>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> > >> max_idle_ns: 19112604462750000 ns
> > >>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> > >>   Brought up 2 CPUs
> > >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > >>   CPU: All CPU(s) started in SVC mode.
> > >>   RCU: rcu_sched GP kthread: c784e1c0 state: 1 flags: 0x0 g:-300 c:-300
> > >>        jiffies: 0xffff8ad0  GP start: 0x0 Last GP activity: 0x0
> > >>   rcu_node tree layout dump
> > >>    0:1/0x0/0x3 ^0
> > >
> > > This is in fact the initial state for RCU grace periods.  In other words,
> > > all the earlier calls to synchronize_sched() likely happened while there
> > > was only one CPU online.
> > >
> > >>   devtmpfs: initialized
> > >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> > >
> > > Could you please add the call_rcu() and timed delay as described in my
> > > earlier email?  That would hopefully help me see the state of the stalled
> > > grace period.
> > 
> > I already did, cfr. "it still hangs when calling call_rcu() or
> > synchronize_sched()".
> 
> Ah, sorry for my inattention.
> 
> I am a bit surprised that it could hang when calling call_rcu(), given
> that call_rcu() is callable from atomic contexts.  Could you please show
> me the current test code you have?
> 
> If the hang is in call_rcu(), could you please try disabling irqs across
> the call to call_rcu()?

These are my local changes:

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c7f1bc4f817c4a34..50bea263e510006f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4707,11 +4707,16 @@ static void __init rcu_init_geometry(void)
  * Dump out the structure of the rcu_node combining tree associated
  * with the rcu_state structure referenced by rsp.
  */
-static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
+static void rcu_dump_rcu_node_tree(struct rcu_state *rsp)
 {
 	int level = 0;
 	struct rcu_node *rnp;
 
+	pr_info("RCU: %s GP kthread: %p state: %d flags: %#x g:%ld c:%ld\n",
+		rsp->name, rsp->gp_kthread, rsp->gp_state, rsp->gp_flags,
+		(long)rsp->gpnum, (long)rsp->completed);
+	pr_info("     jiffies: %#lx  GP start: %#lx Last GP activity: %#lx\n",
+		jiffies, rsp->gp_start, rsp->gp_activity);
 	pr_info("rcu_node tree layout dump\n");
 	pr_info(" ");
 	rcu_for_each_node_breadth_first(rsp, rnp) {
@@ -4720,11 +4725,32 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
 			pr_info(" ");
 			level = rnp->level;
 		}
-		pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
+		pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
+			rnp->qsmask,
+			rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
 	}
 	pr_cont("\n");
 }
 
+static void do_nothing_cb(struct rcu_head *rcu_head)
+{
+}
+
+void rcu_dump_rcu_sched_tree(void)
+{
+	struct rcu_head rh;
+	unsigned long flags;
+
+	rcu_dump_rcu_node_tree(&rcu_sched_state);  /* Initial state. */
+	local_irq_save(flags);
+	// call_rcu(&rh, do_nothing_cb);
+	local_irq_restore(flags);
+	// schedule_timeout_uninterruptible(5 * HZ);  /* Or whatever delay. */
+	rcu_dump_rcu_node_tree(&rcu_sched_state); /* GP state. */
+	//synchronize_sched();  /* Probably hangs. */
+	//rcu_barrier();  /* Drop RCU's references to rh before return. */
+}
+
 void __init rcu_init(void)
 {
 	int cpu;
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index 555e21f7b966c789..4f6471f54f69a6fe 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -98,7 +98,7 @@ static int __init init_jiffies_clocksource(void)
 	return __clocksource_register(&clocksource_jiffies);
 }
 
-core_initcall(init_jiffies_clocksource);
+early_initcall(init_jiffies_clocksource);
 
 struct clocksource * __init __weak clocksource_default_clock(void)
 {
diff --git a/mm/slab.c b/mm/slab.c
index cc8bbc1e6bc9b6fe..f9b2f50adc705173 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -909,6 +909,8 @@ static int init_cache_node_node(int node)
 	return 0;
 }
 
+extern void rcu_dump_rcu_sched_tree(void);
+
 static int setup_kmem_cache_node(struct kmem_cache *cachep,
 				int node, gfp_t gfp, bool force_change)
 {
@@ -964,8 +966,19 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
 	 * guaranteed to be valid until irq is re-enabled, because it will be
 	 * freed after synchronize_sched().
 	 */
-	if (force_change)
-		synchronize_sched();
+	if (force_change) {
+		static int cnt;
+
+		if (++cnt < 37) {
+printk("cnt = %d, sync\n", cnt);
+			synchronize_sched();
+		} else if (cnt == 37) {
+printk("cnt = %d, dump\n", cnt);
+			rcu_dump_rcu_sched_tree();
+		} else {
+printk("cnt = %d\n", cnt);
+		}
+	}
 
 fail:
 	kfree(old_shared);


With this it boots fine:

    ...
    cnt = 35, sync
    CPU: Testing write buffer coherency: ok
    CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
    Setting up static identity map for 0x40100000 - 0x40100058
    cnt = 36, sync
    clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
    CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
    Brought up 2 CPUs
    SMP: Total of 2 processors activated (2132.00 BogoMIPS).
    CPU: All CPU(s) started in SVC mode.
    cnt = 37, dump
    RCU: rcu_sched GP kthread: c784e1c0 state: 1 flags: 0x0 g:-300 c:-300
	 jiffies: 0xffff8ad0  GP start: 0x0 Last GP activity: 0x0
    rcu_node tree layout dump
     0:1/0x0/0x3 ^0  
    RCU: rcu_sched GP kthread: c784e1c0 state: 1 flags: 0x0 g:-300 c:-300
	 jiffies: 0xffff8ad0  GP start: 0x0 Last GP activity: 0x0
    rcu_node tree layout dump
     0:1/0x0/0x3 ^0  
    devtmpfs: initialized
    VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
    cnt = 38
    cnt = 39
    ...

When enabling any of the 4 commented-out lines in rcu_dump_rcu_sched_tree(),
it will lock up.

Thanks!

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-30 15:53                                                     ` Geert Uytterhoeven
@ 2016-06-30 16:52                                                       ` Paul E. McKenney
  2016-06-30 17:54                                                         ` Geert Uytterhoeven
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2016-06-30 16:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 30, 2016 at 05:53:42PM +0200, Geert Uytterhoeven wrote:
> On Thu, 30 Jun 2016, Paul E. McKenney wrote:
> > On Thu, Jun 30, 2016 at 03:31:57PM +0200, Geert Uytterhoeven wrote:
> > > On Thu, Jun 30, 2016 at 3:24 PM, Paul E. McKenney
> > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > On Thu, Jun 30, 2016 at 09:58:51AM +0200, Geert Uytterhoeven wrote:
> > > >> On Thu, Jun 30, 2016 at 9:47 AM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > >> > On Wed, Jun 29, 2016 at 11:12:08AM -0700, Paul E. McKenney wrote:
> > > >> >> On Wed, Jun 29, 2016 at 07:52:06PM +0200, Geert Uytterhoeven wrote:
> > > >> >> > On Wed, Jun 29, 2016 at 6:44 PM, Paul E. McKenney
> > > >> >> > <paulmck@linux.vnet.ibm.com> wrote:
> > > >> >> > > On Wed, Jun 29, 2016 at 04:54:44PM +0200, Geert Uytterhoeven wrote:
> > > >> >> > >> On Thu, Jun 23, 2016 at 4:53 AM, Paul E. McKenney
> > > >> >> > >> <paulmck@linux.vnet.ibm.com> wrote:
> > > >> >> > >> > On Wed, Jun 22, 2016 at 07:47:42PM -0700, Paul E. McKenney wrote:
> > > >> >> > >
> > > >> >> > > [ . . . ]
> > > >> >> > >
> > > >> >> > >> > @@ -4720,11 +4720,18 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> > > >> >> > >> >                         pr_info(" ");
> > > >> >> > >> >                         level = rnp->level;
> > > >> >> > >> >                 }
> > > >> >> > >> > -               pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
> > > >> >> > >> > +               pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
> > > >> >> > >> > +                       rnp->qsmask,
> > > >> >> > >> > +                       rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
> > > >> >> > >> >         }
> > > >> >> > >> >         pr_cont("\n");
> > > >> >> > >> >  }
> > > >> >> > >>
> > > >> >> > >> For me it always crashes during the 37th call of synchronize_sched() in
> > > >> >> > >> setup_kmem_cache_node(), which is the first call after secondary CPU bring up.
> > > >> >> > >> With your and my debug code, I get:
> > > >> >> > >>
> > > >> >> > >>   CPU: Testing write buffer coherency: ok
> > > >> >> > >>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> > > >> >> > >>   Setting up static identity map for 0x40100000 - 0x40100058
> > > >> >> > >>   cnt = 36, sync
> > > >> >> > >>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> > > >> >> > >>   Brought up 2 CPUs
> > > >> >> > >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > > >> >> > >>   CPU: All CPU(s) started in SVC mode.
> > > >> >> > >>   rcu_node tree layout dump
> > > >> >> > >>    0:1/0x0/0x3 ^0
> > > >> >> > >
> > > >> >> > > Thank you for running this!
> > > >> >> > >
> > > >> >> > > OK, so RCU knows about both CPUs (the "0x3"), and the previous
> > > >> >> > > grace period has seen quiescent states from both of them (the "0x0").
> > > >> >> > > That would indicate that your synchronize_sched() showed up when RCU was
> > > >> >> > > idle, so it had to start a new grace period.  It also rules out failure
> > > >> >> > > modes where RCU thinks that there are more CPUs than really exist.
> > > >> >> > > (Don't laugh, such things have really happened.)
> > > >> >> > >
> > > >> >> > >>   devtmpfs: initialized
> > > >> >> > >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> > > >> >> > >>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> > > >> >> > >> max_idle_ns: 19112604462750000 ns
> > > >> >> > >>
> > > >> >> > >> I hope it helps. Thanks!
> > > >> >> > >
> > > >> >> > > I am going to guess that this was the first grace period since the second
> > > >> >> > > CPU came online.  When there only on CPU online, synchronize_sched()
> > > >> >> > > is a no-op.
> > > >> >> > >
> > > >> >> > > OK, this showed some things that aren't a problem.  What might the
> > > >> >> > > problem be?
> > > >> >> > >
> > > >> >> > > o       The grace-period kthread has not yet started.  It -should- start
> > > >> >> > >         at early_initcall() time, but who knows?  Adding code to print
> > > >> >> > >         out that kthread's task_struct address.
> > > >> >> > >
> > > >> >> > > o       The grace-period kthread might not be responding to wakeups.
> > > >> >> > >         Checking this requires that a grace period be in progress,
> > > >> >> > >         so please put a call_rcu_sched() just before the call to
> > > >> >> > >         rcu_dump_rcu_node_tree().  (Sample code below.)  Adding code
> > > >> >> > >         to my patch to print out more GP-kthread state as well.
> > > >> >> > >
> > > >> >> > > o       One of the CPUs might not be responding to RCU.  That -should-
> > > >> >> > >         result in an RCU CPU stall warning, so I will ignore this
> > > >> >> > >         possibility for the moment.
> > > >> >> > >
> > > >> >> > >         That said, do you have some way to determine whether scheduling
> > > >> >> > >         clock interrupts are really happening?  Without these interrupts,
> > > >> >> > >         no RCU CPU stall warnings.
> > > >> >> >
> > > >> >> > I believe there are no clocksources yet. The jiffies clocksource is the first
> > > >> >> > clocksource found, and that happens after the first call to
> > > >> >> > synchronize_sched(), cfr. my dmesg snippet above.
> > > >> >> >
> > > >> >> > In a working boot:
> > > >> >> > # cat /sys/bus/clocksource/devices/clocksource0/available_clocksource
> > > >> >> > e0180000.timer jiffies
> > > >> >> > # cat /sys/bus/clocksource/devices/clocksource0/current_clocksource
> > > >> >> > e0180000.timer
> > > >> >>
> > > >> >> Ah!  But if there is no jiffies clocksource, then schedule_timeout()
> > > >> >> and friends will never return, correct?  If so, I guarantee you that
> > > >> >> synchronize_sched() will unconditionally hang.
> > > >> >>
> > > >> >> So if I understand correctly, the fix is to get the jiffies clocksource
> > > >> >> running before the first call to synchronize_sched().
> > > >> >
> > > >> > If so, following change would be sufficient.
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >> > ------>8-------
> > > >> > diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
> > > >> > index 555e21f..4f6471f 100644
> > > >> > --- a/kernel/time/jiffies.c
> > > >> > +++ b/kernel/time/jiffies.c
> > > >> > @@ -98,7 +98,7 @@ static int __init init_jiffies_clocksource(void)
> > > >> >         return __clocksource_register(&clocksource_jiffies);
> > > >> >  }
> > > >> >
> > > >> > -core_initcall(init_jiffies_clocksource);
> > > >> > +early_initcall(init_jiffies_clocksource);
> > > >> >
> > > >> >  struct clocksource * __init __weak clocksource_default_clock(void)
> > > >> >  {
> > > >>
> > > >> Thanks for your patch!
> > > >>
> > > >> While this does move jiffies clocksource initialization before secondary CPU
> > > >> bringup, it still hangs when calling call_rcu() or synchronize_sched():
> > > >>
> > > >>   CPU: Testing write buffer coherency: ok
> > > >>   CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> > > >>   Setting up static identity map for 0x40100000 - 0x40100058
> > > >>   cnt = 36, sync
> > > >>   clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> > > >> max_idle_ns: 19112604462750000 ns
> > > >>   CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> > > >>   Brought up 2 CPUs
> > > >>   SMP: Total of 2 processors activated (2132.00 BogoMIPS).
> > > >>   CPU: All CPU(s) started in SVC mode.
> > > >>   RCU: rcu_sched GP kthread: c784e1c0 state: 1 flags: 0x0 g:-300 c:-300
> > > >>        jiffies: 0xffff8ad0  GP start: 0x0 Last GP activity: 0x0
> > > >>   rcu_node tree layout dump
> > > >>    0:1/0x0/0x3 ^0
> > > >
> > > > This is in fact the initial state for RCU grace periods.  In other words,
> > > > all the earlier calls to synchronize_sched() likely happened while there
> > > > was only one CPU online.
> > > >
> > > >>   devtmpfs: initialized
> > > >>   VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
> > > >
> > > > Could you please add the call_rcu() and timed delay as described in my
> > > > earlier email?  That would hopefully help me see the state of the stalled
> > > > grace period.
> > > 
> > > I already did, cfr. "it still hangs when calling call_rcu() or
> > > synchronize_sched()".
> > 
> > Ah, sorry for my inattention.
> > 
> > I am a bit surprised that it could hang when calling call_rcu(), given
> > that call_rcu() is callable from atomic contexts.  Could you please show
> > me the current test code you have?
> > 
> > If the hang is in call_rcu(), could you please try disabling irqs across
> > the call to call_rcu()?
> 
> These are my local changes:
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c7f1bc4f817c4a34..50bea263e510006f 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4707,11 +4707,16 @@ static void __init rcu_init_geometry(void)
>   * Dump out the structure of the rcu_node combining tree associated
>   * with the rcu_state structure referenced by rsp.
>   */
> -static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
> +static void rcu_dump_rcu_node_tree(struct rcu_state *rsp)
>  {
>  	int level = 0;
>  	struct rcu_node *rnp;
> 
> +	pr_info("RCU: %s GP kthread: %p state: %d flags: %#x g:%ld c:%ld\n",
> +		rsp->name, rsp->gp_kthread, rsp->gp_state, rsp->gp_flags,
> +		(long)rsp->gpnum, (long)rsp->completed);
> +	pr_info("     jiffies: %#lx  GP start: %#lx Last GP activity: %#lx\n",
> +		jiffies, rsp->gp_start, rsp->gp_activity);
>  	pr_info("rcu_node tree layout dump\n");
>  	pr_info(" ");
>  	rcu_for_each_node_breadth_first(rsp, rnp) {
> @@ -4720,11 +4725,32 @@ static void __init rcu_dump_rcu_node_tree(struct rcu_state *rsp)
>  			pr_info(" ");
>  			level = rnp->level;
>  		}
> -		pr_cont("%d:%d ^%d  ", rnp->grplo, rnp->grphi, rnp->grpnum);
> +		pr_cont("%d:%d/%#lx/%#lx ^%d  ", rnp->grplo, rnp->grphi,
> +			rnp->qsmask,
> +			rnp->qsmaskinit | rnp->qsmaskinitnext, rnp->grpnum);
>  	}
>  	pr_cont("\n");
>  }
> 
> +static void do_nothing_cb(struct rcu_head *rcu_head)
> +{
> +}
> +
> +void rcu_dump_rcu_sched_tree(void)
> +{
> +	struct rcu_head rh;
> +	unsigned long flags;
> +
> +	rcu_dump_rcu_node_tree(&rcu_sched_state);  /* Initial state. */
> +	local_irq_save(flags);
> +	// call_rcu(&rh, do_nothing_cb);
> +	local_irq_restore(flags);
> +	// schedule_timeout_uninterruptible(5 * HZ);  /* Or whatever delay. */
> +	rcu_dump_rcu_node_tree(&rcu_sched_state); /* GP state. */
> +	//synchronize_sched();  /* Probably hangs. */
> +	//rcu_barrier();  /* Drop RCU's references to rh before return. */
> +}
> +
>  void __init rcu_init(void)
>  {
>  	int cpu;
> diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
> index 555e21f7b966c789..4f6471f54f69a6fe 100644
> --- a/kernel/time/jiffies.c
> +++ b/kernel/time/jiffies.c
> @@ -98,7 +98,7 @@ static int __init init_jiffies_clocksource(void)
>  	return __clocksource_register(&clocksource_jiffies);
>  }
> 
> -core_initcall(init_jiffies_clocksource);
> +early_initcall(init_jiffies_clocksource);
> 
>  struct clocksource * __init __weak clocksource_default_clock(void)
>  {
> diff --git a/mm/slab.c b/mm/slab.c
> index cc8bbc1e6bc9b6fe..f9b2f50adc705173 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -909,6 +909,8 @@ static int init_cache_node_node(int node)
>  	return 0;
>  }
> 
> +extern void rcu_dump_rcu_sched_tree(void);
> +
>  static int setup_kmem_cache_node(struct kmem_cache *cachep,
>  				int node, gfp_t gfp, bool force_change)
>  {
> @@ -964,8 +966,19 @@ static int setup_kmem_cache_node(struct kmem_cache *cachep,
>  	 * guaranteed to be valid until irq is re-enabled, because it will be
>  	 * freed after synchronize_sched().
>  	 */
> -	if (force_change)
> -		synchronize_sched();
> +	if (force_change) {
> +		static int cnt;
> +
> +		if (++cnt < 37) {
> +printk("cnt = %d, sync\n", cnt);
> +			synchronize_sched();
> +		} else if (cnt == 37) {
> +printk("cnt = %d, dump\n", cnt);
> +			rcu_dump_rcu_sched_tree();
> +		} else {
> +printk("cnt = %d\n", cnt);
> +		}
> +	}
> 
>  fail:
>  	kfree(old_shared);
> 
> 
> With this it boots fine:
> 
>     ...
>     cnt = 35, sync
>     CPU: Testing write buffer coherency: ok
>     CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>     Setting up static identity map for 0x40100000 - 0x40100058
>     cnt = 36, sync
>     clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
>     CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>     Brought up 2 CPUs
>     SMP: Total of 2 processors activated (2132.00 BogoMIPS).
>     CPU: All CPU(s) started in SVC mode.
>     cnt = 37, dump
>     RCU: rcu_sched GP kthread: c784e1c0 state: 1 flags: 0x0 g:-300 c:-300
> 	 jiffies: 0xffff8ad0  GP start: 0x0 Last GP activity: 0x0
>     rcu_node tree layout dump
>      0:1/0x0/0x3 ^0  
>     RCU: rcu_sched GP kthread: c784e1c0 state: 1 flags: 0x0 g:-300 c:-300
> 	 jiffies: 0xffff8ad0  GP start: 0x0 Last GP activity: 0x0
>     rcu_node tree layout dump
>      0:1/0x0/0x3 ^0  
>     devtmpfs: initialized
>     VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 1
>     cnt = 38
>     cnt = 39
>     ...
> 
> When enabling any of the 4 commented-out lines in rcu_dump_rcu_sched_tree(),
> it will lock up.

OK, but that includes schedule_timeout_uninterruptible(5 * HZ), right?

If schedule_timeout_uninterruptible() hangs, RCU grace periods have no
chance of completing.

I am still surprised that call_rcu() hangs, but timed wait not working
will definitely cause RCU grace-period hangs.  Perhaps fixing timed
waits will also prevent the call_rcu() hang.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache)
  2016-06-30 16:52                                                       ` Paul E. McKenney
@ 2016-06-30 17:54                                                         ` Geert Uytterhoeven
  0 siblings, 0 replies; 33+ messages in thread
From: Geert Uytterhoeven @ 2016-06-30 17:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Paul,

On Thu, Jun 30, 2016 at 6:52 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Thu, Jun 30, 2016 at 05:53:42PM +0200, Geert Uytterhoeven wrote:
>> +void rcu_dump_rcu_sched_tree(void)
>> +{
>> +     struct rcu_head rh;
>> +     unsigned long flags;
>> +
>> +     rcu_dump_rcu_node_tree(&rcu_sched_state);  /* Initial state. */
>> +     local_irq_save(flags);
>> +     // call_rcu(&rh, do_nothing_cb);
>> +     local_irq_restore(flags);
>> +     // schedule_timeout_uninterruptible(5 * HZ);  /* Or whatever delay. */
>> +     rcu_dump_rcu_node_tree(&rcu_sched_state); /* GP state. */
>> +     //synchronize_sched();  /* Probably hangs. */
>> +     //rcu_barrier();  /* Drop RCU's references to rh before return. */
>> +}

>>
>> When enabling any of the 4 commented-out lines in rcu_dump_rcu_sched_tree(),
>> it will lock up.
>
> OK, but that includes schedule_timeout_uninterruptible(5 * HZ), right?

Yes it does.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-06-30 17:54 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAMuHMdXC=zEjbZADE5wELjOq_kBiFNewpdUrMCe8d3Utu98h8A@mail.gmail.com>
2016-06-14  6:24 ` Boot failure on emev2/kzm9d (was: Re: [PATCH v2 11/11] mm/slab: lockless decision to grow cache) Joonsoo Kim
2016-06-14  7:31   ` Geert Uytterhoeven
2016-06-14  8:11     ` Joonsoo Kim
2016-06-14 10:45       ` Geert Uytterhoeven
2016-06-15  2:23         ` Joonsoo Kim
2016-06-15  8:39           ` Geert Uytterhoeven
2016-06-20  6:39             ` Joonsoo Kim
2016-06-20 13:12               ` Paul E. McKenney
2016-06-21  6:43                 ` Joonsoo Kim
2016-06-21 12:54                   ` Paul E. McKenney
2016-06-22  0:52                     ` Joonsoo Kim
2016-06-22  3:15                       ` Paul E. McKenney
2016-06-22 15:01                       ` Geert Uytterhoeven
2016-06-22 19:08                         ` Paul E. McKenney
2016-06-23  0:49                           ` Paul E. McKenney
2016-06-23  2:37                             ` Joonsoo Kim
2016-06-23  2:47                               ` Paul E. McKenney
2016-06-23  2:53                                 ` Paul E. McKenney
2016-06-28  0:12                                   ` Paul E. McKenney
2016-06-28  8:33                                     ` Joonsoo Kim
2016-06-29 14:54                                   ` Geert Uytterhoeven
2016-06-29 16:44                                     ` Paul E. McKenney
2016-06-29 17:52                                       ` Geert Uytterhoeven
2016-06-29 18:12                                         ` Paul E. McKenney
2016-06-30  7:47                                           ` Joonsoo Kim
2016-06-30  7:58                                             ` Geert Uytterhoeven
2016-06-30 13:24                                               ` Paul E. McKenney
2016-06-30 13:31                                                 ` Geert Uytterhoeven
2016-06-30 15:18                                                   ` Paul E. McKenney
2016-06-30 15:53                                                     ` Geert Uytterhoeven
2016-06-30 16:52                                                       ` Paul E. McKenney
2016-06-30 17:54                                                         ` Geert Uytterhoeven
2016-06-14 13:10 ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).