All of lore.kernel.org
 help / color / mirror / Atom feed
From: Veronika Kabatova <vkabatov@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: CKI Project <cki-project@redhat.com>,
	linux-block@vger.kernel.org, Changhui Zhong <czhong@redhat.com>,
	Rachel Sibley <rasibley@redhat.com>
Subject: Re: 💥 PANICKED: Test report for kernel 5.9.0-rc3-020ad03.cki (block)
Date: Thu, 3 Sep 2020 15:58:10 -0400 (EDT)	[thread overview]
Message-ID: <1300213431.10047993.1599163090152.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <ad1bf306-6f23-9b7c-842f-766a6efbda3e@redhat.com>



----- Original Message -----
> From: "Rachel Sibley" <rasibley@redhat.com>
> To: "Jens Axboe" <axboe@kernel.dk>, "CKI Project" <cki-project@redhat.com>, linux-block@vger.kernel.org
> Cc: "Changhui Zhong" <czhong@redhat.com>
> Sent: Thursday, September 3, 2020 8:59:48 PM
> Subject: Re: 💥 PANICKED: Test report for kernel 5.9.0-rc3-020ad03.cki (block)
> 
> 
> 
> On 9/3/20 1:46 PM, Jens Axboe wrote:
> > On 9/3/20 11:10 AM, Rachel Sibley wrote:
> >>
> >> On 9/3/20 1:07 PM, CKI Project wrote:
> >>>
> >>> Hello,
> >>>
> >>> We ran automated tests on a recent commit from this kernel tree:
> >>>
> >>>          Kernel repo:
> >>>          https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
> >>>               Commit: 020ad0333b03 - Merge branch 'for-5.10/block' into
> >>>               for-next
> >>>
> >>> The results of these automated tests are provided below.
> >>>
> >>>       Overall result: FAILED (see details below)
> >>>                Merge: OK
> >>>              Compile: OK
> >>>                Tests: PANICKED
> >>>
> >>> All kernel binaries, config files, and logs are available for download
> >>> here:
> >>>
> >>>     https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=datawarehouse/2020/09/02/613166
> >>>
> >>> One or more kernel tests failed:
> >>>
> >>>       ppc64le:
> >>>        💥 storage: software RAID testing
> >>>
> >>>       aarch64:
> >>>        💥 storage: software RAID testing
> >>>
> >>>       x86_64:
> >>>        💥 storage: software RAID testing
> >>
> >> Hello,
> >>
> >> We're seeing a panic for all non s390x arches triggered by swraid test.
> >> Seems to be reproducible
> >> for all succeeding pipelines after this one, and we haven't yet seen it in
> >> mainline or yesterday's
> >> block tree results.
> >>
> >> Thank you,
> >> Rachel
> >>
> >> https://cki-artifacts.s3.us-east-2.amazonaws.com/datawarehouse/2020/09/02/613166/build_aarch64_redhat%3A968098/tests/8757835_aarch64_3_console.log
> >>
> >> [ 8394.609219] Internal error: Oops: 96000004 [#1] SMP
> >> [ 8394.614070] Modules linked in: raid0 loop raid456 async_raid6_recov
> >> async_memcpy async_pq async_xor async_tx dm_log_writes dm_flakey
> >> rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache
> >> rfkill sunrpc vfat fat xgene_hwmon xgene_enet at803x mdio_xgene xgene_rng
> >> xgene_edac mailbox_xgene_slimpro drm ip_tables xfs sdhci_of_arasan
> >> sdhci_pltfm i2c_xgene_slimpro crct10dif_ce sdhci gpio_dwapb cqhci
> >> xhci_plat_hcd
> >> gpio_xgene_sb gpio_keys aes_neon_bs
> >> [ 8394.654298] CPU: 3 PID: 471427 Comm: kworker/3:2 Kdump: loaded Not
> >> tainted 5.9.0-rc3-020ad03.cki #1
> >> [ 8394.663299] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene
> >> Mustang Board, BIOS 3.06.25 Oct 17 2016
> >> [ 8394.672999] Workqueue: md_misc mddev_delayed_delete
> >> [ 8394.677853] pstate: 40400085 (nZcv daIf +PAN -UAO BTYPE=--)
> >> [ 8394.683399] pc : percpu_ref_exit+0x5c/0xc8
> >> [ 8394.687473] lr : percpu_ref_exit+0x20/0xc8
> >> [ 8394.691547] sp : ffff800019f33d00
> >> [ 8394.694843] x29: ffff800019f33d00 x28: 0000000000000000
> >> [ 8394.700129] x27: ffff0003c63ae000 x26: ffff8000120b6228
> >> [ 8394.705414] x25: 0000000000000001 x24: ffff0003d8322a80
> >> [ 8394.710698] x23: 0000000000000000 x22: 0000000000000000
> >> [ 8394.715983] x21: 0000000000000000 x20: ffff8000121d2000
> >> [ 8394.721266] x19: ffff0003d8322af0 x18: 0000000000000000
> >> [ 8394.726550] x17: 0000000000000000 x16: 0000000000000000
> >> [ 8394.731834] x15: 0000000000000007 x14: 0000000000000003
> >> [ 8394.737119] x13: 0000000000000000 x12: ffff0003888a1978
> >> [ 8394.742403] x11: ffff0003888a1918 x10: 0000000000000001
> >> [ 8394.747688] x9 : 0000000000000000 x8 : 0000000000000000
> >> [ 8394.752972] x7 : 0000000000000400 x6 : 0000000000000001
> >> [ 8394.758257] x5 : ffff800010423030 x4 : ffff8000121d2e40
> >> [ 8394.763540] x3 : 0000000000000000 x2 : 0000000000000000
> >> [ 8394.768825] x1 : 0000000000000000 x0 : 0000000000000000
> >> [ 8394.774110] Call trace:
> >> [ 8394.776544]  percpu_ref_exit+0x5c/0xc8
> >> [ 8394.780273]  md_free+0x64/0xa0
> >> [ 8394.783311]  kobject_put+0x7c/0x218
> >> [ 8394.786781]  mddev_delayed_delete+0x3c/0x50
> >> [ 8394.790944]  process_one_work+0x1c4/0x450
> >> [ 8394.794932]  worker_thread+0x164/0x4a8
> >> [ 8394.798662]  kthread+0xf4/0x120
> >> [ 8394.801787]  ret_from_fork+0x10/0x18
> >> [ 8394.805344] Code: 2a0403e0 350002c0 a9400262 52800001 (f9400000)
> >> [ 8394.811407] ---[ end trace 481cab6e1ad73da1 ]---
> > 
> > Ming, I wonder if this is:
> > 
> > commit d0c567d60f3730b97050347ea806e1ee06445c78
> > Author: Ming Lei <ming.lei@redhat.com>
> > Date:   Wed Sep 2 20:26:42 2020 +0800
> > 
> >      percpu_ref: reduce memory footprint of percpu_ref in fast path
> > 
> > Rachel, any chance you can do a run with that commit reverted?
> 
> Hi Jens, yes we're working on it and will share our findings as soon as the
> job finishes.
> 

Hi Jens, we can confirm that there are no panics and the test passes
with the patch reverted.


We also realized that this patch is a likely cause of serious problems
on ppc64le during LTP testing as well, specifically msgstress04. Both
issues started occurring at the same time, we just didn't notice as the
test was crashing.


[ 5682.999169] msgstress04 invoked oom-killer: gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=0, oom_score_adj=0 
[ 5682.999981] CPU: 1 PID: 170909 Comm: msgstress04 Kdump: loaded Not tainted 5.9.0-rc3-020ad03.cki #1 
[ 5683.000048] Call Trace: 
[ 5683.000098] [c00000023de972e0] [c000000000927e00] dump_stack+0xc4/0x114 (unreliable) 
[ 5683.000161] [c00000023de97330] [c000000000386958] dump_header+0x64/0x274 
[ 5683.000205] [c00000023de973c0] [c000000000385534] oom_kill_process+0x284/0x290 
[ 5683.000259] [c00000023de97400] [c0000000003862b0] out_of_memory+0x220/0x790 
[ 5683.000307] [c00000023de974a0] [c000000000408890] __alloc_pages_slowpath.constprop.0+0xd60/0xeb0 
[ 5683.000370] [c00000023de97670] [c000000000408d20] __alloc_pages_nodemask+0x340/0x400 
[ 5683.000426] [c00000023de97700] [c000000000434dec] alloc_pages_current+0xac/0x130 
[ 5683.000479] [c00000023de97750] [c000000000442fc4] allocate_slab+0x584/0x810 
[ 5683.000525] [c00000023de977c0] [c000000000447e7c] ___slab_alloc+0x44c/0xa30 
[ 5683.000571] [c00000023de978b0] [c000000000448494] __slab_alloc+0x34/0x60 
[ 5683.000615] [c00000023de978e0] [c000000000448b48] kmem_cache_alloc+0x688/0x700 
[ 5683.000671] [c00000023de97940] [c0000000003d9c80] __pud_alloc+0x70/0x1e0 
[ 5683.000717] [c00000023de97990] [c0000000003ddbb4] copy_page_range+0x1204/0x1490 
[ 5683.000779] [c00000023de97b20] [c00000000013b7c0] dup_mm+0x370/0x6e0 
[ 5683.000826] [c00000023de97bd0] [c00000000013ce10] copy_process+0xd20/0x1950 
[ 5683.000870] [c00000023de97c90] [c00000000013dc64] _do_fork+0xa4/0x560 
[ 5683.000915] [c00000023de97d00] [c00000000013e24c] __do_sys_clone+0x7c/0xa0 
[ 5683.000965] [c00000023de97dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0 
[ 5683.001019] [c00000023de97e20] [c00000000000d140] system_call_common+0xf0/0x27c 

The test then manages the fill the console log with good 4G of dump...
this is actually visible in the ppc64le console log from the linked
artifacts (warnings, it's a huge file!):

https://cki-artifacts.s3.us-east-2.amazonaws.com/datawarehouse/2020/09/02/613166/build_ppc64le_redhat%3A968099/tests/8757368_ppc64le_3_console.log


There are also more ppc64le traces in the other log (of reasonable size):
https://cki-artifacts.s3.us-east-2.amazonaws.com/datawarehouse/2020/09/02/613166/build_ppc64le_redhat%3A968099/tests/8757337_ppc64le_2_console.log


Veronika

> > 
> 
> 


  reply	other threads:[~2020-09-03 19:58 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-03 17:07 💥 PANICKED: Test report for kernel 5.9.0-rc3-020ad03.cki (block) CKI Project
2020-09-03 17:10 ` Rachel Sibley
2020-09-03 17:46   ` Jens Axboe
2020-09-03 18:59     ` Rachel Sibley
2020-09-03 19:58       ` Veronika Kabatova [this message]
2020-09-03 20:53         ` Jens Axboe
2020-09-04  3:22           ` Ming Lei
2020-09-04  3:37             ` Jens Axboe
2020-09-04  4:24               ` Ming Lei
2020-09-04 15:06                 ` Jens Axboe
2020-09-04  1:02 ` Ming Lei
2020-09-04 11:06   ` Veronika Kabatova
2020-09-06  3:19     ` 💥 PANICKED: Test report for?kernel " Ming Lei
2020-09-07 18:49       ` 💥 PANICKED: Test report for kernel " Veronika Kabatova

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1300213431.10047993.1599163090152.JavaMail.zimbra@redhat.com \
    --to=vkabatov@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=cki-project@redhat.com \
    --cc=czhong@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=rasibley@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.