linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Baokun Li <libaokun1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Yi Zhang <yi.zhang@redhat.com>, Ming Lei <ming.lei@redhat.com>,
	mark.rutland@arm.com, Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
	Changhui Zhong <czhong@redhat.com>,
	yangerkun <yangerkun@huawei.com>,
	"zhangyi (F)" <yi.zhang@huawei.com>,
	Kees Cook <keescook@chromium.org>,
	chengzhihao <chengzhihao1@huawei.com>
Subject: Re: [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278]
Date: Mon, 18 Sep 2023 11:42:00 -0700	[thread overview]
Message-ID: <20230918184200.GA347993@frogsfrogsfrogs> (raw)
In-Reply-To: <a6b10684-39ee-960a-10ab-663746800f85@huawei.com>

On Mon, Sep 18, 2023 at 09:52:28AM +0800, Baokun Li wrote:
> On 2023/9/17 17:26, Peter Zijlstra wrote:
> > On Sun, Sep 17, 2023 at 11:10:32AM +0200, Peter Zijlstra wrote:
> > > On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote:
> > > > On 2023/9/13 16:59, Yi Zhang wrote:
> > > > > The issue still can be reproduced on the latest linux tree[2].
> > > > > To reproduce I need to run about 1000 times blktests block/001, and
> > > > > bisect shows it was introduced with commit[1], as it was not 100%
> > > > > reproduced, not sure if it's the culprit?
> > > > > 
> > > > > 
> > > > > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery
> > > > Hello, everyone!
> > > > 
> > > > We have confirmed that the merge-in of this patch caused hlist_bl_lock
> > > > (aka, bit_spin_lock) to fail, which in turn triggered the issue above.
> > > > [root@localhost ~]# insmod mymod.ko
> > > > [   37.994787][  T621] >>> a = 725, b = 724
> > > > [   37.995313][  T621] ------------[ cut here ]------------
> > > > [   37.995951][  T621] kernel BUG at fs/mymod/mymod.c:42!
> > > > [r[  oo 3t7@.l996o4c61al]h[o s T6t21] ~ ]#Int ernal error: Oops - BUG:
> > > > 00000000f2000800 [#1] SMP
> > > > [   37.997420][  T621] Modules linked in: mymod(E)
> > > > [   37.997891][  T621] CPU: 9 PID: 621 Comm: bl_lock_thread2 Tainted:
> > > > G            E      6.4.0-rc2-00034-g9257959a6e5b-dirty #117
> > > > [   37.999038][  T621] Hardware name: linux,dummy-virt (DT)
> > > > [   37.999571][  T621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS
> > > > BTYPE=--)
> > > > [   38.000344][  T621] pc : increase_ab+0xcc/0xe70 [mymod]
> > > > [   38.000882][  T621] lr : increase_ab+0xcc/0xe70 [mymod]
> > > > [   38.001416][  T621] sp : ffff800008b4be40
> > > > [   38.001822][  T621] x29: ffff800008b4be40 x28: 0000000000000000 x27:
> > > > 0000000000000000
> > > > [   38.002605][  T621] x26: 0000000000000000 x25: 0000000000000000 x24:
> > > > 0000000000000000
> > > > [   38.003385][  T621] x23: ffffd9930c698190 x22: ffff800008a0ba38 x21:
> > > > 0000000000000001
> > > > [   38.004174][  T621] x20: ffffffffffffefff x19: ffffd9930c69a580 x18:
> > > > 0000000000000000
> > > > [   38.004955][  T621] x17: 0000000000000000 x16: ffffd9933011bd38 x15:
> > > > ffffffffffffffff
> > > > [   38.005754][  T621] x14: 0000000000000000 x13: 205d313236542020 x12:
> > > > ffffd99332175b80
> > > > [   38.006538][  T621] x11: 0000000000000003 x10: 0000000000000001 x9 :
> > > > ffffd9933022a9d8
> > > > [   38.007325][  T621] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 :
> > > > ffffd993320b5b40
> > > > [   38.008124][  T621] x5 : ffff0001f7d1c708 x4 : 0000000000000000 x3 :
> > > > 0000000000000000
> > > > [   38.008912][  T621] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> > > > 0000000000000015
> > > > [   38.009709][  T621] Call trace:
> > > > [   38.010035][  T621]  increase_ab+0xcc/0xe70 [mymod]
> > > > [   38.010539][  T621]  kthread+0xdc/0xf0
> > > > [   38.010927][  T621]  ret_from_fork+0x10/0x20
> > > > [   38.011370][  T621] Code: 17ffffe0 90000020 91044000 9400000d (d4210000)
> > > > [   38.012067][  T621] ---[ end trace 0000000000000000 ]---
> > > Is this arm64 or something? You seem to have forgotten to mention what
> > > platform you're using.
> > Is that an LSE or LLSC arm64 ?
> 
> I'm not sure how to distinguish if it's LSE or LLSC, here's some info on the
> cpu:
> 
> $ cat /sys/devices/system/cpu/cpu0/regs/identification/midr_el1
> 0x00000000481fd010
> 
> $ lscpu
> Architecture:        aarch64
> Byte Order:          Little Endian
> CPU(s):              96
> On-line CPU(s) list: 0-95
> Thread(s) per core:  1
> Core(s) per socket:  48
> Socket(s):           2
> NUMA node(s):        4
> Vendor ID:           HiSilicon
> BIOS Vendor ID:      HiSilicon
> Model:               0
> Model name:          Kunpeng-920
> BIOS Model name:     Kunpeng 920-4826
> Stepping:            0x1
> BogoMIPS:            200.00
> L1d cache:           64K
> L1i cache:           64K
> L2 cache:            512K
> L3 cache:            49152K
> NUMA node0 CPU(s):   0-23
> NUMA node1 CPU(s):   24-47
> NUMA node2 CPU(s):   48-71
> NUMA node3 CPU(s):   72-95
> Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
> asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
> 
> > Anyway, it seems that ARM64 shouldn't be using the fallback as it does
> > everything itself.
> > 
> > Mark, can you have a look please? At first glance the
> > atomic64_fetch_or_acquire() that's being used by generic bitops/lock.h
> > seems in order..
> > 
> We also suspect some implicit mechanism change in
> raw_atomic64_fetch_or_acquire. You can reproduce the problem with the
> above mod that can reproduce the problem to make it easier to locate.
> I can help reproduce it and grab some information if you can't reproduce
> it on your end.

FWIW this looks a lot like the crash I reported last week:
https://lore.kernel.org/linux-fsdevel/ZQep0OR0uMmR%2Fwg3@dread.disaster.area/T/#t

Also arm64, but virtualized.  I /think/ the host is some Ampere box,
though I have no idea what kind since it's just some Oracle Cloud A1
instance.  The internet claims "Ampere Altra" processors[1].

# lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 2
  On-line CPU(s) list:  0,1
Vendor ID:              ARM
  Model name:           Neoverse-N1
    Model:              1
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          1
    Stepping:           r3p1
    BogoMIPS:           50.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32
atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
NUMA:                   
  NUMA node(s):         1
  NUMA node0 CPU(s):    0,1
Vulnerabilities:        
  Gather data sampling: Not affected
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Mmio stale data:      Not affected
  Retbleed:             Not affected
  Spec rstack overflow: Not affected
  Spec store bypass:    Vulnerable
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Mitigation; CSV2, but not BHB
  Srbds:                Not affected
  Tsx async abort:      Not affected

[1] https://www.oracle.com/cloud/compute/arm/ 

--D

> -- 
> With Best Regards,
> Baokun Li
> .

  reply	other threads:[~2023-09-18 18:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-23  4:06 [czhong@redhat.com: [bug report] WARNING: CPU: 121 PID: 93233 at fs/dcache.c:365 __dentry_kill+0x214/0x278] Ming Lei
2023-08-23  8:47 ` Christian Brauner
2023-08-28 10:43   ` Ming Lei
2023-09-13  8:59     ` Yi Zhang
2023-09-16  6:55       ` Baokun Li
2023-09-17  9:10         ` Peter Zijlstra
2023-09-17  9:26           ` Peter Zijlstra
2023-09-18  1:52             ` Baokun Li
2023-09-18 18:42               ` Darrick J. Wong [this message]
2023-09-18  1:10           ` Baokun Li
2023-09-18 10:20             ` Yi Zhang
2023-09-19 15:10         ` Mark Rutland
2023-09-17  0:35       ` Bagas Sanjaya
2023-09-29 13:24         ` Linux regression tracking #update (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230918184200.GA347993@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=brauner@kernel.org \
    --cc=chengzhihao1@huawei.com \
    --cc=czhong@redhat.com \
    --cc=keescook@chromium.org \
    --cc=libaokun1@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=ming.lei@redhat.com \
    --cc=peterz@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).