From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF505C04AAF for ; Sat, 18 May 2019 08:27:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8816F20848 for ; Sat, 18 May 2019 08:27:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728112AbfERI1u (ORCPT ); Sat, 18 May 2019 04:27:50 -0400 Received: from mx2.suse.de ([195.135.220.15]:53010 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727156AbfERI1t (ORCPT ); Sat, 18 May 2019 04:27:49 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 6E4ECAE6D; Sat, 18 May 2019 08:27:48 +0000 (UTC) Subject: Re: storm-of-soft-lockups: spinlocks running on all cores, preventing forward progress (4.14- to 5.0+) To: Zygo Blaxell Cc: linux-btrfs@vger.kernel.org References: <20190515213650.GG20359@hungrycats.org> <0480104e-db25-4e2f-08e5-0236ffd5c1c2@suse.com> <20190518044411.GH20359@hungrycats.org> From: Nikolay Borisov Openpgp: preference=signencrypt Autocrypt: addr=nborisov@suse.com; prefer-encrypt=mutual; keydata= mQINBFiKBz4BEADNHZmqwhuN6EAzXj9SpPpH/nSSP8YgfwoOqwrP+JR4pIqRK0AWWeWCSwmZ T7g+RbfPFlmQp+EwFWOtABXlKC54zgSf+uulGwx5JAUFVUIRBmnHOYi/lUiE0yhpnb1KCA7f u/W+DkwGerXqhhe9TvQoGwgCKNfzFPZoM+gZrm+kWv03QLUCr210n4cwaCPJ0Nr9Z3c582xc bCUVbsjt7BN0CFa2BByulrx5xD9sDAYIqfLCcZetAqsTRGxM7LD0kh5WlKzOeAXj5r8DOrU2 GdZS33uKZI/kZJZVytSmZpswDsKhnGzRN1BANGP8sC+WD4eRXajOmNh2HL4P+meO1TlM3GLl EQd2shHFY0qjEo7wxKZI1RyZZ5AgJnSmehrPCyuIyVY210CbMaIKHUIsTqRgY5GaNME24w7h TyyVCy2qAM8fLJ4Vw5bycM/u5xfWm7gyTb9V1TkZ3o1MTrEsrcqFiRrBY94Rs0oQkZvunqia c+NprYSaOG1Cta14o94eMH271Kka/reEwSZkC7T+o9hZ4zi2CcLcY0DXj0qdId7vUKSJjEep c++s8ncFekh1MPhkOgNj8pk17OAESanmDwksmzh1j12lgA5lTFPrJeRNu6/isC2zyZhTwMWs k3LkcTa8ZXxh0RfWAqgx/ogKPk4ZxOXQEZetkEyTFghbRH2BIwARAQABtCNOaWtvbGF5IEJv cmlzb3YgPG5ib3Jpc292QHN1c2UuY29tPokCOAQTAQIAIgUCWIo48QIbAwYLCQgHAwIGFQgC CQoLBBYCAwECHgECF4AACgkQcb6CRuU/KFc0eg/9GLD3wTQz9iZHMFbjiqTCitD7B6dTLV1C ddZVlC8Hm/TophPts1bWZORAmYIihHHI1EIF19+bfIr46pvfTu0yFrJDLOADMDH+Ufzsfy2v HSqqWV/nOSWGXzh8bgg/ncLwrIdEwBQBN9SDS6aqsglagvwFD91UCg/TshLlRxD5BOnuzfzI Leyx2c6YmH7Oa1R4MX9Jo79SaKwdHt2yRN3SochVtxCyafDlZsE/efp21pMiaK1HoCOZTBp5 VzrIP85GATh18pN7YR9CuPxxN0V6IzT7IlhS4Jgj0NXh6vi1DlmKspr+FOevu4RVXqqcNTSS E2rycB2v6cttH21UUdu/0FtMBKh+rv8+yD49FxMYnTi1jwVzr208vDdRU2v7Ij/TxYt/v4O8 V+jNRKy5Fevca/1xroQBICXsNoFLr10X5IjmhAhqIH8Atpz/89ItS3+HWuE4BHB6RRLM0gy8 T7rN6ja+KegOGikp/VTwBlszhvfLhyoyjXI44Tf3oLSFM+8+qG3B7MNBHOt60CQlMkq0fGXd mm4xENl/SSeHsiomdveeq7cNGpHi6i6ntZK33XJLwvyf00PD7tip/GUj0Dic/ZUsoPSTF/mG EpuQiUZs8X2xjK/AS/l3wa4Kz2tlcOKSKpIpna7V1+CMNkNzaCOlbv7QwprAerKYywPCoOSC 7P25Ag0EWIoHPgEQAMiUqvRBZNvPvki34O/dcTodvLSyOmK/MMBDrzN8Cnk302XfnGlW/YAQ csMWISKKSpStc6tmD+2Y0z9WjyRqFr3EGfH1RXSv9Z1vmfPzU42jsdZn667UxrRcVQXUgoKg QYx055Q2FdUeaZSaivoIBD9WtJq/66UPXRRr4H/+Y5FaUZx+gWNGmBT6a0S/GQnHb9g3nonD jmDKGw+YO4P6aEMxyy3k9PstaoiyBXnzQASzdOi39BgWQuZfIQjN0aW+Dm8kOAfT5i/yk59h VV6v3NLHBjHVw9kHli3jwvsizIX9X2W8tb1SefaVxqvqO1132AO8V9CbE1DcVT8fzICvGi42 FoV/k0QOGwq+LmLf0t04Q0csEl+h69ZcqeBSQcIMm/Ir+NorfCr6HjrB6lW7giBkQl6hhomn l1mtDP6MTdbyYzEiBFcwQD4terc7S/8ELRRybWQHQp7sxQM/Lnuhs77MgY/e6c5AVWnMKd/z MKm4ru7A8+8gdHeydrRQSWDaVbfy3Hup0Ia76J9FaolnjB8YLUOJPdhI2vbvNCQ2ipxw3Y3c KhVIpGYqwdvFIiz0Fej7wnJICIrpJs/+XLQHyqcmERn3s/iWwBpeogrx2Lf8AGezqnv9woq7 OSoWlwXDJiUdaqPEB/HmGfqoRRN20jx+OOvuaBMPAPb+aKJyle8zABEBAAGJAh8EGAECAAkF AliKBz4CGwwACgkQcb6CRuU/KFdacg/+M3V3Ti9JYZEiIyVhqs+yHb6NMI1R0kkAmzsGQ1jU zSQUz9AVMR6T7v2fIETTT/f5Oout0+Hi9cY8uLpk8CWno9V9eR/B7Ifs2pAA8lh2nW43FFwp IDiSuDbH6oTLmiGCB206IvSuaQCp1fed8U6yuqGFcnf0ZpJm/sILG2ECdFK9RYnMIaeqlNQm iZicBY2lmlYFBEaMXHoy+K7nbOuizPWdUKoKHq+tmZ3iA+qL5s6Qlm4trH28/fPpFuOmgP8P K+7LpYLNSl1oQUr+WlqilPAuLcCo5Vdl7M7VFLMq4xxY/dY99aZx0ZJQYFx0w/6UkbDdFLzN upT7NIN68lZRucImffiWyN7CjH23X3Tni8bS9ubo7OON68NbPz1YIaYaHmnVQCjDyDXkQoKC R82Vf9mf5slj0Vlpf+/Wpsv/TH8X32ajva37oEQTkWNMsDxyw3aPSps6MaMafcN7k60y2Wk/ TCiLsRHFfMHFY6/lq/c0ZdOsGjgpIK0G0z6et9YU6MaPuKwNY4kBdjPNBwHreucrQVUdqRRm RcxmGC6ohvpqVGfhT48ZPZKZEWM+tZky0mO7bhZYxMXyVjBn4EoNTsXy1et9Y1dU3HVJ8fod 5UqrNrzIQFbdeM0/JqSLrtlTcXKJ7cYFa9ZM2AP7UIN9n1UWxq+OPY9YMOewVfYtL8M= Message-ID: <349ec27f-7eb4-45be-7cd4-29c2cd56d05b@suse.com> Date: Sat, 18 May 2019 11:27:46 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190518044411.GH20359@hungrycats.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On 18.05.19 г. 7:44 ч., Zygo Blaxell wrote: > On Thu, May 16, 2019 at 09:57:01AM +0300, Nikolay Borisov wrote: >> >> >> On 16.05.19 г. 0:36 ч., Zygo Blaxell wrote: >>> "Storm-of-soft-lockups" is a failure mode where btrfs puts all of the >>> CPU cores in kernel functions that are unable to make forward progress, >>> but also unwilling to release their respective CPU cores. This is >>> usually accompanied by a lot of CPU usage (detectable as either kvm CPU >>> usage or just a lot of CPU fan noise) though I don't know if all cores >>> are spinning or only some of them. >>> >>> The kernel console presents a continual stream of "BUG: soft lockup" >>> warnings for some days. None of the call traces change during this time. >>> The only way out is to reboot. >>> >>> You can reproduce this by writing a bunch of data to a filesystem while >>> bees is running on all cores. It takes a few days to occur naturally. >>> It can probably be sped up by just doing a bunch of random LOGICAL_INO >>> ioctls in a tight loop on each core. >>> >>> Here's an instance on a 4-CPU VM where CPU#0 is running btrfs-transaction >>> (btrfs_try_tree_write_lock) and CPU#1-3 are running the LOGICAL_INO >>> ioctl (btrfs_tree_read_lock_atomic): >> >> >> Provide output of all sleeping threads when this occur via >> echo w > /proc/sysrq-trigger. > > I fixed my SysRq configuration. The results are...kind of interesting. > I guess the four threads that are running loops on all cores don't count > as "blocked"... > > > All the blocked threads are waiting for a snapshot, which is here: > > [42829.412010] btrfs R running task 0 11306 6447 0x00000000 > [42829.413392] Call Trace: > [42829.413895] __schedule+0x3d4/0xb70 > [42829.414598] preempt_schedule_common+0x1f/0x30 > [42829.415480] _cond_resched+0x22/0x30 > [42829.416198] wait_for_common_io.constprop.2+0x47/0x1b0 > [42829.417212] ? submit_bio+0x73/0x140 > [42829.417932] wait_for_completion_io+0x18/0x20 > [42829.418791] submit_bio_wait+0x68/0x90 > [42829.419546] blkdev_issue_discard+0x80/0xd0 > [42829.420381] btrfs_issue_discard+0xc7/0x160 > [42829.421215] ? btrfs_issue_discard+0xc7/0x160 > [42829.422088] btrfs_discard_extent+0xcc/0x160 > [42829.423191] btrfs_finish_extent_commit+0x118/0x280 > [42829.424310] btrfs_commit_transaction+0x7f9/0xab0 > [42829.425231] ? wait_woken+0xa0/0xa0 > [42829.425907] btrfs_mksubvol+0x5b4/0x630 > [42829.426766] ? mnt_want_write_file+0x28/0x60 > [42829.427597] btrfs_ioctl_snap_create_transid+0x16b/0x1a0 > [42829.428614] btrfs_ioctl_snap_create_v2+0x125/0x180 > [42829.429548] btrfs_ioctl+0x1351/0x2cb0 > [42829.430272] ? __handle_mm_fault+0x110c/0x1950 > [42829.431124] ? do_raw_spin_unlock+0x4d/0xc0 > [42829.431934] ? _raw_spin_unlock+0x27/0x40 > [42829.432704] ? __handle_mm_fault+0x110c/0x1950 > [42829.433556] ? kvm_sched_clock_read+0x18/0x30 > [42829.434437] do_vfs_ioctl+0xa6/0x6e0 > [42829.435345] ? btrfs_ioctl_get_supported_features+0x30/0x30 > [42829.436803] ? do_vfs_ioctl+0xa6/0x6e0 > [42829.437528] ? up_read+0x1f/0xa0 > [42829.438159] ksys_ioctl+0x75/0x80 > [42829.438798] __x64_sys_ioctl+0x1a/0x20 > [42829.439565] do_syscall_64+0x65/0x1a0 > [42829.440510] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [42829.441463] RIP: 0033:0x7f1faa8f5777 > [42829.442133] Code: Bad RIP value. > [42829.442736] RSP: 002b:00007ffec1520458 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 > [42829.444352] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1faa8f5777 > [42829.445710] RDX: 00007ffec1520498 RSI: 0000000050009417 RDI: 0000000000000003 > [42829.447030] RBP: 00005621c22ef260 R08: 0000000000000008 R09: 00007f1faa97fe80 > [42829.448372] R10: fffffffffffffa4a R11: 0000000000000202 R12: 00005621c22ef290 > [42829.449723] R13: 00005621c22ef290 R14: 0000000000000003 R15: 0000000000000004 So you are discards are hung or there is a lost wakeup (which is rather unlikely, since this is the block layer). What is the underlying block device i.e model/make > > Also I just noticed there's sometimes (but not always!) a BUG in > fs/btrfs/ctree.c just before all the soft lockups start: > > [26101.008546] ------------[ cut here ]------------ > [26101.016090] kernel BUG at fs/btrfs/ctree.c:1227! > [26101.018285] irq event stamp: 36913 > [26101.018287] invalid opcode: 0000 [#1] SMP PTI > [26101.018293] CPU: 1 PID: 4823 Comm: crawl_5268 Not tainted 5.0.16-zb64-9b948ea3083a+ #1 > [26101.019115] hardirqs last enabled at (36913): [] get_page_from_freelist+0x40c/0x19e0 > [26101.019118] hardirqs last disabled at (36912): [] get_page_from_freelist+0x350/0x19e0 > [26101.020820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > [26101.022456] softirqs last enabled at (36478): [] __do_softirq+0x3a0/0x45a > [26101.022461] softirqs last disabled at (36459): [] irq_exit+0xe9/0xf0 > [26101.024212] RIP: 0010:__tree_mod_log_rewind+0x239/0x240 > [26101.039031] Code: c0 48 89 df 48 89 d6 48 c1 e6 05 48 8d 74 32 65 ba 19 00 00 00 e8 87 02 05 00 e9 88 fe ff ff 49 63 57 2c e9 16 ff ff ff 0f 0b <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 83 c2 01 48 63 d2 48 89 e5 > [26101.042382] RSP: 0018:ffffb3f401613820 EFLAGS: 00010206 > [26101.043512] RAX: ffff9f8068690180 RBX: ffff9f7ebf2ba660 RCX: ffff9f8003b1eb80 > [26101.044719] RDX: 000000000000015d RSI: 000000000000007e RDI: 0000018aff0a8000 > [26101.046157] RBP: ffffb3f401613850 R08: ffffb3f4016137c8 R09: ffffb3f4016137d0 > [26101.047474] R10: 0000000000007af3 R11: 000000000000007e R12: 0000000000000008 > [26101.048779] R13: ffff9f7ea7d77d00 R14: 0000000000000a49 R15: ffff9f8068690180 > [26101.049939] FS: 00007f9064d7a700(0000) GS:ffff9f80b6000000(0000) knlGS:0000000000000000 > [26101.051415] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [26101.052330] CR2: 00007fcc462fd8a0 CR3: 000000020e002006 CR4: 00000000001606e0 > [26101.053615] Call Trace: > [26101.054103] btrfs_search_old_slot+0xfe/0x800 > [26101.054900] resolve_indirect_refs+0x1c5/0x910 > [26101.055734] ? prelim_ref_insert+0x10a/0x320 > [26101.056474] find_parent_nodes+0x443/0x1340 > [26101.057153] btrfs_find_all_roots_safe+0xc5/0x140 > [26101.057890] ? btrfs_inode_flags_to_fsflags+0x90/0x90 > [26101.058813] iterate_extent_inodes+0x198/0x3e0 > [26101.059608] iterate_inodes_from_logical+0xa1/0xd0 > [26101.060377] ? iterate_inodes_from_logical+0xa1/0xd0 > [26101.061142] ? btrfs_inode_flags_to_fsflags+0x90/0x90 > [26101.061975] btrfs_ioctl_logical_to_ino+0xf3/0x1a0 > [26101.062710] btrfs_ioctl+0xcf7/0x2cb0 > [26101.063318] ? __lock_acquire+0x477/0x1b50 > [26101.063988] ? kvm_sched_clock_read+0x18/0x30 > [26101.065010] ? kvm_sched_clock_read+0x18/0x30 > [26101.065781] ? sched_clock+0x9/0x10 > [26101.066402] do_vfs_ioctl+0xa6/0x6e0 > [26101.067072] ? btrfs_ioctl_get_supported_features+0x30/0x30 > [26101.068102] ? do_vfs_ioctl+0xa6/0x6e0 > [26101.068765] ? __fget+0x119/0x200 > [26101.069381] ksys_ioctl+0x75/0x80 > [26101.069937] __x64_sys_ioctl+0x1a/0x20 > [26101.070627] do_syscall_64+0x65/0x1a0 > [26101.071299] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [26101.072125] RIP: 0033:0x7f9067675777 > [26101.072732] Code: 00 00 90 48 8b 05 19 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 a6 0c 00 f7 d8 64 89 01 48 > [26101.075884] RSP: 002b:00007f9064d77458 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [26101.077193] RAX: ffffffffffffffda RBX: 00007f9064d77780 RCX: 00007f9067675777 > [26101.078362] RDX: 00007f9064d77788 RSI: 00000000c038943b RDI: 0000000000000004 > [26101.079543] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f9064d77960 > [26101.080837] R10: 0000565444959c40 R11: 0000000000000246 R12: 0000000000000004 > [26101.082724] R13: 00007f9064d77788 R14: 00007f9064d77668 R15: 00007f9064d77890 > [26101.084083] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache dm_persistent_data snd_pcm crct10dif_pclmul crc32_pclmul dm_bio_prison crc32c_intel ghash_clmulni_intel dm_bufio sr_mod snd_timer cdrom snd sg aesni_intel ppdev joydev aes_x86_64 dm_mod soundcore crypto_simd ide_pci_generic cryptd piix glue_helper psmouse input_leds parport_pc ide_core rtc_cmos pcspkr serio_raw bochs_drm evbug parport evdev floppy i2c_piix4 qemu_fw_cfg ip_tables x_tables ipv6 crc_ccitt autofs4 > [26101.091346] ---[ end trace d327561dc44a663d ]--- That looks like genuine bug, first I've ever seen the rewind code hitting it. The rest that follows doesn't seem relevant. > > This doesn't quite match the traces I previously posted. Maybe the > storm-of-soft-lockups is a symptom of multiple bugs (at least one which > happens after a bug in ctree.c and one that happens at other times)? > I'll run this a few more times and see if different cases come up. > > The one thing that is common to all the storm-of-soft-lockups I've seen > so far is the involvement of multiple crawl_* threads, and those spend > ~60% of their time running logical_to_ino. > >> Also do you have this patch on the affected machine: >> >> 38e3eebff643 ("btrfs: honor path->skip_locking in backref code") can you >> try and test with it applied ? >> >> >>