From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1A29433D6 for ; Sun, 22 Sep 2024 23:15:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727046925; cv=none; b=Ge70V4kbwzxiTd+6dpdwOhuGxEHjHWxWvHqDQEtB5LoWcneZB6muE/bDjFDMadrn9MFKQdjcAmpjcc9XymYW5zSyrFUtbrpm+HJwfdqRPGRtu0Idncehzd1pG4YCVJeALR757bVkRuygbUAQnE/qEKzEor3u7dBVC0ruVkef51o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727046925; c=relaxed/simple; bh=IOKFUjrmEw81LaaNRTH/ux2RuIwuKk4fePPJpmJ/xQo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=BsxgUJZnDCp4azjZAAZLqQd7nDpNOJ0gXqkwmDiLtLIwhhB6WHpyP9Va01LXct9djfdZjPWaYwg0TGAC8mdC4JX9FFEf4ANXprWY71/nyyPtfPoXrLKtZxY9fXI01MecJ2AoeVOSE42pDaXzMyYpr1mS3xJNtrHkbNy1bcYE4g0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com; spf=pass smtp.mailfrom=fromorbit.com; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b=Xo95JDKo; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fromorbit.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fromorbit-com.20230601.gappssmtp.com header.i=@fromorbit-com.20230601.gappssmtp.com header.b="Xo95JDKo" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-7198cb6bb02so2654215b3a.3 for ; Sun, 22 Sep 2024 16:15:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1727046923; x=1727651723; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=UlsXTJCziqLDA1kEeYpppJidhLEUsbWQlazGKgEoouE=; b=Xo95JDKo5sA84jZJM63vo79qX7S8Rk0oOR2Kp7ITmtxDERy4+PqDyEl/hsPN+o2pvD 2/RLK26ypWwNLId99pjzQ7/AN6W19NPpEA+ZC8WCnxaf2FnuOKpliByR4nL7zG6onaW0 /08k7Oam45liXPawKT2ZwcSSWLye+uj7Q2zijLWm0TFGI6tpOIiFAo96siCVu9rt1wI3 CQ5a7SbkcvMI3FjsjsCBfMHU7XLgNwO4Vox7gI87LPWYPK9H3tfP/KEtFHJpCPQpraZm UT8t7gZWwcIE7OHw33C4qahxqgCLAOo7sLVK8dR5xET9r4f3zGqFenxNvIjIygvWzxAN cZ/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727046923; x=1727651723; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=UlsXTJCziqLDA1kEeYpppJidhLEUsbWQlazGKgEoouE=; b=AGp4CCJ6empLVTfdSYlJy2TX6dPMVpGRmhd1UAtrQXiDpX6io0D6jl6aOC1EZ0Hv2K snfeVvRPr8BSwf3oAV66pjNMNIxc0y3BJMGYiFg40nJn5susAknrgec5t4ZxW9Y2cgcZ kr5E2kBtGiV4jU/kdmRsgnq2qyhCyTkKrOgq5DmkV32IaagX1L07ON/FCAviWTb8xWIh u17moeISsmYiKFXz3of50j7aCZYSMut0BBNGrT/5dIR0RW3CxI0tVBWioc7KMTjtvLJH PANL5+E5eUQbi/O3YzTAngjez8wlKs9Hpcwfp2ckDezOe9SYX8P270SRPzPbF+4cC3du Oxpg== X-Forwarded-Encrypted: i=1; AJvYcCXdObO2kYEapnO211BD88GxCvNeiuUizSxnmi6ralxlJZ2R8kAauprkeV3gnlkFYAMiE6ZZupoZoHNSvOc=@vger.kernel.org X-Gm-Message-State: AOJu0YxHQyL8dadNS2DIl3ZfGqXFdvoDArNIL0CdPffypN33wrM3WQKs i1+P1nu1LotjFVvoCpZjHzXH3qvNYmPil9fpGgOnD/1bqt1/LopFUdvA5aWTfvsPzGU5DSexCv9 M X-Google-Smtp-Source: AGHT+IFLGsoYS4nkygjOYQ6T6MjfnuxG4sxP2bIE6luDvx3JVi00ybPnZU93vr6k1Rs5zWvJ3g1fxw== X-Received: by 2002:a05:6a21:398a:b0:1d2:e807:7854 with SMTP id adf61e73a8af0-1d30ca2035emr11435044637.22.1727046922936; Sun, 22 Sep 2024 16:15:22 -0700 (PDT) Received: from dread.disaster.area (pa49-179-78-197.pa.nsw.optusnet.com.au. [49.179.78.197]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71944b7b127sm12782575b3a.125.2024.09.22.16.15.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Sep 2024 16:15:22 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1ssVnT-008k5e-0I; Mon, 23 Sep 2024 09:15:19 +1000 Date: Mon, 23 Sep 2024 09:15:19 +1000 From: Dave Chinner To: syzbot Cc: chandan.babu@oracle.com, djwong@kernel.org, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, syzkaller-bugs@googlegroups.com Subject: Re: [syzbot] [xfs?] INFO: task hung in xfs_ail_push_all_sync Message-ID: References: <66eebc5f.050a0220.3195df.0053.GAE@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <66eebc5f.050a0220.3195df.0053.GAE@google.com> On Sat, Sep 21, 2024 at 05:30:23AM -0700, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: a430d95c5efa Merge tag 'lsm-pr-20240911' of git://git.kern.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=15569d00580000 > kernel config: https://syzkaller.appspot.com/x/.config?x=d9ab5893ec5191eb > dashboard link: https://syzkaller.appspot.com/bug?extid=bf0143df12877cb1a21b > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=109e6a77980000 > > Downloadable assets: > disk image: https://storage.googleapis.com/syzbot-assets/f8549392ace4/disk-a430d95c.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/9b866ef0c06c/vmlinux-a430d95c.xz > kernel image: https://storage.googleapis.com/syzbot-assets/9d6691641029/bzImage-a430d95c.xz > mounted in repro: https://storage.googleapis.com/syzbot-assets/dd50a033ca89/mount_2.gz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+bf0143df12877cb1a21b@syzkaller.appspotmail.com > > INFO: task syz.0.75:6468 blocked for more than 143 seconds. > Not tainted 6.11.0-syzkaller-02574-ga430d95c5efa #0 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:syz.0.75 state:D stack:24864 pid:6468 tgid:6446 ppid:6378 flags:0x00004004 > Call Trace: > > context_switch kernel/sched/core.c:5188 [inline] > __schedule+0x17ae/0x4a10 kernel/sched/core.c:6529 > __schedule_loop kernel/sched/core.c:6606 [inline] > schedule+0x14b/0x320 kernel/sched/core.c:6621 > xfs_ail_push_all_sync+0x236/0x310 fs/xfs/xfs_trans_ail.c:726 > xfs_log_quiesce+0xdf/0x5b0 fs/xfs/xfs_log.c:1018 > xfs_fs_freeze+0x8d/0x1a0 fs/xfs/xfs_super.c:936 > freeze_super+0x81b/0xee0 fs/super.c:2107 > fs_bdev_freeze+0x1ac/0x320 fs/super.c:1484 > bdev_freeze+0xd6/0x220 block/bdev.c:257 > xfs_fs_goingdown+0xa9/0x160 fs/xfs/xfs_fsops.c:446 > xfs_file_ioctl+0x12d4/0x19e0 fs/xfs/xfs_ioctl.c:1473 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:907 [inline] > __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > RIP: 0033:0x7fc862d7def9 > RSP: 002b:00007fc863b4a038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007fc862f36058 RCX: 00007fc862d7def9 > RDX: 0000000020000080 RSI: 000000008004587d RDI: 0000000000000006 > RBP: 00007fc862df0b76 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 0000000000000000 R14: 00007fc862f36058 R15: 00007ffd1aceae18 > Not a bug. The reproducer is mounting the filesystem, then some other thread is immediately resizing the block device underneath the filesystem to zero length. In the cases where the test "passes", this happens: [ 215.851387][ T6299] XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791 [ 215.895695][ T6299] XFS (loop0): Ending clean mount [ 215.953257][ T6299] loop0: detected capacity change from 32768 to 0 [ 215.963786][ T6299] syz.0.70: attempt to access beyond end of device [ 215.963786][ T6299] loop0: rw=432129, sector=768, nr_sectors = 128 limit=0 [ 215.978206][ T45] XFS (loop0): log I/O error -5 [ 215.984613][ T45] XFS (loop0): Filesystem has been shut down due to log error (0x2). [ 215.992720][ T45] XFS (loop0): Please unmount the filesystem and rectify the problem(s). [ 216.011609][ T6230] XFS (loop0): Unmounting Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791o The loop device change of size happens about 600ms after the mount finishes, and the next IO that is issued is a log IO. That gets errored out and the filesystem shuts itseld down. In the case where it has hung: 231.577098][ T6447] loop0: detected capacity change from 0 to 32768 [ 231.590389][ T6447] XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791 [ 231.642021][ T6447] XFS (loop0): Ending clean mount [ 231.713195][ T6447] loop0: detected capacity change from 32768 to 0 [ 231.727969][ T6467] xfsaild/loop0: attempt to access beyond end of device [ 231.727969][ T6467] loop0: rw=4097, sector=2, nr_sectors = 1 limit=0 [ 231.742236][ T6467] xfsaild/loop0: attempt to access beyond end of device [ 231.742236][ T6467] loop0: rw=4097, sector=12, nr_sectors = 4 limit=0 .... The loop dev changes size about 700ms afte the mount, and the log IO has already completed. Hence the next IOs are from the user driven shutdown (not sure where that is coming from - nothing in syzbot logs, and no C reproducer was provided) freezing the device before shutting down the filesystem. These error out, but don't cause a shutdown because the default XFS config for async metadata writeback failures is "retry forever": ..... [ 231.849797][ T52] kworker/1:1: attempt to access beyond end of device [ 231.849797][ T52] loop0: rw=4097, sector=18560, nr_sectors = 64 limit=0 [ 231.933061][ T6467] XFS (loop0): Failing async write on buffer block 0x2. Retrying async write. [ 231.942187][ T6467] XFS (loop0): Failing async write on buffer block 0x10. Retrying async write. [ 231.951196][ T6467] XFS (loop0): Failing async write on buffer block 0xc. Retrying async write. [ 231.960108][ T6467] xfsaild/loop0: attempt to access beyond end of device [ 231.960108][ T6467] loop0: rw=4097, sector=2, nr_sectors = 1 limit=0 [ 231.973536][ T6467] xfsaild/loop0: attempt to access beyond end of device [ 231.973536][ T6467] loop0: rw=4097, sector=12, nr_sectors = 4 limit=0 [ 232.053094][ T6467] XFS (loop0): Failing async write on buffer block 0x2. Retrying async write. [ 232.061978][ T6467] XFS (loop0): Failing async write on buffer block 0x10. Retrying async write. [ 232.070960][ T6467] XFS (loop0): Failing async write on buffer block 0xc. Retrying async write. [ 232.143152][ T6467] XFS (loop0): Failing async write on buffer block 0x2. Retrying async write. [ 232.152027][ T6467] XFS (loop0): Failing async write on buffer block 0x10. Retrying async write. ..... And so the user driven shutdown hangs at this point waiting for metadata to flush. This is expected behaviour. If the user wants metadata writes to fail immediately in this situation, then they need to do: # echo 1 > /sys/fs/xfs//error/metadata/EIO/max_retries So on IO error the writeback code will only retry once before failing and shutting down the filesystem immediately. So, no bug here, just syzbot not understanding how metadata IO errors are handled by different filesystems. Hence: #syz invalid -Dave. -- Dave Chinner david@fromorbit.com