From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F448C432BE for ; Wed, 1 Sep 2021 23:48:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 62202610A4 for ; Wed, 1 Sep 2021 23:48:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232519AbhIAXth (ORCPT ); Wed, 1 Sep 2021 19:49:37 -0400 Received: from mail.kernel.org ([198.145.29.99]:41664 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230143AbhIAXtf (ORCPT ); Wed, 1 Sep 2021 19:49:35 -0400 Received: by mail.kernel.org (Postfix) with ESMTPS id D7612610A1 for ; Wed, 1 Sep 2021 23:48:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1630540117; bh=le36FUgFKh7mwdHstRfQev7kaDSlL/XsUQdfHVKEIPA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=XbkXKnB0s/rzSMmDKaV9h5PODg5xp+BQWIGO179CuZ6d4w5Y+lkdp+ihOMr6NB8T/ L0vwjzLLftohJo65C/vCMQQaIwBhN6vwT3oGyhXBEXAg/2JQpuRj5Nj3oPnGpkp1vK XWnziPQ+FCrQWg/LjWoosht5rcjmJYak1BS/YVYQzdqJqDzgIU21U8SoD9C96FI6JV DZPrrDgQWbD4z1JXTf5XJW56BP0r/zeBJgdNEOl0pbfWOrtH8Dx6PWh8Hu3DpMVnGS E+BlfuvpxZbRp2M5mc9FImKEPj61lNjJX6qVSUblzzAtPlYUQLoF9VpT6NN8u3HH6v M0N/4WIEaz13Q== From: bugzilla-daemon@bugzilla.kernel.org To: linux-scsi@vger.kernel.org Subject: [Bug 214147] ISCSI broken in last release Date: Wed, 01 Sep 2021 23:48:37 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: IO/Storage X-Bugzilla-Component: SCSI X-Bugzilla-Version: 2.5 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: michael.christie@oracle.com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: linux-scsi@vger.kernel.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugzilla.kernel.org/ Auto-Submitted: auto-generated MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org https://bugzilla.kernel.org/show_bug.cgi?id=3D214147 --- Comment #2 from michael.christie@oracle.com --- On 8/23/21 6:08 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=3D214147 >=20 > Bug ID: 214147 > Summary: ISCSI broken in last release > Product: IO/Storage > Version: 2.5 > Kernel Version: 5.13.12 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: SCSI > Assignee: linux-scsi@vger.kernel.org > Reporter: slavon.net@gmail.com > Regression: Yes >=20 > Created attachment 298441 > --> https://bugzilla.kernel.org/attachment.cgi?id=3D298441&action=3Dedit > dmesg log >=20 > After some time iscsi go to broke and help only reboot >=20 What are you doing when you hit the issue? What does your target setup look like? What are you using for the backing store? Are you able to build your own kernels? The only major changes between 5.12 and 5.13 is some target patches to batch cmds. However, it looks like you start to hit a problem earlier than when that code comes into play. We first see you hit a data out timeout, so we don't even have all the data for the cmd, so the target changes in 5.13 don't come into play yet. [10931.107057] Unable to recover from DataOut timeout while in ERL=3D0, clo= sing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:vhost11.dev.obs.group,i,0x400001370002,iqn.2003-0= 1.org.linux-iscsi.vm2.x8664:sn.b07943625401,t,0x01 However, we do see some cmds have made it to the core target layer because we can see the target layer is waiting on cmds to complete for part of the lun reset handling: [19906.593285] INFO: task kworker/4:1:3770999 blocked for more than 122 seconds. [19906.603670] Tainted: P O 5.13.12-1.el8.elrepo.x86_6= 4 #1 [19906.613975] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables = this message. [19906.624208] task:kworker/4:1 state:D stack: 0 pid:3770999 ppid: = 2 flags:0x00004000 [19906.624212] Workqueue: events target_tmr_work [target_core_mod] [19906.624247] Call Trace: [19906.624249] __schedule+0x396/0x8a0 [19906.624252] schedule+0x3c/0xa0 [19906.624255] schedule_timeout+0x215/0x2b0 [19906.624258] ? kasprintf+0x4e/0x70 [19906.624261] wait_for_completion+0x9e/0x100 [19906.624264] target_put_cmd_and_wait+0x55/0x80 [target_core_mod] [19906.624279] core_tmr_lun_reset+0x38b/0x660 [target_core_mod] [19906.624294] target_tmr_work+0xb4/0x110 [target_core_mod] [19906.624309] process_one_work+0x230/0x3d0 [19906.624312] worker_thread+0x2d/0x3e0 [19906.624314] ? process_one_work+0x3d0/0x3d0 [19906.624316] kthread+0x118/0x140 [19906.624318] ? set_kthread_struct+0x40/0x40 [19906.624320] ret_from_fork+0x1f/0x30 and we can see iscsi layer not able to relogin because of outstanding cmds/tmfs. I can send you a patch that reverts the core target patches. If we can rule them out then it would help narrow things down. Or, because it sounds like this is easy to reproduce we can turn on some extra lio debugging. --=20 You may reply to this email to add a comment. You are receiving this mail because: You are the assignee for the bug.=