public inbox for initramfs@vger.kernel.org
 help / color / mirror / Atom feed
From: cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org
To: initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	si-wei.liu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: [RFC dracut 1/1] unexpected boot failure due to races between iscsistart for multiple interfaces
Date: Mon,  4 Dec 2017 12:14:44 -0800	[thread overview]
Message-ID: <1512418484-3948-1-git-send-email-cathy.zhou@oracle.com> (raw)

From: Cathy Zhou <Cathy.Zhou-veTT2BtV2gBXrIkS9f7CXA@public.gmane.org>

Co-existence of iscsid and iscsistart can lead to unexpected iSCSI boot
behavior:

We ran into iSCSI netboot failure when multiple interfaces are configured
with dhcp. The problem is that iscsistart is not designed to be working
together with iscsid. When an interface gets the dhcp offer successfully,
the iscsiroot script is run which starts the iscsistart service to
establish the iSCSI session. With the existence of iscsid, the iscsistart
service's attempt to setup its own mgmt ipc fails. Instead, the request
to login to the iscsi target is handled by the mgmt ipc of iscsid. After
iscsistart finishes its login attempt, it eventually sends a
stop_event_loop request to stop the mgmt process. As the result, it
terminates iscsid.

I don't think this behavior is expected. Further, we believe the restart
of the iscsid service in iscsiroot.sh for multiple interfaces could also
interfere with each other and race. That could also cause unexpected
behavior.

Specifically, in our multiple interface iSCSI boot case, below is what
happened:

a. eth0 successfully got the dhcp offer, the iscsiroot script was run and
eventually ran "system-run" to start the "oneshot" iscsistart service.

b. Before step a succeeded, the iscsiroot script was also run for eth1.
It checks the status of the first iscsistart service instance, which was
still "activating". So the iscsistart service was restarted and that
killed the first instance. Note that in this case, the stop_event_loop
request has not been sent, and iscsid was not terminated. As the result,
the second iscsistart instance also failed because of the "existing
session" error, as the half-baked session already existed in iscsid.

Based on iscsistart(8), iscsistart's primary use is to start session
used for iSCSI root boot, and is meant to work indpependently from iscsid.
Making the two work together would need coordination between each
other, which would be complicate and unnecessary.

Therefore, to fix the issue, we'd either choose to remove the use of
iscsid and solely use iscsistart or to remove iscsistart and use iscsid
and iscsiadm to manage the iSCSI sessions.

Our fix chooses the former. We tested the change in our setup.

Signed-off-by: Cathy Zhou <Cathy.Zhou-veTT2BtV2gBXrIkS9f7CXA@public.gmane.org>
---
 modules.d/95iscsi/cleanup-iscsi.sh   |  2 +-
 modules.d/95iscsi/iscsiroot.sh       | 25 +------------------------
 modules.d/95iscsi/module-setup.sh    | 30 ------------------------------
 modules.d/95iscsi/parse-iscsiroot.sh | 10 ----------
 4 files changed, 2 insertions(+), 65 deletions(-)

diff --git a/modules.d/95iscsi/cleanup-iscsi.sh b/modules.d/95iscsi/cleanup-iscsi.sh
index bfc8aefc..e97d65ac 100755
--- a/modules.d/95iscsi/cleanup-iscsi.sh
+++ b/modules.d/95iscsi/cleanup-iscsi.sh
@@ -1,4 +1,4 @@
 #!/bin/sh
 
-[ -z "${DRACUT_SYSTEMD}" ] && [ -e /sys/module/bnx2i ] && killproc iscsiuio
+[ -e /sys/module/bnx2i ] && killproc iscsiuio
 
diff --git a/modules.d/95iscsi/iscsiroot.sh b/modules.d/95iscsi/iscsiroot.sh
index aefd263d..c7f1c474 100755
--- a/modules.d/95iscsi/iscsiroot.sh
+++ b/modules.d/95iscsi/iscsiroot.sh
@@ -36,7 +36,7 @@ iroot=${iroot#:}
 # figured out a way how to check whether this is built-in or not
 modprobe crc32c 2>/dev/null
 
-if [ -z "${DRACUT_SYSTEMD}" ] && [ -e /sys/module/bnx2i ] && ! [ -e /tmp/iscsiuio-started ]; then
+[ -e /sys/module/bnx2i ] && ! [ -e /tmp/iscsiuio-started ]; then
         iscsiuio
         > /tmp/iscsiuio-started
 fi
@@ -117,11 +117,6 @@ handle_netroot()
            mkdir -p /etc/iscsi
            ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
            > /tmp/iscsi_set_initiator
-           if [ -n "$DRACUT_SYSTEMD" ]; then
-               systemctl try-restart iscsid
-               # FIXME: iscsid is not yet ready, when the service is :-/
-               sleep 1
-           fi
     fi
 
     if [ -z "$iscsi_initiator" ]; then
@@ -138,11 +133,6 @@ handle_netroot()
         mkdir -p /etc/iscsi
         ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
         > /tmp/iscsi_set_initiator
-        if [ -n "$DRACUT_SYSTEMD" ]; then
-            systemctl try-restart iscsid
-            # FIXME: iscsid is not yet ready, when the service is :-/
-            sleep 1
-        fi
     fi
 
 
@@ -163,11 +153,6 @@ handle_netroot()
     if ! [ -e /etc/iscsi/initiatorname.iscsi ]; then
         mkdir -p /etc/iscsi
         ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
-        if [ -n "$DRACUT_SYSTEMD" ]; then
-            systemctl try-restart iscsid
-            # FIXME: iscsid is not yet ready, when the service is :-/
-            sleep 1
-        fi
     fi
 # FIXME $iscsi_protocol??
 
@@ -234,14 +219,6 @@ if [ "$netif" != "timeout" ] && getargbool 1 rd.iscsi.waitnet; then
     all_ifaces_setup || exit 0
 fi
 
-if [ "$netif" = "timeout" ] && all_ifaces_setup; then
-    # s.th. went wrong and the timeout script hits
-    # restart
-    systemctl restart iscsid
-    # damn iscsid is not ready after unit says it's ready
-    sleep 2
-fi
-
 if getargbool 0 rd.iscsi.firmware -d -y iscsi_firmware ; then
     if [ "$netif" = "timeout" ] || [ "$netif" = "online" ]; then
         handle_firmware
diff --git a/modules.d/95iscsi/module-setup.sh b/modules.d/95iscsi/module-setup.sh
index 04937b5b..1d185a84 100755
--- a/modules.d/95iscsi/module-setup.sh
+++ b/modules.d/95iscsi/module-setup.sh
@@ -199,36 +199,6 @@ install() {
     inst "$moddir/iscsiroot.sh" "/sbin/iscsiroot"
     if ! dracut_module_included "systemd"; then
         inst "$moddir/mount-lun.sh" "/bin/mount-lun.sh"
-    else
-        inst_multiple -o \
-                      $systemdsystemunitdir/iscsi.service \
-                      $systemdsystemunitdir/iscsid.service \
-                      $systemdsystemunitdir/iscsid.socket \
-                      $systemdsystemunitdir/iscsiuio.service \
-                      $systemdsystemunitdir/iscsiuio.socket \
-                      iscsiadm iscsid
-
-        mkdir -p "${initdir}/$systemdsystemunitdir/sockets.target.wants"
-        for i in \
-                iscsiuio.socket \
-            ; do
-            ln_r "$systemdsystemunitdir/${i}" "$systemdsystemunitdir/sockets.target.wants/${i}"
-        done
-
-        mkdir -p "${initdir}/$systemdsystemunitdir/basic.target.wants"
-        for i in \
-                iscsid.service \
-            ; do
-            ln_r "$systemdsystemunitdir/${i}" "$systemdsystemunitdir/basic.target.wants/${i}"
-        done
-
-        # Make sure iscsid is started after dracut-cmdline and ready for the initqueue
-        mkdir -p "${initdir}/$systemdsystemunitdir/iscsid.service.d"
-        (
-            echo "[Unit]"
-            echo "After=dracut-cmdline.service"
-            echo "Before=dracut-initqueue.service"
-        ) > "${initdir}/$systemdsystemunitdir/iscsid.service.d/dracut.conf"
     fi
     inst_dir /var/lib/iscsi
     dracut_need_initqueue
diff --git a/modules.d/95iscsi/parse-iscsiroot.sh b/modules.d/95iscsi/parse-iscsiroot.sh
index 43b2e088..c8c66ccf 100755
--- a/modules.d/95iscsi/parse-iscsiroot.sh
+++ b/modules.d/95iscsi/parse-iscsiroot.sh
@@ -116,11 +116,6 @@ if arg=$(getarg rd.iscsi.initiator -d iscsi_initiator=) && [ -n "$arg" ] && ! [
     if ! [ -e /etc/iscsi/initiatorname.iscsi ]; then
         mkdir -p /etc/iscsi
         ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
-        if [ -n "$DRACUT_SYSTEMD" ]; then
-            systemctl try-restart iscsid
-            # FIXME: iscsid is not yet ready, when the service is :-/
-            sleep 1
-        fi
     fi
 fi
 
@@ -133,11 +128,6 @@ if [ -z $iscsi_initiator ] && [ -f /sys/firmware/ibft/initiator/initiator-name ]
         mkdir -p /etc/iscsi
         ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
         > /tmp/iscsi_set_initiator
-        if [ -n "$DRACUT_SYSTEMD" ]; then
-            systemctl try-restart iscsid
-            # FIXME: iscsid is not yet ready, when the service is :-/
-            sleep 1
-        fi
     fi
 fi
 
-- 
2.11.1

             reply	other threads:[~2017-12-04 20:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-04 20:14 cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA [this message]
     [not found] ` <1512418484-3948-1-git-send-email-cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-12-08 19:09   ` [RFC dracut 1/1] unexpected boot failure due to races between iscsistart for multiple interfaces Cathy Zhou
     [not found] ` <9326cbf7-b533-4cf2-baac-6699ee5dc4d6@googlegroups.com>
     [not found]   ` <a759d3d3-5c21-4a28-a205-1390561abc2a@googlegroups.com>
     [not found]     ` <213a0c37-587e-7375-341b-680faa752cde@oracle.com>
     [not found]       ` <B00ADBFE-F00B-4BCD-A7FE-0FEFE2A64F9D@gmail.com>
     [not found]         ` <B00ADBFE-F00B-4BCD-A7FE-0FEFE2A64F9D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-12-21 18:54           ` Cathy Zhou
     [not found]             ` <29030bdf-e65a-dc01-ee1e-48498dafdc8f-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-12-22  1:03               ` Lee Duncan
     [not found]                 ` <C94AC86B-C41E-4B3A-B375-3405E25BD45D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-12-22  6:20                   ` si-wei liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1512418484-3948-1-git-send-email-cathy.zhou@oracle.com \
    --to=cathy.zhou-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
    --cc=initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=open-iscsi-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    --cc=si-wei.liu-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox