From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6372537E for ; Tue, 8 Aug 2023 02:13:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D19CFC433C7; Tue, 8 Aug 2023 02:13:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1691460821; bh=hIQVyIVKa1+wJpqh5robkngECkA8ABSIuJqROobHfJ8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=edC9yk2r/XOjRcTvRytMFoUIb23G/AG33ZeYVn3iv8oLaSuBKOhyN7oV/fyjKnhtX owDadoai/wYzTQcQ1LqhkCWoZsRlkEE/JLozzIl4L/MKarVVmHQHgkga1cLZwYOzud leyFXFALlM2bA31ixh/zx9BUN45St2r674GVej2DOLrkx2oAPLxaexox3ey98o9PPG qsqL+9c9PUfD5sxsZe5rtuBu6XtMVAgBzlX1tZ/yilR/yTvFUpVcpwBOnZse//qtnD ieG85Ky4CRT955VqEiEQsDX1RBF4MJKHUHKQyuE6/OS16yaBrq0ntMZdz096HgdVlb B75Fqi/7IyikQ== Date: Mon, 7 Aug 2023 19:13:39 -0700 From: Jakub Kicinski To: Nick Child Cc: netdev@vger.kernel.org, haren@linux.ibm.com, ricklind@us.ibm.com, danymadden@us.ibm.com, tlfalcon@linux.ibm.com, bjking1@linux.ibm.com Subject: Re: [PATCH net 5/5] ibmvnic: Ensure login failure recovery is safe from other resets Message-ID: <20230807191339.709dc247@kernel.org> In-Reply-To: <20230803202010.37149-5-nnac123@linux.ibm.com> References: <20230803202010.37149-1-nnac123@linux.ibm.com> <20230803202010.37149-5-nnac123@linux.ibm.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Thu, 3 Aug 2023 15:20:10 -0500 Nick Child wrote: > + do { > + reinit_init_done(adapter); > + /* Clear any failovers we got in the previous > + * pass since we are re-initializing the CRQ > + */ > + adapter->failover_pending = false; > + release_crq_queue(adapter); > + /* If we don't sleep here then we risk an > + * unnecessary failover event from the VIOS. > + * This is a known VIOS issue caused by a vnic > + * device freeing and registering a CRQ too > + * quickly. > + */ > + msleep(1500); > + /* Avoid any resets, since we are currently > + * resetting. > + */ > + spin_lock_irqsave(&adapter->rwi_lock, flags); > + flush_reset_queue(adapter); > + spin_unlock_irqrestore(&adapter->rwi_lock, > + flags); > + > + rc = init_crq_queue(adapter); > + if (rc) { > + netdev_err(netdev, "login recovery: init CRQ failed %d\n", > + rc); > + return -EIO; > + } > > - rc = ibmvnic_reset_init(adapter, false); > - if (rc) { > - netdev_err(netdev, "login recovery: Reset init failed %d\n", > - rc); > - return -EIO; > - } > + rc = ibmvnic_reset_init(adapter, false); > + if (rc) > + netdev_err(netdev, "login recovery: Reset init failed %d\n", > + rc); > + /* IBMVNIC_CRQ_INIT will return EAGAIN if it > + * fails, since ibmvnic_reset_init will free > + * irq's in failure, we won't be able to receive > + * new CRQs so we need to keep trying. probe() > + * handles this similarly. > + */ > + } while (rc == -EAGAIN); Isn't this potentially an infinite loop? Can we limit the max number of iterations here or something already makes this loop safe?