From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 868D91F868B; Wed, 8 Jan 2025 10:25:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736331943; cv=none; b=OAplFsDi0AG45BZSoO+u9/Z78jrhXTyGgVIet6UpM8bCEveChs7X3YEOUMuWMDHYaw112a6UixzOp85mXmcwJfpfcONhBDwg4/yJBDn0YWo0vSMBX26NZe1yXuuOHUvT5BSZbIvLA/cJ1EHoHh8GG4ItuMfwq3RrdKK1VCRE8/Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736331943; c=relaxed/simple; bh=V79sy8TwmY7YaNY2CtlOZFjZmfOzive83fCVC9e2RJY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=T5WoCRsjnJVv3su3X0m4v4xl3L6GpYUGhRMQXmYFXmUFhx+bAmambB3yaoDKG7o/GWIP90RNxF1qzmRN95qKacr8HLMBGnoUcX8NYwVjon60KL6d/8aVJKaDpEd2YBSFCAim8/FT8sEvCaYIEOw6qi85TDU1H+Cdgillz9NAfFw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=fLHf7FbT; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=d+X1qumv; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="fLHf7FbT"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="d+X1qumv" Date: Wed, 8 Jan 2025 11:25:32 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1736331934; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=wDnajRGAKSDjhPvwubZBuXDZ5LOIoIGh7ZTlin0DXQc=; b=fLHf7FbT3jWUtCXukcFugRf7ux0GvEBZXQ34F2GvVNuc6jEJ6rXMIk/0gbPcnh0QRsBBik ZH3ffOAdXx+jQw2l6IeTrNWgg4FU5MHoOEwT287dgYnqqSJWIUTD6cRPVO8UKO9Ufn9J5w ehxwrnAXOzEB4s0xNBLeeZ+okyETfb/JnSQV19TJaCce5Bq5qkWXtx2N/HoXaQVPExCWYd DUXQZDkYJeGLmv5kowwshjtQfXIdKuSNEVCe4Ous4tlx5uwjYq7uwCF4OeMMgd/JaA+Tgk YgLrZEhseffjCsdUyTy2iZPW94HHqYtGQXu6L9dZUOuHXkgk5UdJ8XYS5SEBxg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1736331934; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=wDnajRGAKSDjhPvwubZBuXDZ5LOIoIGh7ZTlin0DXQc=; b=d+X1qumv8hLSPmGBE0V0HlJlSOVx7RABGGcSIKPjRWBa0lKe6iTmFw0d4lqxcYHkZ+W+MA ZAWAW8hRoCIhoXDg== From: Sebastian Andrzej Siewior To: Wander Lairson Costa Cc: Tony Nguyen , Przemek Kitszel , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Clark Williams , Steven Rostedt , Jeff Garzik , Auke Kok , "moderated list:INTEL ETHERNET DRIVERS" , "open list:NETWORKING DRIVERS" , open list , "open list:Real-time Linux (PREEMPT_RT):Keyword:PREEMPT_RT" Subject: Re: [PATCH iwl-net 0/4] igb: fix igb_msix_other() handling for PREEMPT_RT Message-ID: <20250108102532.VWnKWvoo@linutronix.de> References: <20241204114229.21452-1-wander@redhat.com> <20250107135106.WWrtBMXY@linutronix.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: On 2025-01-07 15:52:47 [-0300], Wander Lairson Costa wrote: > On Tue, Jan 07, 2025 at 02:51:06PM +0100, Sebastian Andrzej Siewior wrote: > > On 2024-12-04 08:42:23 [-0300], Wander Lairson Costa wrote: > > > This is the second attempt at fixing the behavior of igb_msix_other() > > > for PREEMPT_RT. The previous attempt [1] was reverted [2] following > > > concerns raised by Sebastian [3]. > > > > > > The initial approach proposed converting vfs_lock to a raw_spinlock, > > > a minor change intended to make it safe. However, it became evident > > > that igb_rcv_msg_from_vf() invokes kcalloc with GFP_ATOMIC, > > > which is unsafe in interrupt context on PREEMPT_RT systems. > > > > > > To address this, the solution involves splitting igb_msg_task() > > > into two parts: > > > > > > * One part invoked from the IRQ context. > > > * Another part called from the threaded interrupt handler. > > > > > > To accommodate this, vfs_lock has been restructured into a double > > > lock: a spinlock_t and a raw_spinlock_t. In the revised design: > > > > > > * igb_disable_sriov() locks both spinlocks. > > > * Each part of igb_msg_task() locks the appropriate spinlock for > > > its execution context. > > > > - Is this limited to PREEMPT_RT or does it also occur on PREEMPT systems > > with threadirqs? And if this is PREEMPT_RT only, why? > > PREEMPT systems configured to use threadirqs should be affected as well, > although I never tested with this configuration. Honestly, until now I wasn't > aware of the possibility of a non PREEMPT_RT kernel with threaded IRQs by default. If the issue is indeed the use of threaded interrupts then the fix should not be limited to be PREEMPT_RT only. > > - What causes the failure? I see you reworked into two parts to behave > > similar to what happens without threaded interrupts. There is still no > > explanation for it. Is there a timing limit or was there another > > register operation which removed the mailbox message? > > > > I explained the root cause of the issue in the last commit. Maybe I should > have added the explanation to the cover letter as well. Anyway, here is a > partial verbatim copy of it: > > "During testing of SR-IOV, Red Hat QE encountered an issue where the > ip link up command intermittently fails for the igbvf interfaces when > using the PREEMPT_RT variant. Investigation revealed that > e1000_write_posted_mbx returns an error due to the lack of an ACK > from e1000_poll_for_ack. That ACK would have come if it would poll longer? > The underlying issue arises from the fact that IRQs are threaded by > default under PREEMPT_RT. While the exact hardware details are not > available, it appears that the IRQ handled by igb_msix_other must > be processed before e1000_poll_for_ack times out. However, > e1000_write_posted_mbx is called with preemption disabled, leading > to a scenario where the IRQ is serviced only after the failure of > e1000_write_posted_mbx." Where is this disabled preemption coming from? This should be one of the ops.write_posted() calls, right? I've been looking around and don't see anything obvious. Couldn't you wait for an event instead of polling? > The call chain from igb_msg_task(): > > igb_msg_task > igb_rcv_msg_from_vf > igb_set_vf_multicasts > igb_set_rx_mode > igb_write_mc_addr_list > kmalloc > > Cannot happen from interrupt context under PREEMPT_RT. So this part of > the interrupt handler is deferred to a threaded IRQ handler. > > > > Cheers, > > > Wander Sebastian