From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BE1C8F6C; Sun, 2 Feb 2025 20:44:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.42 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738529097; cv=none; b=eXa8eYv31cNvh6DS7qKQu4J3bw+t3a85kdF/7mwpxX1I8H/lGVdxyoRtRiQE6Il5bb1e4uUFbw5czc80GYmMWtlzEqxLtIalGfUFBtKB8KW8PFCtjxOEjOO5CrBh/pNgKfmhZKJ80wChDMzGipzZkYGjg+M0CULCaGLXT2CygtU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738529097; c=relaxed/simple; bh=ByIT6O2MQ8AeBF3aaiyNh/HTBIpkXFi2GmOxu/ewkjI=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=CRjPqBK52iAfNAJ9toSlnA/XbbHL2+b7uyWkvr5BF2NVFp6sUWxStdoHw8NBPXYeMr3wiCqAMQJX8awB6j7S1rc5sxVMt2UtCM/xrVPEm9d0TVB44PXYk9YD3m3jkmt8g07nig4Ddj6QsPPcY3Ay0RX6n6k8F8vS2nPDTO0HqNE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Pw1lMt8M; arc=none smtp.client-ip=209.85.128.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Pw1lMt8M" Received: by mail-wm1-f42.google.com with SMTP id 5b1f17b1804b1-436345cc17bso27501185e9.0; Sun, 02 Feb 2025 12:44:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738529093; x=1739133893; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=cbSx/ISaCz+L+0YMvLHQK84mnVS9XArcMwXBbJKLocA=; b=Pw1lMt8MBAg/XEOr0QY98X3tsMAmix8xhZ5oyFJCYppt/p/o11hnE45ZZEABNv+Uas e8IJfhmRFemZHAvo9Iii3dTd6s4HlYbeLh/aaTldO8gD5oyyeIBsDMG+m1v3fjOnPSGO KwoIC35kKnnPrgPUpq1WPLsmuSARqp3HazTiDmn+emCeCrtHFacZJHKK+N6IxQJpLLob tHXQzYIigCXfIGIkAUiD0KSd1TRFoSn3R8WQxsfhtPRzkJOSApjD6hbxxKhpT8t2T8Es 5Rgy7s+IKSJEBMPDQPuoSRO6m3nD0L1y4Yf7TglbgWgcpGh+CHI9k74DG6ykVxcg3Ktb 48kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738529093; x=1739133893; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cbSx/ISaCz+L+0YMvLHQK84mnVS9XArcMwXBbJKLocA=; b=RmrekS4elt9eknXVLo/ThCRR27p6ac+H6RynLMUpCdeHLg3/5VfCjDBFmkYOeIB3uL c00748FLBKKXUsmx82u8ptzp4ensBm+qxX7jF8MhllHZXgdWquKlnF6TUwAXMaE56VwP BCG2B2fo0H+lEyDMMmji3Mt2y174Qk6wDxLi9cugTKmZCGjTThCu9Wand2BSICz/twqx 1WWFSjl7aT5X1tFUKRDgAh7et3k1coCHVjwtm2PLv7b0IcqvjiR868qUv85YJIUgdlDE ZGP1RdaPWibtGR+e+3HxXannXSz8Gn/BcfeNBVPlKRuAg/MMi7TxBxsqgeriRZyuf5sc VlNw== X-Forwarded-Encrypted: i=1; AJvYcCUKQKyXRDnxFTBx2+2gTyHseATzwGLEULONvo40yWoRg1CHAss/MSb600Dy4eqsaphc9jhuM/Z/vt+dNOg=@vger.kernel.org, AJvYcCW7RaRiRv4/Q/zlyLu95a/3BEjAHmDbVL8nq9FXYoJna5t67NqBhBYW7H4QVh1EZXj6LJ9RXzz5@vger.kernel.org X-Gm-Message-State: AOJu0YyInnQF/hIhLRZypiFY78RfjNQTLzJL5zZRv3jr1ao3VfPWX1WR ZCaojyyk/BjcW85d9ypj/aulbRMh9aH+hEUEC6Oy2aXkIVFBeBmc X-Gm-Gg: ASbGncuI63JaPJFbI7oYXJHb9nq2EAA68o7AoxsQeGF0Jvt77No3GDbh91TiRXuRipn 4XiEvnBBshNw5riY1Poq3MM4tHV3Lzv0szcks1Xo0P7cKZ/9r1P9O78cgjtdsrgAqLdt6jVpR5S ZJ2lKAITLKmrhPjpfosgPXQHQOIPArQPiG4D2j2lWtobzhAgI3ioZ47HyrCvEj2Cp9USmTpW7qM 67yuo8x3DbK+e46x40qsDW6/Hc/a9k1pXgvIAQL3Q/+WphxMmre4sU3sWgwKgfcAdl79FL0+qUc lAjSwPP4OrPT8cQJhHJ56FJ/XB/Y2AVHky+Dea0LCmekYVXDEf/H6Q== X-Google-Smtp-Source: AGHT+IHLsWh/YGpljPw48TJZyadtyPAWHtbeX9ie2f3Bzm4IzJqabYq5/PboSRajCtTjNtqAWvC6/w== X-Received: by 2002:a05:6000:1561:b0:38c:5b52:3a47 with SMTP id ffacd0b85a97d-38c5b523e1dmr9683706f8f.4.1738529093086; Sun, 02 Feb 2025 12:44:53 -0800 (PST) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38c5c1b595asm10875448f8f.66.2025.02.02.12.44.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Feb 2025 12:44:52 -0800 (PST) Date: Sun, 2 Feb 2025 20:44:49 +0000 From: David Laight To: Mateusz Guzik Cc: Matthew Wilcox , ebiederm@xmission.com, oleg@redhat.com, brauner@kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org Subject: Re: [PATCH v3 6/6] pid: drop irq disablement around pidmap_lock Message-ID: <20250202204449.77cab5e5@pumpkin> In-Reply-To: References: <20250201163106.28912-1-mjguzik@gmail.com> <20250201163106.28912-7-mjguzik@gmail.com> <20250201181933.07a3e7e2@pumpkin> <20250201215105.55c0319a@pumpkin> <20250202135503.4e377fb0@pumpkin> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Sun, 2 Feb 2025 20:34:48 +0100 Mateusz Guzik wrote: > On Sun, Feb 2, 2025 at 2:55=E2=80=AFPM David Laight > wrote: > > > > On Sat, 1 Feb 2025 22:00:06 +0000 > > Matthew Wilcox wrote: > > =20 > > > On Sat, Feb 01, 2025 at 09:51:05PM +0000, David Laight wrote: =20 > > > > I'm not sure what you mean. > > > > Disabling interrupts isn't as cheap as it ought to be, but probably= isn't > > > > that bad. =20 > > > > > > Time it. You'll see. =20 > > > > The best scheme I've seen is to just increment a per-cpu value. > > Let the interrupt happen, notice it isn't allowed and return with > > interrupts disabled. > > Then re-issue the interrupt when the count is decremented to zero. > > Easy with level sensitive interrupts. > > But I don't think Linux ever uses that scheme. > > =20 >=20 > I presume you are talking about the splhigh/splx set of primivitives > from Unix kernels. >=20 > While "entering" is indeed cheap, undoing the work still needs to be > atomic vs interrupts. >=20 > I see NetBSD uses local cmpxchg8b on the interrupt level and interrupt > mask, while the rest takes the irq trip. >=20 > The NetBSD solution is still going to be visibly slower than not > messing with any of it as spin_unlock on amd64 is merely a store of 0 > and cmpxchg even without the lock prefix costs several clocks. >=20 > Maybe there is other hackery which could be done, but see below. I was thinking it might be possible to merge an 'interrupts disabled' count with the existing 'pre-emption disabled' count. IIRC (on x86 at least) this is just a per-cpu variabled accessed from %fs/%= gs. So you add one to disable pre-emption and (say) 1<<16 to disable interrupts. If an interrupt happens while the count is 'big' the count value is changed so the last decrement of 1<<16 will set carry (or overflow), and a return from interrupt is done that leaves interrupts disabled (traditionally easy). The interrupt enable call just subtracts the 1<<16 and checks for carry (or overflow), if not set all is fine, it set it needs to call something to re-issue the interrupt - that is probably the hard bit. >=20 > > =20 > > > > > So while this is indeed a tradeoff, as I understand the sane defa= ult > > > > > is to *not* disable interrupts unless necessary. =20 > > > > > > > > I bet to differ. =20 > > > > > > You're wrong. It is utterly standard to take spinlocks without > > > disabling IRQs. We do it all over the kernel. If you think that nee= ds > > > to change, then make your case, don't throw a driveby review. > > > > > > And I don't mean by arguing. Make a change, measure the difference. = =20 > > > > The analysis was done on some userspace code that basically does: > > for (;;) { > > pthread_mutex_enter(lock); > > item =3D get_head(list); > > if (!item) > > break; > > pthead_mutex_exit(lock); > > process(item); > > } > > For the test there were about 10000 items on the list and 30 threads > > processing it (that was the target of the tests). > > The entire list needs to be processed in 10ms (RTP audio). > > There was a bit more code with the mutex held, but only 100 or so > > instructions. > > Mostly it works fine, some threads get delayed by interrupts (etc) but > > the other threads carry on working and all the items get processed. > > > > However sometimes an interrupt happens while the mutex is held. > > In that case the other 29 threads get stuck waiting for the mutex. > > No progress is made until the interrupt completes and it overruns > > the 10ms period. > > > > While this is a userspace test, the same thing will happen with > > spin locks in the kernel. > > > > In userspace you can't disable interrupts, but for kernel spinlocks > > you can. > > > > The problem is likely to show up as unexpected latency affecting > > code with a hot mutex that is only held for short periods while > > running a lot of network traffic. > > That is also latency that affects all cpu at the same time. > > The interrupt itself will always cause latency to one cpu. > > =20 >=20 > Nobody is denying there is potential that lock hold time will get > significantly extended if you get unlucky enough vs interrupts. It is > questioned whether defaulting to irqs off around lock-protected areas > is the right call. I really commented because you were changing one lock which could easily be 'hot' enough for there to be side effects, without even a comment about any pitfalls. David >=20 > As I noted in my previous e-mail the spin_lock_irq stuff disables > interrupts upfront and does not touch them afterwards even when > waiting for the lock to become free. Patching that up with queued > locks may be non-trivial, if at all possible. Thus contention on > irq-disabled locks *will* add latency to their handling unless this > gets addressed. Note maintaining forward progress guarantee in the > locking mechanism is non-negotiable, so punting to TAS or similar > unfair locks does not cut it. >=20 > This is on top of having to solve the overhead problem for taking the > trips (see earlier in the e-mail). >=20 > I would argue if the network stuff specifically is known to add > visible latency, then perhaps that's something to investigate. >=20 > Anyhow, as Willy said, you are welcome to code this up and demonstrate > it is better overall. >=20