From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3110C46467 for ; Wed, 4 Jan 2023 21:32:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230073AbjADVcS (ORCPT ); Wed, 4 Jan 2023 16:32:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230233AbjADVcS (ORCPT ); Wed, 4 Jan 2023 16:32:18 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9330F26D for ; Wed, 4 Jan 2023 13:32:16 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id az7so9975148wrb.5 for ; Wed, 04 Jan 2023 13:32:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:in-reply-to :user-agent:references:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=FKBlx740w9A5UAeJo05D8jbCIPN2LFJg0L8SLYifXos=; b=GaX2K8jQpdswaduOtffTYBsu2GWdIHWx3f77IFUGpLZ3/HIMUm4+lq6B2aaLVrp7gA 8htSjolG0NlL7Lw2pgnDKz2xAJCmicWZG6KeAggc8AMyRPdtihZL+1yBiSbbm/AIwsS3 h4gVZcx79PfBET+MK6ra3Cyn/DKRmP2EeZGKeF+sfEvdzyH2ooPRS1YL3Zu2I7tmHCRE Ac2QMNAhLdiU9HtZtl7U9JEG5yHU3ZnFHfFVety6qoBjLuSJODLPvfHk+PgJi8x/WzO1 +m4lxILFaAa/TGsDKN//XxCCb8Vl8Sqre1r7L0bMp8T+eTihtJAVHlqaLJeYGHkLaBlc JSXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:in-reply-to :user-agent:references:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=FKBlx740w9A5UAeJo05D8jbCIPN2LFJg0L8SLYifXos=; b=ZNnu6mEuK4aHosmrG1jO5NGpSij8X7ALubeOVjSjxMiZBxOBlHlMXupefMSEech7fq qyNfWkev7fHhoZVT+sBcpvHqsEkYntcKZcC7vWtvXyMhaJTWav2ASgwz4wLCJulC0Bua FJMWex8TNOzg0apPG8OiOlcrkHOV9ns/NRzDr+W0duBXVx/Ws8yfoORsjRnuY88aiqbq +dtV2y6/wMzS2KGfuL6ahcocQAeMr14wlBF9DJRRMXCFpvUg4QDEc/u7bW/1zpF3UrtF 9WdXuHXvlLAeeAd2cGzfxde017Iio9CyOjzz2ThI3AfAT7nbCYHC0HHAL5DJb4pXhGor x74w== X-Gm-Message-State: AFqh2ko3AnUzzltjDELz333SmkpuKTDZROu/vohvb3zS0pDZ4O4Dg9qv nmsMSJQ+Kt1wRSQAnNGcqKKFdJd+XvtCVw== X-Google-Smtp-Source: AMrXdXtppnsDd+8AVYBNy6o2vlqJ1phIHhxItHYGjkl2h4aRH3dcPnFnR3W7SjSlxQawcOH5KIpyTQ== X-Received: by 2002:a5d:6d0f:0:b0:28b:456c:1b6d with SMTP id e15-20020a5d6d0f000000b0028b456c1b6dmr23679664wrq.55.1672867934918; Wed, 04 Jan 2023 13:32:14 -0800 (PST) Received: from gmgdl (dsl-59-113.bl26.telepac.pt. [176.78.59.113]) by smtp.gmail.com with ESMTPSA id c8-20020a5d4148000000b002428c4fb16asm34847934wrq.10.2023.01.04.13.32.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Jan 2023 13:32:14 -0800 (PST) Received: from avar by gmgdl with local (Exim 4.96) (envelope-from ) id 1pDBMr-00CTMA-1S; Wed, 04 Jan 2023 22:32:13 +0100 From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Alejandro Colomar Cc: Linus Torvalds , linux-man@vger.kernel.org, Alejandro Colomar , Tejun Heo , Craig Small , Alexey Dobriyan , Michael Kerrisk Subject: Re: [PATCH 0/2] proc.5: note broken v4.18 userspace promise Date: Wed, 04 Jan 2023 21:59:38 +0100 References: User-agent: Debian GNU/Linux bookworm/sid; Emacs 27.1; mu4e 1.9.0 In-reply-to: Message-ID: <230104.86r0wat28y.gmgdl@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-man@vger.kernel.org On Wed, Dec 28 2022, Alejandro Colomar wrote: > [[PGP Signed Part:Undecided]] > Hi, > > On 12/23/22 19:12, Linus Torvalds wrote: >> On Fri, Dec 23, 2022 at 10:00 AM =C3=86var Arnfj=C3=B6r=C3=B0 Bjarmason >> wrote: >>> >>> Whereas the fix here is a fix for a promise we're currently making >>> which hasn't been true since v4.18. >> Hah. Ack. Did anybody ever actually notice? >> I wonder if the newer limit of 64 characters for kworkers shouldn't >> even be mentioned at all, and if the 16-byte truncation for user space >> should also be just removed. >> Those limits should never have been some documented API, they were >> always just implementation details, after all. >> Linus > Sorry about the late reply, holidays. > I agree. A variable implementation detail like this doesn't provide > anything valuable to users; especially since there's no statbility > promise at all. I'd rewrite to just remove the (16) implementation > detail. > > =C3=86var, would you send an v2 that removes implementation details, rath= er > than fixing the details? Maybe, because I'm not sure I'm qualified to document this anymore. My current patch just extends the description to cover the 4.18 divergence. Let's separate a few things here: A. The long-standing docs promise that it's limited to 16 bytes B. Since 4.18 that hasn't been true for (some of) the kernel's own processes, where the limit's been 64. C. Was the part of "A" where a limit was documented at all a good idea in retrospect? D. If "C"'s a "no" (which seems to be the consensus) what should the docs say? E. I hadn't mentioned this before, but the docs for prctl()'s PR_SET_NAME document the same 16 byte limit. I think the current behavior since 4.18 is a broken userspace promise, although admittedly a minor/obscure one. I think even if going forward the documentation is deliberately ambiguous about it, it would make sense to briefly document the 16 and 64 byte limits as past limits, to at least help to explain why current code parsing "/proc/*/stat" seems to be confident in those (or more commonly, the 16 bytes). The code I wrote was rather anal about that promise, but e.g. looking at htop(1)'s source code they've got a total limit of 2048 for this sort of line (MAX_READ). I'm sure if I went fishing I could find other similar cases (and probably some lower ones). I don't think it would be good to just leave it ambiguous for those trying to use this interface. They might assume any of 16 bytes (from finding the prctl() PR_SET_NAME docs), 64 bytes (from reading kernel sources), 255 (maximum filename length) etc. Wouldn't the least bad thing be to: * Cover "A" and "B" in passing, i.e. explain past promised / implemented limits. * Note that this is no guarantee, but that... * ...we might use up to N, where N is some sane limit (1024? 2048? 4096?). So programs that parse this now could just increase their fixed buffers, rather than having to use some getc()/realloc() loop, as they might if the interface makes no promises about an upper bound, and if they're being paranoid about future-proofing the parser. If so I have no opinion on what value of "N" would be sane, other than it seems best to pick something. ? > On 12/23/22 18:59, =C3=86var Arnfj=C3=B6r=C3=B0 Bjarmason wrote: >> diff --git a/man5/proc.5 b/man5/proc.5 >> index 115c8592e..b23dd1479 100644 >> --- a/man5/proc.5 >> +++ b/man5/proc.5 >> @@ -2092,9 +2092,13 @@ The filename of the executable, in > parentheses. Tools such as >> may alternatively (or additionally) use >> .IR/proc/ pid /cmdline. >> .IP >> -Strings longer than >> +For userspace, strings longer than >> .B TASK_COMM_LEN >> (16) characters (including the terminating null byte) are silently tru= ncated. >> +Since Linux version 4.18.0 a longer limit of 64 (including the >> +terminating null byte) has applied to the kernel's own workqueue >> +workers (whose names start with "kworker/"). >> +.IP >> This is visible whether or not the executable is swapped out. >> .TP >> (3) \fIstate\fP \ %c