From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 234893890E7 for ; Wed, 13 May 2026 22:17:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.45 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778710650; cv=none; b=BjKGmqoo9t+QL2XPM4xmdkNAf+Jwf/lTIs1lDpYPqYPOJ7w8Q6foT9BJ2XdOhJDhgTGGrwUJDdtidhTNPn1d6DRHMWxeV5OU57uReE99MSOtAZACJb5dCplvIPnQIcLaBgr19FEOqtD+9r6JWRZTULp51gj8C5mnL5a8b5JvxrI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778710650; c=relaxed/simple; bh=9ljXWelSSb41bPdmFuDfEhNURdkthMqNIaV5bG8/Ntc=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=It9PDnnS0JDR05kHRpIFDcSWOpcfrnhArCDvABlnXPsCPYaU3xx8+E5WAyy8N8ZrVEDV3CspgpmlOTvpZsdvSJwAJEgtx4gENtKpd7KGwOqCoQUqd3uRd055ndgz6fmU5dcl/eXnsbnKE1DAllIArfwesmtxd72hXMyahgj70ss= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=hackers.camp; spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.221.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=hackers.camp Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-45b030a5696so240291f8f.0 for ; Wed, 13 May 2026 15:17:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778710647; x=1779315447; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=TMGw0f9gZylJ7TavSBiZDOcc+XV9XbP1FUOcejH/V9k=; b=Q1TgsrGs1kP/56CABHqiQ/LA+s7HDX0jzE3qrznBs4RwlUVHHMrz2FsBk7adHyi1Li x0Z78EKHwBYRdFd2Bw9qs1yP/KqLH3pXZtozE9LTR3Kw51hL6M9riyGxu8FBeroQ3I3F yKhfSR8FASOHGTbN32iH2zKXGAvLnxtI7RZezmyjV1Q+AVtgQw3Fo2XxAT7/Qv6r0UVB TWOchQQOx27fA5Ocp1Hwt+VaSjd36cNaJepQC2EXlkEnWoqs2WqFsLHfaxP+G1G0eaH9 cFbPSGSXmXe07qGxLrzPuLzf3DYge74u368QmeOZPXm+MadZRjKK2L/eAy4OKmgFEUcT 7L5A== X-Gm-Message-State: AOJu0Yxaoe3rK+YunPSjCykWqkxTjn/07V7Arm9t/zF59EgN7Hh3qXt1 KOAhJcSTSX0bQJkE8LSJF7I42S7caoA2K7iXqkd3u/KKofEgWS5h9a5EH6UgVdkOb3c= X-Gm-Gg: Acq92OFIFB3Y+bZBltuXRuGNB7tum12GKTfGroYnBovxynRVChnjppWUNU4+Y49zuIb uFwZFdMfio1FrCA73OtnrfdlvygluYCTAevy90z6+mpXWQKLRyiup8io3rr8AfIHRkOeLgI3okm ecJ7qmvHpF02GxlAd907OnsbgXQwiXdrkQO9wvUvJrjSF8xrx1Zq2ncc9GrIhuKY6XwJAFqYxkO KC73Hx3VQ25QZCXj5FGuiNaxng2sbR7Q3SWl2D80sTNZ69+GByb/KGjT1pUOg/Pu9Jta1AYjxpH An5xfUQjXcYbPIFuJs9E7tgw5vtTWlN3azU5sMnYUtM0WV3wy1/xXBL9FjGOouiP7uUydH8BO/m tGpbJ3ABxpTdA+l6xJFgWPt2OGfT8efMcc15EkV4DhtvIh+EqCL077XbFF9i3qqmk+1bWJNAtzq C7I2Y9T9Iq4MTAgdFDjWA10ITOqKrv2DY2ZlMQD9B7VyjhvToQcIHvZsd7n2/7L//rL/T5PkrwC g== X-Received: by 2002:a05:600c:1d06:b0:48a:5758:7999 with SMTP id 5b1f17b1804b1-48fc9a34a11mr44069265e9.4.1778710646413; Wed, 13 May 2026 15:17:26 -0700 (PDT) Received: from spartian.home ([2a01:cb1c:868:fa00:915c:e3c1:676e:553e]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48fdb26a7aasm3878445e9.3.2026.05.13.15.17.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 May 2026 15:17:25 -0700 (PDT) From: Aurelien DESBRIERES To: tglx@kernel.org Cc: linux-kernel@vger.kernel.org Subject: [PATCH 0/1] lib/reed_solomon: document rs_control concurrency contract Date: Thu, 14 May 2026 00:17:23 +0200 Message-ID: X-Mailer: git-send-email 2.53.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Hi Thomas, While building a Reed-Solomon-protected filesystem on top of lib/reed_solomon, I ran into a subtle latent issue in the library that took some time to pin down. The library itself is correct -- this series proposes nothing more than a documentation patch to spare future users that diagnostic time. Problem in short ================ struct rs_control carries internal scratch buffers (lambda, syn, b, t, omega, root, reg, loc) in its flexible @buffers array, used by decode_rs.c and encode_rs.c during each call. Two concurrent calls to decode_rs8() / decode_rs16() (or the encode counterparts) on the same rs_control instance race on those scratch buffers and corrupt each other's intermediate state, producing spurious uncorrectable returns on otherwise valid codewords. This is consistent with the in-tree code -- nothing in the library disclaims thread safety, but the buffers are clearly per-instance, not per-call -- and consistent with the original Phil Karn KA9Q implementation from 2002 that the library is derived from. No existing in-tree user appears to decode concurrently on a shared rs_control (most callers in drivers/{mtd,dvb,...} use the library from a single producer context), which is why the contract has stayed implicit. Empirical reproducer (kernel 7.0.3, cluster of qemu-aarch64 VMs): - filesystem stores 16x RS(255,239) shortened subblocks per 4 KiB disk block, decoded on read via a single shared rs_control initialised in module_init() with init_rs(8, 0x187, 0, 1, 16) - 8 parallel sha256sum invocations across a read-only mount of a pristine image: 150-240 of 758 files report wrong hashes per run, ~160 RS uncorrectable entries in dmesg per batch - same image, sequential single-process reads: 0 errors, hashes stable byte-for-byte across hundreds of iterations - same image, parallel reads after switching to one rs_control per possible CPU (alloc_percpu + for_each_possible_cpu init_rs, with get_cpu_ptr / put_cpu_ptr around encode_rs8 / decode_rs8): 0 uncorrectable, 8 runs byte-identical to baseline, total wallclock ~25% faster than sequential thanks to real parallelism The diagnostic chain that pointed at the library (and not at our filesystem code or the page cache) involved CRC32 probes around the buffer-head reads showing that bh->b_data was byte-stable before and after the memcpy into a private scratch, and only the decode_rs8() return value diverged across concurrent calls. The fix on our side is the per-CPU rs_control allocation described above; it lives in our out-of-tree code, so this patch series proposes nothing in lib/. What this patch does ==================== Just documentation. It extends the kdoc of struct rs_control in include/linux/rslib.h with a "Locking and concurrency" paragraph that states explicitly: - one rs_control is NOT safe to share across concurrent encode_rs*() / decode_rs*() calls - callers needing concurrent codec use must allocate one rs_control per concurrent caller (one per CPU, one per worker, etc.) - the underlying rs_codec is refcounted across rs_control instances initialised with the same parameters, so the per-instance overhead is bounded to sizeof(struct rs_control) + scratch arrays No code change, no ABI change. The intent is purely to make the existing contract discoverable without having to read decode_rs.c to notice the rsc->buffers indexing. Open questions / follow-ups (not in this series) ================================================ If you think they are worth doing, two non-trivial follow-ups are possible but I did not want to bundle them with a doc patch: 1. Move the scratch arrays out of struct rs_control to caller stack or onto a per-CPU helper inside the library, so a single rs_control becomes safe to share. Neither the public API nor the ABI (struct rs_control is opaque to callers) would change, but this would touch the hot path of decode_rs.c and encode_rs.c and is larger than a doc patch should be. 2. Add an rs_control_get_for_cpu() helper that wraps alloc_percpu + for_each_possible_cpu init_rs / get_cpu_ptr for callers that want concurrent decode without rolling their own. Additive change, narrow, could ship after the doc patch. I am happy to follow up on either if you think they are worth it. I tested this patch on master at e1914add2799 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm") with an x86_64 build and an arm64 cross-build. The kdoc renders correctly under scripts/kernel-doc -rst on include/linux/rslib.h and the "Locking and concurrency" paragraph appears in the rendered Description section of struct rs_control. Disclosure: this patch and its cover letter were drafted interactively with the help of an AI coding assistant (Anthropic Claude). I have reviewed every line, verified all technical claims empirically on the test cluster described above, and am the sole signatory of the DCO. Thanks for your time. Aurelien Aurelien DESBRIERES (1): lib/reed_solomon: document rs_control concurrency contract include/linux/rslib.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) -- 2.53.0