The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Aurelien DESBRIERES <aurelien@hackers.camp>
To: tglx@kernel.org
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH 0/1] lib/reed_solomon: document rs_control concurrency contract
Date: Thu, 14 May 2026 00:17:23 +0200	[thread overview]
Message-ID: <cover.1778710307.git.aurelien@hackers.camp> (raw)

Hi Thomas,

While building a Reed-Solomon-protected filesystem on top of
lib/reed_solomon, I ran into a subtle latent issue in the library that
took some time to pin down. The library itself is correct -- this
series proposes nothing more than a documentation patch to spare
future users that diagnostic time.

Problem in short
================

struct rs_control carries internal scratch buffers (lambda, syn, b, t,
omega, root, reg, loc) in its flexible @buffers array, used by
decode_rs.c and encode_rs.c during each call. Two concurrent calls to
decode_rs8() / decode_rs16() (or the encode counterparts) on the same
rs_control instance race on those scratch buffers and corrupt each
other's intermediate state, producing spurious uncorrectable returns
on otherwise valid codewords.

This is consistent with the in-tree code -- nothing in the library
disclaims thread safety, but the buffers are clearly per-instance, not
per-call -- and consistent with the original Phil Karn KA9Q
implementation from 2002 that the library is derived from. No existing
in-tree user appears to decode concurrently on a shared rs_control
(most callers in drivers/{mtd,dvb,...} use the library from a single
producer context), which is why the contract has stayed implicit.

Empirical reproducer (kernel 7.0.3, cluster of qemu-aarch64 VMs):

  - filesystem stores 16x RS(255,239) shortened subblocks per 4 KiB
    disk block, decoded on read via a single shared rs_control
    initialised in module_init() with init_rs(8, 0x187, 0, 1, 16)
  - 8 parallel sha256sum invocations across a read-only mount of a
    pristine image: 150-240 of 758 files report wrong hashes per run,
    ~160 RS uncorrectable entries in dmesg per batch
  - same image, sequential single-process reads: 0 errors, hashes
    stable byte-for-byte across hundreds of iterations
  - same image, parallel reads after switching to one rs_control per
    possible CPU (alloc_percpu + for_each_possible_cpu init_rs, with
    get_cpu_ptr / put_cpu_ptr around encode_rs8 / decode_rs8): 0
    uncorrectable, 8 runs byte-identical to baseline, total wallclock
    ~25% faster than sequential thanks to real parallelism

The diagnostic chain that pointed at the library (and not at our
filesystem code or the page cache) involved CRC32 probes around the
buffer-head reads showing that bh->b_data was byte-stable before and
after the memcpy into a private scratch, and only the decode_rs8()
return value diverged across concurrent calls. The fix on our side is
the per-CPU rs_control allocation described above; it lives in our
out-of-tree code, so this patch series proposes nothing in lib/.

What this patch does
====================

Just documentation. It extends the kdoc of struct rs_control in
include/linux/rslib.h with a "Locking and concurrency" paragraph that
states explicitly:

  - one rs_control is NOT safe to share across concurrent
    encode_rs*() / decode_rs*() calls
  - callers needing concurrent codec use must allocate one rs_control
    per concurrent caller (one per CPU, one per worker, etc.)
  - the underlying rs_codec is refcounted across rs_control instances
    initialised with the same parameters, so the per-instance overhead
    is bounded to sizeof(struct rs_control) + scratch arrays

No code change, no ABI change. The intent is purely to make the
existing contract discoverable without having to read decode_rs.c to
notice the rsc->buffers indexing.

Open questions / follow-ups (not in this series)
================================================

If you think they are worth doing, two non-trivial follow-ups are
possible but I did not want to bundle them with a doc patch:

  1. Move the scratch arrays out of struct rs_control to caller stack
     or onto a per-CPU helper inside the library, so a single
     rs_control becomes safe to share. Neither the public API nor the
     ABI (struct rs_control is opaque to callers) would change, but
     this would touch the hot path of decode_rs.c and encode_rs.c and
     is larger than a doc patch should be.

  2. Add an rs_control_get_for_cpu() helper that wraps alloc_percpu +
     for_each_possible_cpu init_rs / get_cpu_ptr for callers that want
     concurrent decode without rolling their own. Additive change,
     narrow, could ship after the doc patch.

I am happy to follow up on either if you think they are worth it.

I tested this patch on master at e1914add2799 ("Merge tag 'for-linus'
of git://git.kernel.org/pub/scm/virt/kvm/kvm") with an x86_64 build and
an arm64 cross-build. The kdoc renders correctly under
scripts/kernel-doc -rst on include/linux/rslib.h and the "Locking and
concurrency" paragraph appears in the rendered Description section of
struct rs_control.

Disclosure: this patch and its cover letter were drafted interactively
with the help of an AI coding assistant (Anthropic Claude). I have
reviewed every line, verified all technical claims empirically on the
test cluster described above, and am the sole signatory of the DCO.

Thanks for your time.

Aurelien

Aurelien DESBRIERES (1):
  lib/reed_solomon: document rs_control concurrency contract

 include/linux/rslib.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

-- 
2.53.0


             reply	other threads:[~2026-05-13 22:17 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-13 22:17 Aurelien DESBRIERES [this message]
2026-05-13 22:17 ` [PATCH 1/1] lib/reed_solomon: document rs_control concurrency contract Aurelien DESBRIERES

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1778710307.git.aurelien@hackers.camp \
    --to=aurelien@hackers.camp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox