Re: [RFC PATCH] HID: BPF: add keyboard behavioral anomaly detection

Linux Input/HID development
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: krishgulati7@gmail.com
Cc: dmitry.torokhov@gmail.com, linux-input@vger.kernel.org
Subject: Re: [RFC PATCH] HID: BPF: add keyboard behavioral anomaly detection
Date: Fri, 19 Jun 2026 07:45:42 +0000	[thread overview]
Message-ID: <20260619074543.281E81F00A3A@smtp.kernel.org> (raw)
In-Reply-To: <20260619073607.393248-1-krishgulati7@gmail.com>

Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [Critical] Evasion of detector via large HID descriptors with padded payloads.
- [High] Missing Report ID validation causes false positives and state corruption on composite devices.
- [High] Mismatched report size rounding causes silent event dropping and logical out-of-bounds reads.
- [High] LRU map eviction permanently disables detection for a device instead of resetting its state.
- [High] Uninitialized padding bytes in the stack-allocated `struct dev_info` will cause the BPF verifier to reject the program.
- [High] Keystroke heuristic fails to detect keys when no empty report is sent between strokes (morphed keys).
- [Medium] Missing `bpf_spin_lock` protection allows data races between concurrent `kdb_hook` or `probe` invocations.
--

commit 103b2fc551543479bbac0ff3ea588c3508776a4c
Author: Krish Gulati <krishgulati7@gmail.com>
HID: BPF: add keyboard behavioral anomaly detection

This patch implements a HID-BPF struct_ops program that detects automated
HID injection attacks. It does this by measuring post-enumeration delay
and tracking inter-keystroke timing using Welford's online variance algorithm.
State is stored in an LRU hash map keyed by the HID ID.

Link: https://lore.kernel.org/linux-input/adSxXidgeWF0-Ewn@beelink/

> diff --git a/src/bpf/testing/0010-Generic__keyboard.bpf.c b/src/bpf/testing/0010-Generic__keyboard.bpf.c
> index 0000000000000..91145870768f6
> --- /dev/null
> +++ b/src/bpf/testing/0010-Generic__keyboard.bpf.c

[ ... ]

> +static __always_inline void welford(struct dev_info *dev_state,
> +				    __u64 interval_ms)
> +{
> +	__s64 x = (__s64)interval_ms * HID_GUARD_WELFORD_SCALE;
> +	__s64 delta, delta2, delta_abs, signed_div_res;
> +	__u64 div_res;
> +
> +	dev_state->count++;

[Severity: Medium]
Will this lockless compound update be corrupted by concurrent executions?

The dev_details LRU hash map stores per-device state which kdb_hook() reads
and modifies directly via a map value pointer locklessly. Updating Welford
variance (count, mean, M2) is a compound operation.

If kdb_hook() executes concurrently for the same device, such as across
multiple endpoints on a composite device, or if the userspace probe()
program is executed simultaneously, torn reads and writes will occur.

Does this structure require a bpf_spin_lock to prevent data races?

> +	delta = x - dev_state->mean;

[ ... ]

> +SEC(HID_BPF_DEVICE_EVENT)
> +int BPF_PROG(kdb_hook, struct hid_bpf_ctx *hctx)
> +{
> +	__u32 hid_id = hctx->hid->id;
> +
> +	struct dev_info *info;
> +
> +	info = bpf_map_lookup_elem(&dev_details, &hid_id);
> +
> +	if (!info)
> +		return 0;

[Severity: High]
Does an LRU map eviction permanently disable detection for a device instead
of resetting its state?

The dev_details map uses BPF_MAP_TYPE_LRU_HASH with 128 max entries.
When a device is evicted, bpf_map_lookup_elem() returns NULL, and
kdb_hook() immediately returns 0, completely ignoring the event.

Because elements are only inserted by the userspace probe() program, an
evicted device is never automatically re-inserted by the kernel hook.

Could an attacker deliberately present 128 dummy HID interfaces to flood
the map, permanently evict the payload device, and bypass the guard entirely?

> +
> +	__u32 size = info->report_size;
> +	__u32 fetch_size;
> +	__u8 *data;
> +
> +	if (size <= 8)
> +		fetch_size = 8;
> +	else if (size <= 16)
> +		fetch_size = 16;
> +	else
> +		fetch_size = 32;

[Severity: Critical]
Does hardcapping the analysis size to 32 bytes allow attackers to evade
detection by padding their malicious HID descriptor?

An attacker could craft a custom USB HID descriptor that defines dummy
fields or padding for the first 32 bytes, placing the actual Keyboard Usage
page array at offset 32 or higher.

The loop will only iterate over the static padding and completely ignore
the keystrokes. Meanwhile, the Linux HID subsystem correctly parses the full
report and processes the hidden keycodes, allowing arbitrary keystroke
injection that bypasses this guard.

> +
> +	data = hid_bpf_get_data(hctx, 0, fetch_size);
> +
> +	if (!data)
> +		return 0;

[Severity: High]
Will this size rounding cause silent event dropping and logical out-of-bounds
reads?

The BPF program rounds info->report_size up to a power-of-two fetch_size
(8, 16, or 32). If a keyboard has a 9-byte report, fetch_size becomes 16.

If the kernel's ctx->allocated_size for this device is exactly 9 bytes,
hid_bpf_get_data(hctx, 0, 16) will exceed the bounds and return NULL.
The program then returns 0, silently skipping analysis for the report
while the kernel continues processing the keystrokes.

If allocated_size is large enough, the loop uses fetch_size as the upper
bound, reading past valid report data into padding or uninitialized memory,
polluting the active key tracking.

> +	int now_active_ks = 0, was_active_ks = 0;
> +#pragma unroll
> +	for (int i = 0; i < 32; i++) {
> +		if (i >= fetch_size)
> +			break;
> +		now_active_ks += ((__u8)data[i] + 255) >> 8;

[Severity: High]
Will this logic indiscriminately process all incoming raw reports, even
those not belonging to the keyboard?

Many HID keyboards are composite devices, such as wireless keyboard and
mouse receivers, that send multiple report types over the same interface.

Since the report ID is observed in probe() but not stored in dev_info,
this hook counts all non-zero bytes across all input reports as keystrokes.
Mouse movements, which send reports with changing non-zero bytes, will be
incorrectly interpreted as rapid keystrokes.

Will this corrupt the Welford variance tracking and cause false positive
anomaly detections?

> +		was_active_ks += ((__u8)info->prev_report[i] + 255) >> 8;
> +
> +		info->prev_report[i] = data[i];
> +	}
> +
> +	__u64 current_ms = bpf_ktime_get_ns() / HID_GUARD_NSEC_PER_MSEC;
> +
> +	if (now_active_ks > was_active_ks) {

[Severity: High]
Does this heuristic fail to detect keys when no empty report is sent
between strokes?

The detection logic relies strictly on an increase in the total number of
non-zero bytes to identify a keypress.

Automated injection tools routinely optimize injections by morphing
keycodes directly without sending an intervening empty (all-zeros) report.
If an attacker's tool changes key 'A' to key 'B' in consecutive reports,
both reports have exactly one non-zero byte.

The condition 1 > 1 evaluates to false, causing the keystroke to be missed
and bypassing the variance check entirely.

[ ... ]

> +SEC("syscall")
> +int probe(struct hid_bpf_probe_args *ctx)
> +{
> +	struct hid_rdesc_report *input;
> +	struct hid_rdesc_field *field;
> +	struct hid_rdesc_collection *col;
> +
> +	hid_bpf_for_each_input_report(&HID_REPORT_DESCRIPTOR, input) {
> +		__u32 size_in_bytes = (input->size_in_bits + 7) / 8;
> +
> +		bpf_printk("Report size: %d\n", size_in_bytes);
> +		if (input->report_id != 0)
> +			size_in_bytes += 1;

[Severity: High]
Is the report ID needed here to validate incoming reports later?

The report ID is observed here but not stored in the map, making it
impossible for kdb_hook() to distinguish between keyboard reports and
other HID reports on a composite device.

> +
> +		bpf_printk("Report size after report_id: %d\n", size_in_bytes);
> +
> +		hid_bpf_for_each_field(input, field) {
> +			hid_bpf_for_each_collection(field, col) {
> +				if (col->usage_page ==
> +					    HidUsagePage_GenericDesktop &&
> +				    col->usage_id == HidUsage_GD_Keyboard) {
> +					__u32 key = ctx->hid;
> +					struct dev_info info = {
> +						.connection_time =
> +							bpf_ktime_get_ns(),
> +						.count = 0,
> +						.report_size = size_in_bytes
> +					};

[Severity: High]
Will the uninitialized padding bytes in this struct cause the BPF verifier
to reject the program?

The dev_info object is initialized using a designated initializer list.
While this zero-initializes unspecified members, it does not guarantee that
the compiler will zero the 4 bytes of padding between report_size and
prev_report.

The BPF verifier strictly tracks initialization of stack memory at the byte
level. When bpf_map_update_elem() reads the struct from the stack, the
verifier will detect the uninitialized padding and reject the program with
an 'invalid indirect read from stack' error.

Should this structure be explicitly zeroed using __builtin_memset?

> +					bpf_map_update_elem(&dev_details, &key,
> +							    &info, BPF_ANY);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260619073607.393248-1-krishgulati7@gmail.com?part=1

next prev parent reply	other threads:[~2026-06-19  7:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-04 13:37 [PATCH v2 0/1] HID: add malicious HID device detection driver Zubeyr Almaho
2026-04-04 13:37 ` [PATCH v2 1/1] " Zubeyr Almaho
2026-04-07  7:59   ` Benjamin Tissoires
2026-06-19  7:36     ` [RFC PATCH] HID: BPF: add keyboard behavioral anomaly detection krishgulati7
2026-06-19  7:45       ` sashiko-bot [this message]
2026-04-05  5:31 ` [PATCH v2 0/1] HID: add malicious HID device detection driver Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260619074543.281E81F00A3A@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=dmitry.torokhov@gmail.com \
    --cc=krishgulati7@gmail.com \
    --cc=linux-input@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox