* [RFC] doc: add AGENTS.md for AI-powered code review tools
@ 2026-01-09 1:41 Stephen Hemminger
2026-01-09 9:46 ` Bruce Richardson
` (11 more replies)
0 siblings, 12 replies; 63+ messages in thread
From: Stephen Hemminger @ 2026-01-09 1:41 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Add a structured reference document that enables AI code review tools
to validate DPDK contributions against project standards. This document
consolidates requirements from multiple sources into a machine-readable
format optimized for automated validation workflows.
The AGENTS.md file synthesizes guidelines from:
- DPDK Contributing Code documentation (patches.rst)
- DPDK Coding Style guidelines (coding_style.rst)
- Linux kernel patch submission process (submitting-patches.rst)
- SPDX License Identifier specification (spdx.org)
Key sections include:
- SPDX license identifier requirements
- Commit message format and tag ordering
- C coding style rules and naming conventions
- Patch validation checklists with severity levels
- Meson build file style requirements
The document provides actionable checklists and concrete examples to
support integration with CI/CD pipelines and automated review systems.
Severity levels (error/warning/info) help tools prioritize feedback
appropriately.
This supports the broader goal of maintaining code quality and
consistency as the project scales, while reducing manual review burden
for maintainers on mechanical style issues.
References:
- https://doc.dpdk.org/guides/contributing/patches.html
- https://doc.dpdk.org/guides/contributing/coding_style.html
- https://www.kernel.org/doc/html/latest/process/submitting-patches.html
- https://spdx.org/licenses/
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
AGENTS.md | 410 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 410 insertions(+)
create mode 100644 AGENTS.md
diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 0000000000..cfaaa81af3
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,410 @@
+# AGENTS.md - DPDK Code Review Guidelines for AI Tools
+
+This document provides guidelines for AI-powered code review tools when reviewing contributions to the Data Plane Development Kit (DPDK). It is derived from the official DPDK contributor guidelines.
+
+## Overview
+
+DPDK follows a development process modeled on the Linux Kernel. All patches are reviewed publicly on the mailing list before being merged. AI review tools should verify compliance with the standards outlined below.
+
+---
+
+## Source License Requirements
+
+### SPDX License Identifiers
+
+- **Every file must begin with an SPDX license identifier** on the first line (or second line for `#!` scripts)
+- Core libraries and drivers use `BSD-3-Clause`
+- Kernel components use `GPL-2.0`
+- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)`
+
+```c
+/* Correct */
+/* SPDX-License-Identifier: BSD-3-Clause */
+
+/* Incorrect - no boilerplate license text should follow */
+```
+
+- A blank line must follow the license header before any other content
+
+---
+
+## Commit Message Requirements
+
+### Subject Line (First Line)
+
+- **Must capture the area and impact** of the change
+- **~50 characters** recommended length
+- **Lowercase** except for acronyms
+- **Prefixed with component name** (check `git log` for existing components)
+- Use **imperative mood** (instructions to the codebase)
+- **No trailing period** (causes double periods in patch filenames)
+
+```
+# Good examples
+ixgbe: fix offload config option name
+config: increase max queues per port
+net/mlx5: add support for flow counters
+
+# Bad examples
+Fixed the offload config option. # past tense, has period
+IXGBE: Fix Offload Config # uppercase
+```
+
+### Commit Body
+
+- Describe the issue being fixed or feature being added
+- Provide enough context for reviewers to understand the purpose
+- Wrap text at **72 characters**
+- **Must end with** `Signed-off-by:` line (real name, not alias)
+- When fixing regressions, include:
+ ```
+ Fixes: abcdefgh1234 ("original commit subject")
+ Cc: original_author@example.com
+ ```
+
+### Required Tags
+
+```
+# For Coverity issues:
+Coverity issue: 12345
+
+# For Bugzilla issues:
+Bugzilla ID: 12345
+
+# For stable release backport candidates:
+Cc: stable@dpdk.org
+
+# For patch dependencies:
+Depends-on: series-NNNNN ("Title of the series")
+```
+
+### Tag Order
+
+```
+Coverity issue:
+Bugzilla ID:
+Fixes:
+Cc:
+
+Reported-by:
+Suggested-by:
+Signed-off-by:
+Acked-by:
+Reviewed-by:
+Tested-by:
+```
+
+Note: Empty line between the first group and `Reported-by:`
+
+---
+
+## C Coding Style
+
+### General Formatting
+
+- **Line length**: Recommended ≤80 characters, acceptable up to 100
+- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment)
+- **No trailing whitespace** on lines or at end of files
+- Code style must be consistent within each file
+
+### Comments
+
+```c
+/* Most single-line comments look like this. */
+
+/*
+ * VERY important single-line comments look like this.
+ */
+
+/*
+ * Multi-line comments look like this. Make them real sentences. Fill
+ * them so they look like real paragraphs.
+ */
+```
+
+### Header File Organization
+
+Include order (each group separated by blank line):
+1. System/libc includes
+2. DPDK EAL includes
+3. DPDK misc library includes
+4. Application-specific includes
+
+```c
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <rte_eal.h>
+
+#include <rte_ring.h>
+#include <rte_mempool.h>
+
+#include "application.h"
+```
+
+### Header Guards
+
+```c
+#ifndef _FILE_H_
+#define _FILE_H_
+
+/* Code */
+
+#endif /* _FILE_H_ */
+```
+
+### Naming Conventions
+
+- **All external symbols** must have `RTE_` or `rte_` prefix
+- **Macros**: ALL_UPPERCASE
+- **Functions**: lowercase with underscores only (no CamelCase)
+- **Variables**: lowercase with underscores only
+- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix
+- **Struct types**: prefer `struct name` over typedefs
+
+#### Prohibited Terminology
+
+Do not use:
+- `master/slave` → Use: primary/secondary, controller/worker, leader/follower
+- `blacklist/whitelist` → Use: denylist/allowlist, blocklist/passlist
+
+### Indentation and Braces
+
+```c
+/* Control statements - no braces for single statements */
+if (val != NULL)
+ val = realloc(val, newsize);
+
+/* Braces on same line as else */
+if (test)
+ stmt;
+else if (bar) {
+ stmt;
+ stmt;
+} else
+ stmt;
+
+/* Switch statements - don't indent case */
+switch (ch) {
+case 'a':
+ aflag = 1;
+ /* FALLTHROUGH */
+case 'b':
+ bflag = 1;
+ break;
+default:
+ usage();
+}
+
+/* Long conditions - double indent continuation */
+if (really_long_variable_name_1 == really_long_variable_name_2 &&
+ var3 == var4) {
+ x = y + z;
+}
+```
+
+### Function Definitions
+
+```c
+/* Return type on its own line, opening brace on its own line */
+static char *
+function(int a1, int a2, float fl, int a4)
+{
+ /* body */
+}
+```
+
+### Variable Declarations
+
+```c
+int *x; /* no space after asterisk */
+int * const x; /* space before type qualifier */
+
+/* Multiple initializers - one per line */
+char a = 0;
+char b = 0;
+
+/* Or only last variable initialized */
+float x, y = 0.0;
+```
+
+### Pointer and NULL Comparisons
+
+```c
+/* Good */
+if (p == NULL)
+if (*p == '\0')
+
+/* Bad */
+if (!p) /* don't use ! on pointers */
+```
+
+### Return Values
+
+- Object creation/allocation: return pointer, NULL on error, set `rte_errno`
+- Packet burst functions: return number of packets handled
+- Other int-returning functions: 0 on success, -1 on error (or `-errno`)
+- No-error functions: use `void` return type
+- Don't cast `void *` return values
+- Don't parenthesize return values
+
+### Macros
+
+```c
+/* Wrap compound statements in do-while(0) */
+#define MACRO(x, y) do { \
+ variable = (x) + (y); \
+ (y) += 2; \
+} while (0)
+```
+
+Prefer enums and inline functions over macros when possible.
+
+### Structure Layout
+
+- Order members by: use, then size (largest to smallest), then alphabetically
+- New additions to existing structures go at the end (ABI compatibility)
+- Align member names with spaces
+- Avoid `bool` in structures (unclear size, wastes space)
+
+```c
+struct foo {
+ struct foo *next; /* List of active foo. */
+ struct mumble amumble; /* Comment for mumble. */
+ int bar; /* Try to align the comments. */
+};
+```
+
+---
+
+## Code Quality Requirements
+
+### Compilation
+
+- Each commit must compile independently (for `git bisect`)
+- No forward dependencies within a patchset
+- Test with multiple targets, compilers, and options
+- Use `devtools/test-meson-builds.sh`
+
+### Testing
+
+- Add tests to `app/test` unit test framework
+- New API functions must be used in `/app` test directory
+- New device APIs require at least one driver implementation
+
+### Documentation
+
+- Add Doxygen comments for public APIs
+- Update release notes in `doc/guides/rel_notes/` for important changes
+- Code and documentation must be updated atomically in same patch
+
+### ABI Compatibility
+
+- New external functions must be exported properly
+- Follow ABI policy and versioning guidelines
+- Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable
+
+---
+
+## Patch Validation Checklist
+
+AI review tools should verify:
+
+### Commit Message
+- [ ] Subject line ~50 chars, lowercase (except acronyms)
+- [ ] Component prefix present and valid
+- [ ] Imperative mood used
+- [ ] No trailing period on subject
+- [ ] Body wrapped at 72 characters
+- [ ] `Signed-off-by:` present with real name
+- [ ] `Fixes:` tag present for bug fixes with 12-char SHA
+- [ ] Tags in correct order
+
+### License
+- [ ] SPDX identifier on first line (or second for scripts)
+- [ ] Appropriate license for file type
+- [ ] Blank line after license header
+
+### Code Style
+- [ ] Lines ≤100 characters (prefer ≤80)
+- [ ] Hard tabs for indentation
+- [ ] No trailing whitespace
+- [ ] Proper include order
+- [ ] Header guards present
+- [ ] `rte_`/`RTE_` prefix on external symbols
+- [ ] No prohibited terminology
+- [ ] Proper brace style
+- [ ] Function return type on own line
+- [ ] NULL comparisons use `== NULL`
+
+### Structure
+- [ ] Each commit compiles independently
+- [ ] Code and docs updated together
+- [ ] Tests added/updated as needed
+- [ ] Release notes updated for significant changes
+
+---
+
+## Meson Build Files
+
+### Style Requirements
+
+- 4-space indentation (no tabs)
+- Line continuations double-indented
+- Lists alphabetically ordered
+- Short lists (≤3 items): single line, no trailing comma
+- Long lists: one item per line, trailing comma on last item
+
+```python
+# Short list
+sources = files('file1.c', 'file2.c')
+
+# Long list
+headers = files(
+ 'header1.h',
+ 'header2.h',
+ 'header3.h',
+)
+```
+
+---
+
+## Python Code
+
+- Must comply with PEP8
+- Line length acceptable up to 100 characters
+- Use `pep8` tool for validation
+
+---
+
+## Review Process Notes
+
+### For AI Review Tools
+
+When providing feedback:
+- Reference specific line numbers
+- Cite the relevant guideline section
+- Suggest concrete fixes
+- Prioritize: errors > warnings > style suggestions
+- Flag potential ABI breaks
+- Check for missing documentation updates
+- Verify test coverage for new functionality
+
+### Severity Levels
+
+**Error** (must fix):
+- Missing SPDX license
+- Missing Signed-off-by
+- Compilation failures
+- ABI breaks without proper versioning
+
+**Warning** (should fix):
+- Style violations
+- Missing Fixes tag for bug fixes
+- Documentation gaps
+- Missing tests
+
+**Info** (consider):
+- Minor style preferences
+- Optimization suggestions
+- Alternative approaches
--
2.51.0
^ permalink raw reply related [flat|nested] 63+ messages in thread* Re: [RFC] doc: add AGENTS.md for AI-powered code review tools 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger @ 2026-01-09 9:46 ` Bruce Richardson 2026-01-10 17:34 ` Stephen Hemminger 2026-01-09 9:58 ` Marat Khalili ` (10 subsequent siblings) 11 siblings, 1 reply; 63+ messages in thread From: Bruce Richardson @ 2026-01-09 9:46 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev On Thu, Jan 08, 2026 at 05:41:04PM -0800, Stephen Hemminger wrote: > Add a structured reference document that enables AI code review tools > to validate DPDK contributions against project standards. This document > consolidates requirements from multiple sources into a machine-readable > format optimized for automated validation workflows. > > The AGENTS.md file synthesizes guidelines from: > - DPDK Contributing Code documentation (patches.rst) > - DPDK Coding Style guidelines (coding_style.rst) > - Linux kernel patch submission process (submitting-patches.rst) > - SPDX License Identifier specification (spdx.org) > > Key sections include: > - SPDX license identifier requirements > - Commit message format and tag ordering > - C coding style rules and naming conventions > - Patch validation checklists with severity levels > - Meson build file style requirements > > The document provides actionable checklists and concrete examples to > support integration with CI/CD pipelines and automated review systems. > Severity levels (error/warning/info) help tools prioritize feedback > appropriately. > > This supports the broader goal of maintaining code quality and > consistency as the project scales, while reducing manual review burden > for maintainers on mechanical style issues. > > References: > - https://doc.dpdk.org/guides/contributing/patches.html > - https://doc.dpdk.org/guides/contributing/coding_style.html > - https://www.kernel.org/doc/html/latest/process/submitting-patches.html > - https://spdx.org/licenses/ > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- Good to see this being added. Feedback inline below. Thanks, /Bruce > AGENTS.md | 410 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 410 insertions(+) > create mode 100644 AGENTS.md > > diff --git a/AGENTS.md b/AGENTS.md > new file mode 100644 > index 0000000000..cfaaa81af3 > --- /dev/null > +++ b/AGENTS.md > @@ -0,0 +1,410 @@ > +# AGENTS.md - DPDK Code Review Guidelines for AI Tools > + > +This document provides guidelines for AI-powered code review tools when reviewing contributions to the Data Plane Development Kit (DPDK). It is derived from the official DPDK contributor guidelines. > + > +## Overview > + > +DPDK follows a development process modeled on the Linux Kernel. All patches are reviewed publicly on the mailing list before being merged. AI review tools should verify compliance with the standards outlined below. > + > +--- > + > +## Source License Requirements > + > +### SPDX License Identifiers > + > +- **Every file must begin with an SPDX license identifier** on the first line (or second line for `#!` scripts) > +- Core libraries and drivers use `BSD-3-Clause` > +- Kernel components use `GPL-2.0` > +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` > + > +```c > +/* Correct */ > +/* SPDX-License-Identifier: BSD-3-Clause */ > + > +/* Incorrect - no boilerplate license text should follow */ > +``` > + > +- A blank line must follow the license header before any other content > + Not strictly true, the blank line follows the copyright line immediately after the SPDX header. The copyright line should be included in descriptions and examples above. > +--- > + > +## Commit Message Requirements > + > +### Subject Line (First Line) > + > +- **Must capture the area and impact** of the change > +- **~50 characters** recommended length In practice I think we allow up to 60 chars. So 50 chars recommended, 60 char max. > +- **Lowercase** except for acronyms > +- **Prefixed with component name** (check `git log` for existing components) > +- Use **imperative mood** (instructions to the codebase) > +- **No trailing period** (causes double periods in patch filenames) > + > +``` > +# Good examples > +ixgbe: fix offload config option name Should be "net/ixgbe". Since you have a driver example below, I'd suggest picking an example for a library e.g. hash, to include as well. > +config: increase max queues per port > +net/mlx5: add support for flow counters > + > +# Bad examples > +Fixed the offload config option. # past tense, has period > +IXGBE: Fix Offload Config # uppercase > +``` > + > +### Commit Body > + > +- Describe the issue being fixed or feature being added > +- Provide enough context for reviewers to understand the purpose > +- Wrap text at **72 characters** > +- **Must end with** `Signed-off-by:` line (real name, not alias) > +- When fixing regressions, include: > + ``` > + Fixes: abcdefgh1234 ("original commit subject") > + Cc: original_author@example.com > + ``` I know we *should* CC original authors, but in practice I don't think we do. So many original authors have moved on, I suspect that this guidance just causes too many email bounces to be useful. I would update this to Fixes and CC stable for all fixes. Let's not burden the contributor with determining if a patch needs backport or not - I assume the stable tree maintainers have tooling to pull out only relevant fixes for their trees. > + > +### Required Tags > + > +``` > +# For Coverity issues: > +Coverity issue: 12345 > + > +# For Bugzilla issues: > +Bugzilla ID: 12345 > + > +# For stable release backport candidates: > +Cc: stable@dpdk.org I'd make this an always-add for bugfixes, unless it causes us other issues. > + > +# For patch dependencies: > +Depends-on: series-NNNNN ("Title of the series") > +``` > + > +### Tag Order > + > +``` > +Coverity issue: > +Bugzilla ID: > +Fixes: > +Cc: > + > +Reported-by: > +Suggested-by: > +Signed-off-by: > +Acked-by: > +Reviewed-by: > +Tested-by: > +``` > + > +Note: Empty line between the first group and `Reported-by:` > + > +--- > + > +## C Coding Style > + > +### General Formatting > + > +- **Line length**: Recommended ≤80 characters, acceptable up to 100 I'd drop the 80 reference. If we can use 100 chars, let's do so, rather than having lines split. > +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) > +- **No trailing whitespace** on lines or at end of files > +- Code style must be consistent within each file > + > +### Comments > + > +```c > +/* Most single-line comments look like this. */ > + > +/* > + * VERY important single-line comments look like this. > + */ > + > +/* > + * Multi-line comments look like this. Make them real sentences. Fill > + * them so they look like real paragraphs. > + */ > +``` > + > +### Header File Organization > + > +Include order (each group separated by blank line): > +1. System/libc includes > +2. DPDK EAL includes > +3. DPDK misc library includes > +4. Application-specific includes > + > +```c > +#include <stdio.h> > +#include <stdlib.h> > + > +#include <rte_eal.h> > + > +#include <rte_ring.h> > +#include <rte_mempool.h> > + > +#include "application.h" > +``` > + > +### Header Guards > + > +```c > +#ifndef _FILE_H_ > +#define _FILE_H_ > + > +/* Code */ > + > +#endif /* _FILE_H_ */ > +``` > + > +### Naming Conventions > + > +- **All external symbols** must have `RTE_` or `rte_` prefix > +- **Macros**: ALL_UPPERCASE > +- **Functions**: lowercase with underscores only (no CamelCase) > +- **Variables**: lowercase with underscores only > +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix > +- **Struct types**: prefer `struct name` over typedefs > + We can strengthen and clarify this, I think: for new definitions, typedefs only for function pointer types. > +#### Prohibited Terminology > + > +Do not use: > +- `master/slave` → Use: primary/secondary, controller/worker, leader/follower > +- `blacklist/whitelist` → Use: denylist/allowlist, blocklist/passlist > + > +### Indentation and Braces > + > +```c > +/* Control statements - no braces for single statements */ > +if (val != NULL) > + val = realloc(val, newsize); > + > +/* Braces on same line as else */ > +if (test) > + stmt; > +else if (bar) { > + stmt; > + stmt; > +} else > + stmt; > + > +/* Switch statements - don't indent case */ > +switch (ch) { > +case 'a': > + aflag = 1; > + /* FALLTHROUGH */ > +case 'b': > + bflag = 1; > + break; > +default: > + usage(); > +} > + > +/* Long conditions - double indent continuation */ > +if (really_long_variable_name_1 == really_long_variable_name_2 && > + var3 == var4) { > + x = y + z; > +} > +``` > + > +### Function Definitions > + > +```c > +/* Return type on its own line, opening brace on its own line */ > +static char * > +function(int a1, int a2, float fl, int a4) > +{ > + /* body */ > +} > +``` > + > +### Variable Declarations > + > +```c > +int *x; /* no space after asterisk */ > +int * const x; /* space before type qualifier */ > + > +/* Multiple initializers - one per line */ > +char a = 0; > +char b = 0; > + > +/* Or only last variable initialized */ > +float x, y = 0.0; > +``` > + > +### Pointer and NULL Comparisons > + > +```c > +/* Good */ > +if (p == NULL) > +if (*p == '\0') > + > +/* Bad */ > +if (!p) /* don't use ! on pointers */ > +``` > + > +### Return Values > + > +- Object creation/allocation: return pointer, NULL on error, set `rte_errno` > +- Packet burst functions: return number of packets handled > +- Other int-returning functions: 0 on success, -1 on error (or `-errno`) > +- No-error functions: use `void` return type > +- Don't cast `void *` return values > +- Don't parenthesize return values > + > +### Macros > + > +```c > +/* Wrap compound statements in do-while(0) */ > +#define MACRO(x, y) do { \ > + variable = (x) + (y); \ > + (y) += 2; \ > +} while (0) > +``` > + > +Prefer enums and inline functions over macros when possible. > + > +### Structure Layout > + > +- Order members by: use, then size (largest to smallest), then alphabetically > +- New additions to existing structures go at the end (ABI compatibility) > +- Align member names with spaces Nice to have, ignored in practice AFAIK. Maybe not worth including. > +- Avoid `bool` in structures (unclear size, wastes space) Disagree on this one. Unless it's a datapath struct that must be small, using bool is a lot clearer and easier to manage than bit fields. > + > +```c > +struct foo { > + struct foo *next; /* List of active foo. */ > + struct mumble amumble; /* Comment for mumble. */ > + int bar; /* Try to align the comments. */ > +}; > +``` > + > +--- > + > +## Code Quality Requirements > + > +### Compilation > + > +- Each commit must compile independently (for `git bisect`) > +- No forward dependencies within a patchset > +- Test with multiple targets, compilers, and options > +- Use `devtools/test-meson-builds.sh` > + > +### Testing > + > +- Add tests to `app/test` unit test framework > +- New API functions must be used in `/app` test directory > +- New device APIs require at least one driver implementation > + > +### Documentation > + > +- Add Doxygen comments for public APIs > +- Update release notes in `doc/guides/rel_notes/` for important changes > +- Code and documentation must be updated atomically in same patch > + > +### ABI Compatibility > + > +- New external functions must be exported properly > +- Follow ABI policy and versioning guidelines > +- Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable > + > +--- > + > +## Patch Validation Checklist > + > +AI review tools should verify: > + > +### Commit Message > +- [ ] Subject line ~50 chars, lowercase (except acronyms) 60 chars > +- [ ] Component prefix present and valid > +- [ ] Imperative mood used > +- [ ] No trailing period on subject > +- [ ] Body wrapped at 72 characters > +- [ ] `Signed-off-by:` present with real name > +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA > +- [ ] Tags in correct order > + > +### License > +- [ ] SPDX identifier on first line (or second for scripts) > +- [ ] Appropriate license for file type > +- [ ] Blank line after license header > + > +### Code Style > +- [ ] Lines ≤100 characters (prefer ≤80) > +- [ ] Hard tabs for indentation > +- [ ] No trailing whitespace > +- [ ] Proper include order > +- [ ] Header guards present > +- [ ] `rte_`/`RTE_` prefix on external symbols > +- [ ] No prohibited terminology > +- [ ] Proper brace style > +- [ ] Function return type on own line > +- [ ] NULL comparisons use `== NULL` - [ ] Integer comparisons use `== 0` > + > +### Structure > +- [ ] Each commit compiles independently > +- [ ] Code and docs updated together > +- [ ] Tests added/updated as needed > +- [ ] Release notes updated for significant changes > + > +--- > + > +## Meson Build Files > + > +### Style Requirements > + > +- 4-space indentation (no tabs) > +- Line continuations double-indented > +- Lists alphabetically ordered > +- Short lists (≤3 items): single line, no trailing comma > +- Long lists: one item per line, trailing comma on last item > + > +```python > +# Short list > +sources = files('file1.c', 'file2.c') > + > +# Long list > +headers = files( > + 'header1.h', > + 'header2.h', > + 'header3.h', > +) > +``` > + > +--- > + > +## Python Code > + > +- Must comply with PEP8 > +- Line length acceptable up to 100 characters > +- Use `pep8` tool for validation > + I think we are switching to using "black" as the python formatting tool of choice. > +--- > + > +## Review Process Notes > + > +### For AI Review Tools > + > +When providing feedback: > +- Reference specific line numbers > +- Cite the relevant guideline section > +- Suggest concrete fixes > +- Prioritize: errors > warnings > style suggestions > +- Flag potential ABI breaks > +- Check for missing documentation updates > +- Verify test coverage for new functionality > + > +### Severity Levels > + > +**Error** (must fix): > +- Missing SPDX license > +- Missing Signed-off-by > +- Compilation failures > +- ABI breaks without proper versioning > + > +**Warning** (should fix): > +- Style violations > +- Missing Fixes tag for bug fixes > +- Documentation gaps > +- Missing tests > + > +**Info** (consider): > +- Minor style preferences > +- Optimization suggestions > +- Alternative approaches > -- > 2.51.0 > ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC] doc: add AGENTS.md for AI-powered code review tools 2026-01-09 9:46 ` Bruce Richardson @ 2026-01-10 17:34 ` Stephen Hemminger 0 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-10 17:34 UTC (permalink / raw) To: Bruce Richardson; +Cc: dev On Fri, 9 Jan 2026 09:46:00 +0000 Bruce Richardson <bruce.richardson@intel.com> wrote: > On Thu, Jan 08, 2026 at 05:41:04PM -0800, Stephen Hemminger wrote: > > Add a structured reference document that enables AI code review tools > > to validate DPDK contributions against project standards. This document > > consolidates requirements from multiple sources into a machine-readable > > format optimized for automated validation workflows. > > > > The AGENTS.md file synthesizes guidelines from: > > - DPDK Contributing Code documentation (patches.rst) > > - DPDK Coding Style guidelines (coding_style.rst) > > - Linux kernel patch submission process (submitting-patches.rst) > > - SPDX License Identifier specification (spdx.org) > > > > Key sections include: > > - SPDX license identifier requirements > > - Commit message format and tag ordering > > - C coding style rules and naming conventions > > - Patch validation checklists with severity levels > > - Meson build file style requirements > > > > The document provides actionable checklists and concrete examples to > > support integration with CI/CD pipelines and automated review systems. > > Severity levels (error/warning/info) help tools prioritize feedback > > appropriately. > > > > This supports the broader goal of maintaining code quality and > > consistency as the project scales, while reducing manual review burden > > for maintainers on mechanical style issues. > > > > References: > > - https://doc.dpdk.org/guides/contributing/patches.html > > - https://doc.dpdk.org/guides/contributing/coding_style.html > > - https://www.kernel.org/doc/html/latest/process/submitting-patches.html > > - https://spdx.org/licenses/ > > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > > --- > > Good to see this being added. > > Feedback inline below. > > Thanks, > /Bruce > > > AGENTS.md | 410 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 410 insertions(+) > > create mode 100644 AGENTS.md > > > > diff --git a/AGENTS.md b/AGENTS.md > > new file mode 100644 > > index 0000000000..cfaaa81af3 > > --- /dev/null > > +++ b/AGENTS.md > > @@ -0,0 +1,410 @@ > > +# AGENTS.md - DPDK Code Review Guidelines for AI Tools > > + > > +This document provides guidelines for AI-powered code review tools when reviewing contributions to the Data Plane Development Kit (DPDK). It is derived from the official DPDK contributor guidelines. > > + > > +## Overview > > + > > +DPDK follows a development process modeled on the Linux Kernel. All patches are reviewed publicly on the mailing list before being merged. AI review tools should verify compliance with the standards outlined below. > > + > > +--- > > + > > +## Source License Requirements > > + > > +### SPDX License Identifiers > > + > > +- **Every file must begin with an SPDX license identifier** on the first line (or second line for `#!` scripts) > > +- Core libraries and drivers use `BSD-3-Clause` > > +- Kernel components use `GPL-2.0` > > +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` > > + > > +```c > > +/* Correct */ > > +/* SPDX-License-Identifier: BSD-3-Clause */ > > + > > +/* Incorrect - no boilerplate license text should follow */ > > +``` > > + > > +- A blank line must follow the license header before any other content > > + > > Not strictly true, the blank line follows the copyright line immediately > after the SPDX header. The copyright line should be included in > descriptions and examples above. I think that was something Linux kernel wanted. > > > +--- > > + > > +## Commit Message Requirements > > + > > +### Subject Line (First Line) > > + > > +- **Must capture the area and impact** of the change > > +- **~50 characters** recommended length > > In practice I think we allow up to 60 chars. So 50 chars recommended, 60 > char max. > > > +- **Lowercase** except for acronyms > > +- **Prefixed with component name** (check `git log` for existing components) > > +- Use **imperative mood** (instructions to the codebase) > > +- **No trailing period** (causes double periods in patch filenames) > > + > > +``` > > +# Good examples > > +ixgbe: fix offload config option name > > Should be "net/ixgbe". > Since you have a driver example below, I'd suggest picking an example for a > library e.g. hash, to include as well. I suspect AI got this from old examples. > > > +config: increase max queues per port > > +net/mlx5: add support for flow counters > > + > > +# Bad examples > > +Fixed the offload config option. # past tense, has period > > +IXGBE: Fix Offload Config # uppercase > > +``` > > + > > +### Commit Body > > + > > +- Describe the issue being fixed or feature being added > > +- Provide enough context for reviewers to understand the purpose > > +- Wrap text at **72 characters** > > +- **Must end with** `Signed-off-by:` line (real name, not alias) > > +- When fixing regressions, include: > > + ``` > > + Fixes: abcdefgh1234 ("original commit subject") > > + Cc: original_author@example.com > > + ``` > > I know we *should* CC original authors, but in practice I don't think we > do. So many original authors have moved on, I suspect that this guidance > just causes too many email bounces to be useful. I would update this to > Fixes and CC stable for all fixes. Let's not burden the contributor with > determining if a patch needs backport or not - I assume the stable tree > maintainers have tooling to pull out only relevant fixes for their trees. Would prefer that get_maintainer script used instead. > > + > > +### Required Tags > > + > > +``` > > +# For Coverity issues: > > +Coverity issue: 12345 > > + > > +# For Bugzilla issues: > > +Bugzilla ID: 12345 > > + > > +# For stable release backport candidates: > > +Cc: stable@dpdk.org > > I'd make this an always-add for bugfixes, unless it causes us other issues. Should be checkpatch rule? > > + > > +# For patch dependencies: > > +Depends-on: series-NNNNN ("Title of the series") > > +``` > > + > > +### Tag Order > > + > > +``` > > +Coverity issue: > > +Bugzilla ID: > > +Fixes: > > +Cc: > > + > > +Reported-by: > > +Suggested-by: > > +Signed-off-by: > > +Acked-by: > > +Reviewed-by: > > +Tested-by: > > +``` > > + > > +Note: Empty line between the first group and `Reported-by:` > > + > > +--- > > + > > +## C Coding Style > > + > > +### General Formatting > > + > > +- **Line length**: Recommended ≤80 characters, acceptable up to 100 > > I'd drop the 80 reference. If we can use 100 chars, let's do so, rather > than having lines split. > > > +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) > > +- **No trailing whitespace** on lines or at end of files > > +- Code style must be consistent within each file > > + > > +### Comments > > + > > +```c > > +/* Most single-line comments look like this. */ > > + > > +/* > > + * VERY important single-line comments look like this. > > + */ > > + > > +/* > > + * Multi-line comments look like this. Make them real sentences. Fill > > + * them so they look like real paragraphs. > > + */ > > +``` > > + > > +### Header File Organization > > + > > +Include order (each group separated by blank line): > > +1. System/libc includes > > +2. DPDK EAL includes > > +3. DPDK misc library includes > > +4. Application-specific includes > > + > > +```c > > +#include <stdio.h> > > +#include <stdlib.h> > > + > > +#include <rte_eal.h> > > + > > +#include <rte_ring.h> > > +#include <rte_mempool.h> > > + > > +#include "application.h" > > +``` > > + > > +### Header Guards > > + > > +```c > > +#ifndef _FILE_H_ > > +#define _FILE_H_ > > + > > +/* Code */ > > + > > +#endif /* _FILE_H_ */ > > +``` > > + > > +### Naming Conventions > > + > > +- **All external symbols** must have `RTE_` or `rte_` prefix > > +- **Macros**: ALL_UPPERCASE > > +- **Functions**: lowercase with underscores only (no CamelCase) > > +- **Variables**: lowercase with underscores only > > +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix > > +- **Struct types**: prefer `struct name` over typedefs > > + > > We can strengthen and clarify this, I think: for new definitions, typedefs > only for function pointer types. Should also add exception for base type code. + > > +### Structure Layout > > + > > +- Order members by: use, then size (largest to smallest), then alphabetically > > +- New additions to existing structures go at the end (ABI compatibility) > > +- Align member names with spaces > > > Nice to have, ignored in practice AFAIK. Maybe not worth including. > > > +- Avoid `bool` in structures (unclear size, wastes space) > > Disagree on this one. Unless it's a datapath struct that must be small, > using bool is a lot clearer and easier to manage than bit fields. I think this was inherited from kernel style. Lots of bool flags when bitfield could be used. > > > + > > +```c > > +struct foo { > > + struct foo *next; /* List of active foo. */ > > + struct mumble amumble; /* Comment for mumble. */ > > + int bar; /* Try to align the comments. */ > > +}; > > +``` > > + > > +--- > > + > > +## Code Quality Requirements > > + > > +### Compilation > > + > > +- Each commit must compile independently (for `git bisect`) > > +- No forward dependencies within a patchset > > +- Test with multiple targets, compilers, and options > > +- Use `devtools/test-meson-builds.sh` > > + > > +### Testing > > + > > +- Add tests to `app/test` unit test framework > > +- New API functions must be used in `/app` test directory > > +- New device APIs require at least one driver implementation > > + > > +### Documentation > > + > > +- Add Doxygen comments for public APIs > > +- Update release notes in `doc/guides/rel_notes/` for important changes > > +- Code and documentation must be updated atomically in same patch > > + > > +### ABI Compatibility > > + > > +- New external functions must be exported properly > > +- Follow ABI policy and versioning guidelines > > +- Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable > > + > > +--- > > + > > +## Patch Validation Checklist > > + > > +AI review tools should verify: > > + > > +### Commit Message > > +- [ ] Subject line ~50 chars, lowercase (except acronyms) > > 60 chars > > > +- [ ] Component prefix present and valid > > +- [ ] Imperative mood used > > +- [ ] No trailing period on subject > > +- [ ] Body wrapped at 72 characters > > +- [ ] `Signed-off-by:` present with real name > > +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA > > +- [ ] Tags in correct order > > + > > +### License > > +- [ ] SPDX identifier on first line (or second for scripts) > > +- [ ] Appropriate license for file type > > +- [ ] Blank line after license header > > + > > +### Code Style > > +- [ ] Lines ≤100 characters (prefer ≤80) > > +- [ ] Hard tabs for indentation > > +- [ ] No trailing whitespace > > +- [ ] Proper include order > > +- [ ] Header guards present > > +- [ ] `rte_`/`RTE_` prefix on external symbols > > +- [ ] No prohibited terminology > > +- [ ] Proper brace style > > +- [ ] Function return type on own line > > +- [ ] NULL comparisons use `== NULL` > - [ ] Integer comparisons use `== 0` > > > + > > +### Structure > > +- [ ] Each commit compiles independently > > +- [ ] Code and docs updated together > > +- [ ] Tests added/updated as needed > > +- [ ] Release notes updated for significant changes > > + > > +--- > > + > > +## Meson Build Files > > + > > +### Style Requirements > > + > > +- 4-space indentation (no tabs) > > +- Line continuations double-indented > > +- Lists alphabetically ordered > > +- Short lists (≤3 items): single line, no trailing comma > > +- Long lists: one item per line, trailing comma on last item > > + > > +```python > > +# Short list > > +sources = files('file1.c', 'file2.c') > > + > > +# Long list > > +headers = files( > > + 'header1.h', > > + 'header2.h', > > + 'header3.h', > > +) > > +``` > > + > > +--- > > + > > +## Python Code > > + > > +- Must comply with PEP8 > > +- Line length acceptable up to 100 characters > > +- Use `pep8` tool for validation > > + > > I think we are switching to using "black" as the python formatting tool of > choice. > > > +--- > > + > > +## Review Process Notes > > + > > +### For AI Review Tools > > + > > +When providing feedback: > > +- Reference specific line numbers > > +- Cite the relevant guideline section > > +- Suggest concrete fixes > > +- Prioritize: errors > warnings > style suggestions > > +- Flag potential ABI breaks > > +- Check for missing documentation updates > > +- Verify test coverage for new functionality > > + > > +### Severity Levels > > + > > +**Error** (must fix): > > +- Missing SPDX license > > +- Missing Signed-off-by > > +- Compilation failures > > +- ABI breaks without proper versioning > > + > > +**Warning** (should fix): > > +- Style violations > > +- Missing Fixes tag for bug fixes > > +- Documentation gaps > > +- Missing tests > > + > > +**Info** (consider): > > +- Minor style preferences > > +- Optimization suggestions > > +- Alternative approaches > > -- > > 2.51.0 > > I will try and make a prompt to get a better result. Often good stuff happens with better prompt rather than manual edits. ^ permalink raw reply [flat|nested] 63+ messages in thread
* RE: [RFC] doc: add AGENTS.md for AI-powered code review tools 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger 2026-01-09 9:46 ` Bruce Richardson @ 2026-01-09 9:58 ` Marat Khalili 2026-01-10 17:28 ` Stephen Hemminger 2026-01-10 18:00 ` [RFC v2] " Stephen Hemminger ` (9 subsequent siblings) 11 siblings, 1 reply; 63+ messages in thread From: Marat Khalili @ 2026-01-09 9:58 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev@dpdk.org In my very limited experience it helps to tell the model that all commits (both patch and commit message) MUST pass devtools/checkpatches.sh, ideally configured (and verified) as a git hook, clear justification must be provided in rare exception cases. From the general programming principles I'm unsure about duplicating information already available in doc/guides/contributing in the same tree. Perhaps we could just instruct the model to read patches.rst and coding_style.rst, they are not very big (and models typically know Linux and PIP8 rules already). Otherwise we risk one of the documents becoming outdated, or rules/nuances omitted in the summarized version ignored. In instructions specifically for the model we could focus on things that are persistent source of problems for it despite instructions for humans being clear. Just my 2c, great that someone is working on this. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC] doc: add AGENTS.md for AI-powered code review tools 2026-01-09 9:58 ` Marat Khalili @ 2026-01-10 17:28 ` Stephen Hemminger 0 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-10 17:28 UTC (permalink / raw) To: Marat Khalili; +Cc: dev@dpdk.org On Fri, 9 Jan 2026 09:58:38 +0000 Marat Khalili <marat.khalili@huawei.com> wrote: > In my very limited experience it helps to tell the model that all commits (both > patch and commit message) MUST pass devtools/checkpatches.sh, ideally > configured (and verified) as a git hook, clear justification must be > provided in rare exception cases. Models don't really run scripts. Would like the auto code review tools to do a better job of pre-parsing for checkpatch rules and explaining better. > From the general programming principles I'm unsure about duplicating > information already available in doc/guides/contributing in the same tree. > Perhaps we could just instruct the model to read patches.rst and > coding_style.rst, they are not very big (and models typically know Linux and > PIP8 rules already). Otherwise we risk one of the documents becoming outdated, > or rules/nuances omitted in the summarized version ignored. In instructions > specifically for the model we could focus on things that are persistent source > of problems for it despite instructions for humans being clear. > > Just my 2c, great that someone is working on this. I am not sure that models will always interpret our documentation the same and this looks clearer. Also, there is lots of overlap with other projects which is not covered in DPDK alone. ^ permalink raw reply [flat|nested] 63+ messages in thread
* [RFC v2] doc: add AGENTS.md for AI-powered code review tools 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger 2026-01-09 9:46 ` Bruce Richardson 2026-01-09 9:58 ` Marat Khalili @ 2026-01-10 18:00 ` Stephen Hemminger 2026-01-13 15:28 ` [PATCH v3] " Stephen Hemminger ` (8 subsequent siblings) 11 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-10 18:00 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a structured reference document that enables AI code review tools to validate DPDK contributions against project standards. This document consolidates requirements from multiple sources into a machine-readable format optimized for automated validation workflows. The AGENTS.md file synthesizes guidelines from: - DPDK Contributing Code documentation (patches.rst) - DPDK Coding Style guidelines (coding_style.rst) - DPDK validation scripts (check-git-log.sh, checkpatches.sh) - Linux kernel patch submission process - SPDX License Identifier specification Key sections include: - SPDX license and copyright header requirements - Commit message format with precise limits (60 char subject, 75 char body) and tag ordering rules - C coding style including explicit comparison requirements - Forbidden tokens table derived from checkpatches.sh - API tag placement rules for experimental and internal APIs - Patch validation checklists with severity levels The forbidden tokens section documents restrictions on deprecated atomics, logging functions, threading APIs, and compiler built-ins that are checked by the existing checkpatches.sh infrastructure. Severity levels (error/warning/info) align with the exit codes and messaging from check-git-log.sh and checkpatches.sh to help automated tools prioritize feedback appropriately. References: - https://doc.dpdk.org/guides/contributing/patches.html - https://doc.dpdk.org/guides/contributing/coding_style.html - devtools/check-git-log.sh - devtools/checkpatches.sh Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- AGENTS.md | 693 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 693 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..9d5a2e1de9 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,693 @@ +# AGENTS.md - DPDK Code Review Guidelines for AI Tools + +This document provides guidelines for AI-powered code review tools when reviewing contributions to the Data Plane Development Kit (DPDK). It is derived from the official DPDK contributor guidelines and validation scripts. + +## Overview + +DPDK follows a development process modeled on the Linux Kernel. All patches are reviewed publicly on the mailing list before being merged. AI review tools should verify compliance with the standards outlined below. + +--- + +## Source License Requirements + +### SPDX License Identifiers + +Every source file must begin with an SPDX license identifier, followed by the copyright notice, then a blank line before other content. + +- SPDX tag on first line (or second line for `#!` scripts) +- Copyright line immediately follows +- Blank line after copyright before any code/includes +- Core libraries and drivers use `BSD-3-Clause` +- Kernel components use `GPL-2.0` +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` + +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 ExampleCorp + */ + +#include <stdio.h> +``` + +For scripts: +```python +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 ExampleCorp + +import sys +``` + +**Do not include boilerplate license text** - the SPDX identifier is sufficient. + +--- + +## Commit Message Requirements + +### Subject Line (First Line) + +| Rule | Limit | +|------|-------| +| Maximum length | **60 characters** | +| Format | `component: lowercase description` | +| Case | Lowercase except acronyms | +| Mood | Imperative (instructions to codebase) | +| Punctuation | **No trailing period** | + +``` +# Good examples +net/ixgbe: fix offload config option name +config: increase max queues per port +net/mlx5: add support for flow counters +app/testpmd: fix memory leak in flow create + +# Bad examples +Fixed the offload config option. # past tense, has period, no prefix +net/ixgbe: Fix Offload Config # uppercase after colon +ixgbe: fix something # wrong prefix, should be net/ixgbe +lib/ethdev: add new feature # wrong prefix, should be ethdev: +``` + +#### Headline Format Errors (from check-git-log.sh) + +The following are flagged as errors: +- Tab characters in subject +- Leading or trailing spaces +- Trailing period (`.`) +- Punctuation marks: `, ; ! ? & |` +- Underscores after the colon (indicates code in subject) +- Missing colon separator +- No space after colon +- Space before colon + +#### Common Prefix Mistakes + +| Wrong | Correct | +|-------|---------| +| `ixgbe:` | `net/ixgbe:` | +| `lib/ethdev:` | `ethdev:` | +| `example:` | `examples/foo:` | +| `apps/` | `app/name:` | +| `testpmd:` | `app/testpmd:` | +| `test-pmd:` | `app/testpmd:` | +| `bond:` | `net/bonding:` | + +#### Case-Sensitive Terms + +These terms must use exact capitalization (from `devtools/words-case.txt`): +- `Rx`, `Tx` (not `RX`, `TX`, `rx`, `tx`) +- `VF`, `PF` (not `vf`, `pf`) +- `MAC`, `VLAN`, `RSS`, `API` +- `Linux`, `Windows`, `FreeBSD` +- Check `devtools/words-case.txt` for complete list + +### Commit Body + +| Rule | Limit | +|------|-------| +| Line wrap | **75 characters** | +| Exception | `Fixes:` lines may exceed 75 chars | + +Body guidelines: +- Describe the issue being fixed or feature being added +- Provide enough context for reviewers +- **Do not start the commit message body with "It"** +- **Must end with** `Signed-off-by:` line (real name, not alias) + +### Fixes Tag + +When fixing regressions, use the `Fixes:` tag with a 12-character abbreviated SHA: + +``` +Fixes: abcdefgh1234 ("original commit subject") +``` + +The hash must reference a commit in the current branch, and the subject must match exactly. + +**Finding maintainers**: Use `devtools/get-maintainer.sh` to identify the current subsystem maintainer from the `MAINTAINERS` file, rather than CC'ing the original author: + +```bash +git send-email --to-cmd ./devtools/get-maintainer.sh --cc dev@dpdk.org 000*.patch +``` + +### Required Tags + +``` +# For Coverity issues (required if "coverity" mentioned in body): +Coverity issue: 12345 + +# For Bugzilla issues (required if "bugzilla" mentioned in body): +Bugzilla ID: 12345 + +# For stable release backport candidates: +Cc: stable@dpdk.org + +# For patch dependencies (in commit notes after ---): +Depends-on: series-NNNNN ("Title of the series") +``` + +### Tag Order + +Tags must appear in this order: + +``` +Coverity issue: +Bugzilla ID: +Fixes: +Cc: + <-- blank line required here +Reported-by: +Suggested-by: +Signed-off-by: +Acked-by: +Reviewed-by: +Tested-by: +``` + +**Tag format**: `Tag-name: Full Name <email@domain.com>` + +--- + +## C Coding Style + +### Line Length + +| Context | Limit | +|---------|-------| +| Source code | **100 characters** | +| Commit body | **75 characters** | + +### General Formatting + +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) +- **No trailing whitespace** on lines or at end of files +- Code style must be consistent within each file + +### Comments + +```c +/* Most single-line comments look like this. */ + +/* + * VERY important single-line comments look like this. + */ + +/* + * Multi-line comments look like this. Make them real sentences. Fill + * them so they look like real paragraphs. + */ +``` + +### Header File Organization + +Include order (each group separated by blank line): +1. System/libc includes +2. DPDK EAL includes +3. DPDK misc library includes +4. Application-specific includes + +```c +#include <stdio.h> +#include <stdlib.h> + +#include <rte_eal.h> + +#include <rte_ring.h> +#include <rte_mempool.h> + +#include "application.h" +``` + +### Header Guards + +```c +#ifndef _FILE_H_ +#define _FILE_H_ + +/* Code */ + +#endif /* _FILE_H_ */ +``` + +### Naming Conventions + +- **All external symbols** must have `RTE_` or `rte_` prefix +- **Macros**: ALL_UPPERCASE with `RTE_` prefix +- **Functions**: lowercase with underscores only (no CamelCase) +- **Variables**: lowercase with underscores only +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix + +**Exception**: Driver base directories (`drivers/*/base/`) may use different naming conventions when sharing code across platforms or with upstream vendor code. + +#### Prohibited Terminology + +Do not use: +- `master/slave` → Use: primary/secondary, controller/worker, leader/follower +- `blacklist/whitelist` → Use: denylist/allowlist, blocklist/passlist + +### Comparisons and Boolean Logic + +```c +/* Pointers - compare explicitly with NULL */ +if (p == NULL) /* Good */ +if (p != NULL) /* Good */ +if (!p) /* Bad - don't use ! on pointers */ + +/* Integers - compare explicitly with zero */ +if (a == 0) /* Good */ +if (a != 0) /* Good */ +if (!a) /* Bad - don't use ! on integers */ + +/* Characters - compare with character constant */ +if (*p == '\0') /* Good */ + +/* Booleans - direct test is acceptable */ +if (flag) /* Good for actual bool types */ +if (!flag) /* Good for actual bool types */ +``` + +### Boolean Usage + +- Using `bool` type is allowed +- Prefer `bool` over `int` when a variable or field is only used as a boolean +- For structure fields, consider if the size/alignment impact is acceptable + +### Indentation and Braces + +```c +/* Control statements - no braces for single statements */ +if (val != NULL) + val = realloc(val, newsize); + +/* Braces on same line as else */ +if (test) + stmt; +else if (bar) { + stmt; + stmt; +} else + stmt; + +/* Switch statements - don't indent case */ +switch (ch) { +case 'a': + aflag = 1; + /* FALLTHROUGH */ +case 'b': + bflag = 1; + break; +default: + usage(); +} + +/* Long conditions - double indent continuation */ +if (really_long_variable_name_1 == really_long_variable_name_2 && + var3 == var4) { + x = y + z; +} +``` + +### Function Definitions + +```c +/* Return type on its own line, opening brace on its own line */ +static char * +function(int a1, int a2, float fl, int a4) +{ + /* body */ +} +``` + +### Variable Declarations + +```c +int *x; /* no space after asterisk */ +int * const x; /* space before type qualifier */ + +/* Multiple initializers - one per line */ +char a = 0; +char b = 0; + +/* Or only last variable initialized */ +float x, y = 0.0; +``` + +### Return Values + +- Object creation/allocation: return pointer, NULL on error, set `rte_errno` +- Packet burst functions: return number of packets handled +- Other int-returning functions: 0 on success, -1 on error (or `-errno`) +- No-error functions: use `void` return type +- Don't cast `void *` return values +- Don't parenthesize return values + +### Macros + +```c +/* Wrap compound statements in do-while(0) */ +#define MACRO(x, y) do { \ + variable = (x) + (y); \ + (y) += 2; \ +} while (0) +``` + +Prefer enums and inline functions over macros when possible. + +### Structure Layout + +- Order members by: use, then size (largest to smallest), then alphabetically +- New additions to existing structures go at the end (ABI compatibility) +- Align member names with spaces + +```c +struct foo { + struct foo *next; /* List of active foo. */ + struct mumble amumble; /* Comment for mumble. */ + int bar; /* Try to align the comments. */ + bool is_valid; /* Boolean field is acceptable. */ +}; +``` + +--- + +## Forbidden Tokens (from checkpatches.sh) + +### Logging + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `RTE_LOG()` | `RTE_LOG_LINE()` | lib/, drivers/ | +| `RTE_LOG_DP()` | `RTE_LOG_DP_LINE()` | drivers/ | +| `rte_log()` | `RTE_LOG_LINE()` | drivers/ | +| `printf()` | Use RTE logging | lib/, drivers/ | +| `fprintf(stdout,...)` | Use RTE logging | lib/, drivers/ | +| `fprintf(stderr,...)` | Use RTE logging | lib/, drivers/ | +| `RTE_LOG_REGISTER` | `RTE_LOG_REGISTER_DEFAULT` or `RTE_LOG_REGISTER_SUFFIX` | lib/, drivers/ | + +### Error Handling + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `rte_panic()` | Return error codes | lib/, drivers/ | +| `rte_exit()` | Return error codes | lib/, drivers/ | + +### Atomics and Memory Barriers + +| Forbidden | Preferred | +|-----------|-----------| +| `rte_atomic16/32/64_xxx()` | C11 atomics via `rte_atomic_xxx()` | +| `rte_smp_mb()` | `rte_atomic_thread_fence()` | +| `rte_smp_rmb()` | `rte_atomic_thread_fence()` | +| `rte_smp_wmb()` | `rte_atomic_thread_fence()` | +| `__sync_xxx()` | `rte_atomic_xxx()` | +| `__atomic_xxx()` | `rte_atomic_xxx()` | +| `__ATOMIC_RELAXED` etc. | `rte_memory_order_xxx` | +| `__rte_atomic_thread_fence()` | `rte_atomic_thread_fence()` | + +### Threading + +| Forbidden | Preferred | +|-----------|-----------| +| `pthread_create()` | `rte_thread_create()` | +| `pthread_join()` | `rte_thread_join()` | +| `pthread_detach()` | EAL thread functions | +| `pthread_setaffinity_np()` | `rte_thread_set_affinity()` | +| `rte_thread_set_name()` | `rte_thread_set_prefixed_name()` | +| `rte_thread_create_control()` | `rte_thread_create_internal_control()` | + +### Compiler Built-ins and Attributes + +| Forbidden | Preferred | Notes | +|-----------|-----------|-------| +| `__attribute__` | RTE macros in `rte_common.h` | Except in `lib/eal/include/rte_common.h` | +| `__alignof__` | C11 `alignof` | | +| `__typeof__` | `typeof` | | +| `__builtin_*` | EAL macros | Except in `lib/eal/` and `drivers/*/base/` | +| `__reserved` | Different name | Reserved in Windows headers | +| `#pragma` / `_Pragma` | Avoid | Except in `rte_common.h` | + +### Format Specifiers + +| Forbidden | Preferred | +|-----------|-----------| +| `%lld`, `%llu`, `%llx` | `%PRId64`, `%PRIu64`, `%PRIx64` | + +### Headers and Build + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `#include <linux/pci_regs.h>` | `#include <rte_pci.h>` | | +| `install_headers()` | Meson `headers` variable | meson.build | +| `-DALLOW_EXPERIMENTAL_API` | Not in lib/drivers/app | Build flags | +| `allow_experimental_apis` | Not in lib/drivers/app | Meson | +| `#undef XXX` | `// XXX is not set` | config/rte_config.h | +| Driver headers (`*_driver.h`, `*_pmd.h`) | Public API headers | app/, examples/ | + +### Testing + +| Forbidden | Preferred | +|-----------|-----------| +| `REGISTER_TEST_COMMAND` | `REGISTER_<suite_name>_TEST` | + +### Documentation + +| Forbidden | Preferred | +|-----------|-----------| +| `http://...dpdk.org` | `https://...dpdk.org` | +| `//doc.dpdk.org/guides/...` | `:ref:` or `:doc:` Sphinx references | +| `:: file.svg` | `:: file.*` (wildcard extension) | + +--- + +## API Tag Requirements + +### `__rte_experimental` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_experimental +int +rte_new_feature(void); + +/* Wrong - not alone on line */ +__rte_experimental int rte_new_feature(void); + +/* Wrong - in .c file */ +``` + +### `__rte_internal` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_internal +int +internal_function(void); +``` + +### Alignment Attributes + +`__rte_aligned`, `__rte_cache_aligned`, `__rte_cache_min_aligned` may only be used with `struct` or `union` types: + +```c +/* Correct */ +struct __rte_cache_aligned my_struct { + /* ... */ +}; + +/* Wrong */ +int __rte_cache_aligned my_variable; +``` + +### Packed Attributes + +- `__rte_packed_begin` must follow `struct`, `union`, or alignment attributes +- `__rte_packed_begin` and `__rte_packed_end` must be used in pairs +- Cannot use `__rte_packed_begin` with `enum` + +```c +/* Correct */ +struct __rte_packed_begin my_packed_struct { + /* ... */ +} __rte_packed_end; + +/* Wrong - with enum */ +enum __rte_packed_begin my_enum { + /* ... */ +}; +``` + +--- + +## Code Quality Requirements + +### Compilation + +- Each commit must compile independently (for `git bisect`) +- No forward dependencies within a patchset +- Test with multiple targets, compilers, and options +- Use `devtools/test-meson-builds.sh` + +### Testing + +- Add tests to `app/test` unit test framework +- New API functions must be used in `/app` test directory +- New device APIs require at least one driver implementation + +### Documentation + +- Add Doxygen comments for public APIs +- Update release notes in `doc/guides/rel_notes/` for important changes +- Code and documentation must be updated atomically in same patch +- Only update the **current release** notes file + +### ABI Compatibility + +- New external functions must be exported properly +- Follow ABI policy and versioning guidelines +- Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable + +--- + +## Patch Validation Checklist + +### Commit Message + +- [ ] Subject line ≤60 characters +- [ ] Subject is lowercase (except acronyms from words-case.txt) +- [ ] Correct component prefix (e.g., `net/ixgbe:` not `ixgbe:`) +- [ ] No `lib/` prefix for libraries +- [ ] Imperative mood, no trailing period +- [ ] No tabs, leading/trailing spaces, or punctuation marks +- [ ] Body wrapped at 75 characters +- [ ] Body does not start with "It" +- [ ] `Signed-off-by:` present with real name and valid email +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA and exact subject +- [ ] `Coverity issue:` tag present if Coverity mentioned +- [ ] `Bugzilla ID:` tag present if Bugzilla mentioned +- [ ] `Cc: stable@dpdk.org` for stable backport candidates +- [ ] Tags in correct order with blank line separator + +### License + +- [ ] SPDX identifier on first line (or second for scripts) +- [ ] Copyright line follows SPDX +- [ ] Blank line after copyright before code +- [ ] Appropriate license for file type + +### Code Style + +- [ ] Lines ≤100 characters +- [ ] Hard tabs for indentation, spaces for alignment +- [ ] No trailing whitespace +- [ ] Proper include order +- [ ] Header guards present +- [ ] `rte_`/`RTE_` prefix on external symbols +- [ ] No prohibited terminology +- [ ] Proper brace style +- [ ] Function return type on own line +- [ ] Explicit comparisons: `== NULL`, `== 0`, `!= NULL`, `!= 0` +- [ ] No forbidden tokens (see table above) + +### API Tags + +- [ ] `__rte_experimental` alone on line, only in headers +- [ ] `__rte_internal` alone on line, only in headers +- [ ] Alignment attributes only on struct/union +- [ ] Packed attributes properly paired + +### Structure + +- [ ] Each commit compiles independently +- [ ] Code and docs updated together +- [ ] Tests added/updated as needed +- [ ] Current release notes updated for significant changes + +--- + +## Meson Build Files + +### Style Requirements + +- 4-space indentation (no tabs) +- Line continuations double-indented +- Lists alphabetically ordered +- Short lists (≤3 items): single line, no trailing comma +- Long lists: one item per line, trailing comma on last item + +```python +# Short list +sources = files('file1.c', 'file2.c') + +# Long list +headers = files( + 'header1.h', + 'header2.h', + 'header3.h', +) +``` + +--- + +## Python Code + +- Must comply with formatting standards +- Use **`black`** for code formatting validation +- Line length acceptable up to 100 characters + +--- + +## Validation Tools + +Run these before submitting: + +```bash +# Check commit messages +devtools/check-git-log.sh -n1 + +# Check patch format and forbidden tokens +devtools/checkpatches.sh -n1 + +# Check maintainers coverage +devtools/check-maintainers.sh + +# Build validation +devtools/test-meson-builds.sh + +# Find maintainers for your patch +devtools/get-maintainer.sh <patch-file> +``` + +--- + +## Severity Levels for AI Review + +**Error** (must fix): +- Missing or malformed SPDX license +- Missing Signed-off-by +- Subject line over 60 characters +- Body lines over 75 characters +- Wrong tag order or format +- Missing required tags (Fixes, Coverity issue, Bugzilla ID) +- Forbidden tokens in code +- `__rte_experimental`/`__rte_internal` in .c files or not alone on line +- Compilation failures +- ABI breaks without proper versioning + +**Warning** (should fix): +- Subject line style issues (case, punctuation) +- Wrong component prefix +- Missing Cc: stable@dpdk.org for fixes +- Documentation gaps +- Missing tests +- Implicit comparisons (`!ptr` instead of `ptr == NULL`) + +**Info** (consider): +- Minor style preferences +- Optimization suggestions +- Alternative approaches -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v3] doc: add AGENTS.md for AI-powered code review tools 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger ` (2 preceding siblings ...) 2026-01-10 18:00 ` [RFC v2] " Stephen Hemminger @ 2026-01-13 15:28 ` Stephen Hemminger 2026-01-16 17:46 ` [PATCH v4] " Stephen Hemminger ` (7 subsequent siblings) 11 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-13 15:28 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a structured reference document that enables AI code review tools to validate DPDK contributions against project standards. This document consolidates requirements from multiple sources into a machine-readable format optimized for automated validation workflows. The AGENTS.md file synthesizes guidelines from: - DPDK Contributing Code documentation (patches.rst) - DPDK Coding Style guidelines (coding_style.rst) - DPDK validation scripts (check-git-log.sh, checkpatches.sh) - Linux kernel patch submission process - SPDX License Identifier specification Key sections include: - SPDX license and copyright header requirements - Commit message format with precise limits (60 char subject, 75 char body) and tag ordering rules - C coding style including explicit comparison requirements - Forbidden tokens table derived from checkpatches.sh - API tag placement rules for experimental and internal APIs - Patch validation checklists with severity levels The forbidden tokens section documents restrictions on deprecated atomics, logging functions, threading APIs, and compiler built-ins that are checked by the existing checkpatches.sh infrastructure. Severity levels (error/warning/info) align with the exit codes and messaging from check-git-log.sh and checkpatches.sh to help automated tools prioritize feedback appropriately. References: - https://doc.dpdk.org/guides/contributing/patches.html - https://doc.dpdk.org/guides/contributing/coding_style.html - devtools/check-git-log.sh - devtools/checkpatches.sh Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- v3 - move to devtools - revise based on results of use of this criteria on patches. use AI to revise its prompt to reduce false positives. devtools/AGENTS.md | 823 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 823 insertions(+) create mode 100644 devtools/AGENTS.md diff --git a/devtools/AGENTS.md b/devtools/AGENTS.md new file mode 100644 index 0000000000..54051a48e4 --- /dev/null +++ b/devtools/AGENTS.md @@ -0,0 +1,823 @@ +# AGENTS.md - DPDK Code Review Guidelines for AI Tools + +This document provides guidelines for AI-powered code review tools when reviewing contributions to the Data Plane Development Kit (DPDK). It is derived from the official DPDK contributor guidelines and validation scripts. + +## Overview + +DPDK follows a development process modeled on the Linux Kernel. All patches are reviewed publicly on the mailing list before being merged. AI review tools should verify compliance with the standards outlined below. + +--- + +## Source License Requirements + +### SPDX License Identifiers + +Every source file must begin with an SPDX license identifier, followed by the copyright notice, then a blank line before other content. + +- SPDX tag on first line (or second line for `#!` scripts) +- Copyright line immediately follows +- Blank line after copyright before any code/includes +- Core libraries and drivers use `BSD-3-Clause` +- Kernel components use `GPL-2.0` +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` + +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 ExampleCorp + */ + +#include <stdio.h> +``` + +For scripts: +```python +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 ExampleCorp + +import sys +``` + +**Do not include boilerplate license text** - the SPDX identifier is sufficient. + +--- + +## Commit Message Requirements + +### Subject Line (First Line) + +| Rule | Limit | +|------|-------| +| Maximum length | **60 characters** | +| Format | `component: lowercase description` | +| Case | Lowercase except acronyms | +| Mood | Imperative (instructions to codebase) | +| Punctuation | **No trailing period** | + +``` +# Good examples +net/ixgbe: fix offload config option name +config: increase max queues per port +net/mlx5: add support for flow counters +app/testpmd: fix memory leak in flow create + +# Bad examples +Fixed the offload config option. # past tense, has period, no prefix +net/ixgbe: Fix Offload Config # uppercase after colon +ixgbe: fix something # wrong prefix, should be net/ixgbe +lib/ethdev: add new feature # wrong prefix, should be ethdev: +``` + +#### Headline Format Errors (from check-git-log.sh) + +The following are flagged as errors: +- Tab characters in subject +- Leading or trailing spaces +- Trailing period (`.`) +- Punctuation marks: `, ; ! ? & |` +- Underscores after the colon (indicates code in subject) +- Missing colon separator +- No space after colon +- Space before colon + +#### Common Prefix Mistakes + +| Wrong | Correct | +|-------|---------| +| `ixgbe:` | `net/ixgbe:` | +| `lib/ethdev:` | `ethdev:` | +| `example:` | `examples/foo:` | +| `apps/` | `app/name:` | +| `app/test:` | `test:` | +| `testpmd:` | `app/testpmd:` | +| `test-pmd:` | `app/testpmd:` | +| `bond:` | `net/bonding:` | + +#### Case-Sensitive Terms + +These terms must use exact capitalization (from `devtools/words-case.txt`): +- `Rx`, `Tx` (not `RX`, `TX`, `rx`, `tx`) +- `VF`, `PF` (not `vf`, `pf`) +- `MAC`, `VLAN`, `RSS`, `API` +- `Linux`, `Windows`, `FreeBSD` +- Check `devtools/words-case.txt` for complete list + +### Commit Body + +| Rule | Limit | +|------|-------| +| Line wrap | **75 characters** | +| Exception | `Fixes:` lines may exceed 75 chars | + +Body guidelines: +- Describe the issue being fixed or feature being added +- Provide enough context for reviewers +- **Do not start the commit message body with "It"** +- **Must end with** `Signed-off-by:` line (real name, not alias) + +### Fixes Tag + +When fixing regressions, use the `Fixes:` tag with a 12-character abbreviated SHA: + +``` +Fixes: abcdefgh1234 ("original commit subject") +``` + +The hash must reference a commit in the current branch, and the subject must match exactly. + +**Finding maintainers**: Use `devtools/get-maintainer.sh` to identify the current subsystem maintainer from the `MAINTAINERS` file, rather than CC'ing the original author: + +```bash +git send-email --to-cmd ./devtools/get-maintainer.sh --cc dev@dpdk.org 000*.patch +``` + +### Required Tags + +``` +# For Coverity issues (required if "coverity" mentioned in body): +Coverity issue: 12345 + +# For Bugzilla issues (required if "bugzilla" mentioned in body): +Bugzilla ID: 12345 + +# For stable release backport candidates: +Cc: stable@dpdk.org + +# For patch dependencies (in commit notes after ---): +Depends-on: series-NNNNN ("Title of the series") +``` + +### Tag Order + +Tags must appear in this order: + +``` +Coverity issue: +Bugzilla ID: +Fixes: +Cc: + <-- blank line required here +Reported-by: +Suggested-by: +Signed-off-by: +Acked-by: +Reviewed-by: +Tested-by: +``` + +**Tag format**: `Tag-name: Full Name <email@domain.com>` + +--- + +## C Coding Style + +### Line Length + +| Context | Limit | +|---------|-------| +| Source code | **100 characters** | +| Commit body | **75 characters** | + +### General Formatting + +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) +- **No trailing whitespace** on lines or at end of files +- Files must end with a new line +- Code style should be consistent within each file + + +### Comments + +```c +/* Most single-line comments look like this. */ + +/* + * VERY important single-line comments look like this. + */ + +/* + * Multi-line comments look like this. Make them real sentences. Fill + * them so they look like real paragraphs. + */ +``` + +### Header File Organization + +Include order (each group separated by blank line): +1. System/libc includes +2. DPDK EAL includes +3. DPDK misc library includes +4. Application-specific includes + +```c +#include <stdio.h> +#include <stdlib.h> + +#include <rte_eal.h> + +#include <rte_ring.h> +#include <rte_mempool.h> + +#include "application.h" +``` + +### Header Guards + +```c +#ifndef _FILE_H_ +#define _FILE_H_ + +/* Code */ + +#endif /* _FILE_H_ */ +``` + +### Naming Conventions + +- **All external symbols** must have `RTE_` or `rte_` prefix +- **Macros**: ALL_UPPERCASE with `RTE_` prefix +- **Functions**: lowercase with underscores only (no CamelCase) +- **Variables**: lowercase with underscores only +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix + +**Exception**: Driver base directories (`drivers/*/base/`) may use different naming conventions when sharing code across platforms or with upstream vendor code. + +#### Prohibited Terminology + +Do not use: +- `master/slave` → Use: primary/secondary, controller/worker, leader/follower +- `blacklist/whitelist` → Use: denylist/allowlist, blocklist/passlist + +### Comparisons and Boolean Logic + +```c +/* Pointers - compare explicitly with NULL */ +if (p == NULL) /* Good */ +if (p != NULL) /* Good */ +if (!p) /* Bad - don't use ! on pointers */ + +/* Integers - compare explicitly with zero */ +if (a == 0) /* Good */ +if (a != 0) /* Good */ +if (!a) /* Bad - don't use ! on integers */ + +/* Characters - compare with character constant */ +if (*p == '\0') /* Good */ + +/* Booleans - direct test is acceptable */ +if (flag) /* Good for actual bool types */ +if (!flag) /* Good for actual bool types */ +``` + +### Boolean Usage + +- Using `bool` type is allowed +- Prefer `bool` over `int` when a variable or field is only used as a boolean +- For structure fields, consider if the size/alignment impact is acceptable + +### Indentation and Braces + +```c +/* Control statements - no braces for single statements */ +if (val != NULL) + val = realloc(val, newsize); + +/* Braces on same line as else */ +if (test) + stmt; +else if (bar) { + stmt; + stmt; +} else + stmt; + +/* Switch statements - don't indent case */ +switch (ch) { +case 'a': + aflag = 1; + /* FALLTHROUGH */ +case 'b': + bflag = 1; + break; +default: + usage(); +} + +/* Long conditions - double indent continuation */ +if (really_long_variable_name_1 == really_long_variable_name_2 && + really_long_variable_name_3 == really_long_variable_name_4) + stmt; +``` + +### Variable Declarations + +- Declare variables at the beginning of blocks +- Initialize variables only when a meaningful value exists at declaration time +- Use C99 designated initializers for structures + +```c +/* Good - separate declaration and assignment */ +int ret; +ret = some_function(); + +/* Bad - unnecessary initialization defeats compiler warnings */ +int ret = 0; +ret = some_function(); /* Compiler won't warn if assignment removed */ +``` + +### Function Format + +- Return type on its own line +- Opening brace on its own line +- Place an empty line between declarations and statements + +```c +static char * +function(int a1, int b1) +{ + char *p; + + p = do_something(a1, b1); + return p; +} +``` + +--- + +## Unnecessary Code Patterns + +The following patterns add unnecessary code, hide bugs, or reduce performance. Avoid them. + +### Unnecessary Variable Initialization + +Do not initialize variables that will be assigned before use. This defeats the compiler's uninitialized variable warnings, hiding potential bugs. + +```c +/* Bad - initialization defeats -Wuninitialized */ +int ret = 0; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - compiler will warn if any path misses assignment */ +int ret; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - meaningful initial value */ +int count = 0; +for (i = 0; i < n; i++) + if (test(i)) + count++; +``` + +### Unnecessary Casts of void * + +In C, `void *` converts implicitly to any pointer type. Casting the result of `malloc()`, `calloc()`, `rte_malloc()`, or similar functions is unnecessary and can hide the error of a missing `#include <stdlib.h>`. + +```c +/* Bad - unnecessary cast */ +struct foo *p = (struct foo *)malloc(sizeof(*p)); +struct bar *q = (struct bar *)rte_malloc(NULL, sizeof(*q), 0); + +/* Good - no cast needed in C */ +struct foo *p = malloc(sizeof(*p)); +struct bar *q = rte_malloc(NULL, sizeof(*q), 0); +``` + +Note: Casts are required in C++ but DPDK is a C project. + +### Zero-Length Arrays vs Variable-Length Arrays + +Zero-length arrays (`int arr[0]`) are a GCC extension. Use C99 flexible array members instead. + +```c +/* Bad - GCC extension */ +struct msg { + int len; + char data[0]; +}; + +/* Good - C99 flexible array member */ +struct msg { + int len; + char data[]; +}; +``` + +### Unnecessary NULL Checks Before free() + +Functions like `free()`, `rte_free()`, and similar deallocation functions accept NULL pointers safely. Do not add redundant NULL checks. + +```c +/* Bad - unnecessary check */ +if (ptr != NULL) + free(ptr); + +if (rte_ptr != NULL) + rte_free(rte_ptr); + +/* Good - free handles NULL */ +free(ptr); +rte_free(rte_ptr); +``` + +### memset Before free() + +Do not call `memset()` to zero memory before freeing it. The compiler may optimize away the `memset()` as a dead store. For security-sensitive data, use `rte_free_sensitive()` which ensures memory is cleared. + +```c +/* Bad - compiler may eliminate memset */ +memset(secret_key, 0, sizeof(secret_key)); +free(secret_key); + +/* Good - for non-sensitive data, just free */ +free(ptr); + +/* Good - for sensitive data, use secure free */ +rte_free_sensitive(secret_key); +``` + +### Appropriate Use of rte_malloc() + +`rte_malloc()` allocates from hugepage memory. Use it only when required: + +- Memory that will be accessed by DMA (NIC descriptors, packet buffers) +- Memory shared between primary and secondary DPDK processes +- Memory requiring specific NUMA node placement + +For general allocations, use standard `malloc()` which is faster and does not consume limited hugepage resources. + +```c +/* Bad - rte_malloc for ordinary data structure */ +struct config *cfg = rte_malloc(NULL, sizeof(*cfg), 0); + +/* Good - standard malloc for control structures */ +struct config *cfg = malloc(sizeof(*cfg)); + +/* Good - rte_malloc for DMA-accessible memory */ +struct rte_mbuf *mbufs = rte_malloc(NULL, n * sizeof(*mbufs), RTE_CACHE_LINE_SIZE); +``` + +### Appropriate Use of rte_memcpy() + +`rte_memcpy()` is optimized for bulk data transfer in the fast path. For general use, standard `memcpy()` is preferred because: + +- Modern compilers optimize `memcpy()` effectively +- `memcpy()` includes bounds checking with `_FORTIFY_SOURCE` +- `memcpy()` handles small fixed-size copies efficiently + +```c +/* Bad - rte_memcpy in control path */ +rte_memcpy(&config, &default_config, sizeof(config)); + +/* Good - standard memcpy for control path */ +memcpy(&config, &default_config, sizeof(config)); + +/* Good - rte_memcpy for packet data in fast path */ +rte_memcpy(rte_pktmbuf_mtod(m, void *), payload, len); +``` + +--- + +## Forbidden Tokens + +### Functions + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `rte_panic()` | Return error codes | lib/, drivers/ | +| `rte_exit()` | Return error codes | lib/, drivers/ | + +### Atomics and Memory Barriers + +| Forbidden | Preferred | +|-----------|-----------| +| `rte_atomic16/32/64_xxx()` | C11 atomics via `rte_atomic_xxx()` | +| `rte_smp_mb()` | `rte_atomic_thread_fence()` | +| `rte_smp_rmb()` | `rte_atomic_thread_fence()` | +| `rte_smp_wmb()` | `rte_atomic_thread_fence()` | +| `__sync_xxx()` | `rte_atomic_xxx()` | +| `__atomic_xxx()` | `rte_atomic_xxx()` | +| `__ATOMIC_RELAXED` etc. | `rte_memory_order_xxx` | +| `__rte_atomic_thread_fence()` | `rte_atomic_thread_fence()` | + +### Threading + +| Forbidden | Preferred | +|-----------|-----------| +| `pthread_create()` | `rte_thread_create()` | +| `pthread_join()` | `rte_thread_join()` | +| `pthread_detach()` | EAL thread functions | +| `pthread_setaffinity_np()` | `rte_thread_set_affinity()` | +| `rte_thread_set_name()` | `rte_thread_set_prefixed_name()` | +| `rte_thread_create_control()` | `rte_thread_create_internal_control()` | + +### Compiler Built-ins and Attributes + +| Forbidden | Preferred | Notes | +|-----------|-----------|-------| +| `__attribute__` | RTE macros in `rte_common.h` | Except in `lib/eal/include/rte_common.h` | +| `__alignof__` | C11 `alignof` | | +| `__typeof__` | `typeof` | | +| `__builtin_*` | EAL macros | Except in `lib/eal/` and `drivers/*/base/` | +| `__reserved` | Different name | Reserved in Windows headers | +| `#pragma` / `_Pragma` | Avoid | Except in `rte_common.h` | + +### Format Specifiers + +| Forbidden | Preferred | +|-----------|-----------| +| `%lld`, `%llu`, `%llx` | `%PRId64`, `%PRIu64`, `%PRIx64` | + +### Headers and Build + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `#include <linux/pci_regs.h>` | `#include <rte_pci.h>` | | +| `install_headers()` | Meson `headers` variable | meson.build | +| `-DALLOW_EXPERIMENTAL_API` | Not in lib/drivers/app | Build flags | +| `allow_experimental_apis` | Not in lib/drivers/app | Meson | +| `#undef XXX` | `// XXX is not set` | config/rte_config.h | +| Driver headers (`*_driver.h`, `*_pmd.h`) | Public API headers | app/, examples/ | + +### Testing + +| Forbidden | Preferred | +|-----------|-----------| +| `REGISTER_TEST_COMMAND` | `REGISTER_<suite_name>_TEST` | + +### Documentation + +| Forbidden | Preferred | +|-----------|-----------| +| `http://...dpdk.org` | `https://...dpdk.org` | +| `//doc.dpdk.org/guides/...` | `:ref:` or `:doc:` Sphinx references | +| `:: file.svg` | `:: file.*` (wildcard extension) | + +--- + +## API Tag Requirements + +### `__rte_experimental` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_experimental +int +rte_new_feature(void); + +/* Wrong - not alone on line */ +__rte_experimental int rte_new_feature(void); + +/* Wrong - in .c file */ +``` + +### `__rte_internal` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_internal +int +internal_function(void); +``` + +### Alignment Attributes + +`__rte_aligned`, `__rte_cache_aligned`, `__rte_cache_min_aligned` may only be used with `struct` or `union` types: + +```c +/* Correct */ +struct __rte_cache_aligned my_struct { + /* ... */ +}; + +/* Wrong */ +int __rte_cache_aligned my_variable; +``` + +### Packed Attributes + +- `__rte_packed_begin` must follow `struct`, `union`, or alignment attributes +- `__rte_packed_begin` and `__rte_packed_end` must be used in pairs +- Cannot use `__rte_packed_begin` with `enum` + +```c +/* Correct */ +struct __rte_packed_begin my_packed_struct { + /* ... */ +} __rte_packed_end; + +/* Wrong - with enum */ +enum __rte_packed_begin my_enum { + /* ... */ +}; +``` + +--- + +## Code Quality Requirements + +### Compilation + +- Each commit must compile independently (for `git bisect`) +- No forward dependencies within a patchset +- Test with multiple targets, compilers, and options +- Use `devtools/test-meson-builds.sh` + +### Testing + +- Add tests to `app/test` unit test framework +- New API functions must be used in `/app` test directory +- New device APIs require at least one driver implementation + +### Documentation + +- Add Doxygen comments for public APIs +- Update release notes in `doc/guides/rel_notes/` for important changes +- Code and documentation must be updated atomically in same patch +- Only update the **current release** notes file +- Documentation must match the code +- PMD features must match the features matrix in `doc/guides/nics/features/` +- Documentation must match device operations (see `doc/guides/nics/features.rst` for the mapping between features, `eth_dev_ops`, and related APIs) + +### API and Driver Changes + +- New APIs must be marked as `__rte_experimental` +- New APIs must have hooks in `app/testpmd` and tests in the functional test suite +- Changes to existing APIs require release notes +- New drivers or subsystems must have release notes + +### ABI Compatibility + +- New external functions must be exported properly +- Follow ABI policy and versioning guidelines +- Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable + +--- + +## Patch Validation Checklist + +### Commit Message + +- [ ] Subject line ≤60 characters +- [ ] Subject is lowercase (except acronyms from words-case.txt) +- [ ] Correct component prefix (e.g., `net/ixgbe:` not `ixgbe:`) +- [ ] No `lib/` prefix for libraries +- [ ] Imperative mood, no trailing period +- [ ] No tabs, leading/trailing spaces, or punctuation marks +- [ ] Body wrapped at 75 characters +- [ ] Body does not start with "It" +- [ ] `Signed-off-by:` present with real name and valid email +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA and exact subject +- [ ] `Coverity issue:` tag present if Coverity mentioned +- [ ] `Bugzilla ID:` tag present if Bugzilla mentioned +- [ ] `Cc: stable@dpdk.org` for stable backport candidates +- [ ] Tags in correct order with blank line separator + +### License + +- [ ] SPDX identifier on first line (or second for scripts) +- [ ] Copyright line follows SPDX +- [ ] Blank line after copyright before code +- [ ] Appropriate license for file type + +### Code Style + +- [ ] Lines ≤100 characters +- [ ] Hard tabs for indentation, spaces for alignment +- [ ] No trailing whitespace +- [ ] Proper include order +- [ ] Header guards present +- [ ] `rte_`/`RTE_` prefix on external symbols +- [ ] No prohibited terminology +- [ ] Proper brace style +- [ ] Function return type on own line +- [ ] Explicit comparisons: `== NULL`, `== 0`, `!= NULL`, `!= 0` +- [ ] No forbidden tokens (see table above) +- [ ] No unnecessary code patterns (see section above) + +### API Tags + +- [ ] `__rte_experimental` alone on line, only in headers +- [ ] `__rte_internal` alone on line, only in headers +- [ ] Alignment attributes only on struct/union +- [ ] Packed attributes properly paired + +### Structure + +- [ ] Each commit compiles independently +- [ ] Code and docs updated together +- [ ] Documentation matches code behavior +- [ ] PMD features match `doc/guides/nics/features/` matrix +- [ ] Device operations match documentation (per `features.rst` mappings) +- [ ] Tests added/updated as needed +- [ ] New APIs marked as `__rte_experimental` +- [ ] New APIs have testpmd hooks and functional tests +- [ ] Current release notes updated for significant changes +- [ ] Release notes updated for API changes +- [ ] Release notes updated for new drivers or subsystems + +--- + +## Meson Build Files + +### Style Requirements + +- 4-space indentation (no tabs) +- Line continuations double-indented +- Lists alphabetically ordered +- Short lists (≤3 items): single line, no trailing comma +- Long lists: one item per line, trailing comma on last item + +```python +# Short list +sources = files('file1.c', 'file2.c') + +# Long list +headers = files( + 'header1.h', + 'header2.h', + 'header3.h', +) +``` + +--- + +## Python Code + +- Must comply with formatting standards +- Use **`black`** for code formatting validation +- Line length acceptable up to 100 characters + +--- + +## Validation Tools + +Run these before submitting: + +```bash +# Check commit messages +devtools/check-git-log.sh -n1 + +# Check patch format and forbidden tokens +devtools/checkpatches.sh -n1 + +# Check maintainers coverage +devtools/check-maintainers.sh + +# Build validation +devtools/test-meson-builds.sh + +# Find maintainers for your patch +devtools/get-maintainer.sh <patch-file> +``` + +--- + +## Severity Levels for AI Review + +**Error** (must fix): +- Missing or malformed SPDX license +- Missing Signed-off-by +- Subject line over 60 characters +- Body lines over 75 characters +- Wrong tag order or format +- Missing required tags (Fixes, Coverity issue, Bugzilla ID) +- Forbidden tokens in code +- `__rte_experimental`/`__rte_internal` in .c files or not alone on line +- Compilation failures +- ABI breaks without proper versioning + +**Warning** (should fix): +- Subject line style issues (case, punctuation) +- Wrong component prefix +- Missing Cc: stable@dpdk.org for fixes +- Documentation gaps +- Documentation does not match code behavior +- PMD features missing from `doc/guides/nics/features/` matrix +- Device operations not documented per `features.rst` mappings +- Missing tests +- New API not marked as `__rte_experimental` +- New API without testpmd hooks or functional tests +- API changes without release notes +- New drivers or subsystems without release notes +- Implicit comparisons (`!ptr` instead of `ptr == NULL`) +- Unnecessary variable initialization +- Unnecessary casts of `void *` +- Unnecessary NULL checks before free +- Inappropriate use of `rte_malloc()` or `rte_memcpy()` + +**Info** (consider): +- Minor style preferences +- Optimization suggestions +- Alternative approaches -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v4] doc: add AGENTS.md for AI-powered code review tools 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger ` (3 preceding siblings ...) 2026-01-13 15:28 ` [PATCH v3] " Stephen Hemminger @ 2026-01-16 17:46 ` Stephen Hemminger 2026-01-20 14:21 ` Bruce Richardson 2026-01-20 17:35 ` Bruce Richardson 2026-01-23 0:42 ` [PATCH v5 0/4] devtools: add " Stephen Hemminger ` (6 subsequent siblings) 11 siblings, 2 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-16 17:46 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a structured reference document that enables AI code review tools to validate DPDK contributions against project standards. This document consolidates requirements from multiple sources into a machine-readable format optimized for automated validation workflows. The AGENTS.md file synthesizes guidelines from: - DPDK Contributing Code documentation (patches.rst) - DPDK Coding Style guidelines (coding_style.rst) - DPDK validation scripts (check-git-log.sh, checkpatches.sh) - Linux kernel patch submission process - SPDX License Identifier specification - DPDK Coccinelle scripts (cocci) - common items spotted on mailing list review Key sections include: - SPDX license and copyright header requirements - Commit message format with precise limits (60 char subject, 75 char body) and tag ordering rules - C coding style including explicit comparison requirements - Forbidden tokens table derived from checkpatches.sh - API tag placement rules for experimental and internal APIs - Patch validation checklists with severity levels The forbidden tokens section documents restrictions on deprecated atomics, logging functions, threading APIs, and compiler built-ins that are checked by the existing checkpatches.sh infrastructure. Severity levels (error/warning/info) align with the exit codes and messaging from check-git-log.sh and checkpatches.sh to help automated tools prioritize feedback appropriately. References: - https://doc.dpdk.org/guides/contributing/patches.html - https://doc.dpdk.org/guides/contributing/coding_style.html - devtools/check-git-log.sh - devtools/checkpatches.sh - devtools/cocci/ Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- v4 - revise based on use in reviewing patchs get rid of false positives from variable declarations add checks for rte_malloc and rte_memcpy devtools/AGENTS.md | 939 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 939 insertions(+) create mode 100644 devtools/AGENTS.md diff --git a/devtools/AGENTS.md b/devtools/AGENTS.md new file mode 100644 index 0000000000..1ec197bdf7 --- /dev/null +++ b/devtools/AGENTS.md @@ -0,0 +1,939 @@ +# AGENTS.md - DPDK Code Review Guidelines for AI Tools + +This document provides guidelines for AI-powered code review tools when reviewing contributions to the Data Plane Development Kit (DPDK). It is derived from the official DPDK contributor guidelines and validation scripts. + +## Overview + +DPDK follows a development process modeled on the Linux Kernel. All patches are reviewed publicly on the mailing list before being merged. AI review tools should verify compliance with the standards outlined below. + +--- + +## Source License Requirements + +### SPDX License Identifiers + +Every source file must begin with an SPDX license identifier, followed by the copyright notice, then a blank line before other content. + +- SPDX tag on first line (or second line for `#!` scripts) +- Copyright line immediately follows +- Blank line after copyright before any code/includes +- Core libraries and drivers use `BSD-3-Clause` +- Kernel components use `GPL-2.0` +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` + +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 ExampleCorp + */ + +#include <stdio.h> +``` + +For scripts: +```python +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 ExampleCorp + +import sys +``` + +**Do not include boilerplate license text** - the SPDX identifier is sufficient. + +--- + +## Commit Message Requirements + +### Subject Line (First Line) + +| Rule | Limit | +|------|-------| +| Maximum length | **60 characters** | +| Format | `component: lowercase description` | +| Case | Lowercase except acronyms | +| Mood | Imperative (instructions to codebase) | +| Punctuation | **No trailing period** | + +``` +# Good examples +net/ixgbe: fix offload config option name +config: increase max queues per port +net/mlx5: add support for flow counters +app/testpmd: fix memory leak in flow create + +# Bad examples +Fixed the offload config option. # past tense, has period, no prefix +net/ixgbe: Fix Offload Config # uppercase after colon +ixgbe: fix something # wrong prefix, should be net/ixgbe +lib/ethdev: add new feature # wrong prefix, should be ethdev: +``` + +#### Headline Format Errors (from check-git-log.sh) + +The following are flagged as errors: +- Tab characters in subject +- Leading or trailing spaces +- Trailing period (`.`) +- Punctuation marks: `, ; ! ? & |` +- Underscores after the colon (indicates code in subject) +- Missing colon separator +- No space after colon +- Space before colon + +#### Common Prefix Mistakes + +| Wrong | Correct | +|-------|---------| +| `ixgbe:` | `net/ixgbe:` | +| `lib/ethdev:` | `ethdev:` | +| `example:` | `examples/foo:` | +| `apps/` | `app/name:` | +| `app/test:` | `test:` | +| `testpmd:` | `app/testpmd:` | +| `test-pmd:` | `app/testpmd:` | +| `bond:` | `net/bonding:` | + +#### Case-Sensitive Terms + +These terms must use exact capitalization (from `devtools/words-case.txt`): +- `Rx`, `Tx` (not `RX`, `TX`, `rx`, `tx`) +- `VF`, `PF` (not `vf`, `pf`) +- `MAC`, `VLAN`, `RSS`, `API` +- `Linux`, `Windows`, `FreeBSD` +- Check `devtools/words-case.txt` for complete list + +### Commit Body + +| Rule | Limit | +|------|-------| +| Line wrap | **75 characters** | +| Exception | `Fixes:` lines may exceed 75 chars | + +Body guidelines: +- Describe the issue being fixed or feature being added +- Provide enough context for reviewers +- **Do not start the commit message body with "It"** +- **Must end with** `Signed-off-by:` line (real name, not alias) + +### Fixes Tag + +When fixing regressions, use the `Fixes:` tag with a 12-character abbreviated SHA: + +``` +Fixes: abcdefgh1234 ("original commit subject") +``` + +The hash must reference a commit in the current branch, and the subject must match exactly. + +**Finding maintainers**: Use `devtools/get-maintainer.sh` to identify the current subsystem maintainer from the `MAINTAINERS` file, rather than CC'ing the original author: + +```bash +git send-email --to-cmd ./devtools/get-maintainer.sh --cc dev@dpdk.org 000*.patch +``` + +### Required Tags + +``` +# For Coverity issues (required if "coverity" mentioned in body): +Coverity issue: 12345 + +# For Bugzilla issues (required if "bugzilla" mentioned in body): +Bugzilla ID: 12345 + +# For stable release backport candidates: +Cc: stable@dpdk.org + +# For patch dependencies (in commit notes after ---): +Depends-on: series-NNNNN ("Title of the series") +``` + +### Tag Order + +Tags must appear in this order: + +``` +Coverity issue: +Bugzilla ID: +Fixes: +Cc: + <-- blank line required here +Reported-by: +Suggested-by: +Signed-off-by: +Acked-by: +Reviewed-by: +Tested-by: +``` + +**Tag format**: `Tag-name: Full Name <email@domain.com>` + +--- + +## C Coding Style + +### Line Length + +| Context | Limit | +|---------|-------| +| Source code | **100 characters** | +| Commit body | **75 characters** | + +### General Formatting + +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) +- **No trailing whitespace** on lines or at end of files +- Files must end with a new line +- Code style should be consistent within each file + + +### Comments + +```c +/* Most single-line comments look like this. */ + +/* + * VERY important single-line comments look like this. + */ + +/* + * Multi-line comments look like this. Make them real sentences. Fill + * them so they look like real paragraphs. + */ +``` + +### Header File Organization + +Include order (each group separated by blank line): +1. System/libc includes +2. DPDK EAL includes +3. DPDK misc library includes +4. Application-specific includes + +```c +#include <stdio.h> +#include <stdlib.h> + +#include <rte_eal.h> + +#include <rte_ring.h> +#include <rte_mempool.h> + +#include "application.h" +``` + +### Header Guards + +```c +#ifndef _FILE_H_ +#define _FILE_H_ + +/* Code */ + +#endif /* _FILE_H_ */ +``` + +### Naming Conventions + +- **All external symbols** must have `RTE_` or `rte_` prefix +- **Macros**: ALL_UPPERCASE with `RTE_` prefix +- **Functions**: lowercase with underscores only (no CamelCase) +- **Variables**: lowercase with underscores only +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix + +**Exception**: Driver base directories (`drivers/*/base/`) may use different naming conventions when sharing code across platforms or with upstream vendor code. + +#### Symbol Naming for Static Linking + +Drivers and libraries must not expose global variables that could clash when statically linked with other DPDK components or applications. Use consistent and unique prefixes for all exported symbols to avoid namespace collisions. + +**Good practice**: Use a driver-specific or library-specific prefix for all global variables: + +```c +/* Good - virtio driver uses consistent "virtio_" prefix */ +const struct virtio_ops virtio_legacy_ops = { + .read = virtio_legacy_read, + .write = virtio_legacy_write, + .configure = virtio_legacy_configure, +}; + +const struct virtio_ops virtio_modern_ops = { + .read = virtio_modern_read, + .write = virtio_modern_write, + .configure = virtio_modern_configure, +}; + +/* Good - mlx5 driver uses consistent "mlx5_" prefix */ +struct mlx5_flow_driver_ops mlx5_flow_dv_ops; +``` + +**Bad practice**: Generic names that may clash: + +```c +/* Bad - "ops" is too generic, will clash with other drivers */ +const struct virtio_ops ops = { ... }; + +/* Bad - "legacy_ops" could clash with other legacy implementations */ +const struct virtio_ops legacy_ops = { ... }; + +/* Bad - "driver_config" is not unique */ +struct driver_config config; +``` + +**Guidelines**: +- Prefix all global variables with the driver or library name (e.g., `virtio_`, `mlx5_`, `ixgbe_`) +- Prefix all global functions similarly unless they use the `rte_` namespace +- Internal static variables do not require prefixes as they have file scope +- Consider using the `RTE_` or `rte_` prefix only for symbols that are part of the public DPDK API + +#### Prohibited Terminology + +Do not use: +- `master/slave` → Use: primary/secondary, controller/worker, leader/follower +- `blacklist/whitelist` → Use: denylist/allowlist, blocklist/passlist + +### Comparisons and Boolean Logic + +```c +/* Pointers - compare explicitly with NULL */ +if (p == NULL) /* Good */ +if (p != NULL) /* Good */ +if (!p) /* Bad - don't use ! on pointers */ + +/* Integers - compare explicitly with zero */ +if (a == 0) /* Good */ +if (a != 0) /* Good */ +if (!a) /* Bad - don't use ! on integers */ + +/* Characters - compare with character constant */ +if (*p == '\0') /* Good */ + +/* Booleans - direct test is acceptable */ +if (flag) /* Good for actual bool types */ +if (!flag) /* Good for actual bool types */ +``` + +### Boolean Usage + +- Using `bool` type is allowed +- Prefer `bool` over `int` when a variable or field is only used as a boolean +- For structure fields, consider if the size/alignment impact is acceptable + +### Indentation and Braces + +```c +/* Control statements - no braces for single statements */ +if (val != NULL) + val = realloc(val, newsize); + +/* Braces on same line as else */ +if (test) + stmt; +else if (bar) { + stmt; + stmt; +} else + stmt; + +/* Switch statements - don't indent case */ +switch (ch) { +case 'a': + aflag = 1; + /* FALLTHROUGH */ +case 'b': + bflag = 1; + break; +default: + usage(); +} + +/* Long conditions - double indent continuation */ +if (really_long_variable_name_1 == really_long_variable_name_2 && + really_long_variable_name_3 == really_long_variable_name_4) + stmt; +``` + +### Variable Declarations + +- Prefer declaring variables inside the basic block where they are used +- Variables may be declared either at the start of the block, or at point of first use (C99 style) +- Both declaration styles are acceptable; consistency within a function is preferred +- Initialize variables only when a meaningful value exists at declaration time +- Use C99 designated initializers for structures + +```c +/* Good - declaration at start of block */ +int ret; +ret = some_function(); + +/* Also good - declaration at point of use (C99 style) */ +for (int i = 0; i < count; i++) + process(i); + +/* Good - declaration in inner block where variable is used */ +if (condition) { + int local_val = compute(); + use(local_val); +} + +/* Bad - unnecessary initialization defeats compiler warnings */ +int ret = 0; +ret = some_function(); /* Compiler won't warn if assignment removed */ +``` + +### Function Format + +- Return type on its own line +- Opening brace on its own line +- Place an empty line between declarations and statements + +```c +static char * +function(int a1, int b1) +{ + char *p; + + p = do_something(a1, b1); + return p; +} +``` + +--- + +## Unnecessary Code Patterns + +The following patterns add unnecessary code, hide bugs, or reduce performance. Avoid them. + +### Unnecessary Variable Initialization + +Do not initialize variables that will be assigned before use. This defeats the compiler's uninitialized variable warnings, hiding potential bugs. + +```c +/* Bad - initialization defeats -Wuninitialized */ +int ret = 0; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - compiler will warn if any path misses assignment */ +int ret; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - meaningful initial value */ +int count = 0; +for (i = 0; i < n; i++) + if (test(i)) + count++; +``` + +### Unnecessary Casts of void * + +In C, `void *` converts implicitly to any pointer type. Casting the result of `malloc()`, `calloc()`, `rte_malloc()`, or similar functions is unnecessary and can hide the error of a missing `#include <stdlib.h>`. + +```c +/* Bad - unnecessary cast */ +struct foo *p = (struct foo *)malloc(sizeof(*p)); +struct bar *q = (struct bar *)rte_malloc(NULL, sizeof(*q), 0); + +/* Good - no cast needed in C */ +struct foo *p = malloc(sizeof(*p)); +struct bar *q = rte_malloc(NULL, sizeof(*q), 0); +``` + +Note: Casts are required in C++ but DPDK is a C project. + +### Zero-Length Arrays vs Variable-Length Arrays + +Zero-length arrays (`int arr[0]`) are a GCC extension. Use C99 flexible array members instead. + +```c +/* Bad - GCC extension */ +struct msg { + int len; + char data[0]; +}; + +/* Good - C99 flexible array member */ +struct msg { + int len; + char data[]; +}; +``` + +### Unnecessary NULL Checks Before free() + +Functions like `free()`, `rte_free()`, and similar deallocation functions accept NULL pointers safely. Do not add redundant NULL checks. + +```c +/* Bad - unnecessary check */ +if (ptr != NULL) + free(ptr); + +if (rte_ptr != NULL) + rte_free(rte_ptr); + +/* Good - free handles NULL */ +free(ptr); +rte_free(rte_ptr); +``` + +### memset Before free() + +Do not call `memset()` to zero memory before freeing it. The compiler may optimize away the `memset()` as a dead store. For security-sensitive data, use `rte_free_sensitive()` which ensures memory is cleared. + +```c +/* Bad - compiler may eliminate memset */ +memset(secret_key, 0, sizeof(secret_key)); +free(secret_key); + +/* Good - for non-sensitive data, just free */ +free(ptr); + +/* Good - for sensitive data, use secure free */ +rte_free_sensitive(secret_key); +``` + +### Appropriate Use of rte_malloc() + +`rte_malloc()` allocates from hugepage memory. Use it only when required: + +- Memory that will be accessed by DMA (NIC descriptors, packet buffers) +- Memory shared between primary and secondary DPDK processes +- Memory requiring specific NUMA node placement + +For general allocations, use standard `malloc()` which is faster and does not consume limited hugepage resources. + +```c +/* Bad - rte_malloc for ordinary data structure */ +struct config *cfg = rte_malloc(NULL, sizeof(*cfg), 0); + +/* Good - standard malloc for control structures */ +struct config *cfg = malloc(sizeof(*cfg)); + +/* Good - rte_malloc for DMA-accessible memory */ +struct rte_mbuf *mbufs = rte_malloc(NULL, n * sizeof(*mbufs), RTE_CACHE_LINE_SIZE); +``` + +### Appropriate Use of rte_memcpy() + +`rte_memcpy()` is optimized for bulk data transfer in the fast path. For general use, standard `memcpy()` is preferred because: + +- Modern compilers optimize `memcpy()` effectively +- `memcpy()` includes bounds checking with `_FORTIFY_SOURCE` +- `memcpy()` handles small fixed-size copies efficiently + +```c +/* Bad - rte_memcpy in control path */ +rte_memcpy(&config, &default_config, sizeof(config)); + +/* Good - standard memcpy for control path */ +memcpy(&config, &default_config, sizeof(config)); + +/* Good - rte_memcpy for packet data in fast path */ +rte_memcpy(rte_pktmbuf_mtod(m, void *), payload, len); +``` + +--- + +## Forbidden Tokens + +### Functions + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `rte_panic()` | Return error codes | lib/, drivers/ | +| `rte_exit()` | Return error codes | lib/, drivers/ | +| `perror()` | `RTE_LOG()` with `strerror(errno)` | lib/, drivers/ (allowed in examples/, app/test/) | +| `printf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | +| `fprintf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | + +### Atomics and Memory Barriers + +| Forbidden | Preferred | +|-----------|-----------| +| `rte_atomic16/32/64_xxx()` | C11 atomics via `rte_atomic_xxx()` | +| `rte_smp_mb()` | `rte_atomic_thread_fence()` | +| `rte_smp_rmb()` | `rte_atomic_thread_fence()` | +| `rte_smp_wmb()` | `rte_atomic_thread_fence()` | +| `__sync_xxx()` | `rte_atomic_xxx()` | +| `__atomic_xxx()` | `rte_atomic_xxx()` | +| `__ATOMIC_RELAXED` etc. | `rte_memory_order_xxx` | +| `__rte_atomic_thread_fence()` | `rte_atomic_thread_fence()` | + +### Threading + +| Forbidden | Preferred | +|-----------|-----------| +| `pthread_create()` | `rte_thread_create()` | +| `pthread_join()` | `rte_thread_join()` | +| `pthread_detach()` | EAL thread functions | +| `pthread_setaffinity_np()` | `rte_thread_set_affinity()` | +| `rte_thread_set_name()` | `rte_thread_set_prefixed_name()` | +| `rte_thread_create_control()` | `rte_thread_create_internal_control()` | + +### Compiler Built-ins and Attributes + +| Forbidden | Preferred | Notes | +|-----------|-----------|-------| +| `__attribute__` | RTE macros in `rte_common.h` | Except in `lib/eal/include/rte_common.h` | +| `__alignof__` | C11 `alignof` | | +| `__typeof__` | `typeof` | | +| `__builtin_*` | EAL macros | Except in `lib/eal/` and `drivers/*/base/` | +| `__reserved` | Different name | Reserved in Windows headers | +| `#pragma` / `_Pragma` | Avoid | Except in `rte_common.h` | + +### Format Specifiers + +| Forbidden | Preferred | +|-----------|-----------| +| `%lld`, `%llu`, `%llx` | `%PRId64`, `%PRIu64`, `%PRIx64` | + +### Headers and Build + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `#include <linux/pci_regs.h>` | `#include <rte_pci.h>` | | +| `install_headers()` | Meson `headers` variable | meson.build | +| `-DALLOW_EXPERIMENTAL_API` | Not in lib/drivers/app | Build flags | +| `allow_experimental_apis` | Not in lib/drivers/app | Meson | +| `#undef XXX` | `// XXX is not set` | config/rte_config.h | +| Driver headers (`*_driver.h`, `*_pmd.h`) | Public API headers | app/, examples/ | + +### Testing + +| Forbidden | Preferred | +|-----------|-----------| +| `REGISTER_TEST_COMMAND` | `REGISTER_<suite_name>_TEST` | + +### Documentation + +| Forbidden | Preferred | +|-----------|-----------| +| `http://...dpdk.org` | `https://...dpdk.org` | +| `//doc.dpdk.org/guides/...` | `:ref:` or `:doc:` Sphinx references | +| `:: file.svg` | `:: file.*` (wildcard extension) | + +--- + +## API Tag Requirements + +### `__rte_experimental` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_experimental +int +rte_new_feature(void); + +/* Wrong - not alone on line */ +__rte_experimental int rte_new_feature(void); + +/* Wrong - in .c file */ +``` + +### `__rte_internal` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_internal +int +internal_function(void); +``` + +### Alignment Attributes + +`__rte_aligned`, `__rte_cache_aligned`, `__rte_cache_min_aligned` may only be used with `struct` or `union` types: + +```c +/* Correct */ +struct __rte_cache_aligned my_struct { + /* ... */ +}; + +/* Wrong */ +int __rte_cache_aligned my_variable; +``` + +### Packed Attributes + +- `__rte_packed_begin` must follow `struct`, `union`, or alignment attributes +- `__rte_packed_begin` and `__rte_packed_end` must be used in pairs +- Cannot use `__rte_packed_begin` with `enum` + +```c +/* Correct */ +struct __rte_packed_begin my_packed_struct { + /* ... */ +} __rte_packed_end; + +/* Wrong - with enum */ +enum __rte_packed_begin my_enum { + /* ... */ +}; +``` + +--- + +## Code Quality Requirements + +### Compilation + +- Each commit must compile independently (for `git bisect`) +- No forward dependencies within a patchset +- Test with multiple targets, compilers, and options +- Use `devtools/test-meson-builds.sh` + +### Testing + +- Add tests to `app/test` unit test framework +- New API functions must be used in `/app` test directory +- New device APIs require at least one driver implementation + +#### Functional Test Infrastructure + +Standalone functional tests should use the `TEST_ASSERT` macros and `unit_test_suite_runner` infrastructure for consistency and proper integration with the DPDK test framework. + +```c +#include <rte_test.h> + +static int +test_feature_basic(void) +{ + int ret; + + ret = rte_feature_init(); + TEST_ASSERT_SUCCESS(ret, "Failed to initialize feature"); + + ret = rte_feature_operation(); + TEST_ASSERT_EQUAL(ret, 0, "Operation returned unexpected value"); + + TEST_ASSERT_NOT_NULL(rte_feature_get_ptr(), + "Feature pointer should not be NULL"); + + return TEST_SUCCESS; +} + +static struct unit_test_suite feature_testsuite = { + .suite_name = "feature_autotest", + .setup = test_feature_setup, + .teardown = test_feature_teardown, + .unit_test_cases = { + TEST_CASE(test_feature_basic), + TEST_CASE(test_feature_advanced), + TEST_CASES_END() + } +}; + +static int +test_feature(void) +{ + return unit_test_suite_runner(&feature_testsuite); +} + +REGISTER_FAST_TEST(feature_autotest, true, true, test_feature); +``` + +Common `TEST_ASSERT` macros: +- `TEST_ASSERT(cond, msg, ...)` - Assert condition is true +- `TEST_ASSERT_SUCCESS(val, msg, ...)` - Assert value equals 0 +- `TEST_ASSERT_FAIL(val, msg, ...)` - Assert value is non-zero +- `TEST_ASSERT_EQUAL(a, b, msg, ...)` - Assert two values are equal +- `TEST_ASSERT_NOT_EQUAL(a, b, msg, ...)` - Assert two values differ +- `TEST_ASSERT_NULL(val, msg, ...)` - Assert value is NULL +- `TEST_ASSERT_NOT_NULL(val, msg, ...)` - Assert value is not NULL + +### Documentation + +- Add Doxygen comments for public APIs +- Update release notes in `doc/guides/rel_notes/` for important changes +- Code and documentation must be updated atomically in same patch +- Only update the **current release** notes file +- Documentation must match the code +- PMD features must match the features matrix in `doc/guides/nics/features/` +- Documentation must match device operations (see `doc/guides/nics/features.rst` for the mapping between features, `eth_dev_ops`, and related APIs) + +### API and Driver Changes + +- New APIs must be marked as `__rte_experimental` +- New APIs must have hooks in `app/testpmd` and tests in the functional test suite +- Changes to existing APIs require release notes +- New drivers or subsystems must have release notes + +### ABI Compatibility + +- New external functions must be exported properly +- Follow ABI policy and versioning guidelines +- Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable + +--- + +## Patch Validation Checklist + +### Commit Message + +- [ ] Subject line ≤60 characters +- [ ] Subject is lowercase (except acronyms from words-case.txt) +- [ ] Correct component prefix (e.g., `net/ixgbe:` not `ixgbe:`) +- [ ] No `lib/` prefix for libraries +- [ ] Imperative mood, no trailing period +- [ ] No tabs, leading/trailing spaces, or punctuation marks +- [ ] Body wrapped at 75 characters +- [ ] Body does not start with "It" +- [ ] `Signed-off-by:` present with real name and valid email +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA and exact subject +- [ ] `Coverity issue:` tag present if Coverity mentioned +- [ ] `Bugzilla ID:` tag present if Bugzilla mentioned +- [ ] `Cc: stable@dpdk.org` for stable backport candidates +- [ ] Tags in correct order with blank line separator + +### License + +- [ ] SPDX identifier on first line (or second for scripts) +- [ ] Copyright line follows SPDX +- [ ] Blank line after copyright before code +- [ ] Appropriate license for file type + +### Code Style + +- [ ] Lines ≤100 characters +- [ ] Hard tabs for indentation, spaces for alignment +- [ ] No trailing whitespace +- [ ] Proper include order +- [ ] Header guards present +- [ ] `rte_`/`RTE_` prefix on external symbols +- [ ] Driver/library global variables use unique prefixes (e.g., `virtio_`, `mlx5_`) +- [ ] No prohibited terminology +- [ ] Proper brace style +- [ ] Function return type on own line +- [ ] Explicit comparisons: `== NULL`, `== 0`, `!= NULL`, `!= 0` +- [ ] No forbidden tokens (see table above) +- [ ] No unnecessary code patterns (see section above) + +### API Tags + +- [ ] `__rte_experimental` alone on line, only in headers +- [ ] `__rte_internal` alone on line, only in headers +- [ ] Alignment attributes only on struct/union +- [ ] Packed attributes properly paired + +### Structure + +- [ ] Each commit compiles independently +- [ ] Code and docs updated together +- [ ] Documentation matches code behavior +- [ ] PMD features match `doc/guides/nics/features/` matrix +- [ ] Device operations match documentation (per `features.rst` mappings) +- [ ] Tests added/updated as needed +- [ ] Functional tests use TEST_ASSERT macros and unit_test_suite_runner +- [ ] New APIs marked as `__rte_experimental` +- [ ] New APIs have testpmd hooks and functional tests +- [ ] Current release notes updated for significant changes +- [ ] Release notes updated for API changes +- [ ] Release notes updated for new drivers or subsystems + +--- + +## Meson Build Files + +### Style Requirements + +- 4-space indentation (no tabs) +- Line continuations double-indented +- Lists alphabetically ordered +- Short lists (≤3 items): single line, no trailing comma +- Long lists: one item per line, trailing comma on last item + +```python +# Short list +sources = files('file1.c', 'file2.c') + +# Long list +headers = files( + 'header1.h', + 'header2.h', + 'header3.h', +) +``` + +--- + +## Python Code + +- Must comply with formatting standards +- Use **`black`** for code formatting validation +- Line length acceptable up to 100 characters + +--- + +## Validation Tools + +Run these before submitting: + +```bash +# Check commit messages +devtools/check-git-log.sh -n1 + +# Check patch format and forbidden tokens +devtools/checkpatches.sh -n1 + +# Check maintainers coverage +devtools/check-maintainers.sh + +# Build validation +devtools/test-meson-builds.sh + +# Find maintainers for your patch +devtools/get-maintainer.sh <patch-file> +``` + +--- + +## Severity Levels for AI Review + +**Error** (must fix): +- Missing or malformed SPDX license +- Missing Signed-off-by +- Subject line over 60 characters +- Body lines over 75 characters +- Wrong tag order or format +- Missing required tags (Fixes, Coverity issue, Bugzilla ID) +- Forbidden tokens in code +- `__rte_experimental`/`__rte_internal` in .c files or not alone on line +- Compilation failures +- ABI breaks without proper versioning + +**Warning** (should fix): +- Subject line style issues (case, punctuation) +- Wrong component prefix +- Missing Cc: stable@dpdk.org for fixes +- Documentation gaps +- Documentation does not match code behavior +- PMD features missing from `doc/guides/nics/features/` matrix +- Device operations not documented per `features.rst` mappings +- Missing tests +- Functional tests not using TEST_ASSERT macros or unit_test_suite_runner +- New API not marked as `__rte_experimental` +- New API without testpmd hooks or functional tests +- API changes without release notes +- New drivers or subsystems without release notes +- Implicit comparisons (`!ptr` instead of `ptr == NULL`) +- Unnecessary variable initialization +- Unnecessary casts of `void *` +- Unnecessary NULL checks before free +- Inappropriate use of `rte_malloc()` or `rte_memcpy()` +- Use of `perror()`, `printf()`, `fprintf()` in libraries or drivers (allowed in examples and test code) +- Driver/library global variables without unique prefixes (static linking clash risk) + +**Info** (consider): +- Minor style preferences +- Optimization suggestions +- Alternative approaches -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH v4] doc: add AGENTS.md for AI-powered code review tools 2026-01-16 17:46 ` [PATCH v4] " Stephen Hemminger @ 2026-01-20 14:21 ` Bruce Richardson 2026-01-20 17:35 ` Bruce Richardson 1 sibling, 0 replies; 63+ messages in thread From: Bruce Richardson @ 2026-01-20 14:21 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev On Fri, Jan 16, 2026 at 09:46:41AM -0800, Stephen Hemminger wrote: > Add a structured reference document that enables AI code review tools > to validate DPDK contributions against project standards. This document > consolidates requirements from multiple sources into a machine-readable > format optimized for automated validation workflows. > > The AGENTS.md file synthesizes guidelines from: > - DPDK Contributing Code documentation (patches.rst) > - DPDK Coding Style guidelines (coding_style.rst) > - DPDK validation scripts (check-git-log.sh, checkpatches.sh) > - Linux kernel patch submission process > - SPDX License Identifier specification > - DPDK Coccinelle scripts (cocci) > - common items spotted on mailing list review > > Key sections include: > - SPDX license and copyright header requirements > - Commit message format with precise limits (60 char subject, > 75 char body) and tag ordering rules > - C coding style including explicit comparison requirements > - Forbidden tokens table derived from checkpatches.sh > - API tag placement rules for experimental and internal APIs > - Patch validation checklists with severity levels > > The forbidden tokens section documents restrictions on deprecated > atomics, logging functions, threading APIs, and compiler built-ins > that are checked by the existing checkpatches.sh infrastructure. > > Severity levels (error/warning/info) align with the exit codes and > messaging from check-git-log.sh and checkpatches.sh to help automated > tools prioritize feedback appropriately. > > References: > - https://doc.dpdk.org/guides/contributing/patches.html > - https://doc.dpdk.org/guides/contributing/coding_style.html > - devtools/check-git-log.sh > - devtools/checkpatches.sh > - devtools/cocci/ > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- > > v4 - revise based on use in reviewing patchs > get rid of false positives from variable declarations > add checks for rte_malloc and rte_memcpy > > > devtools/AGENTS.md | 939 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 939 insertions(+) > create mode 100644 devtools/AGENTS.md > Acked-by: Bruce Richardson <bruce.richardson@intel.com> Even if not perfect, like to see this in main repo, to make it easier for all to have AI review of patches - especially maintainers merging to subtrees! :-) ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v4] doc: add AGENTS.md for AI-powered code review tools 2026-01-16 17:46 ` [PATCH v4] " Stephen Hemminger 2026-01-20 14:21 ` Bruce Richardson @ 2026-01-20 17:35 ` Bruce Richardson 1 sibling, 0 replies; 63+ messages in thread From: Bruce Richardson @ 2026-01-20 17:35 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev On Fri, Jan 16, 2026 at 09:46:41AM -0800, Stephen Hemminger wrote: > Add a structured reference document that enables AI code review tools > to validate DPDK contributions against project standards. This document > consolidates requirements from multiple sources into a machine-readable > format optimized for automated validation workflows. > > The AGENTS.md file synthesizes guidelines from: > - DPDK Contributing Code documentation (patches.rst) > - DPDK Coding Style guidelines (coding_style.rst) > - DPDK validation scripts (check-git-log.sh, checkpatches.sh) > - Linux kernel patch submission process > - SPDX License Identifier specification > - DPDK Coccinelle scripts (cocci) > - common items spotted on mailing list review > > Key sections include: > - SPDX license and copyright header requirements > - Commit message format with precise limits (60 char subject, > 75 char body) and tag ordering rules > - C coding style including explicit comparison requirements > - Forbidden tokens table derived from checkpatches.sh > - API tag placement rules for experimental and internal APIs > - Patch validation checklists with severity levels > > The forbidden tokens section documents restrictions on deprecated > atomics, logging functions, threading APIs, and compiler built-ins > that are checked by the existing checkpatches.sh infrastructure. > > Severity levels (error/warning/info) align with the exit codes and > messaging from check-git-log.sh and checkpatches.sh to help automated > tools prioritize feedback appropriately. > > References: > - https://doc.dpdk.org/guides/contributing/patches.html > - https://doc.dpdk.org/guides/contributing/coding_style.html > - devtools/check-git-log.sh > - devtools/checkpatches.sh > - devtools/cocci/ > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- > > v4 - revise based on use in reviewing patchs > get rid of false positives from variable declarations > add checks for rte_malloc and rte_memcpy > > > devtools/AGENTS.md | 939 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 939 insertions(+) > create mode 100644 devtools/AGENTS.md > I see you are putting this in the devtools directory. To have it picked up automatically, should this not be put in the root directory? ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v5 0/4] devtools: add AI-powered code review tools 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger ` (4 preceding siblings ...) 2026-01-16 17:46 ` [PATCH v4] " Stephen Hemminger @ 2026-01-23 0:42 ` Stephen Hemminger 2026-01-23 0:42 ` [PATCH v5 1/4] doc: add AGENTS.md for " Stephen Hemminger ` (4 more replies) 2026-01-24 19:00 ` [PATCH v6 " Stephen Hemminger ` (5 subsequent siblings) 11 siblings, 5 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-23 0:42 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This series adds tools to enable AI-assisted code review for DPDK contributions. The goal is to help contributors validate their patches against DPDK coding standards before submission, reducing review cycles and improving patch quality. The series consists of: 1. AGENTS.md - A structured reference document that consolidates DPDK coding standards, commit message requirements, and validation rules into a machine-readable format. This serves as the knowledge base for AI review tools. 2. analyze-patch.sh - Reviews patches against AGENTS.md guidelines. Supports multiple AI providers (Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini) for flexibility and comparison. 3. compare-reviews.sh - Utility script to run reviews across all configured AI providers simultaneously for comparison. 4. review-doc.sh - Reviews documentation files for spelling, grammar, and technical correctness. Produces ready-to-apply diffs and compliant commit messages. Usage requires an API key from at least one supported provider: export ANTHROPIC_API_KEY=... # or: OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY ./devtools/analyze-patch.sh my-patch.patch ./devtools/review-doc.sh doc/guides/prog_guide/mempool_lib.rst The multi-provider design allows users to: - Choose their preferred AI service - Compare feedback quality across providers - Avoid vendor lock-in These tools complement existing validation infrastructure (checkpatches.sh, check-git-log.sh) by providing semantic review capabilities that static analysis cannot offer. Future work could include: - Integration with CI pipelines - Mailing list integration for automated review comments - Fine-tuning prompts based on common review feedback - Better updating and error handling. AI providers are not stable. - Integration into contributing documentation Stephen Hemminger (4): doc: add AGENTS.md for AI-powered code review tools devtools: add multi-provider AI patch review script devtools: add script to compare AI reviews across providers devtools: add multi-provider AI documentation review script AGENTS.md | 939 ++++++++++++++++++++ devtools/analyze-patch.sh | 534 ++++++++++++ devtools/compare-reviews.sh | 153 ++++ devtools/review-doc.sh | 734 ++++++++++++++++ 4 files changed, 2360 insertions(+) create mode 100644 AGENTS.md create mode 100755 devtools/analyze-patch.sh create mode 100755 devtools/compare-reviews.sh create mode 100755 devtools/review-doc.sh Stephen Hemminger (4): doc: add AGENTS.md for AI-powered code review tools devtools: add multi-provider AI patch review script devtools: add script to compare AI reviews across providers devtools: add multi-provider AI documentation review script AGENTS.md | 939 ++++++++++++++++++++++++++++++++++++ devtools/analyze-patch.sh | 534 ++++++++++++++++++++ devtools/compare-reviews.sh | 153 ++++++ devtools/review-doc.sh | 734 ++++++++++++++++++++++++++++ 4 files changed, 2360 insertions(+) create mode 100644 AGENTS.md create mode 100755 devtools/analyze-patch.sh create mode 100755 devtools/compare-reviews.sh create mode 100755 devtools/review-doc.sh -- 2.51.0 ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v5 1/4] doc: add AGENTS.md for AI-powered code review tools 2026-01-23 0:42 ` [PATCH v5 0/4] devtools: add " Stephen Hemminger @ 2026-01-23 0:42 ` Stephen Hemminger 2026-01-23 0:42 ` [PATCH v5 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger ` (3 subsequent siblings) 4 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-23 0:42 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a structured reference document that enables AI code review tools to validate DPDK contributions against project standards. This document consolidates requirements from multiple sources into a machine-readable format optimized for automated validation workflows. The AGENTS.md file synthesizes guidelines from: - DPDK Contributing Code documentation (patches.rst) - DPDK Coding Style guidelines (coding_style.rst) - DPDK validation scripts (check-git-log.sh, checkpatches.sh) - Linux kernel patch submission process - SPDX License Identifier specification - DPDK Coccinelle scripts (cocci) - common items spotted on mailing list review Key sections include: - SPDX license and copyright header requirements - Commit message format with precise limits (60 char subject, 75 char body) and tag ordering rules - C coding style including explicit comparison requirements - Forbidden tokens table derived from checkpatches.sh - API tag placement rules for experimental and internal APIs - Patch validation checklists with severity levels The forbidden tokens section documents restrictions on deprecated atomics, logging functions, threading APIs, and compiler built-ins that are checked by the existing checkpatches.sh infrastructure. Severity levels (error/warning/info) align with the exit codes and messaging from check-git-log.sh and checkpatches.sh to help automated tools prioritize feedback appropriately. References: - https://doc.dpdk.org/guides/contributing/patches.html - https://doc.dpdk.org/guides/contributing/coding_style.html - devtools/check-git-log.sh - devtools/checkpatches.sh - devtools/cocci/ Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- AGENTS.md | 939 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 939 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..1ec197bdf7 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,939 @@ +# AGENTS.md - DPDK Code Review Guidelines for AI Tools + +This document provides guidelines for AI-powered code review tools when reviewing contributions to the Data Plane Development Kit (DPDK). It is derived from the official DPDK contributor guidelines and validation scripts. + +## Overview + +DPDK follows a development process modeled on the Linux Kernel. All patches are reviewed publicly on the mailing list before being merged. AI review tools should verify compliance with the standards outlined below. + +--- + +## Source License Requirements + +### SPDX License Identifiers + +Every source file must begin with an SPDX license identifier, followed by the copyright notice, then a blank line before other content. + +- SPDX tag on first line (or second line for `#!` scripts) +- Copyright line immediately follows +- Blank line after copyright before any code/includes +- Core libraries and drivers use `BSD-3-Clause` +- Kernel components use `GPL-2.0` +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` + +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 ExampleCorp + */ + +#include <stdio.h> +``` + +For scripts: +```python +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 ExampleCorp + +import sys +``` + +**Do not include boilerplate license text** - the SPDX identifier is sufficient. + +--- + +## Commit Message Requirements + +### Subject Line (First Line) + +| Rule | Limit | +|------|-------| +| Maximum length | **60 characters** | +| Format | `component: lowercase description` | +| Case | Lowercase except acronyms | +| Mood | Imperative (instructions to codebase) | +| Punctuation | **No trailing period** | + +``` +# Good examples +net/ixgbe: fix offload config option name +config: increase max queues per port +net/mlx5: add support for flow counters +app/testpmd: fix memory leak in flow create + +# Bad examples +Fixed the offload config option. # past tense, has period, no prefix +net/ixgbe: Fix Offload Config # uppercase after colon +ixgbe: fix something # wrong prefix, should be net/ixgbe +lib/ethdev: add new feature # wrong prefix, should be ethdev: +``` + +#### Headline Format Errors (from check-git-log.sh) + +The following are flagged as errors: +- Tab characters in subject +- Leading or trailing spaces +- Trailing period (`.`) +- Punctuation marks: `, ; ! ? & |` +- Underscores after the colon (indicates code in subject) +- Missing colon separator +- No space after colon +- Space before colon + +#### Common Prefix Mistakes + +| Wrong | Correct | +|-------|---------| +| `ixgbe:` | `net/ixgbe:` | +| `lib/ethdev:` | `ethdev:` | +| `example:` | `examples/foo:` | +| `apps/` | `app/name:` | +| `app/test:` | `test:` | +| `testpmd:` | `app/testpmd:` | +| `test-pmd:` | `app/testpmd:` | +| `bond:` | `net/bonding:` | + +#### Case-Sensitive Terms + +These terms must use exact capitalization (from `devtools/words-case.txt`): +- `Rx`, `Tx` (not `RX`, `TX`, `rx`, `tx`) +- `VF`, `PF` (not `vf`, `pf`) +- `MAC`, `VLAN`, `RSS`, `API` +- `Linux`, `Windows`, `FreeBSD` +- Check `devtools/words-case.txt` for complete list + +### Commit Body + +| Rule | Limit | +|------|-------| +| Line wrap | **75 characters** | +| Exception | `Fixes:` lines may exceed 75 chars | + +Body guidelines: +- Describe the issue being fixed or feature being added +- Provide enough context for reviewers +- **Do not start the commit message body with "It"** +- **Must end with** `Signed-off-by:` line (real name, not alias) + +### Fixes Tag + +When fixing regressions, use the `Fixes:` tag with a 12-character abbreviated SHA: + +``` +Fixes: abcdefgh1234 ("original commit subject") +``` + +The hash must reference a commit in the current branch, and the subject must match exactly. + +**Finding maintainers**: Use `devtools/get-maintainer.sh` to identify the current subsystem maintainer from the `MAINTAINERS` file, rather than CC'ing the original author: + +```bash +git send-email --to-cmd ./devtools/get-maintainer.sh --cc dev@dpdk.org 000*.patch +``` + +### Required Tags + +``` +# For Coverity issues (required if "coverity" mentioned in body): +Coverity issue: 12345 + +# For Bugzilla issues (required if "bugzilla" mentioned in body): +Bugzilla ID: 12345 + +# For stable release backport candidates: +Cc: stable@dpdk.org + +# For patch dependencies (in commit notes after ---): +Depends-on: series-NNNNN ("Title of the series") +``` + +### Tag Order + +Tags must appear in this order: + +``` +Coverity issue: +Bugzilla ID: +Fixes: +Cc: + <-- blank line required here +Reported-by: +Suggested-by: +Signed-off-by: +Acked-by: +Reviewed-by: +Tested-by: +``` + +**Tag format**: `Tag-name: Full Name <email@domain.com>` + +--- + +## C Coding Style + +### Line Length + +| Context | Limit | +|---------|-------| +| Source code | **100 characters** | +| Commit body | **75 characters** | + +### General Formatting + +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) +- **No trailing whitespace** on lines or at end of files +- Files must end with a new line +- Code style should be consistent within each file + + +### Comments + +```c +/* Most single-line comments look like this. */ + +/* + * VERY important single-line comments look like this. + */ + +/* + * Multi-line comments look like this. Make them real sentences. Fill + * them so they look like real paragraphs. + */ +``` + +### Header File Organization + +Include order (each group separated by blank line): +1. System/libc includes +2. DPDK EAL includes +3. DPDK misc library includes +4. Application-specific includes + +```c +#include <stdio.h> +#include <stdlib.h> + +#include <rte_eal.h> + +#include <rte_ring.h> +#include <rte_mempool.h> + +#include "application.h" +``` + +### Header Guards + +```c +#ifndef _FILE_H_ +#define _FILE_H_ + +/* Code */ + +#endif /* _FILE_H_ */ +``` + +### Naming Conventions + +- **All external symbols** must have `RTE_` or `rte_` prefix +- **Macros**: ALL_UPPERCASE with `RTE_` prefix +- **Functions**: lowercase with underscores only (no CamelCase) +- **Variables**: lowercase with underscores only +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix + +**Exception**: Driver base directories (`drivers/*/base/`) may use different naming conventions when sharing code across platforms or with upstream vendor code. + +#### Symbol Naming for Static Linking + +Drivers and libraries must not expose global variables that could clash when statically linked with other DPDK components or applications. Use consistent and unique prefixes for all exported symbols to avoid namespace collisions. + +**Good practice**: Use a driver-specific or library-specific prefix for all global variables: + +```c +/* Good - virtio driver uses consistent "virtio_" prefix */ +const struct virtio_ops virtio_legacy_ops = { + .read = virtio_legacy_read, + .write = virtio_legacy_write, + .configure = virtio_legacy_configure, +}; + +const struct virtio_ops virtio_modern_ops = { + .read = virtio_modern_read, + .write = virtio_modern_write, + .configure = virtio_modern_configure, +}; + +/* Good - mlx5 driver uses consistent "mlx5_" prefix */ +struct mlx5_flow_driver_ops mlx5_flow_dv_ops; +``` + +**Bad practice**: Generic names that may clash: + +```c +/* Bad - "ops" is too generic, will clash with other drivers */ +const struct virtio_ops ops = { ... }; + +/* Bad - "legacy_ops" could clash with other legacy implementations */ +const struct virtio_ops legacy_ops = { ... }; + +/* Bad - "driver_config" is not unique */ +struct driver_config config; +``` + +**Guidelines**: +- Prefix all global variables with the driver or library name (e.g., `virtio_`, `mlx5_`, `ixgbe_`) +- Prefix all global functions similarly unless they use the `rte_` namespace +- Internal static variables do not require prefixes as they have file scope +- Consider using the `RTE_` or `rte_` prefix only for symbols that are part of the public DPDK API + +#### Prohibited Terminology + +Do not use: +- `master/slave` → Use: primary/secondary, controller/worker, leader/follower +- `blacklist/whitelist` → Use: denylist/allowlist, blocklist/passlist + +### Comparisons and Boolean Logic + +```c +/* Pointers - compare explicitly with NULL */ +if (p == NULL) /* Good */ +if (p != NULL) /* Good */ +if (!p) /* Bad - don't use ! on pointers */ + +/* Integers - compare explicitly with zero */ +if (a == 0) /* Good */ +if (a != 0) /* Good */ +if (!a) /* Bad - don't use ! on integers */ + +/* Characters - compare with character constant */ +if (*p == '\0') /* Good */ + +/* Booleans - direct test is acceptable */ +if (flag) /* Good for actual bool types */ +if (!flag) /* Good for actual bool types */ +``` + +### Boolean Usage + +- Using `bool` type is allowed +- Prefer `bool` over `int` when a variable or field is only used as a boolean +- For structure fields, consider if the size/alignment impact is acceptable + +### Indentation and Braces + +```c +/* Control statements - no braces for single statements */ +if (val != NULL) + val = realloc(val, newsize); + +/* Braces on same line as else */ +if (test) + stmt; +else if (bar) { + stmt; + stmt; +} else + stmt; + +/* Switch statements - don't indent case */ +switch (ch) { +case 'a': + aflag = 1; + /* FALLTHROUGH */ +case 'b': + bflag = 1; + break; +default: + usage(); +} + +/* Long conditions - double indent continuation */ +if (really_long_variable_name_1 == really_long_variable_name_2 && + really_long_variable_name_3 == really_long_variable_name_4) + stmt; +``` + +### Variable Declarations + +- Prefer declaring variables inside the basic block where they are used +- Variables may be declared either at the start of the block, or at point of first use (C99 style) +- Both declaration styles are acceptable; consistency within a function is preferred +- Initialize variables only when a meaningful value exists at declaration time +- Use C99 designated initializers for structures + +```c +/* Good - declaration at start of block */ +int ret; +ret = some_function(); + +/* Also good - declaration at point of use (C99 style) */ +for (int i = 0; i < count; i++) + process(i); + +/* Good - declaration in inner block where variable is used */ +if (condition) { + int local_val = compute(); + use(local_val); +} + +/* Bad - unnecessary initialization defeats compiler warnings */ +int ret = 0; +ret = some_function(); /* Compiler won't warn if assignment removed */ +``` + +### Function Format + +- Return type on its own line +- Opening brace on its own line +- Place an empty line between declarations and statements + +```c +static char * +function(int a1, int b1) +{ + char *p; + + p = do_something(a1, b1); + return p; +} +``` + +--- + +## Unnecessary Code Patterns + +The following patterns add unnecessary code, hide bugs, or reduce performance. Avoid them. + +### Unnecessary Variable Initialization + +Do not initialize variables that will be assigned before use. This defeats the compiler's uninitialized variable warnings, hiding potential bugs. + +```c +/* Bad - initialization defeats -Wuninitialized */ +int ret = 0; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - compiler will warn if any path misses assignment */ +int ret; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - meaningful initial value */ +int count = 0; +for (i = 0; i < n; i++) + if (test(i)) + count++; +``` + +### Unnecessary Casts of void * + +In C, `void *` converts implicitly to any pointer type. Casting the result of `malloc()`, `calloc()`, `rte_malloc()`, or similar functions is unnecessary and can hide the error of a missing `#include <stdlib.h>`. + +```c +/* Bad - unnecessary cast */ +struct foo *p = (struct foo *)malloc(sizeof(*p)); +struct bar *q = (struct bar *)rte_malloc(NULL, sizeof(*q), 0); + +/* Good - no cast needed in C */ +struct foo *p = malloc(sizeof(*p)); +struct bar *q = rte_malloc(NULL, sizeof(*q), 0); +``` + +Note: Casts are required in C++ but DPDK is a C project. + +### Zero-Length Arrays vs Variable-Length Arrays + +Zero-length arrays (`int arr[0]`) are a GCC extension. Use C99 flexible array members instead. + +```c +/* Bad - GCC extension */ +struct msg { + int len; + char data[0]; +}; + +/* Good - C99 flexible array member */ +struct msg { + int len; + char data[]; +}; +``` + +### Unnecessary NULL Checks Before free() + +Functions like `free()`, `rte_free()`, and similar deallocation functions accept NULL pointers safely. Do not add redundant NULL checks. + +```c +/* Bad - unnecessary check */ +if (ptr != NULL) + free(ptr); + +if (rte_ptr != NULL) + rte_free(rte_ptr); + +/* Good - free handles NULL */ +free(ptr); +rte_free(rte_ptr); +``` + +### memset Before free() + +Do not call `memset()` to zero memory before freeing it. The compiler may optimize away the `memset()` as a dead store. For security-sensitive data, use `rte_free_sensitive()` which ensures memory is cleared. + +```c +/* Bad - compiler may eliminate memset */ +memset(secret_key, 0, sizeof(secret_key)); +free(secret_key); + +/* Good - for non-sensitive data, just free */ +free(ptr); + +/* Good - for sensitive data, use secure free */ +rte_free_sensitive(secret_key); +``` + +### Appropriate Use of rte_malloc() + +`rte_malloc()` allocates from hugepage memory. Use it only when required: + +- Memory that will be accessed by DMA (NIC descriptors, packet buffers) +- Memory shared between primary and secondary DPDK processes +- Memory requiring specific NUMA node placement + +For general allocations, use standard `malloc()` which is faster and does not consume limited hugepage resources. + +```c +/* Bad - rte_malloc for ordinary data structure */ +struct config *cfg = rte_malloc(NULL, sizeof(*cfg), 0); + +/* Good - standard malloc for control structures */ +struct config *cfg = malloc(sizeof(*cfg)); + +/* Good - rte_malloc for DMA-accessible memory */ +struct rte_mbuf *mbufs = rte_malloc(NULL, n * sizeof(*mbufs), RTE_CACHE_LINE_SIZE); +``` + +### Appropriate Use of rte_memcpy() + +`rte_memcpy()` is optimized for bulk data transfer in the fast path. For general use, standard `memcpy()` is preferred because: + +- Modern compilers optimize `memcpy()` effectively +- `memcpy()` includes bounds checking with `_FORTIFY_SOURCE` +- `memcpy()` handles small fixed-size copies efficiently + +```c +/* Bad - rte_memcpy in control path */ +rte_memcpy(&config, &default_config, sizeof(config)); + +/* Good - standard memcpy for control path */ +memcpy(&config, &default_config, sizeof(config)); + +/* Good - rte_memcpy for packet data in fast path */ +rte_memcpy(rte_pktmbuf_mtod(m, void *), payload, len); +``` + +--- + +## Forbidden Tokens + +### Functions + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `rte_panic()` | Return error codes | lib/, drivers/ | +| `rte_exit()` | Return error codes | lib/, drivers/ | +| `perror()` | `RTE_LOG()` with `strerror(errno)` | lib/, drivers/ (allowed in examples/, app/test/) | +| `printf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | +| `fprintf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | + +### Atomics and Memory Barriers + +| Forbidden | Preferred | +|-----------|-----------| +| `rte_atomic16/32/64_xxx()` | C11 atomics via `rte_atomic_xxx()` | +| `rte_smp_mb()` | `rte_atomic_thread_fence()` | +| `rte_smp_rmb()` | `rte_atomic_thread_fence()` | +| `rte_smp_wmb()` | `rte_atomic_thread_fence()` | +| `__sync_xxx()` | `rte_atomic_xxx()` | +| `__atomic_xxx()` | `rte_atomic_xxx()` | +| `__ATOMIC_RELAXED` etc. | `rte_memory_order_xxx` | +| `__rte_atomic_thread_fence()` | `rte_atomic_thread_fence()` | + +### Threading + +| Forbidden | Preferred | +|-----------|-----------| +| `pthread_create()` | `rte_thread_create()` | +| `pthread_join()` | `rte_thread_join()` | +| `pthread_detach()` | EAL thread functions | +| `pthread_setaffinity_np()` | `rte_thread_set_affinity()` | +| `rte_thread_set_name()` | `rte_thread_set_prefixed_name()` | +| `rte_thread_create_control()` | `rte_thread_create_internal_control()` | + +### Compiler Built-ins and Attributes + +| Forbidden | Preferred | Notes | +|-----------|-----------|-------| +| `__attribute__` | RTE macros in `rte_common.h` | Except in `lib/eal/include/rte_common.h` | +| `__alignof__` | C11 `alignof` | | +| `__typeof__` | `typeof` | | +| `__builtin_*` | EAL macros | Except in `lib/eal/` and `drivers/*/base/` | +| `__reserved` | Different name | Reserved in Windows headers | +| `#pragma` / `_Pragma` | Avoid | Except in `rte_common.h` | + +### Format Specifiers + +| Forbidden | Preferred | +|-----------|-----------| +| `%lld`, `%llu`, `%llx` | `%PRId64`, `%PRIu64`, `%PRIx64` | + +### Headers and Build + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `#include <linux/pci_regs.h>` | `#include <rte_pci.h>` | | +| `install_headers()` | Meson `headers` variable | meson.build | +| `-DALLOW_EXPERIMENTAL_API` | Not in lib/drivers/app | Build flags | +| `allow_experimental_apis` | Not in lib/drivers/app | Meson | +| `#undef XXX` | `// XXX is not set` | config/rte_config.h | +| Driver headers (`*_driver.h`, `*_pmd.h`) | Public API headers | app/, examples/ | + +### Testing + +| Forbidden | Preferred | +|-----------|-----------| +| `REGISTER_TEST_COMMAND` | `REGISTER_<suite_name>_TEST` | + +### Documentation + +| Forbidden | Preferred | +|-----------|-----------| +| `http://...dpdk.org` | `https://...dpdk.org` | +| `//doc.dpdk.org/guides/...` | `:ref:` or `:doc:` Sphinx references | +| `:: file.svg` | `:: file.*` (wildcard extension) | + +--- + +## API Tag Requirements + +### `__rte_experimental` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_experimental +int +rte_new_feature(void); + +/* Wrong - not alone on line */ +__rte_experimental int rte_new_feature(void); + +/* Wrong - in .c file */ +``` + +### `__rte_internal` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_internal +int +internal_function(void); +``` + +### Alignment Attributes + +`__rte_aligned`, `__rte_cache_aligned`, `__rte_cache_min_aligned` may only be used with `struct` or `union` types: + +```c +/* Correct */ +struct __rte_cache_aligned my_struct { + /* ... */ +}; + +/* Wrong */ +int __rte_cache_aligned my_variable; +``` + +### Packed Attributes + +- `__rte_packed_begin` must follow `struct`, `union`, or alignment attributes +- `__rte_packed_begin` and `__rte_packed_end` must be used in pairs +- Cannot use `__rte_packed_begin` with `enum` + +```c +/* Correct */ +struct __rte_packed_begin my_packed_struct { + /* ... */ +} __rte_packed_end; + +/* Wrong - with enum */ +enum __rte_packed_begin my_enum { + /* ... */ +}; +``` + +--- + +## Code Quality Requirements + +### Compilation + +- Each commit must compile independently (for `git bisect`) +- No forward dependencies within a patchset +- Test with multiple targets, compilers, and options +- Use `devtools/test-meson-builds.sh` + +### Testing + +- Add tests to `app/test` unit test framework +- New API functions must be used in `/app` test directory +- New device APIs require at least one driver implementation + +#### Functional Test Infrastructure + +Standalone functional tests should use the `TEST_ASSERT` macros and `unit_test_suite_runner` infrastructure for consistency and proper integration with the DPDK test framework. + +```c +#include <rte_test.h> + +static int +test_feature_basic(void) +{ + int ret; + + ret = rte_feature_init(); + TEST_ASSERT_SUCCESS(ret, "Failed to initialize feature"); + + ret = rte_feature_operation(); + TEST_ASSERT_EQUAL(ret, 0, "Operation returned unexpected value"); + + TEST_ASSERT_NOT_NULL(rte_feature_get_ptr(), + "Feature pointer should not be NULL"); + + return TEST_SUCCESS; +} + +static struct unit_test_suite feature_testsuite = { + .suite_name = "feature_autotest", + .setup = test_feature_setup, + .teardown = test_feature_teardown, + .unit_test_cases = { + TEST_CASE(test_feature_basic), + TEST_CASE(test_feature_advanced), + TEST_CASES_END() + } +}; + +static int +test_feature(void) +{ + return unit_test_suite_runner(&feature_testsuite); +} + +REGISTER_FAST_TEST(feature_autotest, true, true, test_feature); +``` + +Common `TEST_ASSERT` macros: +- `TEST_ASSERT(cond, msg, ...)` - Assert condition is true +- `TEST_ASSERT_SUCCESS(val, msg, ...)` - Assert value equals 0 +- `TEST_ASSERT_FAIL(val, msg, ...)` - Assert value is non-zero +- `TEST_ASSERT_EQUAL(a, b, msg, ...)` - Assert two values are equal +- `TEST_ASSERT_NOT_EQUAL(a, b, msg, ...)` - Assert two values differ +- `TEST_ASSERT_NULL(val, msg, ...)` - Assert value is NULL +- `TEST_ASSERT_NOT_NULL(val, msg, ...)` - Assert value is not NULL + +### Documentation + +- Add Doxygen comments for public APIs +- Update release notes in `doc/guides/rel_notes/` for important changes +- Code and documentation must be updated atomically in same patch +- Only update the **current release** notes file +- Documentation must match the code +- PMD features must match the features matrix in `doc/guides/nics/features/` +- Documentation must match device operations (see `doc/guides/nics/features.rst` for the mapping between features, `eth_dev_ops`, and related APIs) + +### API and Driver Changes + +- New APIs must be marked as `__rte_experimental` +- New APIs must have hooks in `app/testpmd` and tests in the functional test suite +- Changes to existing APIs require release notes +- New drivers or subsystems must have release notes + +### ABI Compatibility + +- New external functions must be exported properly +- Follow ABI policy and versioning guidelines +- Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable + +--- + +## Patch Validation Checklist + +### Commit Message + +- [ ] Subject line ≤60 characters +- [ ] Subject is lowercase (except acronyms from words-case.txt) +- [ ] Correct component prefix (e.g., `net/ixgbe:` not `ixgbe:`) +- [ ] No `lib/` prefix for libraries +- [ ] Imperative mood, no trailing period +- [ ] No tabs, leading/trailing spaces, or punctuation marks +- [ ] Body wrapped at 75 characters +- [ ] Body does not start with "It" +- [ ] `Signed-off-by:` present with real name and valid email +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA and exact subject +- [ ] `Coverity issue:` tag present if Coverity mentioned +- [ ] `Bugzilla ID:` tag present if Bugzilla mentioned +- [ ] `Cc: stable@dpdk.org` for stable backport candidates +- [ ] Tags in correct order with blank line separator + +### License + +- [ ] SPDX identifier on first line (or second for scripts) +- [ ] Copyright line follows SPDX +- [ ] Blank line after copyright before code +- [ ] Appropriate license for file type + +### Code Style + +- [ ] Lines ≤100 characters +- [ ] Hard tabs for indentation, spaces for alignment +- [ ] No trailing whitespace +- [ ] Proper include order +- [ ] Header guards present +- [ ] `rte_`/`RTE_` prefix on external symbols +- [ ] Driver/library global variables use unique prefixes (e.g., `virtio_`, `mlx5_`) +- [ ] No prohibited terminology +- [ ] Proper brace style +- [ ] Function return type on own line +- [ ] Explicit comparisons: `== NULL`, `== 0`, `!= NULL`, `!= 0` +- [ ] No forbidden tokens (see table above) +- [ ] No unnecessary code patterns (see section above) + +### API Tags + +- [ ] `__rte_experimental` alone on line, only in headers +- [ ] `__rte_internal` alone on line, only in headers +- [ ] Alignment attributes only on struct/union +- [ ] Packed attributes properly paired + +### Structure + +- [ ] Each commit compiles independently +- [ ] Code and docs updated together +- [ ] Documentation matches code behavior +- [ ] PMD features match `doc/guides/nics/features/` matrix +- [ ] Device operations match documentation (per `features.rst` mappings) +- [ ] Tests added/updated as needed +- [ ] Functional tests use TEST_ASSERT macros and unit_test_suite_runner +- [ ] New APIs marked as `__rte_experimental` +- [ ] New APIs have testpmd hooks and functional tests +- [ ] Current release notes updated for significant changes +- [ ] Release notes updated for API changes +- [ ] Release notes updated for new drivers or subsystems + +--- + +## Meson Build Files + +### Style Requirements + +- 4-space indentation (no tabs) +- Line continuations double-indented +- Lists alphabetically ordered +- Short lists (≤3 items): single line, no trailing comma +- Long lists: one item per line, trailing comma on last item + +```python +# Short list +sources = files('file1.c', 'file2.c') + +# Long list +headers = files( + 'header1.h', + 'header2.h', + 'header3.h', +) +``` + +--- + +## Python Code + +- Must comply with formatting standards +- Use **`black`** for code formatting validation +- Line length acceptable up to 100 characters + +--- + +## Validation Tools + +Run these before submitting: + +```bash +# Check commit messages +devtools/check-git-log.sh -n1 + +# Check patch format and forbidden tokens +devtools/checkpatches.sh -n1 + +# Check maintainers coverage +devtools/check-maintainers.sh + +# Build validation +devtools/test-meson-builds.sh + +# Find maintainers for your patch +devtools/get-maintainer.sh <patch-file> +``` + +--- + +## Severity Levels for AI Review + +**Error** (must fix): +- Missing or malformed SPDX license +- Missing Signed-off-by +- Subject line over 60 characters +- Body lines over 75 characters +- Wrong tag order or format +- Missing required tags (Fixes, Coverity issue, Bugzilla ID) +- Forbidden tokens in code +- `__rte_experimental`/`__rte_internal` in .c files or not alone on line +- Compilation failures +- ABI breaks without proper versioning + +**Warning** (should fix): +- Subject line style issues (case, punctuation) +- Wrong component prefix +- Missing Cc: stable@dpdk.org for fixes +- Documentation gaps +- Documentation does not match code behavior +- PMD features missing from `doc/guides/nics/features/` matrix +- Device operations not documented per `features.rst` mappings +- Missing tests +- Functional tests not using TEST_ASSERT macros or unit_test_suite_runner +- New API not marked as `__rte_experimental` +- New API without testpmd hooks or functional tests +- API changes without release notes +- New drivers or subsystems without release notes +- Implicit comparisons (`!ptr` instead of `ptr == NULL`) +- Unnecessary variable initialization +- Unnecessary casts of `void *` +- Unnecessary NULL checks before free +- Inappropriate use of `rte_malloc()` or `rte_memcpy()` +- Use of `perror()`, `printf()`, `fprintf()` in libraries or drivers (allowed in examples and test code) +- Driver/library global variables without unique prefixes (static linking clash risk) + +**Info** (consider): +- Minor style preferences +- Optimization suggestions +- Alternative approaches -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v5 2/4] devtools: add multi-provider AI patch review script 2026-01-23 0:42 ` [PATCH v5 0/4] devtools: add " Stephen Hemminger 2026-01-23 0:42 ` [PATCH v5 1/4] doc: add AGENTS.md for " Stephen Hemminger @ 2026-01-23 0:42 ` Stephen Hemminger 2026-01-23 0:42 ` [PATCH v5 3/4] devtools: add script to compare AI reviews across providers Stephen Hemminger ` (2 subsequent siblings) 4 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-23 0:42 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This is an AI generated script to review DPDK patches against the AGENTS.md coding guidelines using AI language models. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini The script reads a patch file and the AGENTS.md guidelines, then submits them to the selected AI provider for review. Results are organized by severity level (Error, Warning, Info) as defined in the guidelines. Features: - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Verbose mode shows token usage statistics - Uses temporary files for API requests to handle large patches - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/analyze-patch.sh 0001-net-ixgbe-fix-something.patch ./devtools/analyze-patch.sh -p xai my-patch.patch ./devtools/analyze-patch.sh -l # list providers Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/analyze-patch.sh | 534 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 534 insertions(+) create mode 100755 devtools/analyze-patch.sh diff --git a/devtools/analyze-patch.sh b/devtools/analyze-patch.sh new file mode 100755 index 0000000000..eec28ecebb --- /dev/null +++ b/devtools/analyze-patch.sh @@ -0,0 +1,534 @@ +#!/bin/bash +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +# Analyze DPDK patches using various AI providers +# Supported providers: anthropic (Claude), openai (ChatGPT), xai (Grok), google (Gemini) + +set -e + +# Default paths and settings +AGENTS_FILE="AGENTS.md" +MAX_TOKENS=4096 +PROVIDER="anthropic" + +# Temporary file for API requests (cleaned up on exit) +REQUEST_FILE="" +cleanup() { + [[ -n "$REQUEST_FILE" && -f "$REQUEST_FILE" ]] && rm -f "$REQUEST_FILE" +} +trap cleanup EXIT + +# Default models per provider +declare -A DEFAULT_MODELS=( + ["anthropic"]="claude-sonnet-4-5-20250929" + ["openai"]="gpt-4o" + ["xai"]="grok-3" + ["google"]="gemini-2.0-flash" +) + +# API endpoints per provider +declare -A API_ENDPOINTS=( + ["anthropic"]="https://api.anthropic.com/v1/messages" + ["openai"]="https://api.openai.com/v1/chat/completions" + ["xai"]="https://api.x.ai/v1/chat/completions" + ["google"]="https://generativelanguage.googleapis.com/v1beta/models" +) + +# Environment variable names for API keys +declare -A API_KEY_VARS=( + ["anthropic"]="ANTHROPIC_API_KEY" + ["openai"]="OPENAI_API_KEY" + ["xai"]="XAI_API_KEY" + ["google"]="GOOGLE_API_KEY" +) + +usage() { + cat <<EOF +Usage: $(basename "$0") [OPTIONS] <patch-file> + +Analyze a DPDK patch file against AGENTS.md review guidelines using AI. + +Options: + -p, --provider NAME AI provider: anthropic, openai, xai, google + (default: anthropic) + -a, --agents FILE Path to AGENTS.md file (default: AGENTS.md) + -m, --model MODEL Model to use (provider-specific, see defaults below) + -t, --tokens NUM Max tokens for response (default: $MAX_TOKENS) + -v, --verbose Show API request details + -l, --list-providers List available providers and their defaults + -h, --help Show this help message + +Providers and Default Models: + anthropic Claude (claude-sonnet-4-5-20250929) + openai ChatGPT (gpt-4o) + xai Grok (grok-3) + google Gemini (gemini-2.0-flash) + +Environment Variables (set the one for your chosen provider): + ANTHROPIC_API_KEY For Anthropic Claude + OPENAI_API_KEY For OpenAI ChatGPT + XAI_API_KEY For xAI Grok + GOOGLE_API_KEY For Google Gemini + +Examples: + $(basename "$0") 0001-net-ixgbe-fix-something.patch + $(basename "$0") -p openai my-patch.patch + $(basename "$0") -p xai -m grok-3 my-patch.patch + $(basename "$0") -p google --verbose *.patch +EOF + exit "${1:-0}" +} + +list_providers() { + echo "Available AI Providers:" + echo "" + printf "%-12s %-35s %s\n" "Provider" "Default Model" "API Key Variable" + printf "%-12s %-35s %s\n" "--------" "-------------" "----------------" + for provider in anthropic openai xai google; do + printf "%-12s %-35s %s\n" "$provider" "${DEFAULT_MODELS[$provider]}" "${API_KEY_VARS[$provider]}" + done + exit 0 +} + +error() { + echo "Error: $1" >&2 + exit 1 +} + +# Read file contents, escaping for JSON +read_json_escaped() { + local file="$1" + python3 -c " +import json +with open('$file', 'r') as f: + print(json.dumps(f.read())) +" +} + +# Build request for Anthropic Claude API +build_anthropic_request() { + local model="$1" + local max_tokens="$2" + local agents_content="$3" + local patch_content="$4" + local patch_name="$5" + + cat <<EOF +{ + "model": "$model", + "max_tokens": $max_tokens, + "system": [ + { + "type": "text", + "text": "You are an expert DPDK code reviewer. Analyze patches for compliance with DPDK coding standards and contribution guidelines. Provide clear, actionable feedback organized by severity (Error, Warning, Info) as defined in the guidelines." + }, + { + "type": "text", + "text": $agents_content, + "cache_control": {"type": "ephemeral"} + } + ], + "messages": [ + { + "role": "user", + "content": "Please review the following DPDK patch file '$patch_name' against the AGENTS.md guidelines. Check for:\n\n1. Commit message format (subject line, body, tags)\n2. License/copyright compliance\n3. C coding style issues\n4. API and documentation requirements\n5. Any other guideline violations\n\nProvide feedback organized by severity level.\n\n--- PATCH CONTENT ---\n" + }, + { + "role": "user", + "content": $patch_content + } + ] +} +EOF +} + +# Build request for OpenAI-compatible APIs (OpenAI, xAI) +build_openai_request() { + local model="$1" + local max_tokens="$2" + local agents_content="$3" + local patch_content="$4" + local patch_name="$5" + + cat <<EOF +{ + "model": "$model", + "max_tokens": $max_tokens, + "messages": [ + { + "role": "system", + "content": "You are an expert DPDK code reviewer. Analyze patches for compliance with DPDK coding standards and contribution guidelines. Provide clear, actionable feedback organized by severity (Error, Warning, Info) as defined in the guidelines.\n\nHere are the DPDK review guidelines:\n\n" + }, + { + "role": "system", + "content": $agents_content + }, + { + "role": "user", + "content": "Please review the following DPDK patch file '$patch_name' against the AGENTS.md guidelines. Check for:\n\n1. Commit message format (subject line, body, tags)\n2. License/copyright compliance\n3. C coding style issues\n4. API and documentation requirements\n5. Any other guideline violations\n\nProvide feedback organized by severity level.\n\n--- PATCH CONTENT ---\n" + }, + { + "role": "user", + "content": $patch_content + } + ] +} +EOF +} + +# Build request for Google Gemini API +build_google_request() { + local agents_content="$1" + local patch_content="$2" + local patch_name="$3" + + cat <<EOF +{ + "contents": [ + { + "role": "user", + "parts": [ + { + "text": "You are an expert DPDK code reviewer. Analyze patches for compliance with DPDK coding standards and contribution guidelines. Provide clear, actionable feedback organized by severity (Error, Warning, Info) as defined in the guidelines.\n\nHere are the DPDK review guidelines:\n\n" + } + ] + }, + { + "role": "user", + "parts": [ + { + "text": $agents_content + } + ] + }, + { + "role": "user", + "parts": [ + { + "text": "Please review the following DPDK patch file '$patch_name' against the AGENTS.md guidelines. Check for:\n\n1. Commit message format (subject line, body, tags)\n2. License/copyright compliance\n3. C coding style issues\n4. API and documentation requirements\n5. Any other guideline violations\n\nProvide feedback organized by severity level.\n\n--- PATCH CONTENT ---\n" + } + ] + }, + { + "role": "user", + "parts": [ + { + "text": $patch_content + } + ] + } + ], + "generationConfig": { + "maxOutputTokens": $MAX_TOKENS + } +} +EOF +} + +# Make API request for Anthropic +call_anthropic_api() { + local request="$1" + local api_key="$2" + + REQUEST_FILE=$(mktemp) + echo "$request" > "$REQUEST_FILE" + + curl -s "${API_ENDPOINTS[anthropic]}" \ + -H "content-type: application/json" \ + -H "x-api-key: $api_key" \ + -H "anthropic-version: 2023-06-01" \ + -d "@$REQUEST_FILE" +} + +# Make API request for OpenAI +call_openai_api() { + local request="$1" + local api_key="$2" + + REQUEST_FILE=$(mktemp) + echo "$request" > "$REQUEST_FILE" + + curl -s "${API_ENDPOINTS[openai]}" \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $api_key" \ + -d "@$REQUEST_FILE" +} + +# Make API request for xAI (Grok) +call_xai_api() { + local request="$1" + local api_key="$2" + + REQUEST_FILE=$(mktemp) + echo "$request" > "$REQUEST_FILE" + + curl -s "${API_ENDPOINTS[xai]}" \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $api_key" \ + -d "@$REQUEST_FILE" +} + +# Make API request for Google Gemini +call_google_api() { + local request="$1" + local api_key="$2" + local model="$3" + + REQUEST_FILE=$(mktemp) + echo "$request" > "$REQUEST_FILE" + + curl -s "${API_ENDPOINTS[google]}/${model}:generateContent?key=${api_key}" \ + -H "Content-Type: application/json" \ + -d "@$REQUEST_FILE" +} + +# Extract response text from Anthropic response +extract_anthropic_response() { + python3 -c " +import json +import sys +data = json.load(sys.stdin) +if 'error' in data: + err = data['error'] + print(f\"Error type: {err.get('type', 'unknown')}\", file=sys.stderr) + print(f\"Error message: {err.get('message', 'unknown')}\", file=sys.stderr) + sys.exit(1) +for block in data.get('content', []): + if block.get('type') == 'text': + print(block.get('text', '')) +" +} + +# Extract response text from OpenAI/xAI response +extract_openai_response() { + python3 -c " +import json +import sys +data = json.load(sys.stdin) +if 'error' in data: + err = data['error'] + print(f\"Error type: {err.get('type', 'unknown')}\", file=sys.stderr) + print(f\"Error message: {err.get('message', 'unknown')}\", file=sys.stderr) + print(f\"Error code: {err.get('code', 'unknown')}\", file=sys.stderr) + sys.exit(1) +choices = data.get('choices', []) +if choices: + message = choices[0].get('message', {}) + print(message.get('content', '')) +else: + print('No response content received', file=sys.stderr) +" +} + +# Extract response text from Google Gemini response +extract_google_response() { + python3 -c " +import json +import sys +data = json.load(sys.stdin) +if 'error' in data: + err = data['error'] + print(f\"Error code: {err.get('code', 'unknown')}\", file=sys.stderr) + print(f\"Error message: {err.get('message', 'unknown')}\", file=sys.stderr) + print(f\"Error status: {err.get('status', 'unknown')}\", file=sys.stderr) + sys.exit(1) +candidates = data.get('candidates', []) +if candidates: + content = candidates[0].get('content', {}) + parts = content.get('parts', []) + for part in parts: + print(part.get('text', '')) +else: + print('No response candidates received', file=sys.stderr) +" +} + +# Show verbose info for Anthropic +show_anthropic_verbose() { + python3 -c " +import json +import sys +data = json.load(sys.stdin) +usage = data.get('usage', {}) +print(f\"Input tokens: {usage.get('input_tokens', 'N/A')}\") +print(f\"Cache creation: {usage.get('cache_creation_input_tokens', 0)}\") +print(f\"Cache read: {usage.get('cache_read_input_tokens', 0)}\") +print(f\"Output tokens: {usage.get('output_tokens', 'N/A')}\") +" +} + +# Show verbose info for OpenAI/xAI +show_openai_verbose() { + python3 -c " +import json +import sys +data = json.load(sys.stdin) +usage = data.get('usage', {}) +print(f\"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}\") +print(f\"Completion tokens: {usage.get('completion_tokens', 'N/A')}\") +print(f\"Total tokens: {usage.get('total_tokens', 'N/A')}\") +" +} + +# Show verbose info for Google +show_google_verbose() { + python3 -c " +import json +import sys +data = json.load(sys.stdin) +usage = data.get('usageMetadata', {}) +print(f\"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}\") +print(f\"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}\") +print(f\"Total tokens: {usage.get('totalTokenCount', 'N/A')}\") +" +} + +# Parse command line options +VERBOSE=0 +MODEL="" + +while [[ $# -gt 0 ]]; do + case "$1" in + -p|--provider) + PROVIDER="$2" + shift 2 + ;; + -a|--agents) + AGENTS_FILE="$2" + shift 2 + ;; + -m|--model) + MODEL="$2" + shift 2 + ;; + -t|--tokens) + MAX_TOKENS="$2" + shift 2 + ;; + -v|--verbose) + VERBOSE=1 + shift + ;; + -l|--list-providers) + list_providers + ;; + -h|--help) + usage 0 + ;; + -*) + error "Unknown option: $1" + ;; + *) + break + ;; + esac +done + +# Validate provider +if [[ ! -v "DEFAULT_MODELS[$PROVIDER]" ]]; then + error "Unknown provider: $PROVIDER. Use -l to list available providers." +fi + +# Set default model if not specified +if [[ -z "$MODEL" ]]; then + MODEL="${DEFAULT_MODELS[$PROVIDER]}" +fi + +# Check for required arguments +if [[ $# -lt 1 ]]; then + echo "Error: No patch file specified" >&2 + usage 1 +fi + +PATCH_FILE="$1" + +# Get the API key variable name and check it's set +API_KEY_VAR="${API_KEY_VARS[$PROVIDER]}" +API_KEY="${!API_KEY_VAR}" + +if [[ -z "$API_KEY" ]]; then + error "$API_KEY_VAR environment variable not set" +fi + +# Validate files exist +if [[ ! -f "$AGENTS_FILE" ]]; then + error "AGENTS.md not found: $AGENTS_FILE" +fi + +if [[ ! -f "$PATCH_FILE" ]]; then + error "Patch file not found: $PATCH_FILE" +fi + +# Read file contents +AGENTS_CONTENT=$(read_json_escaped "$AGENTS_FILE") +PATCH_CONTENT=$(read_json_escaped "$PATCH_FILE") +PATCH_BASENAME=$(basename "$PATCH_FILE") + +if [[ $VERBOSE -eq 1 ]]; then + echo "=== Request ===" >&2 + echo "Provider: $PROVIDER" >&2 + echo "Model: $MODEL" >&2 + echo "AGENTS file: $AGENTS_FILE" >&2 + echo "Patch file: $PATCH_FILE" >&2 + echo "===============" >&2 +fi + +# Build and send request based on provider +case "$PROVIDER" in + anthropic) + REQUEST=$(build_anthropic_request "$MODEL" "$MAX_TOKENS" "$AGENTS_CONTENT" "$PATCH_CONTENT" "$PATCH_BASENAME") + RESPONSE=$(call_anthropic_api "$REQUEST" "$API_KEY") + + if [[ $VERBOSE -eq 1 ]]; then + echo "=== Token Usage ===" >&2 + echo "$RESPONSE" | show_anthropic_verbose >&2 + echo "===================" >&2 + fi + + echo "=== Patch Review: $PATCH_BASENAME (via Claude) ===" + echo "$RESPONSE" | extract_anthropic_response + ;; + + openai) + REQUEST=$(build_openai_request "$MODEL" "$MAX_TOKENS" "$AGENTS_CONTENT" "$PATCH_CONTENT" "$PATCH_BASENAME") + RESPONSE=$(call_openai_api "$REQUEST" "$API_KEY") + + if [[ $VERBOSE -eq 1 ]]; then + echo "=== Token Usage ===" >&2 + echo "$RESPONSE" | show_openai_verbose >&2 + echo "===================" >&2 + fi + + echo "=== Patch Review: $PATCH_BASENAME (via ChatGPT) ===" + echo "$RESPONSE" | extract_openai_response + ;; + + xai) + REQUEST=$(build_openai_request "$MODEL" "$MAX_TOKENS" "$AGENTS_CONTENT" "$PATCH_CONTENT" "$PATCH_BASENAME") + RESPONSE=$(call_xai_api "$REQUEST" "$API_KEY") + + if [[ $VERBOSE -eq 1 ]]; then + echo "=== Token Usage ===" >&2 + echo "$RESPONSE" | show_openai_verbose >&2 + echo "===================" >&2 + fi + + echo "=== Patch Review: $PATCH_BASENAME (via Grok) ===" + echo "$RESPONSE" | extract_openai_response + ;; + + google) + REQUEST=$(build_google_request "$AGENTS_CONTENT" "$PATCH_CONTENT" "$PATCH_BASENAME") + RESPONSE=$(call_google_api "$REQUEST" "$API_KEY" "$MODEL") + + if [[ $VERBOSE -eq 1 ]]; then + echo "=== Token Usage ===" >&2 + echo "$RESPONSE" | show_google_verbose >&2 + echo "===================" >&2 + fi + + echo "=== Patch Review: $PATCH_BASENAME (via Gemini) ===" + echo "$RESPONSE" | extract_google_response + ;; +esac -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v5 3/4] devtools: add script to compare AI reviews across providers 2026-01-23 0:42 ` [PATCH v5 0/4] devtools: add " Stephen Hemminger 2026-01-23 0:42 ` [PATCH v5 1/4] doc: add AGENTS.md for " Stephen Hemminger 2026-01-23 0:42 ` [PATCH v5 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger @ 2026-01-23 0:42 ` Stephen Hemminger 2026-01-23 0:42 ` [PATCH v5 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 2026-01-23 10:55 ` [PATCH v5 0/4] devtools: add AI-powered code review tools Marat Khalili 4 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-23 0:42 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add AI generated script to compare-reviews.sh script that runs patch reviews across multiple AI providers for comparison purposes. The script automatically detects which providers have API keys configured and runs analyze-patch.sh for each one. This allows users to compare review quality and feedback across different AI models. Features: - Auto-detects available providers based on environment variables - Optional provider selection via -p/--providers option - Saves individual reviews to separate files with -o/--output - Verbose mode passes through to underlying analyze-patch.sh Usage: ./devtools/compare-reviews.sh my-patch.patch ./devtools/compare-reviews.sh -p anthropic,xai my-patch.patch ./devtools/compare-reviews.sh -o ./reviews my-patch.patch Output files are named <patch>-<provider>.txt when using the output directory option. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/compare-reviews.sh | 153 ++++++++++++++++++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100755 devtools/compare-reviews.sh diff --git a/devtools/compare-reviews.sh b/devtools/compare-reviews.sh new file mode 100755 index 0000000000..41a05a66c5 --- /dev/null +++ b/devtools/compare-reviews.sh @@ -0,0 +1,153 @@ +#!/bin/bash +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +# Compare DPDK patch reviews across multiple AI providers +# Runs analyze-patch.sh with each available provider + +set -e + +SCRIPT_DIR="$(dirname "$(readlink -f "$0")")" +ANALYZE_SCRIPT="${SCRIPT_DIR}/analyze-patch.sh" +AGENTS_FILE="AGENTS.md" +OUTPUT_DIR="" +PROVIDERS="" + +usage() { + cat <<EOF +Usage: $(basename "$0") [OPTIONS] <patch-file> + +Compare DPDK patch reviews across multiple AI providers. + +Options: + -a, --agents FILE Path to AGENTS.md file (default: AGENTS.md) + -o, --output DIR Save individual reviews to directory + -p, --providers LIST Comma-separated list of providers to use + (default: all providers with API keys set) + -v, --verbose Show verbose output from each provider + -h, --help Show this help message + +Environment Variables: + Set API keys for providers you want to use: + ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY + +Examples: + $(basename "$0") my-patch.patch + $(basename "$0") -p anthropic,openai my-patch.patch + $(basename "$0") -o ./reviews my-patch.patch +EOF + exit "${1:-0}" +} + +error() { + echo "Error: $1" >&2 + exit 1 +} + +# Check which providers have API keys configured +get_available_providers() { + local available="" + + [[ -n "$ANTHROPIC_API_KEY" ]] && available="${available}anthropic," + [[ -n "$OPENAI_API_KEY" ]] && available="${available}openai," + [[ -n "$XAI_API_KEY" ]] && available="${available}xai," + [[ -n "$GOOGLE_API_KEY" ]] && available="${available}google," + + # Remove trailing comma + echo "${available%,}" +} + +# Parse command line options +VERBOSE="" + +while [[ $# -gt 0 ]]; do + case "$1" in + -a|--agents) + AGENTS_FILE="$2" + shift 2 + ;; + -o|--output) + OUTPUT_DIR="$2" + shift 2 + ;; + -p|--providers) + PROVIDERS="$2" + shift 2 + ;; + -v|--verbose) + VERBOSE="-v" + shift + ;; + -h|--help) + usage 0 + ;; + -*) + error "Unknown option: $1" + ;; + *) + break + ;; + esac +done + +# Check for required arguments +if [[ $# -lt 1 ]]; then + echo "Error: No patch file specified" >&2 + usage 1 +fi + +PATCH_FILE="$1" + +if [[ ! -f "$PATCH_FILE" ]]; then + error "Patch file not found: $PATCH_FILE" +fi + +if [[ ! -f "$ANALYZE_SCRIPT" ]]; then + error "analyze-patch.sh not found: $ANALYZE_SCRIPT" +fi + +# Get providers to use +if [[ -z "$PROVIDERS" ]]; then + PROVIDERS=$(get_available_providers) +fi + +if [[ -z "$PROVIDERS" ]]; then + error "No API keys configured. Set at least one of: ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY" +fi + +# Create output directory if specified +if [[ -n "$OUTPUT_DIR" ]]; then + mkdir -p "$OUTPUT_DIR" +fi + +PATCH_BASENAME=$(basename "$PATCH_FILE") +echo "Reviewing patch: $PATCH_BASENAME" +echo "Providers: $PROVIDERS" +echo "========================================" +echo "" + +# Run review for each provider +IFS=',' read -ra PROVIDER_LIST <<< "$PROVIDERS" +for provider in "${PROVIDER_LIST[@]}"; do + echo ">>> Running review with: $provider" + echo "" + + if [[ -n "$OUTPUT_DIR" ]]; then + OUTPUT_FILE="${OUTPUT_DIR}/${PATCH_BASENAME%.patch}-${provider}.txt" + "$ANALYZE_SCRIPT" -p "$provider" -a "$AGENTS_FILE" $VERBOSE "$PATCH_FILE" | tee "$OUTPUT_FILE" + echo "" + echo "Saved to: $OUTPUT_FILE" + else + "$ANALYZE_SCRIPT" -p "$provider" -a "$AGENTS_FILE" $VERBOSE "$PATCH_FILE" + fi + + echo "" + echo "========================================" + echo "" +done + +echo "Review comparison complete." + +if [[ -n "$OUTPUT_DIR" ]]; then + echo "All reviews saved to: $OUTPUT_DIR" +fi -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v5 4/4] devtools: add multi-provider AI documentation review script 2026-01-23 0:42 ` [PATCH v5 0/4] devtools: add " Stephen Hemminger ` (2 preceding siblings ...) 2026-01-23 0:42 ` [PATCH v5 3/4] devtools: add script to compare AI reviews across providers Stephen Hemminger @ 2026-01-23 0:42 ` Stephen Hemminger 2026-01-23 10:55 ` [PATCH v5 0/4] devtools: add AI-powered code review tools Marat Khalili 4 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-23 0:42 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add review-doc.sh script that reviews DPDK documentation files for spelling, grammar, technical correctness, and clarity using AI language models. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini The script produces two output files: - A unified diff with suggested changes - A commit message following DPDK standards The commit message prefix is automatically determined from the file path (e.g., doc/guides/prog_guide: for programmer's guide files). Features: - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Configurable output directory via -o/--output option - Verbose mode shows token usage statistics - Uses temporary files for API requests to handle large documents - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/review-doc.sh doc/guides/prog_guide/mempool_lib.rst ./devtools/review-doc.sh -p xai doc/guides/nics/ixgbe.rst git apply mempool_lib.diff && git commit -sF mempool_lib.msg Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/review-doc.sh | 734 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 734 insertions(+) create mode 100755 devtools/review-doc.sh diff --git a/devtools/review-doc.sh b/devtools/review-doc.sh new file mode 100755 index 0000000000..c3b7cefca6 --- /dev/null +++ b/devtools/review-doc.sh @@ -0,0 +1,734 @@ +#!/bin/bash +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +# Review DPDK documentation files using AI providers with prompt caching +# Outputs a diff file and commit message compliant with DPDK standards +# Supported providers: anthropic (Claude), openai (ChatGPT), xai (Grok), google (Gemini) + +set -e + +# Default paths and settings +AGENTS_FILE="AGENTS.md" +MAX_TOKENS=8192 +OUTPUT_DIR="." +PROVIDER="anthropic" + +# Temporary file for API requests (cleaned up on exit) +REQUEST_FILE="" +cleanup() { + [[ -n "$REQUEST_FILE" && -f "$REQUEST_FILE" ]] && rm -f "$REQUEST_FILE" +} +trap cleanup EXIT + +# Default models per provider +declare -A DEFAULT_MODELS=( + ["anthropic"]="claude-sonnet-4-5-20250929" + ["openai"]="gpt-4o" + ["xai"]="grok-3" + ["google"]="gemini-2.0-flash" +) + +# API endpoints per provider +declare -A API_ENDPOINTS=( + ["anthropic"]="https://api.anthropic.com/v1/messages" + ["openai"]="https://api.openai.com/v1/chat/completions" + ["xai"]="https://api.x.ai/v1/chat/completions" + ["google"]="https://generativelanguage.googleapis.com/v1beta/models" +) + +# Environment variable names for API keys +declare -A API_KEY_VARS=( + ["anthropic"]="ANTHROPIC_API_KEY" + ["openai"]="OPENAI_API_KEY" + ["xai"]="XAI_API_KEY" + ["google"]="GOOGLE_API_KEY" +) + +usage() { + cat <<- END_OF_HELP + usage: $(basename "$0") [OPTIONS] <doc-file> + + Review a DPDK documentation file for spelling, grammar, correctness and clarity. + Produces a diff file and commit message compliant with DPDK standards. + + Options: + -p, --provider NAME AI provider: anthropic, openai, xai, google + (default: anthropic) + -a, --agents FILE Path to AGENTS.md file (default: AGENTS.md) + -m, --model MODEL Model to use (provider-specific, see defaults below) + -t, --tokens NUM Max tokens for response (default: $MAX_TOKENS) + -o, --output DIR Output directory for diff and commit msg (default: .) + -v, --verbose Show API request details + -l, --list-providers List available providers and their defaults + -h, --help Show this help message + + Providers and Default Models: + anthropic Claude (claude-sonnet-4-5-20250929) + openai ChatGPT (gpt-4o) + xai Grok (grok-3) + google Gemini (gemini-2.0-flash) + + Environment Variables (set the one for your chosen provider): + ANTHROPIC_API_KEY For Anthropic Claude + OPENAI_API_KEY For OpenAI ChatGPT + XAI_API_KEY For xAI Grok + GOOGLE_API_KEY For Google Gemini + + Output files: + <basename>.diff Unified diff with suggested changes + <basename>.msg Commit message (without Signed-off-by) + + Examples: + $(basename "$0") doc/guides/prog_guide/mempool_lib.rst + $(basename "$0") -p openai -o /tmp doc/guides/nics/ixgbe.rst + $(basename "$0") -p xai doc/guides/cryptodevs/qat.rst + git apply <basename>.diff && git commit -sF <basename>.msg + END_OF_HELP + exit "${1:-0}" +} + +list_providers() { + echo "Available AI Providers:" + echo "" + printf "%-12s %-35s %s\n" "Provider" "Default Model" "API Key Variable" + printf "%-12s %-35s %s\n" "--------" "-------------" "----------------" + for provider in anthropic openai xai google; do + printf "%-12s %-35s %s\n" "$provider" "${DEFAULT_MODELS[$provider]}" "${API_KEY_VARS[$provider]}" + done + exit 0 +} + +error() { + echo "Error: $1" >&2 + exit 1 +} + +# Read file contents, escaping for JSON +read_json_escaped() { + local file="$1" + python3 -c " +import json +with open('$file', 'r') as f: + print(json.dumps(f.read())) +" +} + +# Determine commit message prefix from file path +get_commit_prefix() { + local filepath="$1" + case "$filepath" in + doc/guides/prog_guide/*) + echo "doc/guides/prog_guide:" ;; + doc/guides/sample_app_ug/*) + echo "doc/guides/sample_app:" ;; + doc/guides/nics/*) + echo "doc/guides/nics:" ;; + doc/guides/cryptodevs/*) + echo "doc/guides/cryptodevs:" ;; + doc/guides/compressdevs/*) + echo "doc/guides/compressdevs:" ;; + doc/guides/eventdevs/*) + echo "doc/guides/eventdevs:" ;; + doc/guides/rawdevs/*) + echo "doc/guides/rawdevs:" ;; + doc/guides/bbdevs/*) + echo "doc/guides/bbdevs:" ;; + doc/guides/gpus/*) + echo "doc/guides/gpus:" ;; + doc/guides/dmadevs/*) + echo "doc/guides/dmadevs:" ;; + doc/guides/regexdevs/*) + echo "doc/guides/regexdevs:" ;; + doc/guides/mldevs/*) + echo "doc/guides/mldevs:" ;; + doc/guides/rel_notes/*) + echo "doc/guides/rel_notes:" ;; + doc/guides/linux_gsg/*) + echo "doc/guides/linux_gsg:" ;; + doc/guides/freebsd_gsg/*) + echo "doc/guides/freebsd_gsg:" ;; + doc/guides/windows_gsg/*) + echo "doc/guides/windows_gsg:" ;; + doc/guides/tools/*) + echo "doc/guides/tools:" ;; + doc/guides/testpmd_app_ug/*) + echo "doc/guides/testpmd:" ;; + doc/guides/howto/*) + echo "doc/guides/howto:" ;; + doc/guides/contributing/*) + echo "doc/guides/contributing:" ;; + doc/guides/platform/*) + echo "doc/guides/platform:" ;; + doc/guides/*) + echo "doc:" ;; + doc/api/*) + echo "doc/api:" ;; + doc/*) + echo "doc:" ;; + *) + echo "doc:" ;; + esac +} + +# System prompt for documentation review +SYSTEM_PROMPT='You are an expert technical documentation reviewer for DPDK (Data Plane Development Kit). +Your task is to review documentation files and suggest improvements for: +- Spelling errors +- Grammar issues +- Technical correctness +- Clarity and readability +- Consistency with DPDK terminology + +IMPORTANT COMMIT MESSAGE RULES (from check-git-log.sh): +- Subject line MUST be ≤60 characters +- Format: "prefix: lowercase description" +- First word after colon must be lowercase (except acronyms like Rx, Tx, VF, MAC, API) +- Use imperative mood (e.g., "fix typo" not "fixed typo" or "fixes typo") +- NO trailing period on subject line +- NO punctuation marks: , ; ! ? & | +- NO underscores in subject after colon +- Body lines wrapped at 75 characters +- Body must NOT start with "It" +- Do NOT include Signed-off-by (user adds via git commit --sign) +- Only use "Fixes:" tag for actual errors in documentation, not style improvements + +Case-sensitive terms (must use exact case): +- Rx, Tx (not RX, TX, rx, tx) +- VF, PF (not vf, pf) +- MAC, VLAN, RSS, API +- Linux, Windows, FreeBSD + +For style/clarity improvements, do NOT use Fixes tag. +For actual errors (wrong information, broken examples), include Fixes tag if you can identify the commit. + +OUTPUT FORMAT: +You must output exactly two sections: + +1. COMMIT_MESSAGE section containing the complete commit message +2. UNIFIED_DIFF section containing the unified diff + +Use these exact markers: +---COMMIT_MESSAGE_START--- +(commit message here) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(unified diff here) +---UNIFIED_DIFF_END--- + +The diff should be in unified format that can be applied with "git apply". +If no changes are needed, output empty sections with a note.' + +# Build request for Anthropic Claude API +build_anthropic_request() { + local model="$1" + local max_tokens="$2" + local agents_content="$3" + local doc_content="$4" + local doc_file="$5" + local commit_prefix="$6" + local system_prompt_escaped + system_prompt_escaped=$(echo "$SYSTEM_PROMPT" | python3 -c "import json,sys; print(json.dumps(sys.stdin.read()))") + + cat <<EOF +{ + "model": "$model", + "max_tokens": $max_tokens, + "system": [ + { + "type": "text", + "text": $system_prompt_escaped + }, + { + "type": "text", + "text": $agents_content, + "cache_control": {"type": "ephemeral"} + } + ], + "messages": [ + { + "role": "user", + "content": "Review the following DPDK documentation file and provide improvements.\n\nFile path: $doc_file\nCommit message prefix to use: $commit_prefix\n\nProvide a unified diff and commit message following DPDK standards.\n\n---DOCUMENT CONTENT---" + }, + { + "role": "user", + "content": $doc_content + } + ] +} +EOF +} + +# Build request for OpenAI-compatible APIs (OpenAI, xAI) +build_openai_request() { + local model="$1" + local max_tokens="$2" + local agents_content="$3" + local doc_content="$4" + local doc_file="$5" + local commit_prefix="$6" + local system_prompt_escaped + system_prompt_escaped=$(echo "$SYSTEM_PROMPT" | python3 -c "import json,sys; print(json.dumps(sys.stdin.read()))") + + cat <<EOF +{ + "model": "$model", + "max_tokens": $max_tokens, + "messages": [ + { + "role": "system", + "content": $system_prompt_escaped + }, + { + "role": "system", + "content": $agents_content + }, + { + "role": "user", + "content": "Review the following DPDK documentation file and provide improvements.\n\nFile path: $doc_file\nCommit message prefix to use: $commit_prefix\n\nProvide a unified diff and commit message following DPDK standards.\n\n---DOCUMENT CONTENT---" + }, + { + "role": "user", + "content": $doc_content + } + ] +} +EOF +} + +# Build request for Google Gemini API +build_google_request() { + local max_tokens="$1" + local agents_content="$2" + local doc_content="$3" + local doc_file="$4" + local commit_prefix="$5" + local system_prompt_escaped + system_prompt_escaped=$(echo "$SYSTEM_PROMPT" | python3 -c "import json,sys; print(json.dumps(sys.stdin.read()))") + + cat <<EOF +{ + "contents": [ + { + "role": "user", + "parts": [ + { + "text": $system_prompt_escaped + } + ] + }, + { + "role": "user", + "parts": [ + { + "text": $agents_content + } + ] + }, + { + "role": "user", + "parts": [ + { + "text": "Review the following DPDK documentation file and provide improvements.\n\nFile path: $doc_file\nCommit message prefix to use: $commit_prefix\n\nProvide a unified diff and commit message following DPDK standards.\n\n---DOCUMENT CONTENT---" + } + ] + }, + { + "role": "user", + "parts": [ + { + "text": $doc_content + } + ] + } + ], + "generationConfig": { + "maxOutputTokens": $max_tokens + } +} +EOF +} + +# Make API request for Anthropic +call_anthropic_api() { + local request="$1" + local api_key="$2" + + REQUEST_FILE=$(mktemp) + echo "$request" > "$REQUEST_FILE" + + curl -s "${API_ENDPOINTS[anthropic]}" \ + -H "content-type: application/json" \ + -H "x-api-key: $api_key" \ + -H "anthropic-version: 2023-06-01" \ + -d "@$REQUEST_FILE" +} + +# Make API request for OpenAI +call_openai_api() { + local request="$1" + local api_key="$2" + + REQUEST_FILE=$(mktemp) + echo "$request" > "$REQUEST_FILE" + + curl -s "${API_ENDPOINTS[openai]}" \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $api_key" \ + -d "@$REQUEST_FILE" +} + +# Make API request for xAI (Grok) +call_xai_api() { + local request="$1" + local api_key="$2" + + REQUEST_FILE=$(mktemp) + echo "$request" > "$REQUEST_FILE" + + curl -s "${API_ENDPOINTS[xai]}" \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $api_key" \ + -d "@$REQUEST_FILE" +} + +# Make API request for Google Gemini +call_google_api() { + local request="$1" + local api_key="$2" + local model="$3" + + REQUEST_FILE=$(mktemp) + echo "$request" > "$REQUEST_FILE" + + curl -s "${API_ENDPOINTS[google]}/${model}:generateContent?key=${api_key}" \ + -H "Content-Type: application/json" \ + -d "@$REQUEST_FILE" +} + +# Extract response text from Anthropic response +extract_anthropic_response() { + python3 -c " +import json +import sys +data = json.load(sys.stdin) +if 'error' in data: + err = data['error'] + print(f\"Error type: {err.get('type', 'unknown')}\", file=sys.stderr) + print(f\"Error message: {err.get('message', 'unknown')}\", file=sys.stderr) + sys.exit(1) +text = '' +for block in data.get('content', []): + if block.get('type') == 'text': + text += block.get('text', '') +print(text) +" +} + +# Extract response text from OpenAI/xAI response +extract_openai_response() { + python3 -c " +import json +import sys +data = json.load(sys.stdin) +if 'error' in data: + err = data['error'] + print(f\"Error type: {err.get('type', 'unknown')}\", file=sys.stderr) + print(f\"Error message: {err.get('message', 'unknown')}\", file=sys.stderr) + print(f\"Error code: {err.get('code', 'unknown')}\", file=sys.stderr) + sys.exit(1) +choices = data.get('choices', []) +if choices: + message = choices[0].get('message', {}) + print(message.get('content', '')) +else: + print('No response content received', file=sys.stderr) +" +} + +# Extract response text from Google Gemini response +extract_google_response() { + python3 -c " +import json +import sys +data = json.load(sys.stdin) +if 'error' in data: + err = data['error'] + print(f\"Error code: {err.get('code', 'unknown')}\", file=sys.stderr) + print(f\"Error message: {err.get('message', 'unknown')}\", file=sys.stderr) + print(f\"Error status: {err.get('status', 'unknown')}\", file=sys.stderr) + sys.exit(1) +candidates = data.get('candidates', []) +if candidates: + content = candidates[0].get('content', {}) + parts = content.get('parts', []) + for part in parts: + print(part.get('text', '')) +else: + print('No response candidates received', file=sys.stderr) +" +} + +# Show verbose info for Anthropic +show_anthropic_verbose() { + python3 -c " +import json +import sys +data = json.load(sys.stdin) +usage = data.get('usage', {}) +print(f\"Input tokens: {usage.get('input_tokens', 'N/A')}\") +print(f\"Cache creation: {usage.get('cache_creation_input_tokens', 0)}\") +print(f\"Cache read: {usage.get('cache_read_input_tokens', 0)}\") +print(f\"Output tokens: {usage.get('output_tokens', 'N/A')}\") +" +} + +# Show verbose info for OpenAI/xAI +show_openai_verbose() { + python3 -c " +import json +import sys +data = json.load(sys.stdin) +usage = data.get('usage', {}) +print(f\"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}\") +print(f\"Completion tokens: {usage.get('completion_tokens', 'N/A')}\") +print(f\"Total tokens: {usage.get('total_tokens', 'N/A')}\") +" +} + +# Show verbose info for Google +show_google_verbose() { + python3 -c " +import json +import sys +data = json.load(sys.stdin) +usage = data.get('usageMetadata', {}) +print(f\"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}\") +print(f\"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}\") +print(f\"Total tokens: {usage.get('totalTokenCount', 'N/A')}\") +" +} + +# Parse the response and write output files +parse_and_write_output() { + local text="$1" + local msg_file="$2" + local diff_file="$3" + + python3 -c " +import re +import sys + +text = '''$text''' + +# Extract commit message +msg_match = re.search(r'---COMMIT_MESSAGE_START---\s*\n(.*?)\n---COMMIT_MESSAGE_END---', text, re.DOTALL) +if msg_match: + msg = msg_match.group(1).strip() + with open('$msg_file', 'w') as f: + f.write(msg + '\n') + print(f'Commit message written to: $msg_file', file=sys.stderr) +else: + print('Warning: Could not extract commit message', file=sys.stderr) + with open('$msg_file', 'w') as f: + f.write('# No commit message generated\n') + +# Extract unified diff +diff_match = re.search(r'---UNIFIED_DIFF_START---\s*\n(.*?)\n---UNIFIED_DIFF_END---', text, re.DOTALL) +if diff_match: + diff = diff_match.group(1).strip() + # Clean up any markdown code fence if present + diff = re.sub(r'^\`\`\`diff\s*\n?', '', diff) + diff = re.sub(r'\n?\`\`\`\s*$', '', diff) + with open('$diff_file', 'w') as f: + f.write(diff + '\n') + print(f'Diff written to: $diff_file', file=sys.stderr) +else: + print('Warning: Could not extract diff', file=sys.stderr) + with open('$diff_file', 'w') as f: + f.write('# No changes suggested\n') + +# Also print full response for review +print('\n=== Full Review ===') +print(text) +" +} + +# Parse command line options +VERBOSE=0 +MODEL="" + +while [[ $# -gt 0 ]]; do + case "$1" in + -p|--provider) + PROVIDER="$2" + shift 2 + ;; + -a|--agents) + AGENTS_FILE="$2" + shift 2 + ;; + -m|--model) + MODEL="$2" + shift 2 + ;; + -t|--tokens) + MAX_TOKENS="$2" + shift 2 + ;; + -o|--output) + OUTPUT_DIR="$2" + shift 2 + ;; + -v|--verbose) + VERBOSE=1 + shift + ;; + -l|--list-providers) + list_providers + ;; + -h|--help) + usage 0 + ;; + -*) + error "Unknown option: $1" + ;; + *) + break + ;; + esac +done + +# Validate provider +if [[ ! -v "DEFAULT_MODELS[$PROVIDER]" ]]; then + error "Unknown provider: $PROVIDER. Use -l to list available providers." +fi + +# Set default model if not specified +if [[ -z "$MODEL" ]]; then + MODEL="${DEFAULT_MODELS[$PROVIDER]}" +fi + +# Check for required arguments +if [[ $# -lt 1 ]]; then + echo "Error: No documentation file specified" >&2 + usage 1 +fi + +DOC_FILE="$1" + +# Get the API key variable name and check it's set +API_KEY_VAR="${API_KEY_VARS[$PROVIDER]}" +API_KEY="${!API_KEY_VAR}" + +if [[ -z "$API_KEY" ]]; then + error "$API_KEY_VAR environment variable not set" +fi + +# Validate files exist +if [[ ! -f "$AGENTS_FILE" ]]; then + error "AGENTS.md not found: $AGENTS_FILE" +fi + +if [[ ! -f "$DOC_FILE" ]]; then + error "Documentation file not found: $DOC_FILE" +fi + +# Create output directory if needed +mkdir -p "$OUTPUT_DIR" + +# Derive output filenames +DOC_BASENAME=$(basename "$DOC_FILE" | sed 's/\.[^.]*$//') +DIFF_FILE="$OUTPUT_DIR/${DOC_BASENAME}.diff" +MSG_FILE="$OUTPUT_DIR/${DOC_BASENAME}.msg" + +# Get commit prefix +COMMIT_PREFIX=$(get_commit_prefix "$DOC_FILE") + +# Read file contents +AGENTS_CONTENT=$(read_json_escaped "$AGENTS_FILE") +DOC_CONTENT=$(read_json_escaped "$DOC_FILE") + +if [[ $VERBOSE -eq 1 ]]; then + echo "=== Request ===" >&2 + echo "Provider: $PROVIDER" >&2 + echo "Model: $MODEL" >&2 + echo "AGENTS file: $AGENTS_FILE" >&2 + echo "Doc file: $DOC_FILE" >&2 + echo "Commit prefix: $COMMIT_PREFIX" >&2 + echo "Output dir: $OUTPUT_DIR" >&2 + echo "===============" >&2 +fi + +# Build and send request based on provider +case "$PROVIDER" in + anthropic) + REQUEST=$(build_anthropic_request "$MODEL" "$MAX_TOKENS" "$AGENTS_CONTENT" "$DOC_CONTENT" "$DOC_FILE" "$COMMIT_PREFIX") + RESPONSE=$(call_anthropic_api "$REQUEST" "$API_KEY") + + if [[ $VERBOSE -eq 1 ]]; then + echo "=== Token Usage ===" >&2 + echo "$RESPONSE" | show_anthropic_verbose >&2 + echo "===================" >&2 + fi + + RESPONSE_TEXT=$(echo "$RESPONSE" | extract_anthropic_response) + ;; + + openai) + REQUEST=$(build_openai_request "$MODEL" "$MAX_TOKENS" "$AGENTS_CONTENT" "$DOC_CONTENT" "$DOC_FILE" "$COMMIT_PREFIX") + RESPONSE=$(call_openai_api "$REQUEST" "$API_KEY") + + if [[ $VERBOSE -eq 1 ]]; then + echo "=== Token Usage ===" >&2 + echo "$RESPONSE" | show_openai_verbose >&2 + echo "===================" >&2 + fi + + RESPONSE_TEXT=$(echo "$RESPONSE" | extract_openai_response) + ;; + + xai) + REQUEST=$(build_openai_request "$MODEL" "$MAX_TOKENS" "$AGENTS_CONTENT" "$DOC_CONTENT" "$DOC_FILE" "$COMMIT_PREFIX") + RESPONSE=$(call_xai_api "$REQUEST" "$API_KEY") + + if [[ $VERBOSE -eq 1 ]]; then + echo "=== Token Usage ===" >&2 + echo "$RESPONSE" | show_openai_verbose >&2 + echo "===================" >&2 + fi + + RESPONSE_TEXT=$(echo "$RESPONSE" | extract_openai_response) + ;; + + google) + REQUEST=$(build_google_request "$MAX_TOKENS" "$AGENTS_CONTENT" "$DOC_CONTENT" "$DOC_FILE" "$COMMIT_PREFIX") + RESPONSE=$(call_google_api "$REQUEST" "$API_KEY" "$MODEL") + + if [[ $VERBOSE -eq 1 ]]; then + echo "=== Token Usage ===" >&2 + echo "$RESPONSE" | show_google_verbose >&2 + echo "===================" >&2 + fi + + RESPONSE_TEXT=$(echo "$RESPONSE" | extract_google_response) + ;; +esac + +# Check if we got a response +if [[ -z "$RESPONSE_TEXT" ]]; then + error "No response received from $PROVIDER" +fi + +# Parse response and write output files +parse_and_write_output "$RESPONSE_TEXT" "$MSG_FILE" "$DIFF_FILE" + +echo "" +echo "=== Output Files ===" +echo "Commit message: $MSG_FILE" +echo "Diff file: $DIFF_FILE" +echo "" +echo "To apply changes:" +echo " git apply $DIFF_FILE" +echo " git commit -sF $MSG_FILE" -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* RE: [PATCH v5 0/4] devtools: add AI-powered code review tools 2026-01-23 0:42 ` [PATCH v5 0/4] devtools: add " Stephen Hemminger ` (3 preceding siblings ...) 2026-01-23 0:42 ` [PATCH v5 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger @ 2026-01-23 10:55 ` Marat Khalili 2026-01-24 17:15 ` Stephen Hemminger 4 siblings, 1 reply; 63+ messages in thread From: Marat Khalili @ 2026-01-23 10:55 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev@dpdk.org Apologies if not the right venue for it, but I have an RFE. Can mostly AI-generated emails be clearly marked in some way detectable by standard email filters? They create quite a lot of traffic in the mailing list recently, and it would really help to manage this traffic (tricks such as detecting emojis are not 100% bulletproof). ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v5 0/4] devtools: add AI-powered code review tools 2026-01-23 10:55 ` [PATCH v5 0/4] devtools: add AI-powered code review tools Marat Khalili @ 2026-01-24 17:15 ` Stephen Hemminger 0 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-24 17:15 UTC (permalink / raw) To: Marat Khalili; +Cc: dev@dpdk.org On Fri, 23 Jan 2026 10:55:25 +0000 Marat Khalili <marat.khalili@huawei.com> wrote: > Apologies if not the right venue for it, but I have an RFE. Can mostly > AI-generated emails be clearly marked in some way detectable by standard email > filters? They create quite a lot of traffic in the mailing list recently, and > it would really help to manage this traffic (tricks such as detecting emojis > are not 100% bulletproof). There are a couple of things about the current AI tools. First, they generate markdown format, so you might detect that. My plan is to update the analyze and review scripts to have an option to send mail (like git send-email) and use something like "[REVIEW]" as a subject prefix. Also looking into whether there is a working content type for markdown since it would look better in most mail agents then. ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v6 0/4] devtools: add AI-powered code review tools 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger ` (5 preceding siblings ...) 2026-01-23 0:42 ` [PATCH v5 0/4] devtools: add " Stephen Hemminger @ 2026-01-24 19:00 ` Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 1/4] doc: add AGENTS.md for " Stephen Hemminger ` (3 more replies) 2026-01-28 18:21 ` [PATCH v8 0/4] devtools: add AI-assisted code review tools Stephen Hemminger ` (4 subsequent siblings) 11 siblings, 4 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-24 19:00 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This series adds AI-powered code review tools to assist contributors and maintainers in validating patches against DPDK coding standards before submission. The tools leverage large language models from multiple providers to analyze patches and documentation for compliance with DPDK guidelines. By catching common issues early, these tools can reduce review cycles and improve overall patch quality. Key features: - Multi-provider support: Anthropic Claude, OpenAI ChatGPT, xAI Grok, and Google Gemini - Review patches against comprehensive AGENTS.md guidelines - Review documentation for spelling, grammar, and clarity - Multiple output formats: text, markdown, html, json - Email integration via SMTP using git sendemail configuration - Cross-provider comparison for evaluating review consistency The AGENTS.md document consolidates DPDK contribution requirements from multiple sources (patches.rst, coding_style.rst, check-git-log.sh, checkpatches.sh) into a structured format optimized for AI review tools. Severity levels align with existing validation scripts. All scripts are implemented in Python using only standard library modules, requiring no external dependencies beyond the AI provider API keys. Usage examples: # Review a patch ./devtools/analyze-patch.py 0001-net-ixgbe-fix-something.patch # Use a different provider ./devtools/analyze-patch.py -p openai my-patch.patch # Compare reviews across all configured providers ./devtools/compare-reviews.sh -o ./reviews my-patch.patch # Review documentation and generate diff ./devtools/review-doc.py doc/guides/nics/ixgbe.rst git apply ixgbe.diff && git commit -sF ixgbe.msg # Send review to mailing list ./devtools/analyze-patch.py --send-email --to dev@dpdk.org patch.mbox v6 - add output format options - add send-email option - use python rather than shell Stephen Hemminger (4): doc: add AGENTS.md for AI-powered code review tools devtools: add multi-provider AI patch review script devtools: add script to compare AI reviews across providers devtools: add multi-provider AI documentation review script AGENTS.md | 939 ++++++++++++++++++++++++++++++++++++ devtools/analyze-patch.py | 733 ++++++++++++++++++++++++++++ devtools/compare-reviews.sh | 192 ++++++++ devtools/review-doc.py | 892 ++++++++++++++++++++++++++++++++++ 4 files changed, 2756 insertions(+) create mode 100644 AGENTS.md create mode 100755 devtools/analyze-patch.py create mode 100644 devtools/compare-reviews.sh create mode 100755 devtools/review-doc.py -- 2.51.0 ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v6 1/4] doc: add AGENTS.md for AI-powered code review tools 2026-01-24 19:00 ` [PATCH v6 " Stephen Hemminger @ 2026-01-24 19:00 ` Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger ` (2 subsequent siblings) 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-24 19:00 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a structured reference document that enables AI code review tools to validate DPDK contributions against project standards. This document consolidates requirements from multiple sources into a machine-readable format optimized for automated validation workflows. The AGENTS.md file synthesizes guidelines from: - DPDK Contributing Code documentation (patches.rst) - DPDK Coding Style guidelines (coding_style.rst) - DPDK validation scripts (check-git-log.sh, checkpatches.sh) - Linux kernel patch submission process - SPDX License Identifier specification - DPDK Coccinelle scripts (cocci) - common items spotted on mailing list review Key sections include: - SPDX license and copyright header requirements - Commit message format with precise limits (60 char subject, 75 char body) and tag ordering rules - C coding style including explicit comparison requirements - Forbidden tokens table derived from checkpatches.sh - API tag placement rules for experimental and internal APIs - Patch validation checklists with severity levels The forbidden tokens section documents restrictions on deprecated atomics, logging functions, threading APIs, and compiler built-ins that are checked by the existing checkpatches.sh infrastructure. Severity levels (error/warning/info) align with the exit codes and messaging from check-git-log.sh and checkpatches.sh to help automated tools prioritize feedback appropriately. References: - https://doc.dpdk.org/guides/contributing/patches.html - https://doc.dpdk.org/guides/contributing/coding_style.html - devtools/check-git-log.sh - devtools/checkpatches.sh - devtools/cocci/ Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- AGENTS.md | 939 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 939 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..1ec197bdf7 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,939 @@ +# AGENTS.md - DPDK Code Review Guidelines for AI Tools + +This document provides guidelines for AI-powered code review tools when reviewing contributions to the Data Plane Development Kit (DPDK). It is derived from the official DPDK contributor guidelines and validation scripts. + +## Overview + +DPDK follows a development process modeled on the Linux Kernel. All patches are reviewed publicly on the mailing list before being merged. AI review tools should verify compliance with the standards outlined below. + +--- + +## Source License Requirements + +### SPDX License Identifiers + +Every source file must begin with an SPDX license identifier, followed by the copyright notice, then a blank line before other content. + +- SPDX tag on first line (or second line for `#!` scripts) +- Copyright line immediately follows +- Blank line after copyright before any code/includes +- Core libraries and drivers use `BSD-3-Clause` +- Kernel components use `GPL-2.0` +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` + +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 ExampleCorp + */ + +#include <stdio.h> +``` + +For scripts: +```python +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 ExampleCorp + +import sys +``` + +**Do not include boilerplate license text** - the SPDX identifier is sufficient. + +--- + +## Commit Message Requirements + +### Subject Line (First Line) + +| Rule | Limit | +|------|-------| +| Maximum length | **60 characters** | +| Format | `component: lowercase description` | +| Case | Lowercase except acronyms | +| Mood | Imperative (instructions to codebase) | +| Punctuation | **No trailing period** | + +``` +# Good examples +net/ixgbe: fix offload config option name +config: increase max queues per port +net/mlx5: add support for flow counters +app/testpmd: fix memory leak in flow create + +# Bad examples +Fixed the offload config option. # past tense, has period, no prefix +net/ixgbe: Fix Offload Config # uppercase after colon +ixgbe: fix something # wrong prefix, should be net/ixgbe +lib/ethdev: add new feature # wrong prefix, should be ethdev: +``` + +#### Headline Format Errors (from check-git-log.sh) + +The following are flagged as errors: +- Tab characters in subject +- Leading or trailing spaces +- Trailing period (`.`) +- Punctuation marks: `, ; ! ? & |` +- Underscores after the colon (indicates code in subject) +- Missing colon separator +- No space after colon +- Space before colon + +#### Common Prefix Mistakes + +| Wrong | Correct | +|-------|---------| +| `ixgbe:` | `net/ixgbe:` | +| `lib/ethdev:` | `ethdev:` | +| `example:` | `examples/foo:` | +| `apps/` | `app/name:` | +| `app/test:` | `test:` | +| `testpmd:` | `app/testpmd:` | +| `test-pmd:` | `app/testpmd:` | +| `bond:` | `net/bonding:` | + +#### Case-Sensitive Terms + +These terms must use exact capitalization (from `devtools/words-case.txt`): +- `Rx`, `Tx` (not `RX`, `TX`, `rx`, `tx`) +- `VF`, `PF` (not `vf`, `pf`) +- `MAC`, `VLAN`, `RSS`, `API` +- `Linux`, `Windows`, `FreeBSD` +- Check `devtools/words-case.txt` for complete list + +### Commit Body + +| Rule | Limit | +|------|-------| +| Line wrap | **75 characters** | +| Exception | `Fixes:` lines may exceed 75 chars | + +Body guidelines: +- Describe the issue being fixed or feature being added +- Provide enough context for reviewers +- **Do not start the commit message body with "It"** +- **Must end with** `Signed-off-by:` line (real name, not alias) + +### Fixes Tag + +When fixing regressions, use the `Fixes:` tag with a 12-character abbreviated SHA: + +``` +Fixes: abcdefgh1234 ("original commit subject") +``` + +The hash must reference a commit in the current branch, and the subject must match exactly. + +**Finding maintainers**: Use `devtools/get-maintainer.sh` to identify the current subsystem maintainer from the `MAINTAINERS` file, rather than CC'ing the original author: + +```bash +git send-email --to-cmd ./devtools/get-maintainer.sh --cc dev@dpdk.org 000*.patch +``` + +### Required Tags + +``` +# For Coverity issues (required if "coverity" mentioned in body): +Coverity issue: 12345 + +# For Bugzilla issues (required if "bugzilla" mentioned in body): +Bugzilla ID: 12345 + +# For stable release backport candidates: +Cc: stable@dpdk.org + +# For patch dependencies (in commit notes after ---): +Depends-on: series-NNNNN ("Title of the series") +``` + +### Tag Order + +Tags must appear in this order: + +``` +Coverity issue: +Bugzilla ID: +Fixes: +Cc: + <-- blank line required here +Reported-by: +Suggested-by: +Signed-off-by: +Acked-by: +Reviewed-by: +Tested-by: +``` + +**Tag format**: `Tag-name: Full Name <email@domain.com>` + +--- + +## C Coding Style + +### Line Length + +| Context | Limit | +|---------|-------| +| Source code | **100 characters** | +| Commit body | **75 characters** | + +### General Formatting + +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) +- **No trailing whitespace** on lines or at end of files +- Files must end with a new line +- Code style should be consistent within each file + + +### Comments + +```c +/* Most single-line comments look like this. */ + +/* + * VERY important single-line comments look like this. + */ + +/* + * Multi-line comments look like this. Make them real sentences. Fill + * them so they look like real paragraphs. + */ +``` + +### Header File Organization + +Include order (each group separated by blank line): +1. System/libc includes +2. DPDK EAL includes +3. DPDK misc library includes +4. Application-specific includes + +```c +#include <stdio.h> +#include <stdlib.h> + +#include <rte_eal.h> + +#include <rte_ring.h> +#include <rte_mempool.h> + +#include "application.h" +``` + +### Header Guards + +```c +#ifndef _FILE_H_ +#define _FILE_H_ + +/* Code */ + +#endif /* _FILE_H_ */ +``` + +### Naming Conventions + +- **All external symbols** must have `RTE_` or `rte_` prefix +- **Macros**: ALL_UPPERCASE with `RTE_` prefix +- **Functions**: lowercase with underscores only (no CamelCase) +- **Variables**: lowercase with underscores only +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix + +**Exception**: Driver base directories (`drivers/*/base/`) may use different naming conventions when sharing code across platforms or with upstream vendor code. + +#### Symbol Naming for Static Linking + +Drivers and libraries must not expose global variables that could clash when statically linked with other DPDK components or applications. Use consistent and unique prefixes for all exported symbols to avoid namespace collisions. + +**Good practice**: Use a driver-specific or library-specific prefix for all global variables: + +```c +/* Good - virtio driver uses consistent "virtio_" prefix */ +const struct virtio_ops virtio_legacy_ops = { + .read = virtio_legacy_read, + .write = virtio_legacy_write, + .configure = virtio_legacy_configure, +}; + +const struct virtio_ops virtio_modern_ops = { + .read = virtio_modern_read, + .write = virtio_modern_write, + .configure = virtio_modern_configure, +}; + +/* Good - mlx5 driver uses consistent "mlx5_" prefix */ +struct mlx5_flow_driver_ops mlx5_flow_dv_ops; +``` + +**Bad practice**: Generic names that may clash: + +```c +/* Bad - "ops" is too generic, will clash with other drivers */ +const struct virtio_ops ops = { ... }; + +/* Bad - "legacy_ops" could clash with other legacy implementations */ +const struct virtio_ops legacy_ops = { ... }; + +/* Bad - "driver_config" is not unique */ +struct driver_config config; +``` + +**Guidelines**: +- Prefix all global variables with the driver or library name (e.g., `virtio_`, `mlx5_`, `ixgbe_`) +- Prefix all global functions similarly unless they use the `rte_` namespace +- Internal static variables do not require prefixes as they have file scope +- Consider using the `RTE_` or `rte_` prefix only for symbols that are part of the public DPDK API + +#### Prohibited Terminology + +Do not use: +- `master/slave` → Use: primary/secondary, controller/worker, leader/follower +- `blacklist/whitelist` → Use: denylist/allowlist, blocklist/passlist + +### Comparisons and Boolean Logic + +```c +/* Pointers - compare explicitly with NULL */ +if (p == NULL) /* Good */ +if (p != NULL) /* Good */ +if (!p) /* Bad - don't use ! on pointers */ + +/* Integers - compare explicitly with zero */ +if (a == 0) /* Good */ +if (a != 0) /* Good */ +if (!a) /* Bad - don't use ! on integers */ + +/* Characters - compare with character constant */ +if (*p == '\0') /* Good */ + +/* Booleans - direct test is acceptable */ +if (flag) /* Good for actual bool types */ +if (!flag) /* Good for actual bool types */ +``` + +### Boolean Usage + +- Using `bool` type is allowed +- Prefer `bool` over `int` when a variable or field is only used as a boolean +- For structure fields, consider if the size/alignment impact is acceptable + +### Indentation and Braces + +```c +/* Control statements - no braces for single statements */ +if (val != NULL) + val = realloc(val, newsize); + +/* Braces on same line as else */ +if (test) + stmt; +else if (bar) { + stmt; + stmt; +} else + stmt; + +/* Switch statements - don't indent case */ +switch (ch) { +case 'a': + aflag = 1; + /* FALLTHROUGH */ +case 'b': + bflag = 1; + break; +default: + usage(); +} + +/* Long conditions - double indent continuation */ +if (really_long_variable_name_1 == really_long_variable_name_2 && + really_long_variable_name_3 == really_long_variable_name_4) + stmt; +``` + +### Variable Declarations + +- Prefer declaring variables inside the basic block where they are used +- Variables may be declared either at the start of the block, or at point of first use (C99 style) +- Both declaration styles are acceptable; consistency within a function is preferred +- Initialize variables only when a meaningful value exists at declaration time +- Use C99 designated initializers for structures + +```c +/* Good - declaration at start of block */ +int ret; +ret = some_function(); + +/* Also good - declaration at point of use (C99 style) */ +for (int i = 0; i < count; i++) + process(i); + +/* Good - declaration in inner block where variable is used */ +if (condition) { + int local_val = compute(); + use(local_val); +} + +/* Bad - unnecessary initialization defeats compiler warnings */ +int ret = 0; +ret = some_function(); /* Compiler won't warn if assignment removed */ +``` + +### Function Format + +- Return type on its own line +- Opening brace on its own line +- Place an empty line between declarations and statements + +```c +static char * +function(int a1, int b1) +{ + char *p; + + p = do_something(a1, b1); + return p; +} +``` + +--- + +## Unnecessary Code Patterns + +The following patterns add unnecessary code, hide bugs, or reduce performance. Avoid them. + +### Unnecessary Variable Initialization + +Do not initialize variables that will be assigned before use. This defeats the compiler's uninitialized variable warnings, hiding potential bugs. + +```c +/* Bad - initialization defeats -Wuninitialized */ +int ret = 0; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - compiler will warn if any path misses assignment */ +int ret; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - meaningful initial value */ +int count = 0; +for (i = 0; i < n; i++) + if (test(i)) + count++; +``` + +### Unnecessary Casts of void * + +In C, `void *` converts implicitly to any pointer type. Casting the result of `malloc()`, `calloc()`, `rte_malloc()`, or similar functions is unnecessary and can hide the error of a missing `#include <stdlib.h>`. + +```c +/* Bad - unnecessary cast */ +struct foo *p = (struct foo *)malloc(sizeof(*p)); +struct bar *q = (struct bar *)rte_malloc(NULL, sizeof(*q), 0); + +/* Good - no cast needed in C */ +struct foo *p = malloc(sizeof(*p)); +struct bar *q = rte_malloc(NULL, sizeof(*q), 0); +``` + +Note: Casts are required in C++ but DPDK is a C project. + +### Zero-Length Arrays vs Variable-Length Arrays + +Zero-length arrays (`int arr[0]`) are a GCC extension. Use C99 flexible array members instead. + +```c +/* Bad - GCC extension */ +struct msg { + int len; + char data[0]; +}; + +/* Good - C99 flexible array member */ +struct msg { + int len; + char data[]; +}; +``` + +### Unnecessary NULL Checks Before free() + +Functions like `free()`, `rte_free()`, and similar deallocation functions accept NULL pointers safely. Do not add redundant NULL checks. + +```c +/* Bad - unnecessary check */ +if (ptr != NULL) + free(ptr); + +if (rte_ptr != NULL) + rte_free(rte_ptr); + +/* Good - free handles NULL */ +free(ptr); +rte_free(rte_ptr); +``` + +### memset Before free() + +Do not call `memset()` to zero memory before freeing it. The compiler may optimize away the `memset()` as a dead store. For security-sensitive data, use `rte_free_sensitive()` which ensures memory is cleared. + +```c +/* Bad - compiler may eliminate memset */ +memset(secret_key, 0, sizeof(secret_key)); +free(secret_key); + +/* Good - for non-sensitive data, just free */ +free(ptr); + +/* Good - for sensitive data, use secure free */ +rte_free_sensitive(secret_key); +``` + +### Appropriate Use of rte_malloc() + +`rte_malloc()` allocates from hugepage memory. Use it only when required: + +- Memory that will be accessed by DMA (NIC descriptors, packet buffers) +- Memory shared between primary and secondary DPDK processes +- Memory requiring specific NUMA node placement + +For general allocations, use standard `malloc()` which is faster and does not consume limited hugepage resources. + +```c +/* Bad - rte_malloc for ordinary data structure */ +struct config *cfg = rte_malloc(NULL, sizeof(*cfg), 0); + +/* Good - standard malloc for control structures */ +struct config *cfg = malloc(sizeof(*cfg)); + +/* Good - rte_malloc for DMA-accessible memory */ +struct rte_mbuf *mbufs = rte_malloc(NULL, n * sizeof(*mbufs), RTE_CACHE_LINE_SIZE); +``` + +### Appropriate Use of rte_memcpy() + +`rte_memcpy()` is optimized for bulk data transfer in the fast path. For general use, standard `memcpy()` is preferred because: + +- Modern compilers optimize `memcpy()` effectively +- `memcpy()` includes bounds checking with `_FORTIFY_SOURCE` +- `memcpy()` handles small fixed-size copies efficiently + +```c +/* Bad - rte_memcpy in control path */ +rte_memcpy(&config, &default_config, sizeof(config)); + +/* Good - standard memcpy for control path */ +memcpy(&config, &default_config, sizeof(config)); + +/* Good - rte_memcpy for packet data in fast path */ +rte_memcpy(rte_pktmbuf_mtod(m, void *), payload, len); +``` + +--- + +## Forbidden Tokens + +### Functions + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `rte_panic()` | Return error codes | lib/, drivers/ | +| `rte_exit()` | Return error codes | lib/, drivers/ | +| `perror()` | `RTE_LOG()` with `strerror(errno)` | lib/, drivers/ (allowed in examples/, app/test/) | +| `printf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | +| `fprintf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | + +### Atomics and Memory Barriers + +| Forbidden | Preferred | +|-----------|-----------| +| `rte_atomic16/32/64_xxx()` | C11 atomics via `rte_atomic_xxx()` | +| `rte_smp_mb()` | `rte_atomic_thread_fence()` | +| `rte_smp_rmb()` | `rte_atomic_thread_fence()` | +| `rte_smp_wmb()` | `rte_atomic_thread_fence()` | +| `__sync_xxx()` | `rte_atomic_xxx()` | +| `__atomic_xxx()` | `rte_atomic_xxx()` | +| `__ATOMIC_RELAXED` etc. | `rte_memory_order_xxx` | +| `__rte_atomic_thread_fence()` | `rte_atomic_thread_fence()` | + +### Threading + +| Forbidden | Preferred | +|-----------|-----------| +| `pthread_create()` | `rte_thread_create()` | +| `pthread_join()` | `rte_thread_join()` | +| `pthread_detach()` | EAL thread functions | +| `pthread_setaffinity_np()` | `rte_thread_set_affinity()` | +| `rte_thread_set_name()` | `rte_thread_set_prefixed_name()` | +| `rte_thread_create_control()` | `rte_thread_create_internal_control()` | + +### Compiler Built-ins and Attributes + +| Forbidden | Preferred | Notes | +|-----------|-----------|-------| +| `__attribute__` | RTE macros in `rte_common.h` | Except in `lib/eal/include/rte_common.h` | +| `__alignof__` | C11 `alignof` | | +| `__typeof__` | `typeof` | | +| `__builtin_*` | EAL macros | Except in `lib/eal/` and `drivers/*/base/` | +| `__reserved` | Different name | Reserved in Windows headers | +| `#pragma` / `_Pragma` | Avoid | Except in `rte_common.h` | + +### Format Specifiers + +| Forbidden | Preferred | +|-----------|-----------| +| `%lld`, `%llu`, `%llx` | `%PRId64`, `%PRIu64`, `%PRIx64` | + +### Headers and Build + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `#include <linux/pci_regs.h>` | `#include <rte_pci.h>` | | +| `install_headers()` | Meson `headers` variable | meson.build | +| `-DALLOW_EXPERIMENTAL_API` | Not in lib/drivers/app | Build flags | +| `allow_experimental_apis` | Not in lib/drivers/app | Meson | +| `#undef XXX` | `// XXX is not set` | config/rte_config.h | +| Driver headers (`*_driver.h`, `*_pmd.h`) | Public API headers | app/, examples/ | + +### Testing + +| Forbidden | Preferred | +|-----------|-----------| +| `REGISTER_TEST_COMMAND` | `REGISTER_<suite_name>_TEST` | + +### Documentation + +| Forbidden | Preferred | +|-----------|-----------| +| `http://...dpdk.org` | `https://...dpdk.org` | +| `//doc.dpdk.org/guides/...` | `:ref:` or `:doc:` Sphinx references | +| `:: file.svg` | `:: file.*` (wildcard extension) | + +--- + +## API Tag Requirements + +### `__rte_experimental` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_experimental +int +rte_new_feature(void); + +/* Wrong - not alone on line */ +__rte_experimental int rte_new_feature(void); + +/* Wrong - in .c file */ +``` + +### `__rte_internal` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_internal +int +internal_function(void); +``` + +### Alignment Attributes + +`__rte_aligned`, `__rte_cache_aligned`, `__rte_cache_min_aligned` may only be used with `struct` or `union` types: + +```c +/* Correct */ +struct __rte_cache_aligned my_struct { + /* ... */ +}; + +/* Wrong */ +int __rte_cache_aligned my_variable; +``` + +### Packed Attributes + +- `__rte_packed_begin` must follow `struct`, `union`, or alignment attributes +- `__rte_packed_begin` and `__rte_packed_end` must be used in pairs +- Cannot use `__rte_packed_begin` with `enum` + +```c +/* Correct */ +struct __rte_packed_begin my_packed_struct { + /* ... */ +} __rte_packed_end; + +/* Wrong - with enum */ +enum __rte_packed_begin my_enum { + /* ... */ +}; +``` + +--- + +## Code Quality Requirements + +### Compilation + +- Each commit must compile independently (for `git bisect`) +- No forward dependencies within a patchset +- Test with multiple targets, compilers, and options +- Use `devtools/test-meson-builds.sh` + +### Testing + +- Add tests to `app/test` unit test framework +- New API functions must be used in `/app` test directory +- New device APIs require at least one driver implementation + +#### Functional Test Infrastructure + +Standalone functional tests should use the `TEST_ASSERT` macros and `unit_test_suite_runner` infrastructure for consistency and proper integration with the DPDK test framework. + +```c +#include <rte_test.h> + +static int +test_feature_basic(void) +{ + int ret; + + ret = rte_feature_init(); + TEST_ASSERT_SUCCESS(ret, "Failed to initialize feature"); + + ret = rte_feature_operation(); + TEST_ASSERT_EQUAL(ret, 0, "Operation returned unexpected value"); + + TEST_ASSERT_NOT_NULL(rte_feature_get_ptr(), + "Feature pointer should not be NULL"); + + return TEST_SUCCESS; +} + +static struct unit_test_suite feature_testsuite = { + .suite_name = "feature_autotest", + .setup = test_feature_setup, + .teardown = test_feature_teardown, + .unit_test_cases = { + TEST_CASE(test_feature_basic), + TEST_CASE(test_feature_advanced), + TEST_CASES_END() + } +}; + +static int +test_feature(void) +{ + return unit_test_suite_runner(&feature_testsuite); +} + +REGISTER_FAST_TEST(feature_autotest, true, true, test_feature); +``` + +Common `TEST_ASSERT` macros: +- `TEST_ASSERT(cond, msg, ...)` - Assert condition is true +- `TEST_ASSERT_SUCCESS(val, msg, ...)` - Assert value equals 0 +- `TEST_ASSERT_FAIL(val, msg, ...)` - Assert value is non-zero +- `TEST_ASSERT_EQUAL(a, b, msg, ...)` - Assert two values are equal +- `TEST_ASSERT_NOT_EQUAL(a, b, msg, ...)` - Assert two values differ +- `TEST_ASSERT_NULL(val, msg, ...)` - Assert value is NULL +- `TEST_ASSERT_NOT_NULL(val, msg, ...)` - Assert value is not NULL + +### Documentation + +- Add Doxygen comments for public APIs +- Update release notes in `doc/guides/rel_notes/` for important changes +- Code and documentation must be updated atomically in same patch +- Only update the **current release** notes file +- Documentation must match the code +- PMD features must match the features matrix in `doc/guides/nics/features/` +- Documentation must match device operations (see `doc/guides/nics/features.rst` for the mapping between features, `eth_dev_ops`, and related APIs) + +### API and Driver Changes + +- New APIs must be marked as `__rte_experimental` +- New APIs must have hooks in `app/testpmd` and tests in the functional test suite +- Changes to existing APIs require release notes +- New drivers or subsystems must have release notes + +### ABI Compatibility + +- New external functions must be exported properly +- Follow ABI policy and versioning guidelines +- Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable + +--- + +## Patch Validation Checklist + +### Commit Message + +- [ ] Subject line ≤60 characters +- [ ] Subject is lowercase (except acronyms from words-case.txt) +- [ ] Correct component prefix (e.g., `net/ixgbe:` not `ixgbe:`) +- [ ] No `lib/` prefix for libraries +- [ ] Imperative mood, no trailing period +- [ ] No tabs, leading/trailing spaces, or punctuation marks +- [ ] Body wrapped at 75 characters +- [ ] Body does not start with "It" +- [ ] `Signed-off-by:` present with real name and valid email +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA and exact subject +- [ ] `Coverity issue:` tag present if Coverity mentioned +- [ ] `Bugzilla ID:` tag present if Bugzilla mentioned +- [ ] `Cc: stable@dpdk.org` for stable backport candidates +- [ ] Tags in correct order with blank line separator + +### License + +- [ ] SPDX identifier on first line (or second for scripts) +- [ ] Copyright line follows SPDX +- [ ] Blank line after copyright before code +- [ ] Appropriate license for file type + +### Code Style + +- [ ] Lines ≤100 characters +- [ ] Hard tabs for indentation, spaces for alignment +- [ ] No trailing whitespace +- [ ] Proper include order +- [ ] Header guards present +- [ ] `rte_`/`RTE_` prefix on external symbols +- [ ] Driver/library global variables use unique prefixes (e.g., `virtio_`, `mlx5_`) +- [ ] No prohibited terminology +- [ ] Proper brace style +- [ ] Function return type on own line +- [ ] Explicit comparisons: `== NULL`, `== 0`, `!= NULL`, `!= 0` +- [ ] No forbidden tokens (see table above) +- [ ] No unnecessary code patterns (see section above) + +### API Tags + +- [ ] `__rte_experimental` alone on line, only in headers +- [ ] `__rte_internal` alone on line, only in headers +- [ ] Alignment attributes only on struct/union +- [ ] Packed attributes properly paired + +### Structure + +- [ ] Each commit compiles independently +- [ ] Code and docs updated together +- [ ] Documentation matches code behavior +- [ ] PMD features match `doc/guides/nics/features/` matrix +- [ ] Device operations match documentation (per `features.rst` mappings) +- [ ] Tests added/updated as needed +- [ ] Functional tests use TEST_ASSERT macros and unit_test_suite_runner +- [ ] New APIs marked as `__rte_experimental` +- [ ] New APIs have testpmd hooks and functional tests +- [ ] Current release notes updated for significant changes +- [ ] Release notes updated for API changes +- [ ] Release notes updated for new drivers or subsystems + +--- + +## Meson Build Files + +### Style Requirements + +- 4-space indentation (no tabs) +- Line continuations double-indented +- Lists alphabetically ordered +- Short lists (≤3 items): single line, no trailing comma +- Long lists: one item per line, trailing comma on last item + +```python +# Short list +sources = files('file1.c', 'file2.c') + +# Long list +headers = files( + 'header1.h', + 'header2.h', + 'header3.h', +) +``` + +--- + +## Python Code + +- Must comply with formatting standards +- Use **`black`** for code formatting validation +- Line length acceptable up to 100 characters + +--- + +## Validation Tools + +Run these before submitting: + +```bash +# Check commit messages +devtools/check-git-log.sh -n1 + +# Check patch format and forbidden tokens +devtools/checkpatches.sh -n1 + +# Check maintainers coverage +devtools/check-maintainers.sh + +# Build validation +devtools/test-meson-builds.sh + +# Find maintainers for your patch +devtools/get-maintainer.sh <patch-file> +``` + +--- + +## Severity Levels for AI Review + +**Error** (must fix): +- Missing or malformed SPDX license +- Missing Signed-off-by +- Subject line over 60 characters +- Body lines over 75 characters +- Wrong tag order or format +- Missing required tags (Fixes, Coverity issue, Bugzilla ID) +- Forbidden tokens in code +- `__rte_experimental`/`__rte_internal` in .c files or not alone on line +- Compilation failures +- ABI breaks without proper versioning + +**Warning** (should fix): +- Subject line style issues (case, punctuation) +- Wrong component prefix +- Missing Cc: stable@dpdk.org for fixes +- Documentation gaps +- Documentation does not match code behavior +- PMD features missing from `doc/guides/nics/features/` matrix +- Device operations not documented per `features.rst` mappings +- Missing tests +- Functional tests not using TEST_ASSERT macros or unit_test_suite_runner +- New API not marked as `__rte_experimental` +- New API without testpmd hooks or functional tests +- API changes without release notes +- New drivers or subsystems without release notes +- Implicit comparisons (`!ptr` instead of `ptr == NULL`) +- Unnecessary variable initialization +- Unnecessary casts of `void *` +- Unnecessary NULL checks before free +- Inappropriate use of `rte_malloc()` or `rte_memcpy()` +- Use of `perror()`, `printf()`, `fprintf()` in libraries or drivers (allowed in examples and test code) +- Driver/library global variables without unique prefixes (static linking clash risk) + +**Info** (consider): +- Minor style preferences +- Optimization suggestions +- Alternative approaches -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v6 2/4] devtools: add multi-provider AI patch review script 2026-01-24 19:00 ` [PATCH v6 " Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 1/4] doc: add AGENTS.md for " Stephen Hemminger @ 2026-01-24 19:00 ` Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 3/4] devtools: add script to compare AI reviews across providers Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-24 19:00 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This is an AI generated script to review DPDK patches against the AGENTS.md coding guidelines using AI language models. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini The script reads a patch file and the AGENTS.md guidelines, then submits them to the selected AI provider for review. Results are organized by severity level (Error, Warning, Info) as defined in the guidelines. Features: - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Verbose mode shows token usage statistics - Uses temporary files for API requests to handle large patches - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/analyze-patch.py 0001-net-ixgbe-fix-something.patch ./devtools/analyze-patch.py -p xai my-patch.patch ./devtools/analyze-patch.py -l # list providers Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/analyze-patch.py | 733 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 733 insertions(+) create mode 100755 devtools/analyze-patch.py diff --git a/devtools/analyze-patch.py b/devtools/analyze-patch.py new file mode 100755 index 0000000000..032e91bf31 --- /dev/null +++ b/devtools/analyze-patch.py @@ -0,0 +1,733 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Analyze DPDK patches using AI providers. + +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import json +import os +import re +import subprocess +import sys +import tempfile +from email.message import EmailMessage +from pathlib import Path +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +SYSTEM_PROMPT = """You are an expert DPDK code reviewer. Analyze patches for \ +compliance with DPDK coding standards and contribution guidelines. Provide \ +clear, actionable feedback organized by severity (Error, Warning, Info) as \ +defined in the guidelines.""" + +FORMAT_INSTRUCTIONS = { + "text": """Provide your review in plain text format.""", + "markdown": """Provide your review in Markdown format with: +- Headers (##) for each severity level (Errors, Warnings, Info) +- Bullet points for individual issues +- Code blocks (```) for code references +- Bold (**) for emphasis on key points""", + "html": """Provide your review in HTML format with: +- <h2> tags for each severity level (Errors, Warnings, Info) +- <ul>/<li> for individual issues +- <pre><code> for code references +- <strong> for emphasis on key points +- Use appropriate semantic HTML tags +- Do NOT include <html>, <head>, or <body> tags - just the content""", + "json": """Provide your review in JSON format with this structure: +{ + "summary": "Brief one-line summary of the review", + "errors": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "warnings": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "info": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "passed_checks": ["list of checks that passed"], + "overall_status": "PASS|WARN|FAIL" +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """Please review the following DPDK patch file '{patch_name}' \ +against the AGENTS.md guidelines. Check for: + +1. Commit message format (subject line, body, tags) +2. License/copyright compliance +3. C coding style issues +4. API and documentation requirements +5. Any other guideline violations + +{format_instruction} + +--- PATCH CONTENT --- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def build_anthropic_request( + model, max_tokens, agents_content, patch_content, patch_name, output_format="text" +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": SYSTEM_PROMPT}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + patch_content, + } + ], + } + + +def build_openai_request( + model, max_tokens, agents_content, patch_content, patch_name, output_format="text" +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + patch_content, + }, + ], + } + + +def build_google_request( + max_tokens, agents_content, patch_content, patch_name, output_format="text" +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": SYSTEM_PROMPT}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + patch_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + agents_content, + patch_content, + patch_name, + output_format="text", + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, max_tokens, agents_content, patch_content, patch_name, output_format + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, agents_content, patch_content, patch_name, output_format + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, max_tokens, agents_content, patch_content, patch_name, output_format + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: " f"{usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: " f"{usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def get_last_message_id(patch_content): + """Extract Message-ID from the last patch in an mbox.""" + msg_ids = re.findall( + r"^Message-I[Dd]:\s*(.+)$", patch_content, re.MULTILINE | re.IGNORECASE + ) + if msg_ids: + msg_id = msg_ids[-1].strip() + # Normalize: remove < > and add them back + msg_id = msg_id.strip("<>") + return f"<{msg_id}>" + return None + + +def get_last_subject(patch_content): + """Extract subject from the last patch in an mbox.""" + # Find all Subject lines with potential continuations + subjects = [] + lines = patch_content.split("\n") + i = 0 + while i < len(lines): + if lines[i].lower().startswith("subject:"): + subject = lines[i][8:].strip() + i += 1 + # Handle continuation lines + while i < len(lines) and lines[i].startswith((" ", "\t")): + subject += lines[i].strip() + i += 1 + subjects.append(subject) + else: + i += 1 + return subjects[-1] if subjects else None + + +def send_email( + to_addrs, cc_addrs, from_addr, subject, in_reply_to, body, dry_run=False +): + """Send review email using git send-email, sendmail, or msmtp.""" + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + email_text = msg.as_string() + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(email_text, file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Write to temp file for git send-email + with tempfile.NamedTemporaryFile(mode="w", suffix=".eml", delete=False) as f: + f.write(email_text) + temp_file = f.name + + try: + # Try git send-email first + if get_git_config("sendemail.smtpserver"): + cmd = ["git", "send-email", "--confirm=never", "--quiet"] + cmd.extend(["--to", addr] for addr in to_addrs) + # Flatten the list + flat_cmd = ["git", "send-email", "--confirm=never", "--quiet"] + for addr in to_addrs: + flat_cmd.extend(["--to", addr]) + for addr in cc_addrs: + flat_cmd.extend(["--cc", addr]) + if from_addr: + flat_cmd.extend(["--from", from_addr]) + if in_reply_to: + flat_cmd.extend(["--in-reply-to", in_reply_to]) + flat_cmd.append(temp_file) + + try: + subprocess.run(flat_cmd, check=True, capture_output=True) + print("Email sent via git send-email", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + # Try sendmail + try: + subprocess.run( + ["sendmail", "-t"], + input=email_text, + text=True, + capture_output=True, + check=True, + ) + print("Email sent via sendmail", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + # Try msmtp + try: + subprocess.run( + ["msmtp", "-t"], + input=email_text, + text=True, + capture_output=True, + check=True, + ) + print("Email sent via msmtp", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + error("Could not send email. Configure git send-email, sendmail, " "or msmtp.") + + finally: + os.unlink(temp_file) + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Analyze DPDK patches using AI providers", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s patch.patch # Review with default settings + %(prog)s -p openai my-patch.patch # Use OpenAI ChatGPT + %(prog)s -f markdown patch.patch # Output as Markdown + %(prog)s -f json -o review.json patch.patch # Save JSON to file + %(prog)s -f html -o review.html patch.patch # Save HTML to file + %(prog)s --send-email --to dev@dpdk.org series.mbox + %(prog)s --send-email --to dev@dpdk.org --dry-run series.mbox + """, + ) + + parser.add_argument("patch_file", nargs="?", help="Patch file to analyze") + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=4096, + help="Max tokens for response (default: 4096)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-o", + "--output", + metavar="FILE", + help="Write output to file instead of stdout", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Check patch file is provided + if not args.patch_file: + parser.error("patch_file is required") + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + patch_path = Path(args.patch_file) + if not patch_path.exists(): + error(f"Patch file not found: {args.patch_file}") + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Read files + agents_content = agents_path.read_text() + patch_content = patch_path.read_text() + patch_name = patch_path.name + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Patch file: {args.patch_file}", file=sys.stderr) + if args.output: + print(f"Output file: {args.output}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + agents_content, + patch_content, + patch_name, + args.output_format, + args.verbose, + ) + + if not review_text: + error(f"No response received from {args.provider}") + + # Format output based on requested format + provider_name = config["name"] + + if args.output_format == "json": + # For JSON, try to parse and add metadata + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + # If AI didn't return valid JSON, wrap the text + review_data = {"raw_review": review_text} + + output_data = { + "metadata": { + "patch_file": patch_name, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "html": + # Wrap HTML content with header + output_text = f"""<!-- AI-generated review of {patch_name} --> +<!-- Reviewed using {provider_name} ({model}) --> +<div class="patch-review"> +<h1>Patch Review: {patch_name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model})</p> +{review_text} +</div> +""" + elif args.output_format == "markdown": + output_text = f"""# Patch Review: {patch_name} + +*Reviewed by {provider_name} ({model})* + +{review_text} +""" + else: # text + output_text = f"=== Patch Review: {patch_name} (via {provider_name}) ===\n\n" + output_text += review_text + + # Write output + if args.output: + Path(args.output).write_text(output_text) + print(f"Review written to: {args.output}", file=sys.stderr) + else: + print(output_text) + + # Send email if requested + if args.send_email: + # Email always uses plain text - warn if different format requested + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + in_reply_to = get_last_message_id(patch_content) + orig_subject = get_last_subject(patch_content) + + if orig_subject: + # Remove [PATCH n/m] prefix + review_subject = re.sub(r"^\[PATCH[^\]]*\]\s*", "", orig_subject) + review_subject = f"[REVIEW] {review_subject}" + else: + review_subject = f"[REVIEW] {patch_name}" + + # Build email body - always use plain text version + email_body = f"""AI-generated review of {patch_name} +Reviewed using {provider_name} ({model}) + +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print(f"In-Reply-To: {in_reply_to}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + in_reply_to, + email_body, + args.dry_run, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v6 3/4] devtools: add script to compare AI reviews across providers 2026-01-24 19:00 ` [PATCH v6 " Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 1/4] doc: add AGENTS.md for " Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger @ 2026-01-24 19:00 ` Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-24 19:00 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add AI generated script to compare-reviews.sh script that runs patch reviews across multiple AI providers for comparison purposes. The script automatically detects which providers have API keys configured and runs analyze-patch.sh for each one. This allows users to compare review quality and feedback across different AI models. Features: - Auto-detects available providers based on environment variables - Optional provider selection via -p/--providers option - Saves individual reviews to separate files with -o/--output - Verbose mode passes through to underlying analyze-patch.sh Usage: ./devtools/compare-reviews.sh my-patch.patch ./devtools/compare-reviews.sh -p anthropic,xai my-patch.patch ./devtools/compare-reviews.sh -o ./reviews my-patch.patch Output files are named <patch>-<provider>.txt when using the output directory option. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/compare-reviews.sh | 192 ++++++++++++++++++++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100644 devtools/compare-reviews.sh diff --git a/devtools/compare-reviews.sh b/devtools/compare-reviews.sh new file mode 100644 index 0000000000..a63eeffb71 --- /dev/null +++ b/devtools/compare-reviews.sh @@ -0,0 +1,192 @@ +#!/bin/bash +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +# Compare DPDK patch reviews across multiple AI providers +# Runs analyze-patch.py with each available provider + +set -e + +SCRIPT_DIR="$(dirname "$(readlink -f "$0")")" +ANALYZE_SCRIPT="${SCRIPT_DIR}/analyze-patch.py" +AGENTS_FILE="AGENTS.md" +OUTPUT_DIR="" +PROVIDERS="" +FORMAT="text" + +usage() { + cat <<EOF +Usage: $(basename "$0") [OPTIONS] <patch-file> + +Compare DPDK patch reviews across multiple AI providers. + +Options: + -a, --agents FILE Path to AGENTS.md file (default: AGENTS.md) + -o, --output DIR Save individual reviews to directory + -p, --providers LIST Comma-separated list of providers to use + (default: all providers with API keys set) + -f, --format FORMAT Output format: text, markdown, html, json + (default: text) + -v, --verbose Show verbose output from each provider + -h, --help Show this help message + +Environment Variables: + Set API keys for providers you want to use: + ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY + +Examples: + $(basename "$0") my-patch.patch + $(basename "$0") -p anthropic,openai my-patch.patch + $(basename "$0") -o ./reviews -f markdown my-patch.patch +EOF + exit "${1:-0}" +} + +error() { + echo "Error: $1" >&2 + exit 1 +} + +# Check which providers have API keys configured +get_available_providers() { + local available="" + + [[ -n "$ANTHROPIC_API_KEY" ]] && available="${available}anthropic," + [[ -n "$OPENAI_API_KEY" ]] && available="${available}openai," + [[ -n "$XAI_API_KEY" ]] && available="${available}xai," + [[ -n "$GOOGLE_API_KEY" ]] && available="${available}google," + + # Remove trailing comma + echo "${available%,}" +} + +# Get file extension for format +get_extension() { + case "$1" in + text) echo "txt" ;; + markdown) echo "md" ;; + html) echo "html" ;; + json) echo "json" ;; + *) echo "txt" ;; + esac +} + +# Parse command line options +VERBOSE="" + +while [[ $# -gt 0 ]]; do + case "$1" in + -a|--agents) + AGENTS_FILE="$2" + shift 2 + ;; + -o|--output) + OUTPUT_DIR="$2" + shift 2 + ;; + -p|--providers) + PROVIDERS="$2" + shift 2 + ;; + -f|--format) + FORMAT="$2" + shift 2 + ;; + -v|--verbose) + VERBOSE="-v" + shift + ;; + -h|--help) + usage 0 + ;; + -*) + error "Unknown option: $1" + ;; + *) + break + ;; + esac +done + +# Check for required arguments +if [[ $# -lt 1 ]]; then + echo "Error: No patch file specified" >&2 + usage 1 +fi + +PATCH_FILE="$1" + +if [[ ! -f "$PATCH_FILE" ]]; then + error "Patch file not found: $PATCH_FILE" +fi + +if [[ ! -f "$ANALYZE_SCRIPT" ]]; then + error "analyze-patch.py not found: $ANALYZE_SCRIPT" +fi + +# Validate format +case "$FORMAT" in + text|markdown|html|json) ;; + *) error "Invalid format: $FORMAT (must be text, markdown, html, or json)" ;; +esac + +# Get providers to use +if [[ -z "$PROVIDERS" ]]; then + PROVIDERS=$(get_available_providers) +fi + +if [[ -z "$PROVIDERS" ]]; then + error "No API keys configured. Set at least one of: "\ +"ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY" +fi + +# Create output directory if specified +if [[ -n "$OUTPUT_DIR" ]]; then + mkdir -p "$OUTPUT_DIR" +fi + +PATCH_BASENAME=$(basename "$PATCH_FILE") +PATCH_STEM="${PATCH_BASENAME%.*}" +EXT=$(get_extension "$FORMAT") + +echo "Reviewing patch: $PATCH_BASENAME" +echo "Providers: $PROVIDERS" +echo "Format: $FORMAT" +echo "========================================" +echo "" + +# Run review for each provider +IFS=',' read -ra PROVIDER_LIST <<< "$PROVIDERS" +for provider in "${PROVIDER_LIST[@]}"; do + echo ">>> Running review with: $provider" + echo "" + + if [[ -n "$OUTPUT_DIR" ]]; then + OUTPUT_FILE="${OUTPUT_DIR}/${PATCH_STEM}-${provider}.${EXT}" + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" | tee "$OUTPUT_FILE" + echo "" + echo "Saved to: $OUTPUT_FILE" + else + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" + fi + + echo "" + echo "========================================" + echo "" +done + +echo "Review comparison complete." + +if [[ -n "$OUTPUT_DIR" ]]; then + echo "All reviews saved to: $OUTPUT_DIR" +fi -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v6 4/4] devtools: add multi-provider AI documentation review script 2026-01-24 19:00 ` [PATCH v6 " Stephen Hemminger ` (2 preceding siblings ...) 2026-01-24 19:00 ` [PATCH v6 3/4] devtools: add script to compare AI reviews across providers Stephen Hemminger @ 2026-01-24 19:00 ` Stephen Hemminger 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-24 19:00 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add review-doc.sh script that reviews DPDK documentation files for spelling, grammar, technical correctness, and clarity using AI language models. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini The script produces two output files: - A unified diff with suggested changes - A commit message following DPDK standards The commit message prefix is automatically determined from the file path (e.g., doc/guides/prog_guide: for programmer's guide files). Features: - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Configurable output directory via -o/--output option - Verbose mode shows token usage statistics - Uses temporary files for API requests to handle large documents - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/review-doc.py doc/guides/prog_guide/mempool_lib.rst ./devtools/review-doc.py -p xai doc/guides/nics/ixgbe.rst git apply mempool_lib.diff && git commit -sF mempool_lib.msg Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/review-doc.py | 892 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 892 insertions(+) create mode 100755 devtools/review-doc.py diff --git a/devtools/review-doc.py b/devtools/review-doc.py new file mode 100755 index 0000000000..747af925e5 --- /dev/null +++ b/devtools/review-doc.py @@ -0,0 +1,892 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Review DPDK documentation files using AI providers. + +Produces a diff file and commit message compliant with DPDK standards. +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import getpass +import json +import os +import re +import smtplib +import ssl +import subprocess +import sys +from email.message import EmailMessage +from pathlib import Path +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# Commit prefix mappings based on file path +COMMIT_PREFIX_MAP = [ + ("doc/guides/prog_guide/", "doc/guides/prog_guide:"), + ("doc/guides/sample_app_ug/", "doc/guides/sample_app:"), + ("doc/guides/nics/", "doc/guides/nics:"), + ("doc/guides/cryptodevs/", "doc/guides/cryptodevs:"), + ("doc/guides/compressdevs/", "doc/guides/compressdevs:"), + ("doc/guides/eventdevs/", "doc/guides/eventdevs:"), + ("doc/guides/rawdevs/", "doc/guides/rawdevs:"), + ("doc/guides/bbdevs/", "doc/guides/bbdevs:"), + ("doc/guides/gpus/", "doc/guides/gpus:"), + ("doc/guides/dmadevs/", "doc/guides/dmadevs:"), + ("doc/guides/regexdevs/", "doc/guides/regexdevs:"), + ("doc/guides/mldevs/", "doc/guides/mldevs:"), + ("doc/guides/rel_notes/", "doc/guides/rel_notes:"), + ("doc/guides/linux_gsg/", "doc/guides/linux_gsg:"), + ("doc/guides/freebsd_gsg/", "doc/guides/freebsd_gsg:"), + ("doc/guides/windows_gsg/", "doc/guides/windows_gsg:"), + ("doc/guides/tools/", "doc/guides/tools:"), + ("doc/guides/testpmd_app_ug/", "doc/guides/testpmd:"), + ("doc/guides/howto/", "doc/guides/howto:"), + ("doc/guides/contributing/", "doc/guides/contributing:"), + ("doc/guides/platform/", "doc/guides/platform:"), + ("doc/guides/", "doc:"), + ("doc/api/", "doc/api:"), + ("doc/", "doc:"), +] + +SYSTEM_PROMPT = """\ +You are an expert technical documentation reviewer for DPDK. +Your task is to review documentation files and suggest improvements for: +- Spelling errors +- Grammar issues +- Technical correctness +- Clarity and readability +- Consistency with DPDK terminology + +IMPORTANT COMMIT MESSAGE RULES (from check-git-log.sh): +- Subject line MUST be ≤60 characters +- Format: "prefix: lowercase description" +- First word after colon must be lowercase (except acronyms like Rx, Tx, VF, MAC, API) +- Use imperative mood (e.g., "fix typo" not "fixed typo" or "fixes typo") +- NO trailing period on subject line +- NO punctuation marks: , ; ! ? & | +- NO underscores in subject after colon +- Body lines wrapped at 75 characters +- Body must NOT start with "It" +- Do NOT include Signed-off-by (user adds via git commit --sign) +- Only use "Fixes:" tag for actual errors in documentation, not style improvements + +Case-sensitive terms (must use exact case): +- Rx, Tx (not RX, TX, rx, tx) +- VF, PF (not vf, pf) +- MAC, VLAN, RSS, API +- Linux, Windows, FreeBSD + +For style/clarity improvements, do NOT use Fixes tag. +For actual errors (wrong information, broken examples), include Fixes tag \ +if you can identify the commit.""" + +FORMAT_INSTRUCTIONS = { + "text": """ +OUTPUT FORMAT: +You must output exactly two sections: + +1. COMMIT_MESSAGE section containing the complete commit message +2. UNIFIED_DIFF section containing the unified diff + +Use these exact markers: +---COMMIT_MESSAGE_START--- +(commit message here) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(unified diff here) +---UNIFIED_DIFF_END--- + +The diff should be in unified format that can be applied with "git apply". +If no changes are needed, output empty sections with a note.""", + + "markdown": """ +OUTPUT FORMAT: +Provide your review in Markdown format with: + +## Summary +Brief description of changes + +## Commit Message +``` +(complete commit message here, ready to use) +``` + +## Changes +For each change: +### Issue N: Brief title +- **Location**: file path and line +- **Problem**: description +- **Fix**: suggested correction + +## Unified Diff +```diff +(unified diff here) +```""", + + "html": """ +OUTPUT FORMAT: +Provide your review in HTML format with: +- <h2> for sections (Summary, Commit Message, Changes, Diff) +- <pre><code> for commit message and diff +- <ul>/<li> for individual issues +- Do NOT include <html>, <head>, or <body> tags - just the content + +Include sections for: Summary, Commit Message, Changes, Unified Diff""", + + "json": """ +OUTPUT FORMAT: +Provide your review as JSON with this structure: +{ + "summary": "Brief description of changes", + "commit_message": "Complete commit message ready to use", + "changes": [ + { + "type": "spelling|grammar|technical|clarity|style", + "location": "line number or section", + "original": "original text", + "suggested": "corrected text", + "reason": "why this change" + } + ], + "diff": "unified diff as a string", + "stats": { + "total_issues": 0, + "spelling": 0, + "grammar": 0, + "technical": 0, + "clarity": 0 + } +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """\ +Review the following DPDK documentation file and provide improvements. + +File path: {doc_file} +Commit message prefix to use: {commit_prefix} + +{format_instruction} + +---DOCUMENT CONTENT--- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def get_smtp_config(): + """Get SMTP configuration from git config sendemail settings.""" + config = { + "server": get_git_config("sendemail.smtpserver"), + "port": get_git_config("sendemail.smtpserverport"), + "user": get_git_config("sendemail.smtpuser"), + "encryption": get_git_config("sendemail.smtpencryption"), + "password": get_git_config("sendemail.smtppass"), + } + + # Set defaults + if not config["port"]: + if config["encryption"] == "ssl": + config["port"] = "465" + else: + config["port"] = "587" + + # Convert port to int + if config["port"]: + config["port"] = int(config["port"]) + + return config + + +def get_commit_prefix(filepath): + """Determine commit message prefix from file path.""" + for prefix_path, prefix in COMMIT_PREFIX_MAP: + if filepath.startswith(prefix_path): + return prefix + return "doc:" + + +def build_anthropic_request(model, max_tokens, agents_content, doc_content, + doc_file, commit_prefix, output_format="text"): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": SYSTEM_PROMPT}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + doc_content, + } + ], + } + + +def build_openai_request(model, max_tokens, agents_content, doc_content, + doc_file, commit_prefix, output_format="text"): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + doc_content, + }, + ], + } + + +def build_google_request(max_tokens, agents_content, doc_content, + doc_file, commit_prefix, output_format="text"): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": SYSTEM_PROMPT}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + doc_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api(provider, api_key, model, max_tokens, agents_content, + doc_content, doc_file, commit_prefix, output_format="text", + verbose=False): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, max_tokens, agents_content, doc_content, + doc_file, commit_prefix, output_format + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, agents_content, doc_content, + doc_file, commit_prefix, output_format + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, max_tokens, agents_content, doc_content, + doc_file, commit_prefix, output_format + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", + file=sys.stderr) + print(f"Cache creation: " + f"{usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr) + print(f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr) + print(f"Output tokens: {usage.get('output_tokens', 'N/A')}", + file=sys.stderr) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print(f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr) + print(f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr) + else: # openai, xai + usage = result.get("usage", {}) + print(f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", + file=sys.stderr) + print(f"Completion tokens: " + f"{usage.get('completion_tokens', 'N/A')}", file=sys.stderr) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content + if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def parse_review_text(review_text): + """Extract commit message and diff from text format response.""" + commit_msg = "" + diff = "" + + # Extract commit message + msg_match = re.search( + r'---COMMIT_MESSAGE_START---\s*\n(.*?)\n---COMMIT_MESSAGE_END---', + review_text, re.DOTALL + ) + if msg_match: + commit_msg = msg_match.group(1).strip() + + # Extract unified diff + diff_match = re.search( + r'---UNIFIED_DIFF_START---\s*\n(.*?)\n---UNIFIED_DIFF_END---', + review_text, re.DOTALL + ) + if diff_match: + diff = diff_match.group(1).strip() + # Clean up any markdown code fence if present + diff = re.sub(r'^```diff\s*\n?', '', diff) + diff = re.sub(r'\n?```\s*$', '', diff) + + return commit_msg, diff + + +def send_email(to_addrs, cc_addrs, from_addr, subject, in_reply_to, body, + dry_run=False, verbose=False): + """Send review email via SMTP using git sendemail config.""" + # Build email message + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(msg.as_string(), file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Get SMTP configuration from git config + smtp_config = get_smtp_config() + + if not smtp_config["server"]: + error("No SMTP server configured. Set git config sendemail.smtpserver") + + server = smtp_config["server"] + port = smtp_config["port"] + user = smtp_config["user"] + encryption = smtp_config["encryption"] + + # Get password from environment or git config, or prompt + password = os.environ.get("SMTP_PASSWORD") or smtp_config["password"] + if user and not password: + password = getpass.getpass(f"SMTP password for {user}@{server}: ") + + if verbose: + print(f"SMTP server: {server}:{port}", file=sys.stderr) + print(f"SMTP user: {user or '(none)'}", file=sys.stderr) + print(f"Encryption: {encryption or 'starttls'}", file=sys.stderr) + + # Collect all recipients + all_recipients = list(to_addrs) + if cc_addrs: + all_recipients.extend(cc_addrs) + + try: + if encryption == "ssl": + # SSL/TLS connection from the start (port 465) + context = ssl.create_default_context() + with smtplib.SMTP_SSL(server, port, context=context) as smtp: + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + else: + # STARTTLS (port 587) or plain (port 25) + with smtplib.SMTP(server, port) as smtp: + smtp.ehlo() + if encryption == "tls" or port == 587: + context = ssl.create_default_context() + smtp.starttls(context=context) + smtp.ehlo() + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + + print(f"Email sent via SMTP ({server}:{port})", file=sys.stderr) + return True + + except smtplib.SMTPAuthenticationError as e: + error(f"SMTP authentication failed: {e}") + except smtplib.SMTPException as e: + error(f"SMTP error: {e}") + except OSError as e: + error(f"Connection error to {server}:{port}: {e}") + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Review DPDK documentation files using AI providers", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s doc/guides/prog_guide/mempool_lib.rst + %(prog)s -p openai -o /tmp doc/guides/nics/ixgbe.rst + %(prog)s -f markdown doc/guides/cryptodevs/qat.rst + %(prog)s -f json -O review.json doc/guides/howto/flow_bifurcation.rst + %(prog)s --send-email --to dev@dpdk.org doc/guides/nics/ixgbe.rst + +After review: + git apply <basename>.diff + git commit -sF <basename>.msg + +SMTP Configuration (from git config): + sendemail.smtpserver SMTP server hostname + sendemail.smtpserverport SMTP port (default: 587 for TLS, 465 for SSL) + sendemail.smtpuser SMTP username + sendemail.smtpencryption 'tls' for STARTTLS, 'ssl' for SSL/TLS + sendemail.smtppass SMTP password (or set SMTP_PASSWORD env var) + +Example git config: + git config --global sendemail.smtpserver smtp.gmail.com + git config --global sendemail.smtpserverport 587 + git config --global sendemail.smtpuser yourname@gmail.com + git config --global sendemail.smtpencryption tls + """, + ) + + parser.add_argument("doc_file", nargs="?", help="Documentation file") + parser.add_argument( + "-p", "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", "--tokens", + type=int, + default=8192, + help="Max tokens for response (default: 8192)", + ) + parser.add_argument( + "-o", "--output-dir", + default=".", + help="Output directory for .diff and .msg files (default: .)", + ) + parser.add_argument( + "-v", "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-f", "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-O", "--output-file", + metavar="FILE", + help="Write full review to file (in addition to .diff/.msg)", + ) + parser.add_argument( + "-l", "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Check doc file is provided + if not args.doc_file: + parser.error("doc_file is required") + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + doc_path = Path(args.doc_file) + if not doc_path.exists(): + error(f"Documentation file not found: {args.doc_file}") + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Determine output filenames + doc_basename = doc_path.stem + output_dir = Path(args.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + diff_file = output_dir / f"{doc_basename}.diff" + msg_file = output_dir / f"{doc_basename}.msg" + + # Get commit prefix + commit_prefix = get_commit_prefix(args.doc_file) + + # Read files + agents_content = agents_path.read_text() + doc_content = doc_path.read_text() + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Doc file: {args.doc_file}", file=sys.stderr) + print(f"Commit prefix: {commit_prefix}", file=sys.stderr) + print(f"Output dir: {args.output_dir}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API + provider_name = config["name"] + review_text = call_api( + args.provider, api_key, model, args.tokens, + agents_content, doc_content, args.doc_file, commit_prefix, + args.output_format, args.verbose + ) + + if not review_text: + error(f"No response received from {args.provider}") + + # Process based on output format + if args.output_format == "text": + # Parse and write diff/msg files + commit_msg, diff = parse_review_text(review_text) + + if commit_msg: + msg_file.write_text(commit_msg + "\n") + print(f"Commit message written to: {msg_file}", file=sys.stderr) + else: + msg_file.write_text("# No commit message generated\n") + print("Warning: Could not extract commit message", file=sys.stderr) + + if diff: + diff_file.write_text(diff + "\n") + print(f"Diff written to: {diff_file}", file=sys.stderr) + else: + diff_file.write_text("# No changes suggested\n") + print("Warning: Could not extract diff", file=sys.stderr) + + # Print full review + print(f"\n=== Documentation Review: {doc_path.name} " + f"(via {provider_name}) ===") + print(review_text) + + elif args.output_format == "json": + # Try to parse JSON and extract diff/msg + try: + review_data = json.loads(review_text) + commit_msg = review_data.get("commit_message", "") + diff = review_data.get("diff", "") + + if commit_msg: + msg_file.write_text(commit_msg + "\n") + print(f"Commit message written to: {msg_file}", + file=sys.stderr) + + if diff: + diff_file.write_text(diff + "\n") + print(f"Diff written to: {diff_file}", file=sys.stderr) + + except json.JSONDecodeError: + print("Warning: Response is not valid JSON", file=sys.stderr) + review_data = {"raw_response": review_text} + + # Add metadata + output_data = { + "metadata": { + "doc_file": args.doc_file, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "commit_prefix": commit_prefix, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + print(output_text) + + elif args.output_format == "markdown": + output_text = f"""# Documentation Review: {doc_path.name} + +*Reviewed by {provider_name} ({model})* + +{review_text} +""" + print(output_text) + + elif args.output_format == "html": + output_text = f"""<!-- Documentation review of {doc_path.name} --> +<!-- Reviewed using {provider_name} ({model}) --> +<div class="doc-review"> +<h1>Documentation Review: {doc_path.name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model})</p> +{review_text} +</div> +""" + print(output_text) + + # Write to output file if requested + if args.output_file: + if args.output_format == "json": + Path(args.output_file).write_text(output_text) + elif args.output_format in ("markdown", "html"): + Path(args.output_file).write_text(output_text) + else: + Path(args.output_file).write_text(review_text) + print(f"Full review written to: {args.output_file}", file=sys.stderr) + + # Print usage instructions for text format + if args.output_format == "text": + print("\n=== Output Files ===") + print(f"Commit message: {msg_file}") + print(f"Diff file: {diff_file}") + print("\nTo apply changes:") + print(f" git apply {diff_file}") + print(f" git commit -sF {msg_file}") + + # Send email if requested + if args.send_email: + if args.output_format != "text": + print(f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", file=sys.stderr) + + review_subject = f"[REVIEW] {commit_prefix} {doc_path.name}" + + # Build email body + email_body = f"""AI-generated documentation review of {args.doc_file} +Reviewed using {provider_name} ({model}) + +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, args.cc_addrs, from_addr, + review_subject, None, email_body, + args.dry_run, args.verbose + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", + file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v8 0/4] devtools: add AI-assisted code review tools 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger ` (6 preceding siblings ...) 2026-01-24 19:00 ` [PATCH v6 " Stephen Hemminger @ 2026-01-28 18:21 ` Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 1/4] doc: add AGENTS.md for AI-powered " Stephen Hemminger ` (3 more replies) 2026-01-29 20:38 ` [PATCH v9 0/4] devtools: add AI-assisted code review tools Stephen Hemminger ` (3 subsequent siblings) 11 siblings, 4 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-28 18:21 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This series adds infrastructure for AI-assisted code review of DPDK patches and documentation. It provides guidelines for AI tools and scripts to interface with multiple AI providers. The AGENTS.md file consolidates DPDK coding standards, commit message requirements, and common review feedback into a format optimized for AI code review tools. The accompanying scripts enable automated patch analysis using Claude, ChatGPT, Grok, or Gemini. Patches: 1. AGENTS.md - Guidelines document for AI review tools 2. analyze-patch.py - Multi-provider patch review script 3. compare-reviews.sh - Compare reviews across providers 4. review-doc.py - Documentation review with diff output Changes in v8: - Batch processing: both scripts now accept multiple files and generate separate output for each - Changed -o/--output to -o/--output-dir for organized file output - Added -d/--diff option to force .diff and .msg file generation - Added -q/--quiet option to suppress stdout when processing batches - Added strip_diff_markers() to clean output in non-text formats - Expanded AGENTS.md with additional guidance and examples - Improved help text with better examples and documentation Changes in v7: - Add "Review Philosophy" section to AGENTS.md with guidance on confidence levels and concise feedback - Add "Priority Areas" section covering security, correctness, and architecture concerns that reviewers should focus on - Minor code cleanups in analyze-patch.py Changes in v6: - Expanded AGENTS.md with "Unnecessary Code Patterns" section - Added guidance on rte_malloc() and rte_memcpy() appropriate use - Added symbol naming guidelines for static linking - Improved email sending in review-doc.py with native SMTP support Changes in v5: - Added review-doc.py for documentation review - Added email sending capability to scripts - Expanded forbidden tokens documentation Changes in v4: - Added compare-reviews.sh script - Multiple output formats (text, markdown, html, json) - Improved error handling Changes in v3: - Added support for OpenAI, xAI Grok, and Google Gemini - Added prompt caching for Anthropic to reduce costs - Restructured AGENTS.md for better machine parsing Changes in v2: - Split into separate patches - Added verbose mode with token usage statistics Stephen Hemminger (4): doc: add AGENTS.md for AI-powered code review tools devtools: add multi-provider AI patch review script devtools: add compare-reviews.sh for multi-provider analysis devtools: add multi-provider AI documentation review script AGENTS.md | 1066 +++++++++++++++++++++++++++++++++++ devtools/analyze-patch.py | 1006 +++++++++++++++++++++++++++++++++ devtools/compare-reviews.sh | 192 +++++++ devtools/review-doc.py | 1098 ++++++++++++++++++++++++++++++++++++ 4 files changed, 3362 insertions(+) create mode 100644 AGENTS.md create mode 100755 devtools/analyze-patch.py create mode 100644 devtools/compare-reviews.sh create mode 100755 devtools/review-doc.py -- 2.51.0 *** BLURB HERE *** Stephen Hemminger (4): doc: add AGENTS.md for AI-powered code review tools devtools: add multi-provider AI patch review script devtools: add compare-reviews.sh for multi-provider analysis devtools: add multi-provider AI documentation review script AGENTS.md | 1066 ++++++++++++++++++++++++++++++++++ devtools/analyze-patch.py | 1006 ++++++++++++++++++++++++++++++++ devtools/compare-reviews.sh | 192 ++++++ devtools/review-doc.py | 1098 +++++++++++++++++++++++++++++++++++ 4 files changed, 3362 insertions(+) create mode 100644 AGENTS.md create mode 100755 devtools/analyze-patch.py create mode 100644 devtools/compare-reviews.sh create mode 100755 devtools/review-doc.py -- 2.51.0 ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v8 1/4] doc: add AGENTS.md for AI-powered code review tools 2026-01-28 18:21 ` [PATCH v8 0/4] devtools: add AI-assisted code review tools Stephen Hemminger @ 2026-01-28 18:21 ` Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger ` (2 subsequent siblings) 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-28 18:21 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a structured reference document that enables AI code review tools to validate DPDK contributions against project standards. This document consolidates requirements from multiple sources into a machine-readable format optimized for automated validation workflows. The AGENTS.md file synthesizes guidelines from: - DPDK Contributing Code documentation (patches.rst) - DPDK Coding Style guidelines (coding_style.rst) - DPDK validation scripts (check-git-log.sh, checkpatches.sh) - Linux kernel patch submission process - SPDX License Identifier specification - DPDK Coccinelle scripts (cocci) - common items spotted on mailing list review Key sections include: - SPDX license and copyright header requirements - Commit message format with precise limits (60 char subject, 75 char body) and tag ordering rules - C coding style including explicit comparison requirements - Forbidden tokens table derived from checkpatches.sh - API tag placement rules for experimental and internal APIs - Patch validation checklists with severity levels The forbidden tokens section documents restrictions on deprecated atomics, logging functions, threading APIs, and compiler built-ins that are checked by the existing checkpatches.sh infrastructure. Severity levels (error/warning/info) align with the exit codes and messaging from check-git-log.sh and checkpatches.sh to help automated tools prioritize feedback appropriately. References: - https://doc.dpdk.org/guides/contributing/patches.html - https://doc.dpdk.org/guides/contributing/coding_style.html - devtools/check-git-log.sh - devtools/checkpatches.sh - devtools/cocci/ Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- AGENTS.md | 1066 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1066 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..019a83a2a6 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,1066 @@ +# AGENTS.md - DPDK Code Review Guidelines for AI Tools + +This document provides guidelines for AI-powered code review tools +when reviewing contributions to the Data Plane Development Kit +(DPDK). It is derived from the official DPDK contributor guidelines +and validation scripts. + +## Overview + +DPDK follows a development process modeled on the Linux Kernel. All +patches are reviewed publicly on the mailing list before being +merged. AI review tools should verify compliance with the standards +outlined below. + +## Review Philosophy +- Only comment when you have HIGH CONFIDENCE (>80%) that an issue exists +- Be concise: one sentence per comment when possible +- Focus on actionable feedback, not observations +- When reviewing text, only comment on clarity issues if the text is genuinely + confusing or could lead to errors. + +## Priority Areas (Review These) + +### Security & Safety +- Unsafe code blocks without justification +- Command injection risks (shell commands, user input) +- Path traversal vulnerabilities +- Credential exposure or hard coded secrets +- Missing input validation on external data +- Improper error handling that could leak sensitive info + +### Correctness Issues +- Logic errors that could cause panics or incorrect behavior +- Buffer overflows +- Race conditions +- Resource leaks (files, connections, memory) +- Off-by-one errors or boundary conditions +- Incorrect error propagation +- Changes to API without release notes +- Changes to ABI on non-LTS release +- Overly defensive code that adds unnecessary checks +- Unnecessary comments that just restate what the code already shows (remove them) + +### Architecture & Patterns +- Code that violates existing patterns in the code base +- Missing error handling +- Code that is not safe against signals + +--- + +## Source License Requirements + +### SPDX License Identifiers + +Every source file must begin with an SPDX license identifier, followed +by the copyright notice, then a blank line before other content. + +- SPDX tag on first line (or second line for `#!` scripts) +- Copyright line immediately follows +- Blank line after copyright before any code/includes +- Core libraries and drivers use `BSD-3-Clause` +- Kernel components use `GPL-2.0` +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` + +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 ExampleCorp + */ + +#include <stdio.h> +``` + +For scripts: +```python +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 ExampleCorp + +import sys +``` + +**Do not include boilerplate license text** - the SPDX identifier is sufficient. + +--- + +## Commit Message Requirements + +### Subject Line (First Line) + +| Rule | Limit | +|------|-------| +| Maximum length | **60 characters** | +| Format | `component: lowercase description` | +| Case | Lowercase except acronyms | +| Mood | Imperative (instructions to codebase) | +| Punctuation | **No trailing period** | + +``` +# Good examples +net/ixgbe: fix offload config option name +config: increase max queues per port +net/mlx5: add support for flow counters +app/testpmd: fix memory leak in flow create + +# Bad examples +Fixed the offload config option. # past tense, has period, no prefix +net/ixgbe: Fix Offload Config # uppercase after colon +ixgbe: fix something # wrong prefix, should be net/ixgbe +lib/ethdev: add new feature # wrong prefix, should be ethdev: +``` + +#### Headline Format Errors (from check-git-log.sh) + +The following are flagged as errors: +- Tab characters in subject +- Leading or trailing spaces +- Trailing period (`.`) +- Punctuation marks: `, ; ! ? & |` +- Underscores after the colon (indicates code in subject) +- Missing colon separator +- No space after colon +- Space before colon + +#### Common Prefix Mistakes + +| Wrong | Correct | +|-------|---------| +| `ixgbe:` | `net/ixgbe:` | +| `lib/ethdev:` | `ethdev:` | +| `example:` | `examples/foo:` | +| `apps/` | `app/name:` | +| `app/test:` | `test:` | +| `testpmd:` | `app/testpmd:` | +| `test-pmd:` | `app/testpmd:` | +| `bond:` | `net/bonding:` | + +#### Case-Sensitive Terms + +These terms must use exact capitalization (from `devtools/words-case.txt`): +- `Rx`, `Tx` (not `RX`, `TX`, `rx`, `tx`) +- `VF`, `PF` (not `vf`, `pf`) +- `MAC`, `VLAN`, `RSS`, `API` +- `Linux`, `Windows`, `FreeBSD` +- Check `devtools/words-case.txt` for complete list + +### Commit Body + +| Rule | Limit | +|------|-------| +| Line wrap | **75 characters** | +| Exception | `Fixes:` lines may exceed 75 chars | + +Body guidelines: +- Describe the issue being fixed or feature being added +- Provide enough context for reviewers +- **Do not start the commit message body with "It"** +- **Must end with** `Signed-off-by:` line (real name, not alias) + +### Fixes Tag + +When fixing regressions, use the `Fixes:` tag with a 12-character abbreviated SHA: + +``` +Fixes: abcdefgh1234 ("original commit subject") +``` + +The hash must reference a commit in the current branch, and the subject must match exactly. + +**Finding maintainers**: Use `devtools/get-maintainer.sh` to identify the current subsystem maintainer from the `MAINTAINERS` file, rather than CC'ing the original author: + +```bash +git send-email --to-cmd ./devtools/get-maintainer.sh --cc dev@dpdk.org 000*.patch +``` + +### Required Tags + +``` +# For Coverity issues (required if "coverity" mentioned in body): +Coverity issue: 12345 + +# For Bugzilla issues (required if "bugzilla" mentioned in body): +Bugzilla ID: 12345 + +# For stable release backport candidates: +Cc: stable@dpdk.org + +# For patch dependencies (in commit notes after ---): +Depends-on: series-NNNNN ("Title of the series") +``` + +### Tag Order + +Tags must appear in this order: + +``` +Coverity issue: +Bugzilla ID: +Fixes: +Cc: + <-- blank line required here +Reported-by: +Suggested-by: +Signed-off-by: +Acked-by: +Reviewed-by: +Tested-by: +``` + +**Tag format**: `Tag-name: Full Name <email@domain.com>` + +--- + +## C Coding Style + +### Line Length + +| Context | Limit | +|---------|-------| +| Source code | **100 characters** | +| Commit body | **75 characters** | + +### General Formatting + +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) +- **No trailing whitespace** on lines or at end of files +- Files must end with a new line +- Code style should be consistent within each file + + +### Comments + +```c +/* Most single-line comments look like this. */ + +/* + * VERY important single-line comments look like this. + */ + +/* + * Multi-line comments look like this. Make them real sentences. Fill + * them so they look like real paragraphs. + */ +``` + +### Header File Organization + +Include order (each group separated by blank line): +1. System/libc includes +2. DPDK EAL includes +3. DPDK misc library includes +4. Application-specific includes + +```c +#include <stdio.h> +#include <stdlib.h> + +#include <rte_eal.h> + +#include <rte_ring.h> +#include <rte_mempool.h> + +#include "application.h" +``` + +### Header Guards + +```c +#ifndef _FILE_H_ +#define _FILE_H_ + +/* Code */ + +#endif /* _FILE_H_ */ +``` + +### Naming Conventions + +- **All external symbols** must have `RTE_` or `rte_` prefix +- **Macros**: ALL_UPPERCASE with `RTE_` prefix +- **Functions**: lowercase with underscores only (no CamelCase) +- **Variables**: lowercase with underscores only +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix + +**Exception**: Driver base directories (`drivers/*/base/`) may use different +naming conventions when sharing code across platforms or with upstream vendor code. + +#### Symbol Naming for Static Linking + +Drivers and libraries must not expose global variables that could +clash when statically linked with other DPDK components or +applications. Use consistent and unique prefixes for all exported +symbols to avoid namespace collisions. + +**Good practice**: Use a driver-specific or library-specific prefix for all global variables: + +```c +/* Good - virtio driver uses consistent "virtio_" prefix */ +const struct virtio_ops virtio_legacy_ops = { + .read = virtio_legacy_read, + .write = virtio_legacy_write, + .configure = virtio_legacy_configure, +}; + +const struct virtio_ops virtio_modern_ops = { + .read = virtio_modern_read, + .write = virtio_modern_write, + .configure = virtio_modern_configure, +}; + +/* Good - mlx5 driver uses consistent "mlx5_" prefix */ +struct mlx5_flow_driver_ops mlx5_flow_dv_ops; +``` + +**Bad practice**: Generic names that may clash: + +```c +/* Bad - "ops" is too generic, will clash with other drivers */ +const struct virtio_ops ops = { ... }; + +/* Bad - "legacy_ops" could clash with other legacy implementations */ +const struct virtio_ops legacy_ops = { ... }; + +/* Bad - "driver_config" is not unique */ +struct driver_config config; +``` + +**Guidelines**: +- Prefix all global variables with the driver or library name (e.g., `virtio_`, `mlx5_`, `ixgbe_`) +- Prefix all global functions similarly unless they use the `rte_` namespace +- Internal static variables do not require prefixes as they have file scope +- Consider using the `RTE_` or `rte_` prefix only for symbols that are part of the public DPDK API + +#### Prohibited Terminology + +Do not use non-inclusive naming including: +- `master/slave` → Use: primary/secondary, controller/worker, leader/follower +- `blacklist/whitelist` → Use: denylist/allowlist, blocklist/passlist +- `cripple` → Use: impacted, degraded, restricted, immobolized +- `tribe` → Use: team, squad +- `sanity check` → Use: coherence check, test, verification + + +### Comparisons and Boolean Logic + +```c +/* Pointers - compare explicitly with NULL */ +if (p == NULL) /* Good */ +if (p != NULL) /* Good */ +if (!p) /* Bad - don't use ! on pointers */ + +/* Integers - compare explicitly with zero */ +if (a == 0) /* Good */ +if (a != 0) /* Good */ +if (!a) /* Bad - don't use ! on integers */ + +/* Characters - compare with character constant */ +if (*p == '\0') /* Good */ + +/* Booleans - direct test is acceptable */ +if (flag) /* Good for actual bool types */ +if (!flag) /* Good for actual bool types */ +``` + +### Boolean Usage + +- Using `bool` type is allowed +- Prefer `bool` over `int` when a variable or field is only used as a boolean +- For structure fields, consider if the size/alignment impact is acceptable + +### Indentation and Braces + +```c +/* Control statements - no braces for single statements */ +if (val != NULL) + val = realloc(val, newsize); + +/* Braces on same line as else */ +if (test) + stmt; +else if (bar) { + stmt; + stmt; +} else + stmt; + +/* Switch statements - don't indent case */ +switch (ch) { +case 'a': + aflag = 1; + /* FALLTHROUGH */ +case 'b': + bflag = 1; + break; +default: + usage(); +} + +/* Long conditions - double indent continuation */ +if (really_long_variable_name_1 == really_long_variable_name_2 && + really_long_variable_name_3 == really_long_variable_name_4) + stmt; +``` + +### Variable Declarations + +- Prefer declaring variables inside the basic block where they are used +- Variables may be declared either at the start of the block, or at point of first use (C99 style) +- Both declaration styles are acceptable; consistency within a function is preferred +- Initialize variables only when a meaningful value exists at declaration time +- Use C99 designated initializers for structures + +```c +/* Good - declaration at start of block */ +int ret; +ret = some_function(); + +/* Also good - declaration at point of use (C99 style) */ +for (int i = 0; i < count; i++) + process(i); + +/* Good - declaration in inner block where variable is used */ +if (condition) { + int local_val = compute(); + use(local_val); +} + +/* Bad - unnecessary initialization defeats compiler warnings */ +int ret = 0; +ret = some_function(); /* Compiler won't warn if assignment removed */ +``` + +### Function Format + +- Return type on its own line +- Opening brace on its own line +- Place an empty line between declarations and statements + +```c +static char * +function(int a1, int b1) +{ + char *p; + + p = do_something(a1, b1); + return p; +} +``` + +--- + +## Unnecessary Code Patterns + +The following patterns add unnecessary code, hide bugs, or reduce performance. Avoid them. + +### Unnecessary Variable Initialization + +Do not initialize variables that will be assigned before use. This defeats the compiler's uninitialized variable warnings, hiding potential bugs. + +```c +/* Bad - initialization defeats -Wuninitialized */ +int ret = 0; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - compiler will warn if any path misses assignment */ +int ret; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - meaningful initial value */ +int count = 0; +for (i = 0; i < n; i++) + if (test(i)) + count++; +``` + +### Unnecessary Casts of void * + +In C, `void *` converts implicitly to any pointer type. Casting the result of `malloc()`, `calloc()`, `rte_malloc()`, or similar functions is unnecessary and can hide the error of a missing `#include <stdlib.h>`. + +```c +/* Bad - unnecessary cast */ +struct foo *p = (struct foo *)malloc(sizeof(*p)); +struct bar *q = (struct bar *)rte_malloc(NULL, sizeof(*q), 0); + +/* Good - no cast needed in C */ +struct foo *p = malloc(sizeof(*p)); +struct bar *q = rte_malloc(NULL, sizeof(*q), 0); +``` + +Note: Casts are required in C++ but DPDK is a C project. + +### Zero-Length Arrays vs Variable-Length Arrays + +Zero-length arrays (`int arr[0]`) are a GCC extension. Use C99 flexible array members instead. + +```c +/* Bad - GCC extension */ +struct msg { + int len; + char data[0]; +}; + +/* Good - C99 flexible array member */ +struct msg { + int len; + char data[]; +}; +``` + +### Unnecessary NULL Checks Before free() + +Functions like `free()`, `rte_free()`, and similar deallocation functions accept NULL pointers safely. Do not add redundant NULL checks. + +```c +/* Bad - unnecessary check */ +if (ptr != NULL) + free(ptr); + +if (rte_ptr != NULL) + rte_free(rte_ptr); + +/* Good - free handles NULL */ +free(ptr); +rte_free(rte_ptr); +``` + +### memset Before free() + +Do not call `memset()` to zero memory before freeing it. The compiler may optimize away the `memset()` as a dead store. For security-sensitive data, use `rte_free_sensitive()` which ensures memory is cleared. + +```c +/* Bad - compiler may eliminate memset */ +memset(secret_key, 0, sizeof(secret_key)); +free(secret_key); + +/* Good - for non-sensitive data, just free */ +free(ptr); + +/* Good - for sensitive data, use secure free */ +rte_free_sensitive(secret_key); +``` + +### Appropriate Use of rte_malloc() + +`rte_malloc()` allocates from hugepage memory. Use it only when required: + +- Memory that will be accessed by DMA (NIC descriptors, packet buffers) +- Memory shared between primary and secondary DPDK processes +- Memory requiring specific NUMA node placement + +For general allocations, use standard `malloc()` which is faster and does not consume limited hugepage resources. + +```c +/* Bad - rte_malloc for ordinary data structure */ +struct config *cfg = rte_malloc(NULL, sizeof(*cfg), 0); + +/* Good - standard malloc for control structures */ +struct config *cfg = malloc(sizeof(*cfg)); + +/* Good - rte_malloc for DMA-accessible memory */ +struct rte_mbuf *mbufs = rte_malloc(NULL, n * sizeof(*mbufs), RTE_CACHE_LINE_SIZE); +``` + +### Appropriate Use of rte_memcpy() + +`rte_memcpy()` is optimized for bulk data transfer in the fast path. For general use, standard `memcpy()` is preferred because: + +- Modern compilers optimize `memcpy()` effectively +- `memcpy()` includes bounds checking with `_FORTIFY_SOURCE` +- `memcpy()` handles small fixed-size copies efficiently + +```c +/* Bad - rte_memcpy in control path */ +rte_memcpy(&config, &default_config, sizeof(config)); + +/* Good - standard memcpy for control path */ +memcpy(&config, &default_config, sizeof(config)); + +/* Good - rte_memcpy for packet data in fast path */ +rte_memcpy(rte_pktmbuf_mtod(m, void *), payload, len); +``` + +--- + +## Forbidden Tokens + +### Functions + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `rte_panic()` | Return error codes | lib/, drivers/ | +| `rte_exit()` | Return error codes | lib/, drivers/ | +| `perror()` | `RTE_LOG()` with `strerror(errno)` | lib/, drivers/ (allowed in examples/, app/test/) | +| `printf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | +| `fprintf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | + +### Atomics and Memory Barriers + +| Forbidden | Preferred | +|-----------|-----------| +| `rte_atomic16/32/64_xxx()` | C11 atomics via `rte_atomic_xxx()` | +| `rte_smp_mb()` | `rte_atomic_thread_fence()` | +| `rte_smp_rmb()` | `rte_atomic_thread_fence()` | +| `rte_smp_wmb()` | `rte_atomic_thread_fence()` | +| `__sync_xxx()` | `rte_atomic_xxx()` | +| `__atomic_xxx()` | `rte_atomic_xxx()` | +| `__ATOMIC_RELAXED` etc. | `rte_memory_order_xxx` | +| `__rte_atomic_thread_fence()` | `rte_atomic_thread_fence()` | + +### Threading + +| Forbidden | Preferred | +|-----------|-----------| +| `pthread_create()` | `rte_thread_create()` | +| `pthread_join()` | `rte_thread_join()` | +| `pthread_detach()` | EAL thread functions | +| `pthread_setaffinity_np()` | `rte_thread_set_affinity()` | +| `rte_thread_set_name()` | `rte_thread_set_prefixed_name()` | +| `rte_thread_create_control()` | `rte_thread_create_internal_control()` | + +### Compiler Built-ins and Attributes + +| Forbidden | Preferred | Notes | +|-----------|-----------|-------| +| `__attribute__` | RTE macros in `rte_common.h` | Except in `lib/eal/include/rte_common.h` | +| `__alignof__` | C11 `alignof` | | +| `__typeof__` | `typeof` | | +| `__builtin_*` | EAL macros | Except in `lib/eal/` and `drivers/*/base/` | +| `__reserved` | Different name | Reserved in Windows headers | +| `#pragma` / `_Pragma` | Avoid | Except in `rte_common.h` | + +### Format Specifiers + +| Forbidden | Preferred | +|-----------|-----------| +| `%lld`, `%llu`, `%llx` | `%PRId64`, `%PRIu64`, `%PRIx64` | + +### Headers and Build + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `#include <linux/pci_regs.h>` | `#include <rte_pci.h>` | | +| `install_headers()` | Meson `headers` variable | meson.build | +| `-DALLOW_EXPERIMENTAL_API` | Not in lib/drivers/app | Build flags | +| `allow_experimental_apis` | Not in lib/drivers/app | Meson | +| `#undef XXX` | `// XXX is not set` | config/rte_config.h | +| Driver headers (`*_driver.h`, `*_pmd.h`) | Public API headers | app/, examples/ | + +### Testing + +| Forbidden | Preferred | +|-----------|-----------| +| `REGISTER_TEST_COMMAND` | `REGISTER_<suite_name>_TEST` | + +### Documentation + +| Forbidden | Preferred | +|-----------|-----------| +| `http://...dpdk.org` | `https://...dpdk.org` | +| `//doc.dpdk.org/guides/...` | `:ref:` or `:doc:` Sphinx references | +| `:: file.svg` | `:: file.*` (wildcard extension) | + +--- + +## API Tag Requirements + +### `__rte_experimental` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_experimental +int +rte_new_feature(void); + +/* Wrong - not alone on line */ +__rte_experimental int rte_new_feature(void); + +/* Wrong - in .c file */ +``` + +### `__rte_internal` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_internal +int +internal_function(void); +``` + +### Alignment Attributes + +`__rte_aligned`, `__rte_cache_aligned`, `__rte_cache_min_aligned` may only be used with `struct` or `union` types: + +```c +/* Correct */ +struct __rte_cache_aligned my_struct { + /* ... */ +}; + +/* Wrong */ +int __rte_cache_aligned my_variable; +``` + +### Packed Attributes + +- `__rte_packed_begin` must follow `struct`, `union`, or alignment attributes +- `__rte_packed_begin` and `__rte_packed_end` must be used in pairs +- Cannot use `__rte_packed_begin` with `enum` + +```c +/* Correct */ +struct __rte_packed_begin my_packed_struct { + /* ... */ +} __rte_packed_end; + +/* Wrong - with enum */ +enum __rte_packed_begin my_enum { + /* ... */ +}; +``` + +--- + +## Code Quality Requirements + +### Compilation + +- Each commit must compile independently (for `git bisect`) +- No forward dependencies within a patchset +- Test with multiple targets, compilers, and options +- Use `devtools/test-meson-builds.sh` + +### Testing + +- Add tests to `app/test` unit test framework +- New API functions must be used in `/app` test directory +- New device APIs require at least one driver implementation + +#### Functional Test Infrastructure + +Standalone functional tests should use the `TEST_ASSERT` macros and `unit_test_suite_runner` infrastructure for consistency and proper integration with the DPDK test framework. + +```c +#include <rte_test.h> + +static int +test_feature_basic(void) +{ + int ret; + + ret = rte_feature_init(); + TEST_ASSERT_SUCCESS(ret, "Failed to initialize feature"); + + ret = rte_feature_operation(); + TEST_ASSERT_EQUAL(ret, 0, "Operation returned unexpected value"); + + TEST_ASSERT_NOT_NULL(rte_feature_get_ptr(), + "Feature pointer should not be NULL"); + + return TEST_SUCCESS; +} + +static struct unit_test_suite feature_testsuite = { + .suite_name = "feature_autotest", + .setup = test_feature_setup, + .teardown = test_feature_teardown, + .unit_test_cases = { + TEST_CASE(test_feature_basic), + TEST_CASE(test_feature_advanced), + TEST_CASES_END() + } +}; + +static int +test_feature(void) +{ + return unit_test_suite_runner(&feature_testsuite); +} + +REGISTER_FAST_TEST(feature_autotest, true, true, test_feature); +``` + +Common `TEST_ASSERT` macros: +- `TEST_ASSERT(cond, msg, ...)` - Assert condition is true +- `TEST_ASSERT_SUCCESS(val, msg, ...)` - Assert value equals 0 +- `TEST_ASSERT_FAIL(val, msg, ...)` - Assert value is non-zero +- `TEST_ASSERT_EQUAL(a, b, msg, ...)` - Assert two values are equal +- `TEST_ASSERT_NOT_EQUAL(a, b, msg, ...)` - Assert two values differ +- `TEST_ASSERT_NULL(val, msg, ...)` - Assert value is NULL +- `TEST_ASSERT_NOT_NULL(val, msg, ...)` - Assert value is not NULL + +### Documentation + +- Add Doxygen comments for public APIs +- Update release notes in `doc/guides/rel_notes/` for important changes +- Code and documentation must be updated atomically in same patch +- Only update the **current release** notes file +- Documentation must match the code +- PMD features must match the features matrix in `doc/guides/nics/features/` +- Documentation must match device operations (see `doc/guides/nics/features.rst` for the mapping between features, `eth_dev_ops`, and related APIs) + +### API and Driver Changes + +- New APIs must be marked as `__rte_experimental` +- New APIs must have hooks in `app/testpmd` and tests in the functional test suite +- Changes to existing APIs require release notes +- New drivers or subsystems must have release notes + +### ABI Compatibility and Symbol Exports + +**IMPORTANT**: DPDK uses automatic symbol map generation. Do **NOT** recommend +manually editing `version.map` files - they are auto-generated from source code +annotations. + +#### Symbol Export Macros + +New public functions must be annotated with export macros (defined in +`rte_export.h`). Place the macro on the line immediately before the function +definition in the `.c` file: + +```c +/* For stable ABI symbols */ +RTE_EXPORT_SYMBOL(rte_foo_create) +int +rte_foo_create(struct rte_foo_config *config) +{ + /* ... */ +} + +/* For experimental symbols (include version when first added) */ +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_foo_new_feature, 25.03) +__rte_experimental +int +rte_foo_new_feature(void) +{ + /* ... */ +} + +/* For internal symbols (shared between DPDK components only) */ +RTE_EXPORT_INTERNAL_SYMBOL(rte_foo_internal_helper) +int +rte_foo_internal_helper(void) +{ + /* ... */ +} +``` + +#### Symbol Export Rules + +- `RTE_EXPORT_SYMBOL` - Use for stable ABI functions +- `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, ver)` - Use for new experimental APIs + (version is the DPDK release, e.g., `25.03`) +- `RTE_EXPORT_INTERNAL_SYMBOL` - Use for functions shared between DPDK libs/drivers + but not part of public API +- Export macros go in `.c` files, not headers +- The build system generates linker version maps automatically + +#### What NOT to Review + +- Do **NOT** flag missing `version.map` updates - maps are auto-generated +- Do **NOT** suggest adding symbols to `lib/*/version.map` files + +#### ABI Versioning for Changed Functions + +When changing the signature of an existing stable function, use versioning macros +from `rte_function_versioning.h`: + +- `RTE_VERSION_SYMBOL` - Create versioned symbol for backward compatibility +- `RTE_DEFAULT_SYMBOL` - Mark the new default version + +Follow ABI policy and versioning guidelines in the contributor documentation. +Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable. + +--- + +## Patch Validation Checklist + +### Commit Message + +- [ ] Subject line ≤60 characters +- [ ] Subject is lowercase (except acronyms from words-case.txt) +- [ ] Correct component prefix (e.g., `net/ixgbe:` not `ixgbe:`) +- [ ] No `lib/` prefix for libraries +- [ ] Imperative mood, no trailing period +- [ ] No tabs, leading/trailing spaces, or punctuation marks +- [ ] Body wrapped at 75 characters +- [ ] Body does not start with "It" +- [ ] `Signed-off-by:` present with real name and valid email +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA and exact subject +- [ ] `Coverity issue:` tag present if Coverity mentioned +- [ ] `Bugzilla ID:` tag present if Bugzilla mentioned +- [ ] `Cc: stable@dpdk.org` for stable backport candidates +- [ ] Tags in correct order with blank line separator + +### License + +- [ ] SPDX identifier on first line (or second for scripts) +- [ ] Copyright line follows SPDX +- [ ] Blank line after copyright before code +- [ ] Appropriate license for file type + +### Code Style + +- [ ] Lines ≤100 characters +- [ ] Hard tabs for indentation, spaces for alignment +- [ ] No trailing whitespace +- [ ] Proper include order +- [ ] Header guards present +- [ ] `rte_`/`RTE_` prefix on external symbols +- [ ] Driver/library global variables use unique prefixes (e.g., `virtio_`, `mlx5_`) +- [ ] No prohibited terminology +- [ ] Proper brace style +- [ ] Function return type on own line +- [ ] Explicit comparisons: `== NULL`, `== 0`, `!= NULL`, `!= 0` +- [ ] No forbidden tokens (see table above) +- [ ] No unnecessary code patterns (see section above) + +### API Tags + +- [ ] `__rte_experimental` alone on line, only in headers +- [ ] `__rte_internal` alone on line, only in headers +- [ ] Alignment attributes only on struct/union +- [ ] Packed attributes properly paired +- [ ] New public functions have `RTE_EXPORT_*` macro in `.c` file +- [ ] Experimental functions use `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, version)` + +### Structure + +- [ ] Each commit compiles independently +- [ ] Code and docs updated together +- [ ] Documentation matches code behavior +- [ ] PMD features match `doc/guides/nics/features/` matrix +- [ ] Device operations match documentation (per `features.rst` mappings) +- [ ] Tests added/updated as needed +- [ ] Functional tests use TEST_ASSERT macros and unit_test_suite_runner +- [ ] New APIs marked as `__rte_experimental` +- [ ] New APIs have testpmd hooks and functional tests +- [ ] Current release notes updated for significant changes +- [ ] Release notes updated for API changes +- [ ] Release notes updated for new drivers or subsystems + +--- + +## Meson Build Files + +### Style Requirements + +- 4-space indentation (no tabs) +- Line continuations double-indented +- Lists alphabetically ordered +- Short lists (≤3 items): single line, no trailing comma +- Long lists: one item per line, trailing comma on last item + +```python +# Short list +sources = files('file1.c', 'file2.c') + +# Long list +headers = files( + 'header1.h', + 'header2.h', + 'header3.h', +) +``` + +--- + +## Python Code + +- Must comply with formatting standards +- Use **`black`** for code formatting validation +- Line length acceptable up to 100 characters + +--- + +## Validation Tools + +Run these before submitting: + +```bash +# Check commit messages +devtools/check-git-log.sh -n1 + +# Check patch format and forbidden tokens +devtools/checkpatches.sh -n1 + +# Check maintainers coverage +devtools/check-maintainers.sh + +# Build validation +devtools/test-meson-builds.sh + +# Find maintainers for your patch +devtools/get-maintainer.sh <patch-file> +``` + +--- + +## Severity Levels for AI Review + +**Error** (must fix): +- Missing or malformed SPDX license +- Missing Signed-off-by +- Subject line over 60 characters +- Body lines over 75 characters +- Wrong tag order or format +- Missing required tags (Fixes, Coverity issue, Bugzilla ID) +- Forbidden tokens in code +- `__rte_experimental`/`__rte_internal` in .c files or not alone on line +- Compilation failures +- ABI breaks without proper versioning + +**Warning** (should fix): +- Subject line style issues (case, punctuation) +- Wrong component prefix +- Missing Cc: stable@dpdk.org for fixes +- Documentation gaps +- Documentation does not match code behavior +- PMD features missing from `doc/guides/nics/features/` matrix +- Device operations not documented per `features.rst` mappings +- Missing tests +- Functional tests not using TEST_ASSERT macros or unit_test_suite_runner +- New API not marked as `__rte_experimental` +- New API without testpmd hooks or functional tests +- New public function missing `RTE_EXPORT_*` macro +- API changes without release notes +- New drivers or subsystems without release notes +- Implicit comparisons (`!ptr` instead of `ptr == NULL`) +- Unnecessary variable initialization +- Unnecessary casts of `void *` +- Unnecessary NULL checks before free +- Inappropriate use of `rte_malloc()` or `rte_memcpy()` +- Use of `perror()`, `printf()`, `fprintf()` in libraries or drivers (allowed in examples and test code) +- Driver/library global variables without unique prefixes (static linking clash risk) + +**Do NOT flag** (common false positives): +- Missing `version.map` updates (maps are auto-generated from `RTE_EXPORT_*` macros) +- Suggesting manual edits to any `version.map` file + +**Info** (consider): +- Minor style preferences +- Optimization suggestions +- Alternative approaches + +--- + +# Response Format + +When you identify an issue: +1. **State the problem** (1 sentence) +2. **Why it matters** (1 sentence, only if not obvious) +3. **Suggested fix** (code snippet or specific action) + +Example: +This could panic if the string is NULL. -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v8 2/4] devtools: add multi-provider AI patch review script 2026-01-28 18:21 ` [PATCH v8 0/4] devtools: add AI-assisted code review tools Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 1/4] doc: add AGENTS.md for AI-powered " Stephen Hemminger @ 2026-01-28 18:21 ` Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 3/4] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-28 18:21 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This is an AI generated script to review DPDK patches against the AGENTS.md coding guidelines using AI language models. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini The script reads a patch file and the AGENTS.md guidelines, then submits them to the selected AI provider for review. Results are organized by severity level (Error, Warning, Info) as defined in the guidelines. Features: - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Verbose mode shows token usage statistics - Uses temporary files for API requests to handle large patches - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/analyze-patch.py 0001-net-ixgbe-fix-something.patch ./devtools/analyze-patch.py -p xai my-patch.patch ./devtools/analyze-patch.py -l # list providers Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/analyze-patch.py | 1006 +++++++++++++++++++++++++++++++++++++ 1 file changed, 1006 insertions(+) create mode 100755 devtools/analyze-patch.py diff --git a/devtools/analyze-patch.py b/devtools/analyze-patch.py new file mode 100755 index 0000000000..a4f8c1b3f1 --- /dev/null +++ b/devtools/analyze-patch.py @@ -0,0 +1,1006 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Analyze DPDK patches using AI providers. + +Accepts multiple patch files and generates output for each. +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import getpass +import json +import os +import re +import smtplib +import ssl +import subprocess +import sys +from email.message import EmailMessage +from pathlib import Path +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Map output format to file extension +FORMAT_EXTENSIONS = { + "text": ".txt", + "markdown": ".md", + "html": ".html", + "json": ".json", +} + +# Additional markers for extracting diff/msg (used with --diff flag) +DIFF_MARKERS_INSTRUCTION = """ + +ADDITIONALLY, at the end of your response, include these exact markers for automated extraction: +---COMMIT_MESSAGE_START--- +(corrected commit message if changes needed, or original if acceptable) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(unified diff showing suggested changes to the patch, or "No changes needed" if acceptable) +---UNIFIED_DIFF_END--- +""" + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +SYSTEM_PROMPT = """You are an expert DPDK code reviewer. Analyze patches for \ +compliance with DPDK coding standards and contribution guidelines. Provide \ +clear, actionable feedback organized by severity (Error, Warning, Info) as \ +defined in the guidelines.""" + +FORMAT_INSTRUCTIONS = { + "text": """Provide your review in plain text format.""", + "markdown": """Provide your review in Markdown format with: +- Headers (##) for each severity level (Errors, Warnings, Info) +- Bullet points for individual issues +- Code blocks (```) for code references +- Bold (**) for emphasis on key points""", + "html": """Provide your review in HTML format with: +- <h2> tags for each severity level (Errors, Warnings, Info) +- <ul>/<li> for individual issues +- <pre><code> for code references +- <strong> for emphasis on key points +- Use appropriate semantic HTML tags +- Do NOT include <html>, <head>, or <body> tags - just the content""", + "json": """Provide your review in JSON format with this structure: +{ + "summary": "Brief one-line summary of the review", + "errors": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "warnings": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "info": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "passed_checks": ["list of checks that passed"], + "overall_status": "PASS|WARN|FAIL", + "corrected_commit_message": "corrected commit message if needed", + "suggested_diff": "unified diff of suggested changes" +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """Please review the following DPDK patch file '{patch_name}' \ +against the AGENTS.md guidelines. Check for: + +1. Commit message format (subject line, body, tags) +2. License/copyright compliance +3. C coding style issues +4. API and documentation requirements +5. Any other guideline violations + +{format_instruction} + +--- PATCH CONTENT --- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def get_smtp_config(): + """Get SMTP configuration from git config sendemail settings.""" + config = { + "server": get_git_config("sendemail.smtpserver"), + "port": get_git_config("sendemail.smtpserverport"), + "user": get_git_config("sendemail.smtpuser"), + "encryption": get_git_config("sendemail.smtpencryption"), + "password": get_git_config("sendemail.smtppass"), + } + + # Set defaults + if not config["port"]: + if config["encryption"] == "ssl": + config["port"] = "465" + else: + config["port"] = "587" + + # Convert port to int + if config["port"]: + config["port"] = int(config["port"]) + + return config + + +def build_anthropic_request( + model, + max_tokens, + agents_content, + patch_content, + patch_name, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": SYSTEM_PROMPT}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + patch_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + agents_content, + patch_content, + patch_name, + output_format="text", + include_diff_markers=False, +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + patch_content, + }, + ], + } + + +def build_google_request( + max_tokens, + agents_content, + patch_content, + patch_name, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": SYSTEM_PROMPT}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + patch_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + agents_content, + patch_content, + patch_name, + output_format="text", + include_diff_markers=False, + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + agents_content, + patch_content, + patch_name, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + agents_content, + patch_content, + patch_name, + output_format, + include_diff_markers, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + agents_content, + patch_content, + patch_name, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: " f"{usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: " f"{usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def parse_review_text(review_text): + """Extract commit message and diff from text format response.""" + commit_msg = "" + diff = "" + + # Extract commit message + msg_match = re.search( + r"---COMMIT_MESSAGE_START---\s*\n(.*?)\n---COMMIT_MESSAGE_END---", + review_text, + re.DOTALL, + ) + if msg_match: + commit_msg = msg_match.group(1).strip() + + # Extract unified diff + diff_match = re.search( + r"---UNIFIED_DIFF_START---\s*\n(.*?)\n---UNIFIED_DIFF_END---", + review_text, + re.DOTALL, + ) + if diff_match: + diff = diff_match.group(1).strip() + # Clean up any markdown code fence if present + diff = re.sub(r"^```diff\s*\n?", "", diff) + diff = re.sub(r"\n?```\s*$", "", diff) + + return commit_msg, diff + + +def strip_diff_markers(text): + """Remove the diff/msg extraction markers from text.""" + # Remove commit message markers and content + text = re.sub( + r"\n*---COMMIT_MESSAGE_START---\s*\n.*?\n---COMMIT_MESSAGE_END---\s*", + "", + text, + flags=re.DOTALL, + ) + # Remove unified diff markers and content + text = re.sub( + r"\n*---UNIFIED_DIFF_START---\s*\n.*?\n---UNIFIED_DIFF_END---\s*", + "", + text, + flags=re.DOTALL, + ) + return text.strip() + + +def get_last_message_id(patch_content): + """Extract Message-ID from the last patch in an mbox.""" + msg_ids = re.findall( + r"^Message-I[Dd]:\s*(.+)$", patch_content, re.MULTILINE | re.IGNORECASE + ) + if msg_ids: + msg_id = msg_ids[-1].strip() + # Normalize: remove < > and add them back + msg_id = msg_id.strip("<>") + return f"<{msg_id}>" + return None + + +def get_last_subject(patch_content): + """Extract subject from the last patch in an mbox.""" + # Find all Subject lines with potential continuations + subjects = [] + lines = patch_content.split("\n") + i = 0 + while i < len(lines): + if lines[i].lower().startswith("subject:"): + subject = lines[i][8:].strip() + i += 1 + # Handle continuation lines + while i < len(lines) and lines[i].startswith((" ", "\t")): + subject += lines[i].strip() + i += 1 + subjects.append(subject) + else: + i += 1 + return subjects[-1] if subjects else None + + +def send_email( + to_addrs, + cc_addrs, + from_addr, + subject, + in_reply_to, + body, + dry_run=False, + verbose=False, +): + """Send review email via SMTP using git sendemail config.""" + # Build email message + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(msg.as_string(), file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Get SMTP configuration from git config + smtp_config = get_smtp_config() + + if not smtp_config["server"]: + error("No SMTP server configured. Set git config sendemail.smtpserver") + + server = smtp_config["server"] + port = smtp_config["port"] + user = smtp_config["user"] + encryption = smtp_config["encryption"] + + # Get password from environment or git config, or prompt + password = os.environ.get("SMTP_PASSWORD") or smtp_config["password"] + if user and not password: + password = getpass.getpass(f"SMTP password for {user}@{server}: ") + + if verbose: + print(f"SMTP server: {server}:{port}", file=sys.stderr) + print(f"SMTP user: {user or '(none)'}", file=sys.stderr) + print(f"Encryption: {encryption or 'starttls'}", file=sys.stderr) + + # Collect all recipients + all_recipients = list(to_addrs) + if cc_addrs: + all_recipients.extend(cc_addrs) + + try: + if encryption == "ssl": + # SSL/TLS connection from the start (port 465) + context = ssl.create_default_context() + with smtplib.SMTP_SSL(server, port, context=context) as smtp: + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + else: + # STARTTLS (port 587) or plain (port 25) + with smtplib.SMTP(server, port) as smtp: + smtp.ehlo() + if encryption == "tls" or port == 587: + context = ssl.create_default_context() + smtp.starttls(context=context) + smtp.ehlo() + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + + print(f"Email sent via SMTP ({server}:{port})", file=sys.stderr) + return True + + except smtplib.SMTPAuthenticationError as e: + error(f"SMTP authentication failed: {e}") + except smtplib.SMTPException as e: + error(f"SMTP error: {e}") + except OSError as e: + error(f"Connection error to {server}:{port}: {e}") + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Analyze DPDK patches using AI providers. " + "Accepts multiple files and generates output for each.", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s patch.patch # Review with default settings + %(prog)s *.patch # Review multiple patches + %(prog)s -p openai 0001-*.patch # Use OpenAI ChatGPT + %(prog)s -f markdown -o /tmp patch.patch # Output as Markdown + %(prog)s -f json -o /tmp *.patch # Save JSON reviews to directory + %(prog)s -d -o /tmp patch.patch # Generate diff and msg files + %(prog)s --send-email --to dev@dpdk.org series.mbox + %(prog)s --send-email --to dev@dpdk.org --dry-run series.mbox + +Output files (in output-dir): + <basename>.txt|.md|.html|.json Review in selected format + <basename>.diff Suggested changes (with --diff) + <basename>.msg Corrected commit message (with --diff) + +SMTP Configuration (from git config): + sendemail.smtpserver SMTP server hostname + sendemail.smtpserverport SMTP port (default: 587 for TLS, 465 for SSL) + sendemail.smtpuser SMTP username + sendemail.smtpencryption 'tls' for STARTTLS, 'ssl' for SSL/TLS + sendemail.smtppass SMTP password (or set SMTP_PASSWORD env var) + """, + ) + + parser.add_argument( + "patch_files", + nargs="+", + metavar="patch_file", + help="Patch file(s) to analyze", + ) + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=4096, + help="Max tokens for response (default: 4096)", + ) + parser.add_argument( + "-o", + "--output-dir", + default=".", + help="Output directory for all output files (default: .)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-q", + "--quiet", + action="store_true", + help="Suppress review output to stdout (only write files)", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-d", + "--diff", + action="store_true", + help="Produce .diff and .msg files with suggested corrections", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate AGENTS.md exists + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + # Validate all patch files exist before processing + patch_paths = [] + for patch_file in args.patch_files: + patch_path = Path(patch_file) + if not patch_path.exists(): + error(f"Patch file not found: {patch_file}") + patch_paths.append((patch_file, patch_path)) + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Read AGENTS.md once + agents_content = agents_path.read_text() + output_dir = Path(args.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + provider_name = config["name"] + + # Process each file + num_files = len(patch_paths) + for file_idx, (patch_file, patch_path) in enumerate(patch_paths, 1): + if num_files > 1: + print( + f"\n{'=' * 60}", + file=sys.stderr, + ) + print( + f"Processing file {file_idx}/{num_files}: {patch_file}", + file=sys.stderr, + ) + print( + f"{'=' * 60}", + file=sys.stderr, + ) + + # Determine output filenames + patch_basename = patch_path.stem + diff_file = output_dir / f"{patch_basename}.diff" + msg_file = output_dir / f"{patch_basename}.msg" + + # Read patch content + patch_content = patch_path.read_text() + patch_name = patch_path.name + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Patch file: {patch_file}", file=sys.stderr) + print(f"Output dir: {args.output_dir}", file=sys.stderr) + if args.diff: + print("Generate diff: yes", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + agents_content, + patch_content, + patch_name, + args.output_format, + args.diff, + args.verbose, + ) + + if not review_text: + print( + f"Warning: No response received for {patch_file}", + file=sys.stderr, + ) + continue + + # Determine review output file + format_ext = FORMAT_EXTENSIONS[args.output_format] + review_file = output_dir / f"{patch_basename}{format_ext}" + + # Extract commit message and diff if requested + commit_msg, diff = "", "" + if args.diff: + if args.output_format == "json": + # Will extract from JSON below + pass + else: + # Parse from text format markers + commit_msg, diff = parse_review_text(review_text) + + # For non-text formats with --diff, strip the markers from display output + display_text = review_text + if args.diff and args.output_format in ("markdown", "html"): + display_text = strip_diff_markers(review_text) + + # Build formatted output text + if args.output_format == "text": + output_text = ( + f"=== Patch Review: {patch_name} (via {provider_name}) ===\n\n" + ) + output_text += review_text + elif args.output_format == "json": + # Try to parse JSON response + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + print("Warning: Response is not valid JSON", file=sys.stderr) + review_data = {"raw_response": review_text} + + # Extract diff/msg from JSON if present + if args.diff: + if isinstance(review_data, dict) and "raw_response" not in review_data: + commit_msg = review_data.get("corrected_commit_message", "") + diff = review_data.get("suggested_diff", "") + + # Add metadata + output_data = { + "metadata": { + "patch_file": patch_name, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "markdown": + output_text = f"""# Patch Review: {patch_name} + +*Reviewed by {provider_name} ({model})* + +{display_text} +""" + elif args.output_format == "html": + output_text = f"""<!DOCTYPE html> +<html> +<head> +<meta charset="utf-8"> +<title>Review: {patch_name}</title> +<style> +body {{ font-family: system-ui, sans-serif; max-width: 900px; margin: 2em auto; padding: 0 1em; }} +h1 {{ color: #333; }} +.review-meta {{ color: #666; font-style: italic; }} +pre {{ background: #f5f5f5; padding: 1em; overflow-x: auto; }} +</style> +</head> +<body> +<h1>Patch Review: {patch_name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model})</p> +<div class="review-content"> +{display_text} +</div> +</body> +</html> +""" + + # Write formatted review to file + review_file.write_text(output_text) + print(f"Review written to: {review_file}", file=sys.stderr) + + # Write diff/msg files if requested + if args.diff: + if commit_msg: + msg_file.write_text(commit_msg + "\n") + print(f"Commit message written to: {msg_file}", file=sys.stderr) + else: + msg_file.write_text("# No commit message corrections suggested\n") + print("Note: No commit message corrections extracted", file=sys.stderr) + + if diff and diff.lower() != "no changes needed": + diff_file.write_text(diff + "\n") + print(f"Diff written to: {diff_file}", file=sys.stderr) + else: + diff_file.write_text("# No changes suggested\n") + print("Note: No code changes suggested", file=sys.stderr) + + # Print to stdout unless quiet (or multiple files without verbose) + show_stdout = not args.quiet and (num_files == 1 or args.verbose) + if show_stdout: + print(output_text) + + # Print usage instructions if diff files were generated + if args.diff: + print("\n=== Output Files ===") + print(f"Review: {review_file}") + print(f"Commit message: {msg_file}") + print(f"Diff file: {diff_file}") + + # Send email if requested + if args.send_email: + # Email always uses plain text - warn if different format requested + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + in_reply_to = get_last_message_id(patch_content) + orig_subject = get_last_subject(patch_content) + + if orig_subject: + # Remove [PATCH n/m] prefix + review_subject = re.sub(r"^\[PATCH[^\]]*\]\s*", "", orig_subject) + review_subject = f"[REVIEW] {review_subject}" + else: + review_subject = f"[REVIEW] {patch_name}" + + # Build email body - always use plain text version + email_body = f"""AI-generated review of {patch_name} +Reviewed using {provider_name} ({model}) + +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print(f"In-Reply-To: {in_reply_to}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + in_reply_to, + email_body, + args.dry_run, + args.verbose, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + # Print summary for multiple files + if num_files > 1: + print(f"\n{'=' * 60}", file=sys.stderr) + print(f"Processed {num_files} files", file=sys.stderr) + print(f"Output directory: {output_dir}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v8 3/4] devtools: add compare-reviews.sh for multi-provider analysis 2026-01-28 18:21 ` [PATCH v8 0/4] devtools: add AI-assisted code review tools Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 1/4] doc: add AGENTS.md for AI-powered " Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger @ 2026-01-28 18:21 ` Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-28 18:21 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add script to run patch reviews across multiple AI providers for comparison purposes. The script automatically detects which providers have API keys configured and runs analyze-patch.py for each one. This allows users to compare review quality and feedback across different AI models. Features: - Auto-detects available providers based on environment variables - Optional provider selection via -p/--providers option - Saves individual reviews to separate files with -o/--output - Verbose mode passes through to underlying analyze-patch.py Usage: ./devtools/compare-reviews.sh my-patch.patch ./devtools/compare-reviews.sh -p anthropic,xai my-patch.patch ./devtools/compare-reviews.sh -o ./reviews my-patch.patch Output files are named <patch>-<provider>.txt when using the output directory option. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/compare-reviews.sh | 192 ++++++++++++++++++++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100644 devtools/compare-reviews.sh diff --git a/devtools/compare-reviews.sh b/devtools/compare-reviews.sh new file mode 100644 index 0000000000..a63eeffb71 --- /dev/null +++ b/devtools/compare-reviews.sh @@ -0,0 +1,192 @@ +#!/bin/bash +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +# Compare DPDK patch reviews across multiple AI providers +# Runs analyze-patch.py with each available provider + +set -e + +SCRIPT_DIR="$(dirname "$(readlink -f "$0")")" +ANALYZE_SCRIPT="${SCRIPT_DIR}/analyze-patch.py" +AGENTS_FILE="AGENTS.md" +OUTPUT_DIR="" +PROVIDERS="" +FORMAT="text" + +usage() { + cat <<EOF +Usage: $(basename "$0") [OPTIONS] <patch-file> + +Compare DPDK patch reviews across multiple AI providers. + +Options: + -a, --agents FILE Path to AGENTS.md file (default: AGENTS.md) + -o, --output DIR Save individual reviews to directory + -p, --providers LIST Comma-separated list of providers to use + (default: all providers with API keys set) + -f, --format FORMAT Output format: text, markdown, html, json + (default: text) + -v, --verbose Show verbose output from each provider + -h, --help Show this help message + +Environment Variables: + Set API keys for providers you want to use: + ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY + +Examples: + $(basename "$0") my-patch.patch + $(basename "$0") -p anthropic,openai my-patch.patch + $(basename "$0") -o ./reviews -f markdown my-patch.patch +EOF + exit "${1:-0}" +} + +error() { + echo "Error: $1" >&2 + exit 1 +} + +# Check which providers have API keys configured +get_available_providers() { + local available="" + + [[ -n "$ANTHROPIC_API_KEY" ]] && available="${available}anthropic," + [[ -n "$OPENAI_API_KEY" ]] && available="${available}openai," + [[ -n "$XAI_API_KEY" ]] && available="${available}xai," + [[ -n "$GOOGLE_API_KEY" ]] && available="${available}google," + + # Remove trailing comma + echo "${available%,}" +} + +# Get file extension for format +get_extension() { + case "$1" in + text) echo "txt" ;; + markdown) echo "md" ;; + html) echo "html" ;; + json) echo "json" ;; + *) echo "txt" ;; + esac +} + +# Parse command line options +VERBOSE="" + +while [[ $# -gt 0 ]]; do + case "$1" in + -a|--agents) + AGENTS_FILE="$2" + shift 2 + ;; + -o|--output) + OUTPUT_DIR="$2" + shift 2 + ;; + -p|--providers) + PROVIDERS="$2" + shift 2 + ;; + -f|--format) + FORMAT="$2" + shift 2 + ;; + -v|--verbose) + VERBOSE="-v" + shift + ;; + -h|--help) + usage 0 + ;; + -*) + error "Unknown option: $1" + ;; + *) + break + ;; + esac +done + +# Check for required arguments +if [[ $# -lt 1 ]]; then + echo "Error: No patch file specified" >&2 + usage 1 +fi + +PATCH_FILE="$1" + +if [[ ! -f "$PATCH_FILE" ]]; then + error "Patch file not found: $PATCH_FILE" +fi + +if [[ ! -f "$ANALYZE_SCRIPT" ]]; then + error "analyze-patch.py not found: $ANALYZE_SCRIPT" +fi + +# Validate format +case "$FORMAT" in + text|markdown|html|json) ;; + *) error "Invalid format: $FORMAT (must be text, markdown, html, or json)" ;; +esac + +# Get providers to use +if [[ -z "$PROVIDERS" ]]; then + PROVIDERS=$(get_available_providers) +fi + +if [[ -z "$PROVIDERS" ]]; then + error "No API keys configured. Set at least one of: "\ +"ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY" +fi + +# Create output directory if specified +if [[ -n "$OUTPUT_DIR" ]]; then + mkdir -p "$OUTPUT_DIR" +fi + +PATCH_BASENAME=$(basename "$PATCH_FILE") +PATCH_STEM="${PATCH_BASENAME%.*}" +EXT=$(get_extension "$FORMAT") + +echo "Reviewing patch: $PATCH_BASENAME" +echo "Providers: $PROVIDERS" +echo "Format: $FORMAT" +echo "========================================" +echo "" + +# Run review for each provider +IFS=',' read -ra PROVIDER_LIST <<< "$PROVIDERS" +for provider in "${PROVIDER_LIST[@]}"; do + echo ">>> Running review with: $provider" + echo "" + + if [[ -n "$OUTPUT_DIR" ]]; then + OUTPUT_FILE="${OUTPUT_DIR}/${PATCH_STEM}-${provider}.${EXT}" + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" | tee "$OUTPUT_FILE" + echo "" + echo "Saved to: $OUTPUT_FILE" + else + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" + fi + + echo "" + echo "========================================" + echo "" +done + +echo "Review comparison complete." + +if [[ -n "$OUTPUT_DIR" ]]; then + echo "All reviews saved to: $OUTPUT_DIR" +fi -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v8 4/4] devtools: add multi-provider AI documentation review script 2026-01-28 18:21 ` [PATCH v8 0/4] devtools: add AI-assisted code review tools Stephen Hemminger ` (2 preceding siblings ...) 2026-01-28 18:21 ` [PATCH v8 3/4] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger @ 2026-01-28 18:21 ` Stephen Hemminger 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-28 18:21 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add review-doc.py script that reviews DPDK documentation files for spelling, grammar, technical correctness, and clarity using AI language models. Supports batch processing of multiple files. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini Output formats (-f/--format): - text: plain text with extractable diff/msg markers (default) - markdown: formatted review document - html: complete HTML document with styling - json: structured data with metadata For each input file, the script produces: - <basename>.{txt,md,html,json}: review in selected format - <basename>.diff: unified diff (text/json, or with -d flag) - <basename>.msg: commit message (text/json, or with -d flag) The commit message prefix is automatically determined from the file path (e.g., doc/guides/prog_guide: for programmer's guide). Features: - Multiple file processing with glob support - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Configurable output directory via -o/--output-dir option - Output format selection via -f/--format option - Force diff/msg generation via -d/--diff option - Quiet mode (-q) suppresses stdout output - Verbose mode (-v) shows token usage and API details - Email integration using git sendemail configuration - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/review-doc.py doc/guides/prog_guide/mempool_lib.rst ./devtools/review-doc.py doc/guides/nics/*.rst ./devtools/review-doc.py -f html -d -o /tmp doc/guides/nics/*.rst ./devtools/review-doc.py --send-email --to dev@dpdk.org file.rst Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/review-doc.py | 1098 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1098 insertions(+) create mode 100755 devtools/review-doc.py diff --git a/devtools/review-doc.py b/devtools/review-doc.py new file mode 100755 index 0000000000..1366aa0f85 --- /dev/null +++ b/devtools/review-doc.py @@ -0,0 +1,1098 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Review DPDK documentation files using AI providers. + +Produces a diff file and commit message compliant with DPDK standards. +Accepts multiple documentation files and generates output for each. +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import getpass +import json +import os +import re +import smtplib +import ssl +import subprocess +import sys +from email.message import EmailMessage +from pathlib import Path +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Map output format to file extension +FORMAT_EXTENSIONS = { + "text": ".txt", + "markdown": ".md", + "html": ".html", + "json": ".json", +} + +# Additional markers for extracting diff/msg (used with --diff flag) +DIFF_MARKERS_INSTRUCTION = """ + +ADDITIONALLY, at the end of your response, include these exact markers for automated extraction: +---COMMIT_MESSAGE_START--- +(same commit message as above) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(same unified diff as above) +---UNIFIED_DIFF_END--- +""" + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# Commit prefix mappings based on file path +COMMIT_PREFIX_MAP = [ + ("doc/guides/prog_guide/", "doc/guides/prog_guide:"), + ("doc/guides/sample_app_ug/", "doc/guides/sample_app:"), + ("doc/guides/nics/", "doc/guides/nics:"), + ("doc/guides/cryptodevs/", "doc/guides/cryptodevs:"), + ("doc/guides/compressdevs/", "doc/guides/compressdevs:"), + ("doc/guides/eventdevs/", "doc/guides/eventdevs:"), + ("doc/guides/rawdevs/", "doc/guides/rawdevs:"), + ("doc/guides/bbdevs/", "doc/guides/bbdevs:"), + ("doc/guides/gpus/", "doc/guides/gpus:"), + ("doc/guides/dmadevs/", "doc/guides/dmadevs:"), + ("doc/guides/regexdevs/", "doc/guides/regexdevs:"), + ("doc/guides/mldevs/", "doc/guides/mldevs:"), + ("doc/guides/rel_notes/", "doc/guides/rel_notes:"), + ("doc/guides/linux_gsg/", "doc/guides/linux_gsg:"), + ("doc/guides/freebsd_gsg/", "doc/guides/freebsd_gsg:"), + ("doc/guides/windows_gsg/", "doc/guides/windows_gsg:"), + ("doc/guides/tools/", "doc/guides/tools:"), + ("doc/guides/testpmd_app_ug/", "doc/guides/testpmd:"), + ("doc/guides/howto/", "doc/guides/howto:"), + ("doc/guides/contributing/", "doc/guides/contributing:"), + ("doc/guides/platform/", "doc/guides/platform:"), + ("doc/guides/", "doc:"), + ("doc/api/", "doc/api:"), + ("doc/", "doc:"), +] + +SYSTEM_PROMPT = """\ +You are an expert technical documentation reviewer for DPDK. +Your task is to review documentation files and suggest improvements for: +- Spelling errors +- Grammar issues +- Technical correctness +- Clarity and readability +- Consistency with DPDK terminology + +IMPORTANT COMMIT MESSAGE RULES (from check-git-log.sh): +- Subject line MUST be ≤60 characters +- Format: "prefix: lowercase description" +- First word after colon must be lowercase (except acronyms like Rx, Tx, VF, MAC, API) +- Use imperative mood (e.g., "fix typo" not "fixed typo" or "fixes typo") +- NO trailing period on subject line +- NO punctuation marks: , ; ! ? & | +- NO underscores in subject after colon +- Body lines wrapped at 75 characters +- Body must NOT start with "It" +- Do NOT include Signed-off-by (user adds via git commit --sign) +- Only use "Fixes:" tag for actual errors in documentation, not style improvements + +Case-sensitive terms (must use exact case): +- Rx, Tx (not RX, TX, rx, tx) +- VF, PF (not vf, pf) +- MAC, VLAN, RSS, API +- Linux, Windows, FreeBSD + +For style/clarity improvements, do NOT use Fixes tag. +For actual errors (wrong information, broken examples), include Fixes tag \ +if you can identify the commit.""" + +FORMAT_INSTRUCTIONS = { + "text": """ +OUTPUT FORMAT: +You must output exactly two sections: + +1. COMMIT_MESSAGE section containing the complete commit message +2. UNIFIED_DIFF section containing the unified diff + +Use these exact markers: +---COMMIT_MESSAGE_START--- +(commit message here) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(unified diff here) +---UNIFIED_DIFF_END--- + +The diff should be in unified format that can be applied with "git apply". +If no changes are needed, output empty sections with a note.""", + "markdown": """ +OUTPUT FORMAT: +Provide your review in Markdown format with: + +## Summary +Brief description of changes + +## Commit Message +``` +(complete commit message here, ready to use) +``` + +## Changes +For each change: +### Issue N: Brief title +- **Location**: file path and line +- **Problem**: description +- **Fix**: suggested correction + +## Unified Diff +```diff +(unified diff here) +```""", + "html": """ +OUTPUT FORMAT: +Provide your review in HTML format with: +- <h2> for sections (Summary, Commit Message, Changes, Diff) +- <pre><code> for commit message and diff +- <ul>/<li> for individual issues +- Do NOT include <html>, <head>, or <body> tags - just the content + +Include sections for: Summary, Commit Message, Changes, Unified Diff""", + "json": """ +OUTPUT FORMAT: +Provide your review as JSON with this structure: +{ + "summary": "Brief description of changes", + "commit_message": "Complete commit message ready to use", + "changes": [ + { + "type": "spelling|grammar|technical|clarity|style", + "location": "line number or section", + "original": "original text", + "suggested": "corrected text", + "reason": "why this change" + } + ], + "diff": "unified diff as a string", + "stats": { + "total_issues": 0, + "spelling": 0, + "grammar": 0, + "technical": 0, + "clarity": 0 + } +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """\ +Review the following DPDK documentation file and provide improvements. + +File path: {doc_file} +Commit message prefix to use: {commit_prefix} + +{format_instruction} + +---DOCUMENT CONTENT--- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def get_smtp_config(): + """Get SMTP configuration from git config sendemail settings.""" + config = { + "server": get_git_config("sendemail.smtpserver"), + "port": get_git_config("sendemail.smtpserverport"), + "user": get_git_config("sendemail.smtpuser"), + "encryption": get_git_config("sendemail.smtpencryption"), + "password": get_git_config("sendemail.smtppass"), + } + + # Set defaults + if not config["port"]: + if config["encryption"] == "ssl": + config["port"] = "465" + else: + config["port"] = "587" + + # Convert port to int + if config["port"]: + config["port"] = int(config["port"]) + + return config + + +def get_commit_prefix(filepath): + """Determine commit message prefix from file path.""" + for prefix_path, prefix in COMMIT_PREFIX_MAP: + if filepath.startswith(prefix_path): + return prefix + return "doc:" + + +def build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": SYSTEM_PROMPT}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + doc_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + doc_content, + }, + ], + } + + +def build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": SYSTEM_PROMPT}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + doc_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: " f"{usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: " f"{usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def parse_review_text(review_text): + """Extract commit message and diff from text format response.""" + commit_msg = "" + diff = "" + + # Extract commit message + msg_match = re.search( + r"---COMMIT_MESSAGE_START---\s*\n(.*?)\n---COMMIT_MESSAGE_END---", + review_text, + re.DOTALL, + ) + if msg_match: + commit_msg = msg_match.group(1).strip() + + # Extract unified diff + diff_match = re.search( + r"---UNIFIED_DIFF_START---\s*\n(.*?)\n---UNIFIED_DIFF_END---", + review_text, + re.DOTALL, + ) + if diff_match: + diff = diff_match.group(1).strip() + # Clean up any markdown code fence if present + diff = re.sub(r"^```diff\s*\n?", "", diff) + diff = re.sub(r"\n?```\s*$", "", diff) + + return commit_msg, diff + + +def strip_diff_markers(text): + """Remove the diff/msg extraction markers from text.""" + # Remove commit message markers and content + text = re.sub( + r"\n*---COMMIT_MESSAGE_START---\s*\n.*?\n---COMMIT_MESSAGE_END---\s*", + "", + text, + flags=re.DOTALL, + ) + # Remove unified diff markers and content + text = re.sub( + r"\n*---UNIFIED_DIFF_START---\s*\n.*?\n---UNIFIED_DIFF_END---\s*", + "", + text, + flags=re.DOTALL, + ) + return text.strip() + + +def send_email( + to_addrs, + cc_addrs, + from_addr, + subject, + in_reply_to, + body, + dry_run=False, + verbose=False, +): + """Send review email via SMTP using git sendemail config.""" + # Build email message + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(msg.as_string(), file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Get SMTP configuration from git config + smtp_config = get_smtp_config() + + if not smtp_config["server"]: + error("No SMTP server configured. Set git config sendemail.smtpserver") + + server = smtp_config["server"] + port = smtp_config["port"] + user = smtp_config["user"] + encryption = smtp_config["encryption"] + + # Get password from environment or git config, or prompt + password = os.environ.get("SMTP_PASSWORD") or smtp_config["password"] + if user and not password: + password = getpass.getpass(f"SMTP password for {user}@{server}: ") + + if verbose: + print(f"SMTP server: {server}:{port}", file=sys.stderr) + print(f"SMTP user: {user or '(none)'}", file=sys.stderr) + print(f"Encryption: {encryption or 'starttls'}", file=sys.stderr) + + # Collect all recipients + all_recipients = list(to_addrs) + if cc_addrs: + all_recipients.extend(cc_addrs) + + try: + if encryption == "ssl": + # SSL/TLS connection from the start (port 465) + context = ssl.create_default_context() + with smtplib.SMTP_SSL(server, port, context=context) as smtp: + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + else: + # STARTTLS (port 587) or plain (port 25) + with smtplib.SMTP(server, port) as smtp: + smtp.ehlo() + if encryption == "tls" or port == 587: + context = ssl.create_default_context() + smtp.starttls(context=context) + smtp.ehlo() + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + + print(f"Email sent via SMTP ({server}:{port})", file=sys.stderr) + return True + + except smtplib.SMTPAuthenticationError as e: + error(f"SMTP authentication failed: {e}") + except smtplib.SMTPException as e: + error(f"SMTP error: {e}") + except OSError as e: + error(f"Connection error to {server}:{port}: {e}") + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Review DPDK documentation files using AI providers. " + "Accepts multiple files and generates output for each.", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s doc/guides/prog_guide/mempool_lib.rst + %(prog)s doc/guides/nics/*.rst # Review all NIC docs + %(prog)s -p openai -o /tmp doc/guides/nics/ixgbe.rst doc/guides/nics/i40e.rst + %(prog)s -f html -d -o /tmp/reviews doc/guides/nics/*.rst # HTML + diff files + %(prog)s -f json -o /tmp doc/guides/howto/flow_bifurcation.rst + %(prog)s --send-email --to dev@dpdk.org doc/guides/nics/ixgbe.rst + +Output files (in output-dir): + <basename>.txt|.md|.html|.json Review in selected format + <basename>.diff Unified diff (text/json, or with --diff) + <basename>.msg Commit message (text/json, or with --diff) + +After review: + git apply <basename>.diff + git commit -sF <basename>.msg + +SMTP Configuration (from git config): + sendemail.smtpserver SMTP server hostname + sendemail.smtpserverport SMTP port (default: 587 for TLS, 465 for SSL) + sendemail.smtpuser SMTP username + sendemail.smtpencryption 'tls' for STARTTLS, 'ssl' for SSL/TLS + sendemail.smtppass SMTP password (or set SMTP_PASSWORD env var) + +Example git config: + git config --global sendemail.smtpserver smtp.gmail.com + git config --global sendemail.smtpserverport 587 + git config --global sendemail.smtpuser yourname@gmail.com + git config --global sendemail.smtpencryption tls + """, + ) + + parser.add_argument( + "doc_files", + nargs="+", + metavar="doc_file", + help="Documentation file(s) to review", + ) + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=8192, + help="Max tokens for response (default: 8192)", + ) + parser.add_argument( + "-o", + "--output-dir", + default=".", + help="Output directory for all output files (default: .)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-q", + "--quiet", + action="store_true", + help="Suppress review output to stdout (only write files)", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-d", + "--diff", + action="store_true", + help="Always produce .diff and .msg files (automatic for text/json)", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + # Validate all doc files exist before processing + doc_paths = [] + for doc_file in args.doc_files: + doc_path = Path(doc_file) + if not doc_path.exists(): + error(f"Documentation file not found: {doc_file}") + doc_paths.append((doc_file, doc_path)) + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Read AGENTS.md once + agents_content = agents_path.read_text() + output_dir = Path(args.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + provider_name = config["name"] + + # Process each file + num_files = len(doc_paths) + for file_idx, (doc_file, doc_path) in enumerate(doc_paths, 1): + if num_files > 1: + print( + f"\n{'=' * 60}", + file=sys.stderr, + ) + print( + f"Processing file {file_idx}/{num_files}: {doc_file}", + file=sys.stderr, + ) + print( + f"{'=' * 60}", + file=sys.stderr, + ) + + # Determine output filenames + doc_basename = doc_path.stem + diff_file = output_dir / f"{doc_basename}.diff" + msg_file = output_dir / f"{doc_basename}.msg" + + # Get commit prefix + commit_prefix = get_commit_prefix(doc_file) + + # Read doc content + doc_content = doc_path.read_text() + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Doc file: {doc_file}", file=sys.stderr) + print(f"Commit prefix: {commit_prefix}", file=sys.stderr) + print(f"Output dir: {args.output_dir}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + args.output_format, + args.diff, + args.verbose, + ) + + if not review_text: + print( + f"Warning: No response received for {doc_file}", + file=sys.stderr, + ) + continue + + # Determine review output file + format_ext = FORMAT_EXTENSIONS[args.output_format] + review_file = output_dir / f"{doc_basename}{format_ext}" + + # Determine if we should write diff/msg files + write_diff_msg = args.diff or args.output_format in ("text", "json") + + # Extract commit message and diff first (before stripping markers) + commit_msg, diff = "", "" + if write_diff_msg: + if args.output_format == "json": + # Will extract from JSON below + pass + else: + # Parse from text format markers + commit_msg, diff = parse_review_text(review_text) + + # For non-text formats with --diff, strip the markers from display output + display_text = review_text + if args.diff and args.output_format in ("markdown", "html"): + display_text = strip_diff_markers(review_text) + + # Build formatted output text + if args.output_format == "text": + output_text = review_text + elif args.output_format == "json": + # Try to parse JSON response + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + print("Warning: Response is not valid JSON", file=sys.stderr) + review_data = {"raw_response": review_text} + + # Extract diff/msg from JSON if present + if write_diff_msg: + if isinstance(review_data, dict) and "raw_response" not in review_data: + commit_msg = review_data.get("commit_message", "") + diff = review_data.get("diff", "") + + # Add metadata + output_data = { + "metadata": { + "doc_file": doc_file, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "commit_prefix": commit_prefix, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "markdown": + output_text = f"""# Documentation Review: {doc_path.name} + +*Reviewed by {provider_name} ({model})* + +{display_text} +""" + elif args.output_format == "html": + output_text = f"""<!DOCTYPE html> +<html> +<head> +<meta charset="utf-8"> +<title>Review: {doc_path.name}</title> +<style> +body {{ font-family: system-ui, sans-serif; max-width: 900px; margin: 2em auto; padding: 0 1em; }} +h1 {{ color: #333; }} +.review-meta {{ color: #666; font-style: italic; }} +pre {{ background: #f5f5f5; padding: 1em; overflow-x: auto; }} +</style> +</head> +<body> +<h1>Documentation Review: {doc_path.name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model})</p> +<div class="review-content"> +{display_text} +</div> +</body> +</html> +""" + + # Write formatted review to file + review_file.write_text(output_text) + print(f"Review written to: {review_file}", file=sys.stderr) + + # Write diff/msg files + if write_diff_msg: + if commit_msg: + msg_file.write_text(commit_msg + "\n") + print(f"Commit message written to: {msg_file}", file=sys.stderr) + else: + msg_file.write_text("# No commit message generated\n") + print("Warning: Could not extract commit message", file=sys.stderr) + + if diff: + diff_file.write_text(diff + "\n") + print(f"Diff written to: {diff_file}", file=sys.stderr) + else: + diff_file.write_text("# No changes suggested\n") + print("Warning: Could not extract diff", file=sys.stderr) + + # Print to stdout unless quiet (or multiple files without verbose) + show_stdout = not args.quiet and (num_files == 1 or args.verbose) + if show_stdout: + print( + f"\n=== Documentation Review: {doc_path.name} " + f"(via {provider_name}) ===" + ) + print(output_text) + + # Print usage instructions for text format + if args.output_format == "text": + print("\n=== Output Files ===") + print(f"Commit message: {msg_file}") + print(f"Diff file: {diff_file}") + print("\nTo apply changes:") + print(f" git apply {diff_file}") + print(f" git commit -sF {msg_file}") + + # Send email if requested + if args.send_email: + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + review_subject = f"[REVIEW] {commit_prefix} {doc_path.name}" + + # Build email body + email_body = f"""AI-generated documentation review of {doc_file} +Reviewed using {provider_name} ({model}) + +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + None, + email_body, + args.dry_run, + args.verbose, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + # Print summary for multiple files + if num_files > 1: + print(f"\n{'=' * 60}", file=sys.stderr) + print(f"Processed {num_files} files", file=sys.stderr) + print(f"Output directory: {output_dir}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v9 0/4] devtools: add AI-assisted code review tools 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger ` (7 preceding siblings ...) 2026-01-28 18:21 ` [PATCH v8 0/4] devtools: add AI-assisted code review tools Stephen Hemminger @ 2026-01-29 20:38 ` Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 1/4] doc: add AGENTS.md for AI-powered " Stephen Hemminger ` (3 more replies) 2026-01-30 17:40 ` [PATCH v10 0/4] devtools: add AI-assisted code review infrastructure Stephen Hemminger ` (2 subsequent siblings) 11 siblings, 4 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-29 20:38 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This series adds infrastructure for AI-assisted code review of DPDK patches and documentation. It provides guidelines for AI tools and scripts to interface with multiple AI providers. The AGENTS.md file consolidates DPDK coding standards, commit message requirements, and common review feedback into a format optimized for AI code review tools. The accompanying scripts enable automated patch analysis using Claude, ChatGPT, Grok, or Gemini. Patches: 1. AGENTS.md - Guidelines document for AI review tools 2. analyze-patch.py - Multi-provider patch review script 3. compare-reviews.sh - Compare reviews across providers 4. review-doc.py - Documentation review with diff output Changes in v9: - Revised prompt to cleanup false positives - Added flag to analyze patch to work for LTS releases --lts and review date Changes in v8: - Batch processing: both scripts now accept multiple files and generate separate output for each - Changed -o/--output to -o/--output-dir for organized file output - Added -d/--diff option to force .diff and .msg file generation - Added -q/--quiet option to suppress stdout when processing batches - Added strip_diff_markers() to clean output in non-text formats - Expanded AGENTS.md with additional guidance and examples - Improved help text with better examples and documentation Changes in v7: - Add "Review Philosophy" section to AGENTS.md with guidance on confidence levels and concise feedback - Add "Priority Areas" section covering security, correctness, and architecture concerns that reviewers should focus on - Minor code cleanups in analyze-patch.py Changes in v6: - Expanded AGENTS.md with "Unnecessary Code Patterns" section - Added guidance on rte_malloc() and rte_memcpy() appropriate use - Added symbol naming guidelines for static linking - Improved email sending in review-doc.py with native SMTP support Changes in v5: - Added review-doc.py for documentation review - Added email sending capability to scripts - Expanded forbidden tokens documentation Changes in v4: - Added compare-reviews.sh script - Multiple output formats (text, markdown, html, json) - Improved error handling Changes in v3: - Added support for OpenAI, xAI Grok, and Google Gemini - Added prompt caching for Anthropic to reduce costs - Restructured AGENTS.md for better machine parsing Changes in v2: - Split into separate patches - Added verbose mode with token usage statistics Stephen Hemminger (4): doc: add AGENTS.md for AI-powered code review tools devtools: add multi-provider AI patch review script devtools: add compare-reviews.sh for multi-provider analysis devtools: add multi-provider AI documentation review script AGENTS.md | 1198 +++++++++++++++++++++++++++++++++++ devtools/analyze-patch.py | 862 +++++++++++++++++++++++++ devtools/compare-reviews.sh | 192 ++++++ devtools/review-doc.py | 1098 ++++++++++++++++++++++++++++++++ 4 files changed, 3350 insertions(+) create mode 100644 AGENTS.md create mode 100755 devtools/analyze-patch.py create mode 100644 devtools/compare-reviews.sh create mode 100755 devtools/review-doc.py -- 2.51.0 ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v9 1/4] doc: add AGENTS.md for AI-powered code review tools 2026-01-29 20:38 ` [PATCH v9 0/4] devtools: add AI-assisted code review tools Stephen Hemminger @ 2026-01-29 20:38 ` Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger ` (2 subsequent siblings) 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-29 20:38 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a structured reference document that enables AI code review tools to validate DPDK contributions against project standards. This document consolidates requirements from multiple sources into a machine-readable format optimized for automated validation workflows. The AGENTS.md file synthesizes guidelines from: - DPDK Contributing Code documentation (patches.rst) - DPDK Coding Style guidelines (coding_style.rst) - DPDK validation scripts (check-git-log.sh, checkpatches.sh) - Linux kernel patch submission process - SPDX License Identifier specification - DPDK Coccinelle scripts (cocci) - common items spotted on mailing list review Key sections include: - SPDX license and copyright header requirements - Commit message format with precise limits (60 char subject, 75 char body) and tag ordering rules - C coding style including explicit comparison requirements - Forbidden tokens table derived from checkpatches.sh - API tag placement rules for experimental and internal APIs - Patch validation checklists with severity levels The forbidden tokens section documents restrictions on deprecated atomics, logging functions, threading APIs, and compiler built-ins that are checked by the existing checkpatches.sh infrastructure. Severity levels (error/warning/info) align with the exit codes and messaging from check-git-log.sh and checkpatches.sh to help automated tools prioritize feedback appropriately. References: - https://doc.dpdk.org/guides/contributing/patches.html - https://doc.dpdk.org/guides/contributing/coding_style.html - devtools/check-git-log.sh - devtools/checkpatches.sh - devtools/cocci/ Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- AGENTS.md | 1198 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1198 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..ccaa51336d --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,1198 @@ +# AGENTS.md - DPDK Code Review Guidelines for AI Tools + +This document provides guidelines for AI-powered code review tools +when reviewing contributions to the Data Plane Development Kit +(DPDK). It is derived from the official DPDK contributor guidelines +and validation scripts. + +## Overview + +DPDK follows a development process modeled on the Linux Kernel. All +patches are reviewed publicly on the mailing list before being +merged. AI review tools should verify compliance with the standards +outlined below. + +## Review Philosophy +- Only comment when you have HIGH CONFIDENCE (>80%) that an issue exists +- Be concise: one sentence per comment when possible +- Focus on actionable feedback, not observations +- When reviewing text, only comment on clarity issues if the text is genuinely + confusing or could lead to errors. +- Do NOT comment on copyright years unless outside valid range (2013 to current year) +- Do NOT report an issue then contradict yourself - if something is acceptable, do not mention it at all +- Do NOT include items in Errors/Warnings that you then say are "acceptable" or "correct" +- Do NOT mention things that are correct or "not an issue" - only report actual problems +- Do NOT speculate about contributor circumstances (employment, company policies, etc.) +- Before adding any item to your review, ask: "Is this actually wrong?" If no, omit it entirely. +- VERIFY before reporting: For subject line length, COUNT the characters first. If ≤60, do not mention it. +- NEVER write "(Correction: ...)" - if you need to correct yourself, simply omit the item entirely + +## Priority Areas (Review These) + +### Security & Safety +- Unsafe code blocks without justification +- Command injection risks (shell commands, user input) +- Path traversal vulnerabilities +- Credential exposure or hard coded secrets +- Missing input validation on external data +- Improper error handling that could leak sensitive info + +### Correctness Issues +- Logic errors that could cause panics or incorrect behavior +- Buffer overflows +- Race conditions +- Resource leaks (files, connections, memory) +- Off-by-one errors or boundary conditions +- Incorrect error propagation +- Changes to API without release notes +- Changes to ABI on non-LTS release +- Usage of deprecated APIs when replacements exist +- Overly defensive code that adds unnecessary checks +- Unnecessary comments that just restate what the code already shows (remove them) + +### Architecture & Patterns +- Code that violates existing patterns in the code base +- Missing error handling +- Code that is not safe against signals + +--- + +## Source License Requirements + +### SPDX License Identifiers + +Every source file must begin with an SPDX license identifier, followed +by the copyright notice, then a blank line before other content. + +- SPDX tag on first line (or second line for `#!` scripts) +- Copyright line immediately follows +- Blank line after copyright before any code/includes +- Core libraries and drivers use `BSD-3-Clause` +- Kernel components use `GPL-2.0` +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` + +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 ExampleCorp + */ + +#include <stdio.h> +``` + +For scripts: +```python +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 ExampleCorp + +import sys +``` + +**Do not include boilerplate license text** - the SPDX identifier is sufficient. + +**Do NOT flag copyright years** - Copyright years reflect when code was written. Valid years range from 2013 (DPDK's first release) through the current year. Only flag years outside this range (e.g., years before 2013 or future years beyond the current date). + +**Copyright holders can be individuals or organizations** - Both are equally valid. NEVER comment on, question, or speculate about copyright holders. Do not mention employer policies, company resources, or suggest copyright "should" be assigned differently. The copyright holder's choice is not subject to review. + +--- + +## Commit Message Requirements + +### Subject Line (First Line) + +| Rule | Limit | +|------|-------| +| Maximum length | **60 characters** | +| Format | `component: lowercase description` | +| Case | Lowercase except acronyms | +| Mood | Imperative (instructions to codebase) | +| Punctuation | **No trailing period** | + +**Before flagging subject line length**: Actually count the characters. Only flag if >60. Do not flag then correct yourself. + +``` +# Good examples +net/ixgbe: fix offload config option name +config: increase max queues per port +net/mlx5: add support for flow counters +app/testpmd: fix memory leak in flow create + +# Bad examples +Fixed the offload config option. # past tense, has period, no prefix +net/ixgbe: Fix Offload Config # uppercase after colon +ixgbe: fix something # wrong prefix, should be net/ixgbe +lib/ethdev: add new feature # wrong prefix, should be ethdev: +``` + +#### Headline Format Errors (from check-git-log.sh) + +The following are flagged as errors: +- Tab characters in subject +- Leading or trailing spaces +- Trailing period (`.`) +- Punctuation marks: `, ; ! ? & |` +- Underscores after the colon (indicates code in subject) +- Missing colon separator +- No space after colon +- Space before colon + +#### Common Prefix Mistakes + +| Wrong | Correct | +|-------|---------| +| `ixgbe:` | `net/ixgbe:` | +| `lib/ethdev:` | `ethdev:` | +| `example:` | `examples/foo:` | +| `apps/` | `app/name:` | +| `app/test:` | `test:` | +| `testpmd:` | `app/testpmd:` | +| `test-pmd:` | `app/testpmd:` | +| `bond:` | `net/bonding:` | + +#### Case-Sensitive Terms (Commit Messages Only) + +These terms must use exact capitalization **in commit messages** (from `devtools/words-case.txt`): +- `Rx`, `Tx` (not `RX`, `TX`, `rx`, `tx`) +- `VF`, `PF` (not `vf`, `pf`) +- `MAC`, `VLAN`, `RSS`, `API` +- `Linux`, `Windows`, `FreeBSD` +- Check `devtools/words-case.txt` for complete list + +**Note**: These rules apply to commit messages only, NOT to code comments or documentation. + +### Commit Body + +| Rule | Limit | +|------|-------| +| Line wrap | **75 characters** | +| Exception | `Fixes:` lines may exceed 75 chars | + +Body guidelines: +- Describe the issue being fixed or feature being added +- Provide enough context for reviewers +- **Do not start the commit message body with "It"** +- **Must end with** `Signed-off-by:` line (real name, not alias) + +### Fixes Tag + +When fixing regressions, use the `Fixes:` tag with a 12-character abbreviated SHA: + +``` +Fixes: abcdefgh1234 ("original commit subject") +``` + +The hash must reference a commit in the current branch, and the subject must match exactly. + +**Finding maintainers**: Use `devtools/get-maintainer.sh` to identify the current subsystem maintainer from the `MAINTAINERS` file, rather than CC'ing the original author: + +```bash +git send-email --to-cmd ./devtools/get-maintainer.sh --cc dev@dpdk.org 000*.patch +``` + +### Required Tags + +``` +# For Coverity issues (required if "coverity" mentioned in body): +Coverity issue: 12345 + +# For Bugzilla issues (required if "bugzilla" mentioned in body): +Bugzilla ID: 12345 + +# For stable release backport candidates: +Cc: stable@dpdk.org + +# For patch dependencies (in commit notes after ---): +Depends-on: series-NNNNN ("Title of the series") +``` + +### Tag Order + +Tags must appear in this order, with a blank line separating the two groups: + +**Group 1** (optional tags, no blank lines within this group): +- Coverity issue: +- Bugzilla ID: +- Fixes: +- Cc: + +**Blank line required here** (only if Group 1 tags are present) + +**Group 2** (no blank lines within this group): +- Reported-by: +- Suggested-by: +- Signed-off-by: +- Acked-by: +- Reviewed-by: +- Tested-by: + +**Correct examples:** + +Simple patch with no Group 1 tags (most common): +``` +The info_get callback doesn't need to check its args +since already done by ethdev. + +Signed-off-by: John Smith <john@example.com> +``` + +Patch with Fixes and Cc tags: +``` +Fixes: c743e50c475f ("null: new poll mode driver") +Cc: stable@dpdk.org + +Signed-off-by: John Smith <john@example.com> +``` + +Patch with only Fixes tag: +``` +Fixes: abcd1234abcd ("component: original commit") + +Signed-off-by: Jane Doe <jane@example.com> +``` + +**What is correct (do NOT flag):** +- Signed-off-by directly after commit body when there are no Group 1 tags - this is CORRECT +- No blank line between `Fixes:` and `Cc:` - this is CORRECT +- Blank line between `Cc:` (or last tag in Group 1) and `Signed-off-by:` - this is CORRECT + +**What is wrong (DO flag):** +- Missing blank line between Group 1 tags and Group 2 tags (when Group 1 tags exist) + +**Tag format**: `Tag-name: Full Name <email@domain.com>` + +--- + +## C Coding Style + +### Line Length + +| Context | Limit | +|---------|-------| +| Source code | **100 characters** | +| Commit body | **75 characters** | + +### General Formatting + +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) +- **No trailing whitespace** on lines or at end of files +- Files must end with a new line +- Code style should be consistent within each file + + +### Comments + +```c +/* Most single-line comments look like this. */ + +/* + * VERY important single-line comments look like this. + */ + +/* + * Multi-line comments look like this. Make them real sentences. Fill + * them so they look like real paragraphs. + */ +``` + +### Header File Organization + +Include order (each group separated by blank line): +1. System/libc includes +2. DPDK EAL includes +3. DPDK misc library includes +4. Application-specific includes + +```c +#include <stdio.h> +#include <stdlib.h> + +#include <rte_eal.h> + +#include <rte_ring.h> +#include <rte_mempool.h> + +#include "application.h" +``` + +### Header Guards + +```c +#ifndef _FILE_H_ +#define _FILE_H_ + +/* Code */ + +#endif /* _FILE_H_ */ +``` + +### Naming Conventions + +- **All external symbols** must have `RTE_` or `rte_` prefix +- **Macros**: ALL_UPPERCASE with `RTE_` prefix +- **Functions**: lowercase with underscores only (no CamelCase) +- **Variables**: lowercase with underscores only +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix + +**Exception**: Driver base directories (`drivers/*/base/`) may use different +naming conventions when sharing code across platforms or with upstream vendor code. + +#### Symbol Naming for Static Linking + +Drivers and libraries must not expose global variables that could +clash when statically linked with other DPDK components or +applications. Use consistent and unique prefixes for all exported +symbols to avoid namespace collisions. + +**Good practice**: Use a driver-specific or library-specific prefix for all global variables: + +```c +/* Good - virtio driver uses consistent "virtio_" prefix */ +const struct virtio_ops virtio_legacy_ops = { + .read = virtio_legacy_read, + .write = virtio_legacy_write, + .configure = virtio_legacy_configure, +}; + +const struct virtio_ops virtio_modern_ops = { + .read = virtio_modern_read, + .write = virtio_modern_write, + .configure = virtio_modern_configure, +}; + +/* Good - mlx5 driver uses consistent "mlx5_" prefix */ +struct mlx5_flow_driver_ops mlx5_flow_dv_ops; +``` + +**Bad practice**: Generic names that may clash: + +```c +/* Bad - "ops" is too generic, will clash with other drivers */ +const struct virtio_ops ops = { ... }; + +/* Bad - "legacy_ops" could clash with other legacy implementations */ +const struct virtio_ops legacy_ops = { ... }; + +/* Bad - "driver_config" is not unique */ +struct driver_config config; +``` + +**Guidelines**: +- Prefix all global variables with the driver or library name (e.g., `virtio_`, `mlx5_`, `ixgbe_`) +- Prefix all global functions similarly unless they use the `rte_` namespace +- Internal static variables do not require prefixes as they have file scope +- Consider using the `RTE_` or `rte_` prefix only for symbols that are part of the public DPDK API + +#### Prohibited Terminology + +Do not use non-inclusive naming including: +- `master/slave` → Use: primary/secondary, controller/worker, leader/follower +- `blacklist/whitelist` → Use: denylist/allowlist, blocklist/passlist +- `cripple` → Use: impacted, degraded, restricted, immobolized +- `tribe` → Use: team, squad +- `sanity check` → Use: coherence check, test, verification + + +### Comparisons and Boolean Logic + +```c +/* Pointers - compare explicitly with NULL */ +if (p == NULL) /* Good */ +if (p != NULL) /* Good */ +if (!p) /* Bad - don't use ! on pointers */ + +/* Integers - compare explicitly with zero */ +if (a == 0) /* Good */ +if (a != 0) /* Good */ +if (errno != 0) /* Good - this IS explicit */ +if (!a) /* Bad - don't use ! on integers */ +if (a) /* Bad - implicit, should be a != 0 */ + +/* Characters - compare with character constant */ +if (*p == '\0') /* Good */ + +/* Booleans - direct test is acceptable */ +if (flag) /* Good for actual bool types */ +if (!flag) /* Good for actual bool types */ +``` + +**Explicit comparison** means using `==` or `!=` operators (e.g., `x != 0`, `p == NULL`). +**Implicit comparison** means relying on truthiness without an operator (e.g., `if (x)`, `if (!p)`). + +### Boolean Usage + +- Using `bool` type is allowed +- Prefer `bool` over `int` when a variable or field is only used as a boolean +- For structure fields, consider if the size/alignment impact is acceptable + +### Indentation and Braces + +```c +/* Control statements - no braces for single statements */ +if (val != NULL) + val = realloc(val, newsize); + +/* Braces on same line as else */ +if (test) + stmt; +else if (bar) { + stmt; + stmt; +} else + stmt; + +/* Switch statements - don't indent case */ +switch (ch) { +case 'a': + aflag = 1; + /* FALLTHROUGH */ +case 'b': + bflag = 1; + break; +default: + usage(); +} + +/* Long conditions - double indent continuation */ +if (really_long_variable_name_1 == really_long_variable_name_2 && + really_long_variable_name_3 == really_long_variable_name_4) + stmt; +``` + +### Variable Declarations + +- Prefer declaring variables inside the basic block where they are used +- Variables may be declared either at the start of the block, or at point of first use (C99 style) +- Both declaration styles are acceptable; consistency within a function is preferred +- Initialize variables only when a meaningful value exists at declaration time +- Use C99 designated initializers for structures + +```c +/* Good - declaration at start of block */ +int ret; +ret = some_function(); + +/* Also good - declaration at point of use (C99 style) */ +for (int i = 0; i < count; i++) + process(i); + +/* Good - declaration in inner block where variable is used */ +if (condition) { + int local_val = compute(); + use(local_val); +} + +/* Bad - unnecessary initialization defeats compiler warnings */ +int ret = 0; +ret = some_function(); /* Compiler won't warn if assignment removed */ +``` + +### Function Format + +- Return type on its own line +- Opening brace on its own line +- Place an empty line between declarations and statements + +```c +static char * +function(int a1, int b1) +{ + char *p; + + p = do_something(a1, b1); + return p; +} +``` + +--- + +## Unnecessary Code Patterns + +The following patterns add unnecessary code, hide bugs, or reduce performance. Avoid them. + +### Unnecessary Variable Initialization + +Do not initialize variables that will be assigned before use. This defeats the compiler's uninitialized variable warnings, hiding potential bugs. + +```c +/* Bad - initialization defeats -Wuninitialized */ +int ret = 0; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - compiler will warn if any path misses assignment */ +int ret; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - meaningful initial value */ +int count = 0; +for (i = 0; i < n; i++) + if (test(i)) + count++; +``` + +### Unnecessary Casts of void * + +In C, `void *` converts implicitly to any pointer type. Casting the result of `malloc()`, `calloc()`, `rte_malloc()`, or similar functions is unnecessary and can hide the error of a missing `#include <stdlib.h>`. + +```c +/* Bad - unnecessary cast */ +struct foo *p = (struct foo *)malloc(sizeof(*p)); +struct bar *q = (struct bar *)rte_malloc(NULL, sizeof(*q), 0); + +/* Good - no cast needed in C */ +struct foo *p = malloc(sizeof(*p)); +struct bar *q = rte_malloc(NULL, sizeof(*q), 0); +``` + +Note: Casts are required in C++ but DPDK is a C project. + +### Zero-Length Arrays vs Variable-Length Arrays + +Zero-length arrays (`int arr[0]`) are a GCC extension. Use C99 flexible array members instead. + +```c +/* Bad - GCC extension */ +struct msg { + int len; + char data[0]; +}; + +/* Good - C99 flexible array member */ +struct msg { + int len; + char data[]; +}; +``` + +### Unnecessary NULL Checks Before free() + +Functions like `free()`, `rte_free()`, and similar deallocation functions accept NULL pointers safely. Do not add redundant NULL checks. + +```c +/* Bad - unnecessary check */ +if (ptr != NULL) + free(ptr); + +if (rte_ptr != NULL) + rte_free(rte_ptr); + +/* Good - free handles NULL */ +free(ptr); +rte_free(rte_ptr); +``` + +### memset Before free() + +Do not call `memset()` to zero memory before freeing it. The compiler may optimize away the `memset()` as a dead store. For security-sensitive data, use `rte_free_sensitive()` which ensures memory is cleared. + +```c +/* Bad - compiler may eliminate memset */ +memset(secret_key, 0, sizeof(secret_key)); +free(secret_key); + +/* Good - for non-sensitive data, just free */ +free(ptr); + +/* Good - for sensitive data, use secure free */ +rte_free_sensitive(secret_key); +``` + +### Appropriate Use of rte_malloc() + +`rte_malloc()` allocates from hugepage memory. Use it only when required: + +- Memory that will be accessed by DMA (NIC descriptors, packet buffers) +- Memory shared between primary and secondary DPDK processes +- Memory requiring specific NUMA node placement + +For general allocations, use standard `malloc()` which is faster and does not consume limited hugepage resources. + +```c +/* Bad - rte_malloc for ordinary data structure */ +struct config *cfg = rte_malloc(NULL, sizeof(*cfg), 0); + +/* Good - standard malloc for control structures */ +struct config *cfg = malloc(sizeof(*cfg)); + +/* Good - rte_malloc for DMA-accessible memory */ +struct rte_mbuf *mbufs = rte_malloc(NULL, n * sizeof(*mbufs), RTE_CACHE_LINE_SIZE); +``` + +### Appropriate Use of rte_memcpy() + +`rte_memcpy()` is optimized for bulk data transfer in the fast path. For general use, standard `memcpy()` is preferred because: + +- Modern compilers optimize `memcpy()` effectively +- `memcpy()` includes bounds checking with `_FORTIFY_SOURCE` +- `memcpy()` handles small fixed-size copies efficiently + +```c +/* Bad - rte_memcpy in control path */ +rte_memcpy(&config, &default_config, sizeof(config)); + +/* Good - standard memcpy for control path */ +memcpy(&config, &default_config, sizeof(config)); + +/* Good - rte_memcpy for packet data in fast path */ +rte_memcpy(rte_pktmbuf_mtod(m, void *), payload, len); +``` + +--- + +## Forbidden Tokens + +### Functions + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `rte_panic()` | Return error codes | lib/, drivers/ | +| `rte_exit()` | Return error codes | lib/, drivers/ | +| `perror()` | `RTE_LOG()` with `strerror(errno)` | lib/, drivers/ (allowed in examples/, app/test/) | +| `printf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | +| `fprintf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | + +### Atomics and Memory Barriers + +| Forbidden | Preferred | +|-----------|-----------| +| `rte_atomic16/32/64_xxx()` | C11 atomics via `rte_atomic_xxx()` | +| `rte_smp_mb()` | `rte_atomic_thread_fence()` | +| `rte_smp_rmb()` | `rte_atomic_thread_fence()` | +| `rte_smp_wmb()` | `rte_atomic_thread_fence()` | +| `__sync_xxx()` | `rte_atomic_xxx()` | +| `__atomic_xxx()` | `rte_atomic_xxx()` | +| `__ATOMIC_RELAXED` etc. | `rte_memory_order_xxx` | +| `__rte_atomic_thread_fence()` | `rte_atomic_thread_fence()` | + +### Threading + +| Forbidden | Preferred | +|-----------|-----------| +| `pthread_create()` | `rte_thread_create()` | +| `pthread_join()` | `rte_thread_join()` | +| `pthread_detach()` | EAL thread functions | +| `pthread_setaffinity_np()` | `rte_thread_set_affinity()` | +| `rte_thread_set_name()` | `rte_thread_set_prefixed_name()` | +| `rte_thread_create_control()` | `rte_thread_create_internal_control()` | + +### Compiler Built-ins and Attributes + +| Forbidden | Preferred | Notes | +|-----------|-----------|-------| +| `__attribute__` | RTE macros in `rte_common.h` | Except in `lib/eal/include/rte_common.h` | +| `__alignof__` | C11 `alignof` | | +| `__typeof__` | `typeof` | | +| `__builtin_*` | EAL macros | Except in `lib/eal/` and `drivers/*/base/` | +| `__reserved` | Different name | Reserved in Windows headers | +| `#pragma` / `_Pragma` | Avoid | Except in `rte_common.h` | + +### Format Specifiers + +| Forbidden | Preferred | +|-----------|-----------| +| `%lld`, `%llu`, `%llx` | `%PRId64`, `%PRIu64`, `%PRIx64` | + +### Headers and Build + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `#include <linux/pci_regs.h>` | `#include <rte_pci.h>` | | +| `install_headers()` | Meson `headers` variable | meson.build | +| `-DALLOW_EXPERIMENTAL_API` | Not in lib/drivers/app | Build flags | +| `allow_experimental_apis` | Not in lib/drivers/app | Meson | +| `#undef XXX` | `// XXX is not set` | config/rte_config.h | +| Driver headers (`*_driver.h`, `*_pmd.h`) | Public API headers | app/, examples/ | + +### Testing + +| Forbidden | Preferred | +|-----------|-----------| +| `REGISTER_TEST_COMMAND` | `REGISTER_<suite_name>_TEST` | + +### Documentation + +| Forbidden | Preferred | +|-----------|-----------| +| `http://...dpdk.org` | `https://...dpdk.org` | +| `//doc.dpdk.org/guides/...` | `:ref:` or `:doc:` Sphinx references | +| `:: file.svg` | `:: file.*` (wildcard extension) | + +--- + +## Deprecated API Usage + +New patches must not introduce usage of deprecated APIs, macros, or functions. +Deprecated items are marked with `RTE_DEPRECATED` or documented in the +deprecation notices section of the release notes. + +### Rules for New Code + +- Do not call functions marked with `RTE_DEPRECATED` or `__rte_deprecated` +- Do not use macros that have been superseded by newer alternatives +- Do not use data structures or enum values marked as deprecated +- Check `doc/guides/rel_notes/deprecation.rst` for planned deprecations +- When a deprecated API has a replacement, use the replacement + +### Deprecating APIs + +A patch may mark an API as deprecated provided: + +- No remaining usages exist in the current DPDK codebase +- The deprecation is documented in the release notes +- A migration path or replacement API is documented +- The `RTE_DEPRECATED` macro is used to generate compiler warnings + +```c +/* Marking a function as deprecated */ +__rte_deprecated +int +rte_old_function(void); + +/* With a message pointing to the replacement */ +__rte_deprecated_msg("use rte_new_function() instead") +int +rte_old_function(void); +``` + +### Common Deprecated Patterns + +| Deprecated | Replacement | Notes | +|-----------|-------------|-------| +| `rte_atomic*_t` types | C11 atomics | Use `rte_atomic_xxx()` wrappers | +| `rte_smp_*mb()` barriers | `rte_atomic_thread_fence()` | See Atomics section | +| `pthread_*()` in portable code | `rte_thread_*()` | See Threading section | + +When reviewing patches that add new code, flag any usage of deprecated APIs +as requiring change to use the modern replacement. + +--- + +## API Tag Requirements + +### `__rte_experimental` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_experimental +int +rte_new_feature(void); + +/* Wrong - not alone on line */ +__rte_experimental int rte_new_feature(void); + +/* Wrong - in .c file */ +``` + +### `__rte_internal` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_internal +int +internal_function(void); +``` + +### Alignment Attributes + +`__rte_aligned`, `__rte_cache_aligned`, `__rte_cache_min_aligned` may only be used with `struct` or `union` types: + +```c +/* Correct */ +struct __rte_cache_aligned my_struct { + /* ... */ +}; + +/* Wrong */ +int __rte_cache_aligned my_variable; +``` + +### Packed Attributes + +- `__rte_packed_begin` must follow `struct`, `union`, or alignment attributes +- `__rte_packed_begin` and `__rte_packed_end` must be used in pairs +- Cannot use `__rte_packed_begin` with `enum` + +```c +/* Correct */ +struct __rte_packed_begin my_packed_struct { + /* ... */ +} __rte_packed_end; + +/* Wrong - with enum */ +enum __rte_packed_begin my_enum { + /* ... */ +}; +``` + +--- + +## Code Quality Requirements + +### Compilation + +- Each commit must compile independently (for `git bisect`) +- No forward dependencies within a patchset +- Test with multiple targets, compilers, and options +- Use `devtools/test-meson-builds.sh` + +### Testing + +- Add tests to `app/test` unit test framework +- New API functions must be used in `/app` test directory +- New device APIs require at least one driver implementation + +#### Functional Test Infrastructure + +Standalone functional tests should use the `TEST_ASSERT` macros and `unit_test_suite_runner` infrastructure for consistency and proper integration with the DPDK test framework. + +```c +#include <rte_test.h> + +static int +test_feature_basic(void) +{ + int ret; + + ret = rte_feature_init(); + TEST_ASSERT_SUCCESS(ret, "Failed to initialize feature"); + + ret = rte_feature_operation(); + TEST_ASSERT_EQUAL(ret, 0, "Operation returned unexpected value"); + + TEST_ASSERT_NOT_NULL(rte_feature_get_ptr(), + "Feature pointer should not be NULL"); + + return TEST_SUCCESS; +} + +static struct unit_test_suite feature_testsuite = { + .suite_name = "feature_autotest", + .setup = test_feature_setup, + .teardown = test_feature_teardown, + .unit_test_cases = { + TEST_CASE(test_feature_basic), + TEST_CASE(test_feature_advanced), + TEST_CASES_END() + } +}; + +static int +test_feature(void) +{ + return unit_test_suite_runner(&feature_testsuite); +} + +REGISTER_FAST_TEST(feature_autotest, NOHUGE_OK, ASAN_OK, test_feature); +``` + +The `REGISTER_FAST_TEST` macro parameters are: +- Test name (e.g., `feature_autotest`) +- `NOHUGE_OK` or `HUGEPAGES_REQUIRED` - whether test can run without hugepages +- `ASAN_OK` or `ASAN_FAILS` - whether test is compatible with Address Sanitizer +- Test function name + +Common `TEST_ASSERT` macros: +- `TEST_ASSERT(cond, msg, ...)` - Assert condition is true +- `TEST_ASSERT_SUCCESS(val, msg, ...)` - Assert value equals 0 +- `TEST_ASSERT_FAIL(val, msg, ...)` - Assert value is non-zero +- `TEST_ASSERT_EQUAL(a, b, msg, ...)` - Assert two values are equal +- `TEST_ASSERT_NOT_EQUAL(a, b, msg, ...)` - Assert two values differ +- `TEST_ASSERT_NULL(val, msg, ...)` - Assert value is NULL +- `TEST_ASSERT_NOT_NULL(val, msg, ...)` - Assert value is not NULL + +### Documentation + +- Add Doxygen comments for public APIs +- Update release notes in `doc/guides/rel_notes/` for important changes +- Code and documentation must be updated atomically in same patch +- Only update the **current release** notes file +- Documentation must match the code +- PMD features must match the features matrix in `doc/guides/nics/features/` +- Documentation must match device operations (see `doc/guides/nics/features.rst` for the mapping between features, `eth_dev_ops`, and related APIs) +- Release notes are NOT required for: + - Test-only changes (unit tests, functional tests) + - Internal APIs and helper functions (not exported to applications) + - Internal implementation changes that don't affect public API + +### API and Driver Changes + +- New APIs must be marked as `__rte_experimental` +- New APIs must have hooks in `app/testpmd` and tests in the functional test suite +- Changes to existing APIs require release notes +- New drivers or subsystems must have release notes +- Internal APIs (used only within DPDK, not exported to applications) do NOT require release notes + +### ABI Compatibility and Symbol Exports + +**IMPORTANT**: DPDK uses automatic symbol map generation. Do **NOT** recommend +manually editing `version.map` files - they are auto-generated from source code +annotations. + +#### Symbol Export Macros + +New public functions must be annotated with export macros (defined in +`rte_export.h`). Place the macro on the line immediately before the function +definition in the `.c` file: + +```c +/* For stable ABI symbols */ +RTE_EXPORT_SYMBOL(rte_foo_create) +int +rte_foo_create(struct rte_foo_config *config) +{ + /* ... */ +} + +/* For experimental symbols (include version when first added) */ +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_foo_new_feature, 25.03) +__rte_experimental +int +rte_foo_new_feature(void) +{ + /* ... */ +} + +/* For internal symbols (shared between DPDK components only) */ +RTE_EXPORT_INTERNAL_SYMBOL(rte_foo_internal_helper) +int +rte_foo_internal_helper(void) +{ + /* ... */ +} +``` + +#### Symbol Export Rules + +- `RTE_EXPORT_SYMBOL` - Use for stable ABI functions +- `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, ver)` - Use for new experimental APIs + (version is the DPDK release, e.g., `25.03`) +- `RTE_EXPORT_INTERNAL_SYMBOL` - Use for functions shared between DPDK libs/drivers + but not part of public API +- Export macros go in `.c` files, not headers +- The build system generates linker version maps automatically + +#### What NOT to Review + +- Do **NOT** flag missing `version.map` updates - maps are auto-generated +- Do **NOT** suggest adding symbols to `lib/*/version.map` files + +#### ABI Versioning for Changed Functions + +When changing the signature of an existing stable function, use versioning macros +from `rte_function_versioning.h`: + +- `RTE_VERSION_SYMBOL` - Create versioned symbol for backward compatibility +- `RTE_DEFAULT_SYMBOL` - Mark the new default version + +Follow ABI policy and versioning guidelines in the contributor documentation. +Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable. + +--- + +## Patch Validation Checklist + +### Commit Message + +- [ ] Subject line ≤60 characters +- [ ] Subject is lowercase (except acronyms from words-case.txt) +- [ ] Correct component prefix (e.g., `net/ixgbe:` not `ixgbe:`) +- [ ] No `lib/` prefix for libraries +- [ ] Imperative mood, no trailing period +- [ ] No tabs, leading/trailing spaces, or punctuation marks +- [ ] Body wrapped at 75 characters +- [ ] Body does not start with "It" +- [ ] `Signed-off-by:` present with real name and valid email +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA and exact subject +- [ ] `Coverity issue:` tag present if Coverity mentioned +- [ ] `Bugzilla ID:` tag present if Bugzilla mentioned +- [ ] `Cc: stable@dpdk.org` for stable backport candidates +- [ ] Tags in correct order with blank line separator + +### License + +- [ ] SPDX identifier on first line (or second for scripts) +- [ ] Copyright line follows SPDX +- [ ] Blank line after copyright before code +- [ ] Appropriate license for file type + +### Code Style + +- [ ] Lines ≤100 characters +- [ ] Hard tabs for indentation, spaces for alignment +- [ ] No trailing whitespace +- [ ] Proper include order +- [ ] Header guards present +- [ ] `rte_`/`RTE_` prefix on external symbols +- [ ] Driver/library global variables use unique prefixes (e.g., `virtio_`, `mlx5_`) +- [ ] No prohibited terminology +- [ ] Proper brace style +- [ ] Function return type on own line +- [ ] Explicit comparisons: `== NULL`, `== 0`, `!= NULL`, `!= 0` +- [ ] No forbidden tokens (see table above) +- [ ] No unnecessary code patterns (see section above) +- [ ] No usage of deprecated APIs, macros, or functions + +### API Tags + +- [ ] `__rte_experimental` alone on line, only in headers +- [ ] `__rte_internal` alone on line, only in headers +- [ ] Alignment attributes only on struct/union +- [ ] Packed attributes properly paired +- [ ] New public functions have `RTE_EXPORT_*` macro in `.c` file +- [ ] Experimental functions use `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, version)` + +### Structure + +- [ ] Each commit compiles independently +- [ ] Code and docs updated together +- [ ] Documentation matches code behavior +- [ ] PMD features match `doc/guides/nics/features/` matrix +- [ ] Device operations match documentation (per `features.rst` mappings) +- [ ] Tests added/updated as needed +- [ ] Functional tests use TEST_ASSERT macros and unit_test_suite_runner +- [ ] New APIs marked as `__rte_experimental` +- [ ] New APIs have testpmd hooks and functional tests +- [ ] Current release notes updated for significant changes +- [ ] Release notes updated for API changes +- [ ] Release notes updated for new drivers or subsystems + +--- + +## Meson Build Files + +### Style Requirements + +- 4-space indentation (no tabs) +- Line continuations double-indented +- Lists alphabetically ordered +- Short lists (≤3 items): single line, no trailing comma +- Long lists: one item per line, trailing comma on last item +- No strict line length limit for meson files; lines under 100 characters are acceptable + +```python +# Short list +sources = files('file1.c', 'file2.c') + +# Long list +headers = files( + 'header1.h', + 'header2.h', + 'header3.h', +) +``` + +--- + +## Python Code + +- Must comply with formatting standards +- Use **`black`** for code formatting validation +- Line length acceptable up to 100 characters + +--- + +## Validation Tools + +Run these before submitting: + +```bash +# Check commit messages +devtools/check-git-log.sh -n1 + +# Check patch format and forbidden tokens +devtools/checkpatches.sh -n1 + +# Check maintainers coverage +devtools/check-maintainers.sh + +# Build validation +devtools/test-meson-builds.sh + +# Find maintainers for your patch +devtools/get-maintainer.sh <patch-file> +``` + +--- + +## Severity Levels for AI Review + +**Error** (must fix): +- Missing or malformed SPDX license +- Missing Signed-off-by +- Subject line over 60 characters +- Body lines over 75 characters +- Wrong tag order or format +- Missing required tags (Fixes, Coverity issue, Bugzilla ID) +- Forbidden tokens in code +- `__rte_experimental`/`__rte_internal` in .c files or not alone on line +- Compilation failures +- ABI breaks without proper versioning + +**Warning** (should fix): +- Subject line style issues (case, punctuation) +- Wrong component prefix +- Missing Cc: stable@dpdk.org for fixes +- Documentation gaps +- Documentation does not match code behavior +- PMD features missing from `doc/guides/nics/features/` matrix +- Device operations not documented per `features.rst` mappings +- Missing tests +- Functional tests not using TEST_ASSERT macros or unit_test_suite_runner +- New API not marked as `__rte_experimental` +- New API without testpmd hooks or functional tests +- New public function missing `RTE_EXPORT_*` macro +- API changes without release notes +- New drivers or subsystems without release notes +- Implicit comparisons (`!ptr` instead of `ptr == NULL`) +- Unnecessary variable initialization +- Unnecessary casts of `void *` +- Unnecessary NULL checks before free +- Inappropriate use of `rte_malloc()` or `rte_memcpy()` +- Use of `perror()`, `printf()`, `fprintf()` in libraries or drivers (allowed in examples and test code) +- Driver/library global variables without unique prefixes (static linking clash risk) +- Usage of deprecated APIs, macros, or functions in new code + +**Do NOT flag** (common false positives): +- Missing `version.map` updates (maps are auto-generated from `RTE_EXPORT_*` macros) +- Suggesting manual edits to any `version.map` file +- Copyright years within valid range (2013 to current year) +- Copyright held by individuals (never speculate about employers, company policies, or who "should" hold copyright) +- Meson file lines under 100 characters +- Case-sensitive term violations in code comments (words-case.txt applies to commit messages only) +- Comparisons using `== 0`, `!= 0`, `== NULL`, `!= NULL` as "implicit" (these ARE explicit) +- Anything you determine is correct (do not mention non-issues or say "No issue here") +- `REGISTER_FAST_TEST` using `NOHUGE_OK`/`ASAN_OK` macros (this is the correct current format) +- Tag ordering: Signed-off-by directly after commit body (when no Fixes/Cc tags) is CORRECT +- Tag ordering: no blank line between Fixes/Cc, blank line before Signed-off-by (when Fixes/Cc present) is CORRECT +- Missing release notes for test-only changes (unit tests do not require release notes) +- Missing release notes for internal APIs or helper functions (only public APIs need release notes) +- Subject lines that are within the 60 character limit (count first, do not guess) +- Any item you later correct with "(Correction: ...)" or "actually acceptable" - just omit it + +**Info** (consider): +- Minor style preferences +- Optimization suggestions +- Alternative approaches + +--- + +# Response Format + +When you identify an issue: +1. **State the problem** (1 sentence) +2. **Why it matters** (1 sentence, only if not obvious) +3. **Suggested fix** (code snippet or specific action) + +Example: +This could panic if the string is NULL. -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v9 2/4] devtools: add multi-provider AI patch review script 2026-01-29 20:38 ` [PATCH v9 0/4] devtools: add AI-assisted code review tools Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 1/4] doc: add AGENTS.md for AI-powered " Stephen Hemminger @ 2026-01-29 20:38 ` Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 3/4] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-29 20:38 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This is an AI generated script to review DPDK patches against the AGENTS.md coding guidelines using AI language models. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini The script reads a patch file and the AGENTS.md guidelines, then submits them to the selected AI provider for review. Results are organized by severity level (Error, Warning, Info) as defined in the guidelines. Features: - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Verbose mode shows token usage statistics - Uses temporary files for API requests to handle large patches - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/analyze-patch.py 0001-net-ixgbe-fix-something.patch ./devtools/analyze-patch.py -p xai my-patch.patch ./devtools/analyze-patch.py -l # list providers Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/analyze-patch.py | 862 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 862 insertions(+) create mode 100755 devtools/analyze-patch.py diff --git a/devtools/analyze-patch.py b/devtools/analyze-patch.py new file mode 100755 index 0000000000..6678438e4b --- /dev/null +++ b/devtools/analyze-patch.py @@ -0,0 +1,862 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Analyze DPDK patches using AI providers. + +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import json +import os +import re +import subprocess +import sys +import tempfile +from datetime import date +from email.message import EmailMessage +from pathlib import Path +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# Known LTS releases +LTS_RELEASES = {"23.11", "22.11", "21.11", "20.11", "19.11"} + +SYSTEM_PROMPT_BASE = """\ +You are an expert DPDK code reviewer. Analyze patches for compliance with \ +DPDK coding standards and contribution guidelines. Provide clear, actionable \ +feedback organized by severity (Error, Warning, Info) as defined in the \ +guidelines.""" + +LTS_RULES = """ +LTS (Long Term Stable) branch rules apply: +- Only bug fixes allowed, no new features +- No new APIs (experimental or stable) +- ABI must remain unchanged +- Backported fixes should reference the original commit with Fixes: tag +- Copyright years should reflect when the code was originally written +- Be conservative: reject changes that aren't clearly bug fixes""" + +FORMAT_INSTRUCTIONS = { + "text": """Provide your review in plain text format.""", + "markdown": """Provide your review in Markdown format with: +- Headers (##) for each severity level (Errors, Warnings, Info) +- Bullet points for individual issues +- Code blocks (```) for code references +- Bold (**) for emphasis on key points""", + "html": """Provide your review in HTML format with: +- <h2> tags for each severity level (Errors, Warnings, Info) +- <ul>/<li> for individual issues +- <pre><code> for code references +- <strong> for emphasis on key points +- Use appropriate semantic HTML tags +- Do NOT include <html>, <head>, or <body> tags - just the content""", + "json": """Provide your review in JSON format with this structure: +{ + "summary": "Brief one-line summary of the review", + "errors": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "warnings": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "info": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "passed_checks": ["list of checks that passed"], + "overall_status": "PASS|WARN|FAIL" +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """Please review the following DPDK patch file '{patch_name}' \ +against the AGENTS.md guidelines. Check for: + +1. Commit message format (subject line, body, tags) +2. License/copyright compliance +3. C coding style issues +4. API and documentation requirements +5. Any other guideline violations + +{format_instruction} + +--- PATCH CONTENT --- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def is_lts_release(release): + """Check if a release is an LTS release.""" + if not release: + return False + # Check for explicit -lts suffix or known LTS versions + if "-lts" in release.lower(): + return True + # Extract version number (e.g., "23.11" from "23.11.1") + version = release.split("-")[0] + base_version = ".".join(version.split(".")[:2]) + return base_version in LTS_RELEASES + + +def build_system_prompt(review_date, release): + """Build system prompt with date and release context.""" + prompt = SYSTEM_PROMPT_BASE + prompt += f"\n\nCurrent date: {review_date}." + + if release: + prompt += f"\nTarget DPDK release: {release}." + if is_lts_release(release): + prompt += LTS_RULES + else: + prompt += "\nThis is a main branch or standard release." + prompt += "\nNew features and experimental APIs are allowed." + + return prompt + + +def build_anthropic_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": system_prompt}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + patch_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": system_prompt}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + patch_content, + }, + ], + } + + +def build_google_request( + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": system_prompt}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + patch_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: {usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: {usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def get_last_message_id(patch_content): + """Extract Message-ID from the last patch in an mbox.""" + msg_ids = re.findall( + r"^Message-I[Dd]:\s*(.+)$", patch_content, re.MULTILINE | re.IGNORECASE + ) + if msg_ids: + msg_id = msg_ids[-1].strip() + # Normalize: remove < > and add them back + msg_id = msg_id.strip("<>") + return f"<{msg_id}>" + return None + + +def get_last_subject(patch_content): + """Extract subject from the last patch in an mbox.""" + # Find all Subject lines with potential continuations + subjects = [] + lines = patch_content.split("\n") + i = 0 + while i < len(lines): + if lines[i].lower().startswith("subject:"): + subject = lines[i][8:].strip() + i += 1 + # Handle continuation lines + while i < len(lines) and lines[i].startswith((" ", "\t")): + subject += lines[i].strip() + i += 1 + subjects.append(subject) + else: + i += 1 + return subjects[-1] if subjects else None + + +def send_email( + to_addrs, cc_addrs, from_addr, subject, in_reply_to, body, dry_run=False +): + """Send review email using git send-email, sendmail, or msmtp.""" + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + email_text = msg.as_string() + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(email_text, file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Write to temp file for git send-email + with tempfile.NamedTemporaryFile(mode="w", suffix=".eml", delete=False) as f: + f.write(email_text) + temp_file = f.name + + try: + # Try git send-email first + if get_git_config("sendemail.smtpserver"): + # Build command with all arguments + flat_cmd = ["git", "send-email", "--confirm=never", "--quiet"] + for addr in to_addrs: + flat_cmd.extend(["--to", addr]) + for addr in cc_addrs: + flat_cmd.extend(["--cc", addr]) + if from_addr: + flat_cmd.extend(["--from", from_addr]) + if in_reply_to: + flat_cmd.extend(["--in-reply-to", in_reply_to]) + flat_cmd.append(temp_file) + + try: + subprocess.run(flat_cmd, check=True, capture_output=True) + print("Email sent via git send-email", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + # Try sendmail + try: + subprocess.run( + ["sendmail", "-t"], + input=email_text, + text=True, + capture_output=True, + check=True, + ) + print("Email sent via sendmail", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + # Try msmtp + try: + subprocess.run( + ["msmtp", "-t"], + input=email_text, + text=True, + capture_output=True, + check=True, + ) + print("Email sent via msmtp", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + error("Could not send email. Configure git send-email, sendmail, or msmtp.") + + finally: + os.unlink(temp_file) + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Analyze DPDK patches using AI providers", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s patch.patch # Review with default settings + %(prog)s -p openai my-patch.patch # Use OpenAI ChatGPT + %(prog)s -f markdown patch.patch # Output as Markdown + %(prog)s -f json -o review.json patch.patch # Save JSON to file + %(prog)s -f html -o review.html patch.patch # Save HTML to file + %(prog)s -r 24.11 patch.patch # Review for specific release + %(prog)s -r 23.11-lts patch.patch # Review for LTS branch + %(prog)s --send-email --to dev@dpdk.org series.mbox + %(prog)s --send-email --to dev@dpdk.org --dry-run series.mbox + +LTS Releases: + Use -r/--release with LTS version (e.g., 23.11-lts, 22.11) to enable + stricter review rules: bug fixes only, no new features or APIs. + """, + ) + + parser.add_argument("patch_file", nargs="?", help="Patch file to analyze") + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=4096, + help="Max tokens for response (default: 4096)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-o", + "--output", + metavar="FILE", + help="Write output to file instead of stdout", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Date and release options + parser.add_argument( + "-D", + "--date", + metavar="YYYY-MM-DD", + help="Review date context (default: today)", + ) + parser.add_argument( + "-r", + "--release", + metavar="VERSION", + help="Target DPDK release (e.g., 24.11, 23.11-lts)", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Check patch file is provided + if not args.patch_file: + parser.error("patch_file is required") + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + patch_path = Path(args.patch_file) + if not patch_path.exists(): + error(f"Patch file not found: {args.patch_file}") + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Determine review date + review_date = args.date or date.today().isoformat() + + # Build system prompt with date and release context + system_prompt = build_system_prompt(review_date, args.release) + + # Read files + agents_content = agents_path.read_text() + patch_content = patch_path.read_text() + patch_name = patch_path.name + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Review date: {review_date}", file=sys.stderr) + if args.release: + lts_status = " (LTS)" if is_lts_release(args.release) else "" + print(f"Target release: {args.release}{lts_status}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Patch file: {args.patch_file}", file=sys.stderr) + if args.output: + print(f"Output file: {args.output}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + args.output_format, + args.verbose, + ) + + if not review_text: + error(f"No response received from {args.provider}") + + # Format output based on requested format + provider_name = config["name"] + + if args.output_format == "json": + # For JSON, try to parse and add metadata + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + # If AI didn't return valid JSON, wrap the text + review_data = {"raw_review": review_text} + + output_data = { + "metadata": { + "patch_file": patch_name, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "review_date": review_date, + "target_release": args.release, + "is_lts": is_lts_release(args.release) if args.release else False, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "html": + # Wrap HTML content with header + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"<br>Target release: {args.release}{lts_badge}" + output_text = f"""<!-- AI-generated review of {patch_name} --> +<!-- Reviewed using {provider_name} ({model}) on {review_date} --> +<div class="patch-review"> +<h1>Patch Review: {patch_name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model}) on {review_date}{release_info}</p> +{review_text} +</div> +""" + elif args.output_format == "markdown": + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"\n*Target release: {args.release}{lts_badge}*\n" + output_text = f"""# Patch Review: {patch_name} + +*Reviewed by {provider_name} ({model}) on {review_date}* +{release_info} +{review_text} +""" + else: # text + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"Target release: {args.release}{lts_badge}\n" + output_text = f"=== Patch Review: {patch_name} (via {provider_name}) ===\n" + output_text += f"Review date: {review_date}\n" + output_text += release_info + output_text += "\n" + review_text + + # Write output + if args.output: + Path(args.output).write_text(output_text) + print(f"Review written to: {args.output}", file=sys.stderr) + else: + print(output_text) + + # Send email if requested + if args.send_email: + # Email always uses plain text - warn if different format requested + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + in_reply_to = get_last_message_id(patch_content) + orig_subject = get_last_subject(patch_content) + + if orig_subject: + # Remove [PATCH n/m] prefix + review_subject = re.sub(r"^\[PATCH[^\]]*\]\s*", "", orig_subject) + review_subject = f"[REVIEW] {review_subject}" + else: + review_subject = f"[REVIEW] {patch_name}" + + # Build email body - always use plain text version + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"Target release: {args.release}{lts_badge}\n" + + email_body = f"""AI-generated review of {patch_name} +Reviewed using {provider_name} ({model}) on {review_date} +{release_info} +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print(f"In-Reply-To: {in_reply_to}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + in_reply_to, + email_body, + args.dry_run, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v9 3/4] devtools: add compare-reviews.sh for multi-provider analysis 2026-01-29 20:38 ` [PATCH v9 0/4] devtools: add AI-assisted code review tools Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 1/4] doc: add AGENTS.md for AI-powered " Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger @ 2026-01-29 20:38 ` Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-29 20:38 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add script to run patch reviews across multiple AI providers for comparison purposes. The script automatically detects which providers have API keys configured and runs analyze-patch.py for each one. This allows users to compare review quality and feedback across different AI models. Features: - Auto-detects available providers based on environment variables - Optional provider selection via -p/--providers option - Saves individual reviews to separate files with -o/--output - Verbose mode passes through to underlying analyze-patch.py Usage: ./devtools/compare-reviews.sh my-patch.patch ./devtools/compare-reviews.sh -p anthropic,xai my-patch.patch ./devtools/compare-reviews.sh -o ./reviews my-patch.patch Output files are named <patch>-<provider>.txt when using the output directory option. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/compare-reviews.sh | 192 ++++++++++++++++++++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100644 devtools/compare-reviews.sh diff --git a/devtools/compare-reviews.sh b/devtools/compare-reviews.sh new file mode 100644 index 0000000000..a63eeffb71 --- /dev/null +++ b/devtools/compare-reviews.sh @@ -0,0 +1,192 @@ +#!/bin/bash +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +# Compare DPDK patch reviews across multiple AI providers +# Runs analyze-patch.py with each available provider + +set -e + +SCRIPT_DIR="$(dirname "$(readlink -f "$0")")" +ANALYZE_SCRIPT="${SCRIPT_DIR}/analyze-patch.py" +AGENTS_FILE="AGENTS.md" +OUTPUT_DIR="" +PROVIDERS="" +FORMAT="text" + +usage() { + cat <<EOF +Usage: $(basename "$0") [OPTIONS] <patch-file> + +Compare DPDK patch reviews across multiple AI providers. + +Options: + -a, --agents FILE Path to AGENTS.md file (default: AGENTS.md) + -o, --output DIR Save individual reviews to directory + -p, --providers LIST Comma-separated list of providers to use + (default: all providers with API keys set) + -f, --format FORMAT Output format: text, markdown, html, json + (default: text) + -v, --verbose Show verbose output from each provider + -h, --help Show this help message + +Environment Variables: + Set API keys for providers you want to use: + ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY + +Examples: + $(basename "$0") my-patch.patch + $(basename "$0") -p anthropic,openai my-patch.patch + $(basename "$0") -o ./reviews -f markdown my-patch.patch +EOF + exit "${1:-0}" +} + +error() { + echo "Error: $1" >&2 + exit 1 +} + +# Check which providers have API keys configured +get_available_providers() { + local available="" + + [[ -n "$ANTHROPIC_API_KEY" ]] && available="${available}anthropic," + [[ -n "$OPENAI_API_KEY" ]] && available="${available}openai," + [[ -n "$XAI_API_KEY" ]] && available="${available}xai," + [[ -n "$GOOGLE_API_KEY" ]] && available="${available}google," + + # Remove trailing comma + echo "${available%,}" +} + +# Get file extension for format +get_extension() { + case "$1" in + text) echo "txt" ;; + markdown) echo "md" ;; + html) echo "html" ;; + json) echo "json" ;; + *) echo "txt" ;; + esac +} + +# Parse command line options +VERBOSE="" + +while [[ $# -gt 0 ]]; do + case "$1" in + -a|--agents) + AGENTS_FILE="$2" + shift 2 + ;; + -o|--output) + OUTPUT_DIR="$2" + shift 2 + ;; + -p|--providers) + PROVIDERS="$2" + shift 2 + ;; + -f|--format) + FORMAT="$2" + shift 2 + ;; + -v|--verbose) + VERBOSE="-v" + shift + ;; + -h|--help) + usage 0 + ;; + -*) + error "Unknown option: $1" + ;; + *) + break + ;; + esac +done + +# Check for required arguments +if [[ $# -lt 1 ]]; then + echo "Error: No patch file specified" >&2 + usage 1 +fi + +PATCH_FILE="$1" + +if [[ ! -f "$PATCH_FILE" ]]; then + error "Patch file not found: $PATCH_FILE" +fi + +if [[ ! -f "$ANALYZE_SCRIPT" ]]; then + error "analyze-patch.py not found: $ANALYZE_SCRIPT" +fi + +# Validate format +case "$FORMAT" in + text|markdown|html|json) ;; + *) error "Invalid format: $FORMAT (must be text, markdown, html, or json)" ;; +esac + +# Get providers to use +if [[ -z "$PROVIDERS" ]]; then + PROVIDERS=$(get_available_providers) +fi + +if [[ -z "$PROVIDERS" ]]; then + error "No API keys configured. Set at least one of: "\ +"ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY" +fi + +# Create output directory if specified +if [[ -n "$OUTPUT_DIR" ]]; then + mkdir -p "$OUTPUT_DIR" +fi + +PATCH_BASENAME=$(basename "$PATCH_FILE") +PATCH_STEM="${PATCH_BASENAME%.*}" +EXT=$(get_extension "$FORMAT") + +echo "Reviewing patch: $PATCH_BASENAME" +echo "Providers: $PROVIDERS" +echo "Format: $FORMAT" +echo "========================================" +echo "" + +# Run review for each provider +IFS=',' read -ra PROVIDER_LIST <<< "$PROVIDERS" +for provider in "${PROVIDER_LIST[@]}"; do + echo ">>> Running review with: $provider" + echo "" + + if [[ -n "$OUTPUT_DIR" ]]; then + OUTPUT_FILE="${OUTPUT_DIR}/${PATCH_STEM}-${provider}.${EXT}" + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" | tee "$OUTPUT_FILE" + echo "" + echo "Saved to: $OUTPUT_FILE" + else + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" + fi + + echo "" + echo "========================================" + echo "" +done + +echo "Review comparison complete." + +if [[ -n "$OUTPUT_DIR" ]]; then + echo "All reviews saved to: $OUTPUT_DIR" +fi -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v9 4/4] devtools: add multi-provider AI documentation review script 2026-01-29 20:38 ` [PATCH v9 0/4] devtools: add AI-assisted code review tools Stephen Hemminger ` (2 preceding siblings ...) 2026-01-29 20:38 ` [PATCH v9 3/4] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger @ 2026-01-29 20:38 ` Stephen Hemminger 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-29 20:38 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add review-doc.py script that reviews DPDK documentation files for spelling, grammar, technical correctness, and clarity using AI language models. Supports batch processing of multiple files. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini Output formats (-f/--format): - text: plain text with extractable diff/msg markers (default) - markdown: formatted review document - html: complete HTML document with styling - json: structured data with metadata For each input file, the script produces: - <basename>.{txt,md,html,json}: review in selected format - <basename>.diff: unified diff (text/json, or with -d flag) - <basename>.msg: commit message (text/json, or with -d flag) The commit message prefix is automatically determined from the file path (e.g., doc/guides/prog_guide: for programmer's guide). Features: - Multiple file processing with glob support - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Configurable output directory via -o/--output-dir option - Output format selection via -f/--format option - Force diff/msg generation via -d/--diff option - Quiet mode (-q) suppresses stdout output - Verbose mode (-v) shows token usage and API details - Email integration using git sendemail configuration - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/review-doc.py doc/guides/prog_guide/mempool_lib.rst ./devtools/review-doc.py doc/guides/nics/*.rst ./devtools/review-doc.py -f html -d -o /tmp doc/guides/nics/*.rst ./devtools/review-doc.py --send-email --to dev@dpdk.org file.rst Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/review-doc.py | 1098 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1098 insertions(+) create mode 100755 devtools/review-doc.py diff --git a/devtools/review-doc.py b/devtools/review-doc.py new file mode 100755 index 0000000000..1366aa0f85 --- /dev/null +++ b/devtools/review-doc.py @@ -0,0 +1,1098 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Review DPDK documentation files using AI providers. + +Produces a diff file and commit message compliant with DPDK standards. +Accepts multiple documentation files and generates output for each. +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import getpass +import json +import os +import re +import smtplib +import ssl +import subprocess +import sys +from email.message import EmailMessage +from pathlib import Path +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Map output format to file extension +FORMAT_EXTENSIONS = { + "text": ".txt", + "markdown": ".md", + "html": ".html", + "json": ".json", +} + +# Additional markers for extracting diff/msg (used with --diff flag) +DIFF_MARKERS_INSTRUCTION = """ + +ADDITIONALLY, at the end of your response, include these exact markers for automated extraction: +---COMMIT_MESSAGE_START--- +(same commit message as above) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(same unified diff as above) +---UNIFIED_DIFF_END--- +""" + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# Commit prefix mappings based on file path +COMMIT_PREFIX_MAP = [ + ("doc/guides/prog_guide/", "doc/guides/prog_guide:"), + ("doc/guides/sample_app_ug/", "doc/guides/sample_app:"), + ("doc/guides/nics/", "doc/guides/nics:"), + ("doc/guides/cryptodevs/", "doc/guides/cryptodevs:"), + ("doc/guides/compressdevs/", "doc/guides/compressdevs:"), + ("doc/guides/eventdevs/", "doc/guides/eventdevs:"), + ("doc/guides/rawdevs/", "doc/guides/rawdevs:"), + ("doc/guides/bbdevs/", "doc/guides/bbdevs:"), + ("doc/guides/gpus/", "doc/guides/gpus:"), + ("doc/guides/dmadevs/", "doc/guides/dmadevs:"), + ("doc/guides/regexdevs/", "doc/guides/regexdevs:"), + ("doc/guides/mldevs/", "doc/guides/mldevs:"), + ("doc/guides/rel_notes/", "doc/guides/rel_notes:"), + ("doc/guides/linux_gsg/", "doc/guides/linux_gsg:"), + ("doc/guides/freebsd_gsg/", "doc/guides/freebsd_gsg:"), + ("doc/guides/windows_gsg/", "doc/guides/windows_gsg:"), + ("doc/guides/tools/", "doc/guides/tools:"), + ("doc/guides/testpmd_app_ug/", "doc/guides/testpmd:"), + ("doc/guides/howto/", "doc/guides/howto:"), + ("doc/guides/contributing/", "doc/guides/contributing:"), + ("doc/guides/platform/", "doc/guides/platform:"), + ("doc/guides/", "doc:"), + ("doc/api/", "doc/api:"), + ("doc/", "doc:"), +] + +SYSTEM_PROMPT = """\ +You are an expert technical documentation reviewer for DPDK. +Your task is to review documentation files and suggest improvements for: +- Spelling errors +- Grammar issues +- Technical correctness +- Clarity and readability +- Consistency with DPDK terminology + +IMPORTANT COMMIT MESSAGE RULES (from check-git-log.sh): +- Subject line MUST be ≤60 characters +- Format: "prefix: lowercase description" +- First word after colon must be lowercase (except acronyms like Rx, Tx, VF, MAC, API) +- Use imperative mood (e.g., "fix typo" not "fixed typo" or "fixes typo") +- NO trailing period on subject line +- NO punctuation marks: , ; ! ? & | +- NO underscores in subject after colon +- Body lines wrapped at 75 characters +- Body must NOT start with "It" +- Do NOT include Signed-off-by (user adds via git commit --sign) +- Only use "Fixes:" tag for actual errors in documentation, not style improvements + +Case-sensitive terms (must use exact case): +- Rx, Tx (not RX, TX, rx, tx) +- VF, PF (not vf, pf) +- MAC, VLAN, RSS, API +- Linux, Windows, FreeBSD + +For style/clarity improvements, do NOT use Fixes tag. +For actual errors (wrong information, broken examples), include Fixes tag \ +if you can identify the commit.""" + +FORMAT_INSTRUCTIONS = { + "text": """ +OUTPUT FORMAT: +You must output exactly two sections: + +1. COMMIT_MESSAGE section containing the complete commit message +2. UNIFIED_DIFF section containing the unified diff + +Use these exact markers: +---COMMIT_MESSAGE_START--- +(commit message here) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(unified diff here) +---UNIFIED_DIFF_END--- + +The diff should be in unified format that can be applied with "git apply". +If no changes are needed, output empty sections with a note.""", + "markdown": """ +OUTPUT FORMAT: +Provide your review in Markdown format with: + +## Summary +Brief description of changes + +## Commit Message +``` +(complete commit message here, ready to use) +``` + +## Changes +For each change: +### Issue N: Brief title +- **Location**: file path and line +- **Problem**: description +- **Fix**: suggested correction + +## Unified Diff +```diff +(unified diff here) +```""", + "html": """ +OUTPUT FORMAT: +Provide your review in HTML format with: +- <h2> for sections (Summary, Commit Message, Changes, Diff) +- <pre><code> for commit message and diff +- <ul>/<li> for individual issues +- Do NOT include <html>, <head>, or <body> tags - just the content + +Include sections for: Summary, Commit Message, Changes, Unified Diff""", + "json": """ +OUTPUT FORMAT: +Provide your review as JSON with this structure: +{ + "summary": "Brief description of changes", + "commit_message": "Complete commit message ready to use", + "changes": [ + { + "type": "spelling|grammar|technical|clarity|style", + "location": "line number or section", + "original": "original text", + "suggested": "corrected text", + "reason": "why this change" + } + ], + "diff": "unified diff as a string", + "stats": { + "total_issues": 0, + "spelling": 0, + "grammar": 0, + "technical": 0, + "clarity": 0 + } +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """\ +Review the following DPDK documentation file and provide improvements. + +File path: {doc_file} +Commit message prefix to use: {commit_prefix} + +{format_instruction} + +---DOCUMENT CONTENT--- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def get_smtp_config(): + """Get SMTP configuration from git config sendemail settings.""" + config = { + "server": get_git_config("sendemail.smtpserver"), + "port": get_git_config("sendemail.smtpserverport"), + "user": get_git_config("sendemail.smtpuser"), + "encryption": get_git_config("sendemail.smtpencryption"), + "password": get_git_config("sendemail.smtppass"), + } + + # Set defaults + if not config["port"]: + if config["encryption"] == "ssl": + config["port"] = "465" + else: + config["port"] = "587" + + # Convert port to int + if config["port"]: + config["port"] = int(config["port"]) + + return config + + +def get_commit_prefix(filepath): + """Determine commit message prefix from file path.""" + for prefix_path, prefix in COMMIT_PREFIX_MAP: + if filepath.startswith(prefix_path): + return prefix + return "doc:" + + +def build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": SYSTEM_PROMPT}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + doc_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + doc_content, + }, + ], + } + + +def build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": SYSTEM_PROMPT}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + doc_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: " f"{usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: " f"{usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def parse_review_text(review_text): + """Extract commit message and diff from text format response.""" + commit_msg = "" + diff = "" + + # Extract commit message + msg_match = re.search( + r"---COMMIT_MESSAGE_START---\s*\n(.*?)\n---COMMIT_MESSAGE_END---", + review_text, + re.DOTALL, + ) + if msg_match: + commit_msg = msg_match.group(1).strip() + + # Extract unified diff + diff_match = re.search( + r"---UNIFIED_DIFF_START---\s*\n(.*?)\n---UNIFIED_DIFF_END---", + review_text, + re.DOTALL, + ) + if diff_match: + diff = diff_match.group(1).strip() + # Clean up any markdown code fence if present + diff = re.sub(r"^```diff\s*\n?", "", diff) + diff = re.sub(r"\n?```\s*$", "", diff) + + return commit_msg, diff + + +def strip_diff_markers(text): + """Remove the diff/msg extraction markers from text.""" + # Remove commit message markers and content + text = re.sub( + r"\n*---COMMIT_MESSAGE_START---\s*\n.*?\n---COMMIT_MESSAGE_END---\s*", + "", + text, + flags=re.DOTALL, + ) + # Remove unified diff markers and content + text = re.sub( + r"\n*---UNIFIED_DIFF_START---\s*\n.*?\n---UNIFIED_DIFF_END---\s*", + "", + text, + flags=re.DOTALL, + ) + return text.strip() + + +def send_email( + to_addrs, + cc_addrs, + from_addr, + subject, + in_reply_to, + body, + dry_run=False, + verbose=False, +): + """Send review email via SMTP using git sendemail config.""" + # Build email message + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(msg.as_string(), file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Get SMTP configuration from git config + smtp_config = get_smtp_config() + + if not smtp_config["server"]: + error("No SMTP server configured. Set git config sendemail.smtpserver") + + server = smtp_config["server"] + port = smtp_config["port"] + user = smtp_config["user"] + encryption = smtp_config["encryption"] + + # Get password from environment or git config, or prompt + password = os.environ.get("SMTP_PASSWORD") or smtp_config["password"] + if user and not password: + password = getpass.getpass(f"SMTP password for {user}@{server}: ") + + if verbose: + print(f"SMTP server: {server}:{port}", file=sys.stderr) + print(f"SMTP user: {user or '(none)'}", file=sys.stderr) + print(f"Encryption: {encryption or 'starttls'}", file=sys.stderr) + + # Collect all recipients + all_recipients = list(to_addrs) + if cc_addrs: + all_recipients.extend(cc_addrs) + + try: + if encryption == "ssl": + # SSL/TLS connection from the start (port 465) + context = ssl.create_default_context() + with smtplib.SMTP_SSL(server, port, context=context) as smtp: + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + else: + # STARTTLS (port 587) or plain (port 25) + with smtplib.SMTP(server, port) as smtp: + smtp.ehlo() + if encryption == "tls" or port == 587: + context = ssl.create_default_context() + smtp.starttls(context=context) + smtp.ehlo() + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + + print(f"Email sent via SMTP ({server}:{port})", file=sys.stderr) + return True + + except smtplib.SMTPAuthenticationError as e: + error(f"SMTP authentication failed: {e}") + except smtplib.SMTPException as e: + error(f"SMTP error: {e}") + except OSError as e: + error(f"Connection error to {server}:{port}: {e}") + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Review DPDK documentation files using AI providers. " + "Accepts multiple files and generates output for each.", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s doc/guides/prog_guide/mempool_lib.rst + %(prog)s doc/guides/nics/*.rst # Review all NIC docs + %(prog)s -p openai -o /tmp doc/guides/nics/ixgbe.rst doc/guides/nics/i40e.rst + %(prog)s -f html -d -o /tmp/reviews doc/guides/nics/*.rst # HTML + diff files + %(prog)s -f json -o /tmp doc/guides/howto/flow_bifurcation.rst + %(prog)s --send-email --to dev@dpdk.org doc/guides/nics/ixgbe.rst + +Output files (in output-dir): + <basename>.txt|.md|.html|.json Review in selected format + <basename>.diff Unified diff (text/json, or with --diff) + <basename>.msg Commit message (text/json, or with --diff) + +After review: + git apply <basename>.diff + git commit -sF <basename>.msg + +SMTP Configuration (from git config): + sendemail.smtpserver SMTP server hostname + sendemail.smtpserverport SMTP port (default: 587 for TLS, 465 for SSL) + sendemail.smtpuser SMTP username + sendemail.smtpencryption 'tls' for STARTTLS, 'ssl' for SSL/TLS + sendemail.smtppass SMTP password (or set SMTP_PASSWORD env var) + +Example git config: + git config --global sendemail.smtpserver smtp.gmail.com + git config --global sendemail.smtpserverport 587 + git config --global sendemail.smtpuser yourname@gmail.com + git config --global sendemail.smtpencryption tls + """, + ) + + parser.add_argument( + "doc_files", + nargs="+", + metavar="doc_file", + help="Documentation file(s) to review", + ) + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=8192, + help="Max tokens for response (default: 8192)", + ) + parser.add_argument( + "-o", + "--output-dir", + default=".", + help="Output directory for all output files (default: .)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-q", + "--quiet", + action="store_true", + help="Suppress review output to stdout (only write files)", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-d", + "--diff", + action="store_true", + help="Always produce .diff and .msg files (automatic for text/json)", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + # Validate all doc files exist before processing + doc_paths = [] + for doc_file in args.doc_files: + doc_path = Path(doc_file) + if not doc_path.exists(): + error(f"Documentation file not found: {doc_file}") + doc_paths.append((doc_file, doc_path)) + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Read AGENTS.md once + agents_content = agents_path.read_text() + output_dir = Path(args.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + provider_name = config["name"] + + # Process each file + num_files = len(doc_paths) + for file_idx, (doc_file, doc_path) in enumerate(doc_paths, 1): + if num_files > 1: + print( + f"\n{'=' * 60}", + file=sys.stderr, + ) + print( + f"Processing file {file_idx}/{num_files}: {doc_file}", + file=sys.stderr, + ) + print( + f"{'=' * 60}", + file=sys.stderr, + ) + + # Determine output filenames + doc_basename = doc_path.stem + diff_file = output_dir / f"{doc_basename}.diff" + msg_file = output_dir / f"{doc_basename}.msg" + + # Get commit prefix + commit_prefix = get_commit_prefix(doc_file) + + # Read doc content + doc_content = doc_path.read_text() + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Doc file: {doc_file}", file=sys.stderr) + print(f"Commit prefix: {commit_prefix}", file=sys.stderr) + print(f"Output dir: {args.output_dir}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + args.output_format, + args.diff, + args.verbose, + ) + + if not review_text: + print( + f"Warning: No response received for {doc_file}", + file=sys.stderr, + ) + continue + + # Determine review output file + format_ext = FORMAT_EXTENSIONS[args.output_format] + review_file = output_dir / f"{doc_basename}{format_ext}" + + # Determine if we should write diff/msg files + write_diff_msg = args.diff or args.output_format in ("text", "json") + + # Extract commit message and diff first (before stripping markers) + commit_msg, diff = "", "" + if write_diff_msg: + if args.output_format == "json": + # Will extract from JSON below + pass + else: + # Parse from text format markers + commit_msg, diff = parse_review_text(review_text) + + # For non-text formats with --diff, strip the markers from display output + display_text = review_text + if args.diff and args.output_format in ("markdown", "html"): + display_text = strip_diff_markers(review_text) + + # Build formatted output text + if args.output_format == "text": + output_text = review_text + elif args.output_format == "json": + # Try to parse JSON response + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + print("Warning: Response is not valid JSON", file=sys.stderr) + review_data = {"raw_response": review_text} + + # Extract diff/msg from JSON if present + if write_diff_msg: + if isinstance(review_data, dict) and "raw_response" not in review_data: + commit_msg = review_data.get("commit_message", "") + diff = review_data.get("diff", "") + + # Add metadata + output_data = { + "metadata": { + "doc_file": doc_file, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "commit_prefix": commit_prefix, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "markdown": + output_text = f"""# Documentation Review: {doc_path.name} + +*Reviewed by {provider_name} ({model})* + +{display_text} +""" + elif args.output_format == "html": + output_text = f"""<!DOCTYPE html> +<html> +<head> +<meta charset="utf-8"> +<title>Review: {doc_path.name}</title> +<style> +body {{ font-family: system-ui, sans-serif; max-width: 900px; margin: 2em auto; padding: 0 1em; }} +h1 {{ color: #333; }} +.review-meta {{ color: #666; font-style: italic; }} +pre {{ background: #f5f5f5; padding: 1em; overflow-x: auto; }} +</style> +</head> +<body> +<h1>Documentation Review: {doc_path.name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model})</p> +<div class="review-content"> +{display_text} +</div> +</body> +</html> +""" + + # Write formatted review to file + review_file.write_text(output_text) + print(f"Review written to: {review_file}", file=sys.stderr) + + # Write diff/msg files + if write_diff_msg: + if commit_msg: + msg_file.write_text(commit_msg + "\n") + print(f"Commit message written to: {msg_file}", file=sys.stderr) + else: + msg_file.write_text("# No commit message generated\n") + print("Warning: Could not extract commit message", file=sys.stderr) + + if diff: + diff_file.write_text(diff + "\n") + print(f"Diff written to: {diff_file}", file=sys.stderr) + else: + diff_file.write_text("# No changes suggested\n") + print("Warning: Could not extract diff", file=sys.stderr) + + # Print to stdout unless quiet (or multiple files without verbose) + show_stdout = not args.quiet and (num_files == 1 or args.verbose) + if show_stdout: + print( + f"\n=== Documentation Review: {doc_path.name} " + f"(via {provider_name}) ===" + ) + print(output_text) + + # Print usage instructions for text format + if args.output_format == "text": + print("\n=== Output Files ===") + print(f"Commit message: {msg_file}") + print(f"Diff file: {diff_file}") + print("\nTo apply changes:") + print(f" git apply {diff_file}") + print(f" git commit -sF {msg_file}") + + # Send email if requested + if args.send_email: + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + review_subject = f"[REVIEW] {commit_prefix} {doc_path.name}" + + # Build email body + email_body = f"""AI-generated documentation review of {doc_file} +Reviewed using {provider_name} ({model}) + +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + None, + email_body, + args.dry_run, + args.verbose, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + # Print summary for multiple files + if num_files > 1: + print(f"\n{'=' * 60}", file=sys.stderr) + print(f"Processed {num_files} files", file=sys.stderr) + print(f"Output directory: {output_dir}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v10 0/4] devtools: add AI-assisted code review infrastructure 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger ` (8 preceding siblings ...) 2026-01-29 20:38 ` [PATCH v9 0/4] devtools: add AI-assisted code review tools Stephen Hemminger @ 2026-01-30 17:40 ` Stephen Hemminger 2026-01-30 17:41 ` [PATCH v10 1/4] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger ` (3 more replies) 2026-02-05 17:51 ` [PATCH v8 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger 11 siblings, 4 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-30 17:40 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This series adds infrastructure for AI-assisted code review of DPDK patches and documentation. It provides guidelines for AI tools and scripts to interface with multiple AI providers. The AGENTS.md file consolidates DPDK coding standards, commit message requirements, and common review feedback into a format optimized for AI code review tools. The accompanying scripts enable automated patch analysis using Claude, ChatGPT, Grok, or Gemini. Patches: 1. AGENTS.md - Guidelines document for AI review tools 2. analyze-patch.py - Multi-provider patch review script 3. compare-reviews.sh - Compare reviews across providers 4. review-doc.py - Documentation review with diff output Changes in v10: - Catch misuse of pthread_mutex - More language to defend against false positives - Handle large patch bundles by splitting mbox Changes in v9: - Revised prompt to cleanup false positives - Added flag to analyze patch to work for LTS releases --lts and review date Changes in v8: - Batch processing: both scripts now accept multiple files and generate separate output for each - Changed -o/--output to -o/--output-dir for organized file output - Added -d/--diff option to force .diff and .msg file generation - Added -q/--quiet option to suppress stdout when processing batches - Added strip_diff_markers() to clean output in non-text formats - Expanded AGENTS.md with additional guidance and examples - Improved help text with better examples and documentation Stephen Hemminger (4): doc: add AGENTS.md for AI-powered code review tools devtools: add multi-provider AI patch review script devtools: add compare-reviews.sh for multi-provider analysis devtools: add multi-provider AI documentation review script AGENTS.md | 1375 +++++++++++++++++++++++++++++++++++ devtools/analyze-patch.py | 1323 +++++++++++++++++++++++++++++++++ devtools/compare-reviews.sh | 192 +++++ devtools/review-doc.py | 1098 ++++++++++++++++++++++++++++ 4 files changed, 3988 insertions(+) create mode 100644 AGENTS.md create mode 100755 devtools/analyze-patch.py create mode 100644 devtools/compare-reviews.sh create mode 100755 devtools/review-doc.py -- 2.51.0 ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v10 1/4] doc: add AGENTS.md for AI-powered code review tools 2026-01-30 17:40 ` [PATCH v10 0/4] devtools: add AI-assisted code review infrastructure Stephen Hemminger @ 2026-01-30 17:41 ` Stephen Hemminger 2026-01-30 17:41 ` [PATCH v10 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger ` (2 subsequent siblings) 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-30 17:41 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a structured reference document that enables AI code review tools to validate DPDK contributions against project standards. This document consolidates requirements from multiple sources into a machine-readable format optimized for automated validation workflows. The AGENTS.md file synthesizes guidelines from: - DPDK Contributing Code documentation (patches.rst) - DPDK Coding Style guidelines (coding_style.rst) - DPDK validation scripts (check-git-log.sh, checkpatches.sh) - Linux kernel patch submission process - SPDX License Identifier specification - DPDK Coccinelle scripts (cocci) - common items spotted on mailing list review Key sections include: - SPDX license and copyright header requirements - Commit message format with precise limits (60 char subject, 75 char body) and tag ordering rules - C coding style including explicit comparison requirements - Forbidden tokens table derived from checkpatches.sh - API tag placement rules for experimental and internal APIs - Patch validation checklists with severity levels The forbidden tokens section documents restrictions on deprecated atomics, logging functions, threading APIs, and compiler built-ins that are checked by the existing checkpatches.sh infrastructure. Severity levels (error/warning/info) align with the exit codes and messaging from check-git-log.sh and checkpatches.sh to help automated tools prioritize feedback appropriately. References: - https://doc.dpdk.org/guides/contributing/patches.html - https://doc.dpdk.org/guides/contributing/coding_style.html - devtools/check-git-log.sh - devtools/checkpatches.sh - devtools/cocci/ Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- AGENTS.md | 1375 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1375 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..3fd9d786cb --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,1375 @@ +# AGENTS.md - DPDK Code Review Guidelines for AI Tools + +## CRITICAL INSTRUCTION - READ FIRST + +**NEVER list an item under "Errors" or "Warnings" if you conclude it is correct.** + +Before outputting ANY error or warning, verify it is actually wrong. If your analysis concludes with phrases like: +- "there's no issue here" +- "which is fine" +- "appears correct" +- "is acceptable" +- "this is actually correct" + +Then DO NOT INCLUDE IT IN YOUR OUTPUT AT ALL. Delete it. Omit it entirely. + +The "Errors" section is ONLY for actual errors. If something is not an error, it must not appear there. + +--- + +This document provides guidelines for AI-powered code review tools +when reviewing contributions to the Data Plane Development Kit +(DPDK). It is derived from the official DPDK contributor guidelines +and validation scripts. + +## Overview + +DPDK follows a development process modeled on the Linux Kernel. All +patches are reviewed publicly on the mailing list before being +merged. AI review tools should verify compliance with the standards +outlined below. + +## Review Philosophy + +**BEFORE OUTPUTTING YOUR REVIEW**: Re-read each item. If ANY item contains phrases like "is fine", "no issue", "appears correct", "is acceptable", "actually correct" - DELETE THAT ITEM. Do not include it. + +- Only comment when you have HIGH CONFIDENCE (>80%) that an issue exists +- Be concise: one sentence per comment when possible +- Focus on actionable feedback, not observations +- When reviewing text, only comment on clarity issues if the text is genuinely + confusing or could lead to errors. +- Do NOT comment on copyright years unless outside valid range (2013 to current year) +- Do NOT report an issue then contradict yourself - if something is acceptable, do not mention it at all +- Do NOT include items in Errors/Warnings that you then say are "acceptable" or "correct" +- Do NOT mention things that are correct or "not an issue" - only report actual problems +- Do NOT speculate about contributor circumstances (employment, company policies, etc.) +- Before adding any item to your review, ask: "Is this actually wrong?" If no, omit it entirely. +- VERIFY before reporting: For subject line length, COUNT the characters first. If ≤60, do not mention it. +- NEVER write "(Correction: ...)" - if you need to correct yourself, simply omit the item entirely +- Do NOT add vague suggestions like "should be verified" or "should be checked" - either it's wrong or don't mention it +- Do NOT flag something as an Error then say "which is correct" in the same item +- Do NOT say "no issue here" or "this is actually correct" - if there's no issue, do not include it in your review +- Do NOT call the standard DPDK SPDX/copyright format "different style" - it is THE standard +- Do NOT analyze cross-patch dependencies or compilation order - you cannot reliably determine this from patch review +- Do NOT claim a patch "would cause compilation failure" based on symbols used in other patches in the series +- Review each patch individually for its own correctness; assume the patch author ordered them correctly + +## Priority Areas (Review These) + +### Security & Safety +- Unsafe code blocks without justification +- Command injection risks (shell commands, user input) +- Path traversal vulnerabilities +- Credential exposure or hard coded secrets +- Missing input validation on external data +- Improper error handling that could leak sensitive info + +### Correctness Issues +- Logic errors that could cause panics or incorrect behavior +- Buffer overflows +- Race conditions +- Resource leaks (files, connections, memory) +- Off-by-one errors or boundary conditions +- Incorrect error propagation +- Changes to API without release notes +- Changes to ABI on non-LTS release +- Usage of deprecated APIs when replacements exist +- Overly defensive code that adds unnecessary checks +- Unnecessary comments that just restate what the code already shows (remove them) +- **Process-shared synchronization errors** (pthread mutexes in shared memory without `PTHREAD_PROCESS_SHARED`) + +### Architecture & Patterns +- Code that violates existing patterns in the code base +- Missing error handling +- Code that is not safe against signals + +--- + +## Source License Requirements + +### SPDX License Identifiers + +Every source file must begin with an SPDX license identifier, followed +by the copyright notice, then a blank line before other content. + +- SPDX tag on first line (or second line for `#!` scripts) +- Copyright line immediately follows +- Blank line after copyright before any code/includes +- Core libraries and drivers use `BSD-3-Clause` +- Kernel components use `GPL-2.0` +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` + +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 John Smith + */ + +#include <stdio.h> +``` + +For scripts: +```python +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 Jane Doe + +import sys +``` + +**Do not include boilerplate license text** - the SPDX identifier is sufficient. + +**Do NOT flag copyright years** - Copyright years reflect when code was written. Valid years range from 2013 (DPDK's first release) through the current year. Only flag years outside this range (e.g., years before 2013 or future years beyond the current date). + +**Copyright holders can be individuals or organizations** - Both are equally valid. NEVER comment on, question, or speculate about copyright holders. Do not mention employer policies, company resources, or suggest copyright "should" be assigned differently. The copyright holder's choice is not subject to review. + +**The following SPDX/copyright format is correct** - do NOT flag it or comment on it: +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2026 Stephen Hemminger + */ +``` +This is the standard DPDK format. Do not say it is "different" or "unusual" - it is correct. +Do not suggest the copyright year "should be verified" - if it's in the valid range (2013-current year), it's fine. +If SPDX is on line 1 and copyright follows, this is CORRECT - do not include it in Errors. + +--- + +## Commit Message Requirements + +### Subject Line (First Line) + +| Rule | Limit | +|------|-------| +| Maximum length | **60 characters** | +| Format | `component: lowercase description` | +| Case | Lowercase except acronyms | +| Mood | Imperative (instructions to codebase) | +| Punctuation | **No trailing period** | + +**Before flagging subject line length**: Actually count the characters. Only flag if >60. Do not flag then correct yourself. + +``` +# Good examples +net/ixgbe: fix offload config option name +config: increase max queues per port +net/mlx5: add support for flow counters +app/testpmd: fix memory leak in flow create + +# Bad examples +Fixed the offload config option. # past tense, has period, no prefix +net/ixgbe: Fix Offload Config # uppercase after colon +ixgbe: fix something # wrong prefix, should be net/ixgbe +lib/ethdev: add new feature # wrong prefix, should be ethdev: +``` + +#### Headline Format Errors (from check-git-log.sh) + +The following are flagged as errors: +- Tab characters in subject +- Leading or trailing spaces +- Trailing period (`.`) +- Punctuation marks: `, ; ! ? & |` +- Underscores after the colon (indicates code in subject) +- Missing colon separator +- No space after colon +- Space before colon + +#### Common Prefix Mistakes + +| Wrong | Correct | +|-------|---------| +| `ixgbe:` | `net/ixgbe:` | +| `lib/ethdev:` | `ethdev:` | +| `example:` | `examples/foo:` | +| `apps/` | `app/name:` | +| `app/test:` | `test:` | +| `testpmd:` | `app/testpmd:` | +| `test-pmd:` | `app/testpmd:` | +| `bond:` | `net/bonding:` | + +#### Case-Sensitive Terms (Commit Messages Only) + +These terms must use exact capitalization **in commit messages** (from `devtools/words-case.txt`): +- `Rx`, `Tx` (not `RX`, `TX`, `rx`, `tx`) +- `VF`, `PF` (not `vf`, `pf`) +- `MAC`, `VLAN`, `RSS`, `API` +- `Linux`, `Windows`, `FreeBSD` +- Check `devtools/words-case.txt` for complete list + +**Note**: These rules apply to commit messages only, NOT to code comments or documentation. + +### Commit Body + +| Rule | Limit | +|------|-------| +| Line wrap | **75 characters** | +| Exception | `Fixes:` lines may exceed 75 chars | + +Body guidelines: +- Describe the issue being fixed or feature being added +- Provide enough context for reviewers +- **Do not start the commit message body with "It"** +- **Must end with** `Signed-off-by:` line (real name, not alias) + +### Fixes Tag + +When fixing regressions, use the `Fixes:` tag with a 12-character abbreviated SHA: + +``` +Fixes: abcdefgh1234 ("original commit subject") +``` + +The hash must reference a commit in the current branch, and the subject must match exactly. + +**Do NOT flag Fixes tags** asking for verification that the commit "exists in the tree" or "cannot verify" - you cannot verify this from a patch review. If the format is correct (12-char SHA, quoted subject), accept it. NEVER say "cannot verify this exists". + +**Finding maintainers**: Use `devtools/get-maintainer.sh` to identify the current subsystem maintainer from the `MAINTAINERS` file, rather than CC'ing the original author: + +```bash +git send-email --to-cmd ./devtools/get-maintainer.sh --cc dev@dpdk.org 000*.patch +``` + +### Required Tags + +``` +# For Coverity issues (required if "coverity" mentioned in body): +Coverity issue: 12345 + +# For Bugzilla issues (required if "bugzilla" mentioned in body): +Bugzilla ID: 12345 + +# For stable release backport candidates: +Cc: stable@dpdk.org + +# For patch dependencies (in commit notes after ---): +Depends-on: series-NNNNN ("Title of the series") +``` + +### Tag Order + +Tags must appear in this order, with a blank line separating the two groups: + +**Group 1** (optional tags, no blank lines within this group): +- Coverity issue: +- Bugzilla ID: +- Fixes: +- Cc: + +**Blank line required here** (only if Group 1 tags are present) + +**Group 2** (no blank lines within this group): +- Reported-by: +- Suggested-by: +- Signed-off-by: +- Acked-by: +- Reviewed-by: +- Tested-by: + +**Correct examples:** + +Simple patch with no Group 1 tags (most common): +``` +The info_get callback doesn't need to check its args +since already done by ethdev. + +Signed-off-by: John Smith <john@example.com> +``` + +Patch with Fixes and Cc tags: +``` +Fixes: c743e50c475f ("null: new poll mode driver") +Cc: stable@dpdk.org + +Signed-off-by: John Smith <john@example.com> +``` + +Patch with only Fixes tag: +``` +Fixes: abcd1234abcd ("component: original commit") + +Signed-off-by: Jane Doe <jane@example.com> +``` + +**What is correct (do NOT flag):** +- Signed-off-by directly after commit body when there are no Group 1 tags - this is CORRECT +- No blank line between `Fixes:` and `Cc:` - this is CORRECT +- Blank line between `Cc:` (or last tag in Group 1) and `Signed-off-by:` - this is CORRECT + +**What is wrong (DO flag):** +- Missing blank line between Group 1 tags and Group 2 tags (when Group 1 tags exist) + +**Tag format**: `Tag-name: Full Name <email@domain.com>` + +--- + +## C Coding Style + +### Line Length + +| Context | Limit | +|---------|-------| +| Source code | **100 characters** | +| Commit body | **75 characters** | + +### General Formatting + +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) +- **No trailing whitespace** on lines or at end of files +- Files must end with a new line +- Code style should be consistent within each file + + +### Comments + +```c +/* Most single-line comments look like this. */ + +/* + * VERY important single-line comments look like this. + */ + +/* + * Multi-line comments look like this. Make them real sentences. Fill + * them so they look like real paragraphs. + */ +``` + +### Header File Organization + +Include order (each group separated by blank line): +1. System/libc includes +2. DPDK EAL includes +3. DPDK misc library includes +4. Application-specific includes + +```c +#include <stdio.h> +#include <stdlib.h> + +#include <rte_eal.h> + +#include <rte_ring.h> +#include <rte_mempool.h> + +#include "application.h" +``` + +### Header Guards + +```c +#ifndef _FILE_H_ +#define _FILE_H_ + +/* Code */ + +#endif /* _FILE_H_ */ +``` + +### Naming Conventions + +- **All external symbols** must have `RTE_` or `rte_` prefix +- **Macros**: ALL_UPPERCASE with `RTE_` prefix +- **Functions**: lowercase with underscores only (no CamelCase) +- **Variables**: lowercase with underscores only +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix + +**Exception**: Driver base directories (`drivers/*/base/`) may use different +naming conventions when sharing code across platforms or with upstream vendor code. + +#### Symbol Naming for Static Linking + +Drivers and libraries must not expose global variables that could +clash when statically linked with other DPDK components or +applications. Use consistent and unique prefixes for all exported +symbols to avoid namespace collisions. + +**Good practice**: Use a driver-specific or library-specific prefix for all global variables: + +```c +/* Good - virtio driver uses consistent "virtio_" prefix */ +const struct virtio_ops virtio_legacy_ops = { + .read = virtio_legacy_read, + .write = virtio_legacy_write, + .configure = virtio_legacy_configure, +}; + +const struct virtio_ops virtio_modern_ops = { + .read = virtio_modern_read, + .write = virtio_modern_write, + .configure = virtio_modern_configure, +}; + +/* Good - mlx5 driver uses consistent "mlx5_" prefix */ +struct mlx5_flow_driver_ops mlx5_flow_dv_ops; +``` + +**Bad practice**: Generic names that may clash: + +```c +/* Bad - "ops" is too generic, will clash with other drivers */ +const struct virtio_ops ops = { ... }; + +/* Bad - "legacy_ops" could clash with other legacy implementations */ +const struct virtio_ops legacy_ops = { ... }; + +/* Bad - "driver_config" is not unique */ +struct driver_config config; +``` + +**Guidelines**: +- Prefix all global variables with the driver or library name (e.g., `virtio_`, `mlx5_`, `ixgbe_`) +- Prefix all global functions similarly unless they use the `rte_` namespace +- Internal static variables do not require prefixes as they have file scope +- Consider using the `RTE_` or `rte_` prefix only for symbols that are part of the public DPDK API + +#### Prohibited Terminology + +Do not use non-inclusive naming including: +- `master/slave` → Use: primary/secondary, controller/worker, leader/follower +- `blacklist/whitelist` → Use: denylist/allowlist, blocklist/passlist +- `cripple` → Use: impacted, degraded, restricted, immobolized +- `tribe` → Use: team, squad +- `sanity check` → Use: coherence check, test, verification + + +### Comparisons and Boolean Logic + +```c +/* Pointers - compare explicitly with NULL */ +if (p == NULL) /* Good */ +if (p != NULL) /* Good */ +if (likely(p != NULL)) /* Good - likely/unlikely don't change this */ +if (unlikely(p == NULL)) /* Good - likely/unlikely don't change this */ +if (!p) /* Bad - don't use ! on pointers */ + +/* Integers - compare explicitly with zero */ +if (a == 0) /* Good */ +if (a != 0) /* Good */ +if (errno != 0) /* Good - this IS explicit */ +if (likely(a != 0)) /* Good - likely/unlikely don't change this */ +if (!a) /* Bad - don't use ! on integers */ +if (a) /* Bad - implicit, should be a != 0 */ + +/* Characters - compare with character constant */ +if (*p == '\0') /* Good */ + +/* Booleans - direct test is acceptable */ +if (flag) /* Good for actual bool types */ +if (!flag) /* Good for actual bool types */ +``` + +**Explicit comparison** means using `==` or `!=` operators (e.g., `x != 0`, `p == NULL`). +**Implicit comparison** means relying on truthiness without an operator (e.g., `if (x)`, `if (!p)`). +**Note**: `likely()` and `unlikely()` macros do NOT affect whether a comparison is explicit or implicit. + +### Boolean Usage + +- Using `bool` type is allowed +- Prefer `bool` over `int` when a variable or field is only used as a boolean +- For structure fields, consider if the size/alignment impact is acceptable + +### Indentation and Braces + +```c +/* Control statements - no braces for single statements */ +if (val != NULL) + val = realloc(val, newsize); + +/* Braces on same line as else */ +if (test) + stmt; +else if (bar) { + stmt; + stmt; +} else + stmt; + +/* Switch statements - don't indent case */ +switch (ch) { +case 'a': + aflag = 1; + /* FALLTHROUGH */ +case 'b': + bflag = 1; + break; +default: + usage(); +} + +/* Long conditions - double indent continuation */ +if (really_long_variable_name_1 == really_long_variable_name_2 && + really_long_variable_name_3 == really_long_variable_name_4) + stmt; +``` + +### Variable Declarations + +- Prefer declaring variables inside the basic block where they are used +- Variables may be declared either at the start of the block, or at point of first use (C99 style) +- Both declaration styles are acceptable; consistency within a function is preferred +- Initialize variables only when a meaningful value exists at declaration time +- Use C99 designated initializers for structures + +```c +/* Good - declaration at start of block */ +int ret; +ret = some_function(); + +/* Also good - declaration at point of use (C99 style) */ +for (int i = 0; i < count; i++) + process(i); + +/* Good - declaration in inner block where variable is used */ +if (condition) { + int local_val = compute(); + use(local_val); +} + +/* Bad - unnecessary initialization defeats compiler warnings */ +int ret = 0; +ret = some_function(); /* Compiler won't warn if assignment removed */ +``` + +### Function Format + +- Return type on its own line +- Opening brace on its own line +- Place an empty line between declarations and statements + +```c +static char * +function(int a1, int b1) +{ + char *p; + + p = do_something(a1, b1); + return p; +} +``` + +--- + +## Unnecessary Code Patterns + +The following patterns add unnecessary code, hide bugs, or reduce performance. Avoid them. + +### Unnecessary Variable Initialization + +Do not initialize variables that will be assigned before use. This defeats the compiler's uninitialized variable warnings, hiding potential bugs. + +```c +/* Bad - initialization defeats -Wuninitialized */ +int ret = 0; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - compiler will warn if any path misses assignment */ +int ret; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - meaningful initial value */ +int count = 0; +for (i = 0; i < n; i++) + if (test(i)) + count++; +``` + +### Unnecessary Casts of void * + +In C, `void *` converts implicitly to any pointer type. Casting the result of `malloc()`, `calloc()`, `rte_malloc()`, or similar functions is unnecessary and can hide the error of a missing `#include <stdlib.h>`. + +```c +/* Bad - unnecessary cast */ +struct foo *p = (struct foo *)malloc(sizeof(*p)); +struct bar *q = (struct bar *)rte_malloc(NULL, sizeof(*q), 0); + +/* Good - no cast needed in C */ +struct foo *p = malloc(sizeof(*p)); +struct bar *q = rte_malloc(NULL, sizeof(*q), 0); +``` + +Note: Casts are required in C++ but DPDK is a C project. + +### Zero-Length Arrays vs Variable-Length Arrays + +Zero-length arrays (`int arr[0]`) are a GCC extension. Use C99 flexible array members instead. + +```c +/* Bad - GCC extension */ +struct msg { + int len; + char data[0]; +}; + +/* Good - C99 flexible array member */ +struct msg { + int len; + char data[]; +}; +``` + +### Unnecessary NULL Checks Before free() + +Functions like `free()`, `rte_free()`, and similar deallocation functions accept NULL pointers safely. Do not add redundant NULL checks. + +```c +/* Bad - unnecessary check */ +if (ptr != NULL) + free(ptr); + +if (rte_ptr != NULL) + rte_free(rte_ptr); + +/* Good - free handles NULL */ +free(ptr); +rte_free(rte_ptr); +``` + +### memset Before free() + +Do not call `memset()` to zero memory before freeing it. The compiler may optimize away the `memset()` as a dead store. For security-sensitive data, use `rte_free_sensitive()` which ensures memory is cleared. + +```c +/* Bad - compiler may eliminate memset */ +memset(secret_key, 0, sizeof(secret_key)); +free(secret_key); + +/* Good - for non-sensitive data, just free */ +free(ptr); + +/* Good - for sensitive data, use secure free */ +rte_free_sensitive(secret_key); +``` + +### Appropriate Use of rte_malloc() + +`rte_malloc()` allocates from hugepage memory. Use it only when required: + +- Memory that will be accessed by DMA (NIC descriptors, packet buffers) +- Memory shared between primary and secondary DPDK processes +- Memory requiring specific NUMA node placement + +For general allocations, use standard `malloc()` which is faster and does not consume limited hugepage resources. + +```c +/* Bad - rte_malloc for ordinary data structure */ +struct config *cfg = rte_malloc(NULL, sizeof(*cfg), 0); + +/* Good - standard malloc for control structures */ +struct config *cfg = malloc(sizeof(*cfg)); + +/* Good - rte_malloc for DMA-accessible memory */ +struct rte_mbuf *mbufs = rte_malloc(NULL, n * sizeof(*mbufs), RTE_CACHE_LINE_SIZE); +``` + +### Appropriate Use of rte_memcpy() + +`rte_memcpy()` is optimized for bulk data transfer in the fast path. For general use, standard `memcpy()` is preferred because: + +- Modern compilers optimize `memcpy()` effectively +- `memcpy()` includes bounds checking with `_FORTIFY_SOURCE` +- `memcpy()` handles small fixed-size copies efficiently + +```c +/* Bad - rte_memcpy in control path */ +rte_memcpy(&config, &default_config, sizeof(config)); + +/* Good - standard memcpy for control path */ +memcpy(&config, &default_config, sizeof(config)); + +/* Good - rte_memcpy for packet data in fast path */ +rte_memcpy(rte_pktmbuf_mtod(m, void *), payload, len); +``` + +--- + +## Forbidden Tokens + +### Functions + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `rte_panic()` | Return error codes | lib/, drivers/ | +| `rte_exit()` | Return error codes | lib/, drivers/ | +| `perror()` | `RTE_LOG()` with `strerror(errno)` | lib/, drivers/ (allowed in examples/, app/test/) | +| `printf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | +| `fprintf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | + +### Atomics and Memory Barriers + +| Forbidden | Preferred | +|-----------|-----------| +| `rte_atomic16/32/64_xxx()` | C11 atomics via `rte_atomic_xxx()` | +| `rte_smp_mb()` | `rte_atomic_thread_fence()` | +| `rte_smp_rmb()` | `rte_atomic_thread_fence()` | +| `rte_smp_wmb()` | `rte_atomic_thread_fence()` | +| `__sync_xxx()` | `rte_atomic_xxx()` | +| `__atomic_xxx()` | `rte_atomic_xxx()` | +| `__ATOMIC_RELAXED` etc. | `rte_memory_order_xxx` | +| `__rte_atomic_thread_fence()` | `rte_atomic_thread_fence()` | + +### Threading + +| Forbidden | Preferred | +|-----------|-----------| +| `pthread_create()` | `rte_thread_create()` | +| `pthread_join()` | `rte_thread_join()` | +| `pthread_detach()` | EAL thread functions | +| `pthread_setaffinity_np()` | `rte_thread_set_affinity()` | +| `rte_thread_set_name()` | `rte_thread_set_prefixed_name()` | +| `rte_thread_create_control()` | `rte_thread_create_internal_control()` | + +### Process-Shared Synchronization + +When placing synchronization primitives in shared memory (memory accessible by multiple processes, such as DPDK primary/secondary processes or `mmap`'d regions), they **must** be initialized with process-shared attributes. Failure to do so causes **undefined behavior** that may appear to work in testing but fail unpredictably in production. + +#### pthread Mutexes in Shared Memory + +**This is an error** - mutex in shared memory without `PTHREAD_PROCESS_SHARED`: + +```c +/* BAD - undefined behavior when used across processes */ +struct shared_data { + pthread_mutex_t lock; + int counter; +}; + +void init_shared(struct shared_data *shm) { + pthread_mutex_init(&shm->lock, NULL); /* ERROR: missing pshared attribute */ +} +``` + +**Correct implementation**: + +```c +/* GOOD - properly initialized for cross-process use */ +struct shared_data { + pthread_mutex_t lock; + int counter; +}; + +int init_shared(struct shared_data *shm) { + pthread_mutexattr_t attr; + int ret; + + ret = pthread_mutexattr_init(&attr); + if (ret != 0) + return -ret; + + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); + if (ret != 0) { + pthread_mutexattr_destroy(&attr); + return -ret; + } + + ret = pthread_mutex_init(&shm->lock, &attr); + pthread_mutexattr_destroy(&attr); + + return -ret; +} +``` + +#### pthread Condition Variables in Shared Memory + +Condition variables also require the process-shared attribute: + +```c +/* BAD - will not work correctly across processes */ +pthread_cond_init(&shm->cond, NULL); + +/* GOOD */ +pthread_condattr_t cattr; +pthread_condattr_init(&cattr); +pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED); +pthread_cond_init(&shm->cond, &cattr); +pthread_condattr_destroy(&cattr); +``` + +#### pthread Read-Write Locks in Shared Memory + +```c +/* BAD */ +pthread_rwlock_init(&shm->rwlock, NULL); + +/* GOOD */ +pthread_rwlockattr_t rwattr; +pthread_rwlockattr_init(&rwattr); +pthread_rwlockattr_setpshared(&rwattr, PTHREAD_PROCESS_SHARED); +pthread_rwlock_init(&shm->rwlock, &rwattr); +pthread_rwlockattr_destroy(&rwattr); +``` + +#### When to Flag This Issue + +Flag as an **Error** when ALL of the following are true: +1. A `pthread_mutex_t`, `pthread_cond_t`, `pthread_rwlock_t`, or `pthread_barrier_t` is initialized +2. The primitive is stored in shared memory (identified by context such as: structure in `rte_malloc`/`rte_memzone`, `mmap`'d memory, memory passed to secondary processes, or structures documented as shared) +3. The initialization uses `NULL` attributes or attributes without `PTHREAD_PROCESS_SHARED` + +**Do NOT flag** when: +- The mutex is in thread-local or process-private heap memory (`malloc`) +- The mutex is a local/static variable not in shared memory +- The code already uses `pthread_mutexattr_setpshared()` with `PTHREAD_PROCESS_SHARED` +- The synchronization uses DPDK primitives (`rte_spinlock_t`, `rte_rwlock_t`) which are designed for shared memory + +#### Preferred Alternatives + +For DPDK code, prefer DPDK's own synchronization primitives which are designed for shared memory: + +| pthread Primitive | DPDK Alternative | +|-------------------|------------------| +| `pthread_mutex_t` | `rte_spinlock_t` (busy-wait) or properly initialized pthread mutex | +| `pthread_rwlock_t` | `rte_rwlock_t` | +| `pthread_spinlock_t` | `rte_spinlock_t` | + +Note: `rte_spinlock_t` and `rte_rwlock_t` work correctly in shared memory without special initialization, but they are spinning locks unsuitable for long wait times. + +### Compiler Built-ins and Attributes + +| Forbidden | Preferred | Notes | +|-----------|-----------|-------| +| `__attribute__` | RTE macros in `rte_common.h` | Except in `lib/eal/include/rte_common.h` | +| `__alignof__` | C11 `alignof` | | +| `__typeof__` | `typeof` | | +| `__builtin_*` | EAL macros | Except in `lib/eal/` and `drivers/*/base/` | +| `__reserved` | Different name | Reserved in Windows headers | +| `#pragma` / `_Pragma` | Avoid | Except in `rte_common.h` | + +### Format Specifiers + +| Forbidden | Preferred | +|-----------|-----------| +| `%lld`, `%llu`, `%llx` | `%PRId64`, `%PRIu64`, `%PRIx64` | + +### Headers and Build + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `#include <linux/pci_regs.h>` | `#include <rte_pci.h>` | | +| `install_headers()` | Meson `headers` variable | meson.build | +| `-DALLOW_EXPERIMENTAL_API` | Not in lib/drivers/app | Build flags | +| `allow_experimental_apis` | Not in lib/drivers/app | Meson | +| `#undef XXX` | `// XXX is not set` | config/rte_config.h | +| Driver headers (`*_driver.h`, `*_pmd.h`) | Public API headers | app/, examples/ | + +### Testing + +| Forbidden | Preferred | +|-----------|-----------| +| `REGISTER_TEST_COMMAND` | `REGISTER_<suite_name>_TEST` | + +### Documentation + +| Forbidden | Preferred | +|-----------|-----------| +| `http://...dpdk.org` | `https://...dpdk.org` | +| `//doc.dpdk.org/guides/...` | `:ref:` or `:doc:` Sphinx references | +| `:: file.svg` | `:: file.*` (wildcard extension) | + +--- + +## Deprecated API Usage + +New patches must not introduce usage of deprecated APIs, macros, or functions. +Deprecated items are marked with `RTE_DEPRECATED` or documented in the +deprecation notices section of the release notes. + +### Rules for New Code + +- Do not call functions marked with `RTE_DEPRECATED` or `__rte_deprecated` +- Do not use macros that have been superseded by newer alternatives +- Do not use data structures or enum values marked as deprecated +- Check `doc/guides/rel_notes/deprecation.rst` for planned deprecations +- When a deprecated API has a replacement, use the replacement + +### Deprecating APIs + +A patch may mark an API as deprecated provided: + +- No remaining usages exist in the current DPDK codebase +- The deprecation is documented in the release notes +- A migration path or replacement API is documented +- The `RTE_DEPRECATED` macro is used to generate compiler warnings + +```c +/* Marking a function as deprecated */ +__rte_deprecated +int +rte_old_function(void); + +/* With a message pointing to the replacement */ +__rte_deprecated_msg("use rte_new_function() instead") +int +rte_old_function(void); +``` + +### Common Deprecated Patterns + +| Deprecated | Replacement | Notes | +|-----------|-------------|-------| +| `rte_atomic*_t` types | C11 atomics | Use `rte_atomic_xxx()` wrappers | +| `rte_smp_*mb()` barriers | `rte_atomic_thread_fence()` | See Atomics section | +| `pthread_*()` in portable code | `rte_thread_*()` | See Threading section | + +When reviewing patches that add new code, flag any usage of deprecated APIs +as requiring change to use the modern replacement. + +--- + +## API Tag Requirements + +### `__rte_experimental` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_experimental +int +rte_new_feature(void); + +/* Wrong - not alone on line */ +__rte_experimental int rte_new_feature(void); + +/* Wrong - in .c file */ +``` + +### `__rte_internal` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_internal +int +internal_function(void); +``` + +### Alignment Attributes + +`__rte_aligned`, `__rte_cache_aligned`, `__rte_cache_min_aligned` may only be used with `struct` or `union` types: + +```c +/* Correct */ +struct __rte_cache_aligned my_struct { + /* ... */ +}; + +/* Wrong */ +int __rte_cache_aligned my_variable; +``` + +### Packed Attributes + +- `__rte_packed_begin` must follow `struct`, `union`, or alignment attributes +- `__rte_packed_begin` and `__rte_packed_end` must be used in pairs +- Cannot use `__rte_packed_begin` with `enum` + +```c +/* Correct */ +struct __rte_packed_begin my_packed_struct { + /* ... */ +} __rte_packed_end; + +/* Wrong - with enum */ +enum __rte_packed_begin my_enum { + /* ... */ +}; +``` + +--- + +## Code Quality Requirements + +### Compilation + +- Each commit must compile independently (for `git bisect`) +- No forward dependencies within a patchset +- Test with multiple targets, compilers, and options +- Use `devtools/test-meson-builds.sh` + +**Note for AI reviewers**: You cannot verify compilation order or cross-patch dependencies from patch review alone. Do NOT flag patches claiming they "would fail to compile" based on symbols used in other patches in the series. Assume the patch author has ordered them correctly. + +### Testing + +- Add tests to `app/test` unit test framework +- New API functions must be used in `/app` test directory +- New device APIs require at least one driver implementation + +#### Functional Test Infrastructure + +Standalone functional tests should use the `TEST_ASSERT` macros and `unit_test_suite_runner` infrastructure for consistency and proper integration with the DPDK test framework. + +```c +#include <rte_test.h> + +static int +test_feature_basic(void) +{ + int ret; + + ret = rte_feature_init(); + TEST_ASSERT_SUCCESS(ret, "Failed to initialize feature"); + + ret = rte_feature_operation(); + TEST_ASSERT_EQUAL(ret, 0, "Operation returned unexpected value"); + + TEST_ASSERT_NOT_NULL(rte_feature_get_ptr(), + "Feature pointer should not be NULL"); + + return TEST_SUCCESS; +} + +static struct unit_test_suite feature_testsuite = { + .suite_name = "feature_autotest", + .setup = test_feature_setup, + .teardown = test_feature_teardown, + .unit_test_cases = { + TEST_CASE(test_feature_basic), + TEST_CASE(test_feature_advanced), + TEST_CASES_END() + } +}; + +static int +test_feature(void) +{ + return unit_test_suite_runner(&feature_testsuite); +} + +REGISTER_FAST_TEST(feature_autotest, NOHUGE_OK, ASAN_OK, test_feature); +``` + +The `REGISTER_FAST_TEST` macro parameters are: +- Test name (e.g., `feature_autotest`) +- `NOHUGE_OK` or `HUGEPAGES_REQUIRED` - whether test can run without hugepages +- `ASAN_OK` or `ASAN_FAILS` - whether test is compatible with Address Sanitizer +- Test function name + +Common `TEST_ASSERT` macros: +- `TEST_ASSERT(cond, msg, ...)` - Assert condition is true +- `TEST_ASSERT_SUCCESS(val, msg, ...)` - Assert value equals 0 +- `TEST_ASSERT_FAIL(val, msg, ...)` - Assert value is non-zero +- `TEST_ASSERT_EQUAL(a, b, msg, ...)` - Assert two values are equal +- `TEST_ASSERT_NOT_EQUAL(a, b, msg, ...)` - Assert two values differ +- `TEST_ASSERT_NULL(val, msg, ...)` - Assert value is NULL +- `TEST_ASSERT_NOT_NULL(val, msg, ...)` - Assert value is not NULL + +### Documentation + +- Add Doxygen comments for public APIs +- Update release notes in `doc/guides/rel_notes/` for important changes +- Code and documentation must be updated atomically in same patch +- Only update the **current release** notes file +- Documentation must match the code +- PMD features must match the features matrix in `doc/guides/nics/features/` +- Documentation must match device operations (see `doc/guides/nics/features.rst` for the mapping between features, `eth_dev_ops`, and related APIs) +- Release notes are NOT required for: + - Test-only changes (unit tests, functional tests) + - Internal APIs and helper functions (not exported to applications) + - Internal implementation changes that don't affect public API + +### API and Driver Changes + +- New APIs must be marked as `__rte_experimental` +- New APIs must have hooks in `app/testpmd` and tests in the functional test suite +- Changes to existing APIs require release notes +- New drivers or subsystems must have release notes +- Internal APIs (used only within DPDK, not exported to applications) do NOT require release notes + +### ABI Compatibility and Symbol Exports + +**IMPORTANT**: DPDK uses automatic symbol map generation. Do **NOT** recommend +manually editing `version.map` files - they are auto-generated from source code +annotations. + +#### Symbol Export Macros + +New public functions must be annotated with export macros (defined in +`rte_export.h`). Place the macro on the line immediately before the function +definition in the `.c` file: + +```c +/* For stable ABI symbols */ +RTE_EXPORT_SYMBOL(rte_foo_create) +int +rte_foo_create(struct rte_foo_config *config) +{ + /* ... */ +} + +/* For experimental symbols (include version when first added) */ +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_foo_new_feature, 25.03) +__rte_experimental +int +rte_foo_new_feature(void) +{ + /* ... */ +} + +/* For internal symbols (shared between DPDK components only) */ +RTE_EXPORT_INTERNAL_SYMBOL(rte_foo_internal_helper) +int +rte_foo_internal_helper(void) +{ + /* ... */ +} +``` + +#### Symbol Export Rules + +- `RTE_EXPORT_SYMBOL` - Use for stable ABI functions +- `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, ver)` - Use for new experimental APIs + (version is the DPDK release, e.g., `25.03`) +- `RTE_EXPORT_INTERNAL_SYMBOL` - Use for functions shared between DPDK libs/drivers + but not part of public API +- Export macros go in `.c` files, not headers +- The build system generates linker version maps automatically + +#### What NOT to Review + +- Do **NOT** flag missing `version.map` updates - maps are auto-generated +- Do **NOT** suggest adding symbols to `lib/*/version.map` files + +#### ABI Versioning for Changed Functions + +When changing the signature of an existing stable function, use versioning macros +from `rte_function_versioning.h`: + +- `RTE_VERSION_SYMBOL` - Create versioned symbol for backward compatibility +- `RTE_DEFAULT_SYMBOL` - Mark the new default version + +Follow ABI policy and versioning guidelines in the contributor documentation. +Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable. + +--- + +## Patch Validation Checklist + +### Commit Message + +- [ ] Subject line ≤60 characters +- [ ] Subject is lowercase (except acronyms from words-case.txt) +- [ ] Correct component prefix (e.g., `net/ixgbe:` not `ixgbe:`) +- [ ] No `lib/` prefix for libraries +- [ ] Imperative mood, no trailing period +- [ ] No tabs, leading/trailing spaces, or punctuation marks +- [ ] Body wrapped at 75 characters +- [ ] Body does not start with "It" +- [ ] `Signed-off-by:` present with real name and valid email +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA and exact subject +- [ ] `Coverity issue:` tag present if Coverity mentioned +- [ ] `Bugzilla ID:` tag present if Bugzilla mentioned +- [ ] `Cc: stable@dpdk.org` for stable backport candidates +- [ ] Tags in correct order with blank line separator + +### License + +- [ ] SPDX identifier on first line (or second for scripts) +- [ ] Copyright line follows SPDX +- [ ] Blank line after copyright before code +- [ ] Appropriate license for file type + +### Code Style + +- [ ] Lines ≤100 characters +- [ ] Hard tabs for indentation, spaces for alignment +- [ ] No trailing whitespace +- [ ] Proper include order +- [ ] Header guards present +- [ ] `rte_`/`RTE_` prefix on external symbols +- [ ] Driver/library global variables use unique prefixes (e.g., `virtio_`, `mlx5_`) +- [ ] No prohibited terminology +- [ ] Proper brace style +- [ ] Function return type on own line +- [ ] Explicit comparisons: `== NULL`, `== 0`, `!= NULL`, `!= 0` +- [ ] No forbidden tokens (see table above) +- [ ] No unnecessary code patterns (see section above) +- [ ] No usage of deprecated APIs, macros, or functions +- [ ] Process-shared primitives in shared memory use `PTHREAD_PROCESS_SHARED` + +### API Tags + +- [ ] `__rte_experimental` alone on line, only in headers +- [ ] `__rte_internal` alone on line, only in headers +- [ ] Alignment attributes only on struct/union +- [ ] Packed attributes properly paired +- [ ] New public functions have `RTE_EXPORT_*` macro in `.c` file +- [ ] Experimental functions use `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, version)` + +### Structure + +- [ ] Each commit compiles independently +- [ ] Code and docs updated together +- [ ] Documentation matches code behavior +- [ ] PMD features match `doc/guides/nics/features/` matrix +- [ ] Device operations match documentation (per `features.rst` mappings) +- [ ] Tests added/updated as needed +- [ ] Functional tests use TEST_ASSERT macros and unit_test_suite_runner +- [ ] New APIs marked as `__rte_experimental` +- [ ] New APIs have testpmd hooks and functional tests +- [ ] Current release notes updated for significant changes +- [ ] Release notes updated for API changes +- [ ] Release notes updated for new drivers or subsystems + +--- + +## Meson Build Files + +### Style Requirements + +- 4-space indentation (no tabs) +- Line continuations double-indented +- Lists alphabetically ordered +- Short lists (≤3 items): single line, no trailing comma +- Long lists: one item per line, trailing comma on last item +- No strict line length limit for meson files; lines under 100 characters are acceptable + +```python +# Short list +sources = files('file1.c', 'file2.c') + +# Long list +headers = files( + 'header1.h', + 'header2.h', + 'header3.h', +) +``` + +--- + +## Python Code + +- Must comply with formatting standards +- Use **`black`** for code formatting validation +- Line length acceptable up to 100 characters + +--- + +## Validation Tools + +Run these before submitting: + +```bash +# Check commit messages +devtools/check-git-log.sh -n1 + +# Check patch format and forbidden tokens +devtools/checkpatches.sh -n1 + +# Check maintainers coverage +devtools/check-maintainers.sh + +# Build validation +devtools/test-meson-builds.sh + +# Find maintainers for your patch +devtools/get-maintainer.sh <patch-file> +``` + +--- + +## Severity Levels for AI Review + +**Error** (must fix): +- Missing or malformed SPDX license +- Missing Signed-off-by +- Subject line over 60 characters +- Body lines over 75 characters +- Wrong tag order or format +- Missing required tags (Fixes, Coverity issue, Bugzilla ID) +- Forbidden tokens in code +- `__rte_experimental`/`__rte_internal` in .c files or not alone on line +- Compilation failures +- ABI breaks without proper versioning +- pthread mutex/cond/rwlock in shared memory without `PTHREAD_PROCESS_SHARED` + +**Warning** (should fix): +- Subject line style issues (case, punctuation) +- Wrong component prefix +- Missing Cc: stable@dpdk.org for fixes +- Documentation gaps +- Documentation does not match code behavior +- PMD features missing from `doc/guides/nics/features/` matrix +- Device operations not documented per `features.rst` mappings +- Missing tests +- Functional tests not using TEST_ASSERT macros or unit_test_suite_runner +- New API not marked as `__rte_experimental` +- New API without testpmd hooks or functional tests +- New public function missing `RTE_EXPORT_*` macro +- API changes without release notes +- New drivers or subsystems without release notes +- Implicit comparisons (`!ptr` instead of `ptr == NULL`) +- Unnecessary variable initialization +- Unnecessary casts of `void *` +- Unnecessary NULL checks before free +- Inappropriate use of `rte_malloc()` or `rte_memcpy()` +- Use of `perror()`, `printf()`, `fprintf()` in libraries or drivers (allowed in examples and test code) +- Driver/library global variables without unique prefixes (static linking clash risk) +- Usage of deprecated APIs, macros, or functions in new code + +**Do NOT flag** (common false positives): +- Missing `version.map` updates (maps are auto-generated from `RTE_EXPORT_*` macros) +- Suggesting manual edits to any `version.map` file +- Copyright years within valid range (2013 to current year) +- Copyright held by individuals (never speculate about employers, company policies, or who "should" hold copyright) +- SPDX/copyright format that matches the standard DPDK format (do not call it "different" or "unusual") +- Meson file lines under 100 characters +- Case-sensitive term violations in code comments (words-case.txt applies to commit messages only) +- Comparisons using `== 0`, `!= 0`, `== NULL`, `!= NULL` as "implicit" (these ARE explicit) +- Comparisons wrapped in `likely()` or `unlikely()` macros - these are still explicit if using == or != +- Anything you determine is correct (do not mention non-issues or say "No issue here") +- `REGISTER_FAST_TEST` using `NOHUGE_OK`/`ASAN_OK` macros (this is the correct current format) +- Tag ordering: Signed-off-by directly after commit body (when no Fixes/Cc tags) is CORRECT +- Tag ordering: no blank line between Fixes/Cc, blank line before Signed-off-by (when Fixes/Cc present) is CORRECT +- Missing release notes for test-only changes (unit tests do not require release notes) +- Missing release notes for internal APIs or helper functions (only public APIs need release notes) +- Subject lines that are within the 60 character limit (count first, do not guess) +- Any item you later correct with "(Correction: ...)" or "actually acceptable" - just omit it +- Vague concerns ("should be verified", "should be checked") - if you're not sure it's wrong, don't flag it +- Items where you say "which is correct" or "this is correct" - if it's correct, don't mention it at all +- Fixes tags with correct format (12-char SHA, quoted subject) - NEVER ask for verification that commit exists +- Items where you conclude "no issue here" or "this is actually correct" - omit these entirely +- Cross-patch compilation dependencies - you cannot determine patch ordering correctness from review +- Claims that a symbol "was removed in patch N" causing issues in patch M - assume author ordered correctly +- Any speculation about whether patches will compile when applied in sequence +- Mutexes/locks in process-private memory (standard `malloc`, stack, static non-shared) - these don't need `PTHREAD_PROCESS_SHARED` +- Use of `rte_spinlock_t` or `rte_rwlock_t` in shared memory (these work correctly without special init) + +**Info** (consider): +- Minor style preferences +- Optimization suggestions +- Alternative approaches + +--- + +# Response Format + +When you identify an issue: +1. **State the problem** (1 sentence) +2. **Why it matters** (1 sentence, only if not obvious) +3. **Suggested fix** (code snippet or specific action) + +Example: +This could panic if the string is NULL. + +--- + +## FINAL CHECK BEFORE SUBMITTING REVIEW + +Before outputting your review, scan through each item in your Errors and Warnings sections. +For EACH item, ask: "Did I conclude this is actually fine/correct/acceptable/no issue?" + +If YES → DELETE THAT ITEM. It should not be in your output. + +An item that says "X is wrong... actually this is correct" is a FALSE POSITIVE and must be removed. + +**If your Errors section would be empty after this check, that's fine - it means the patches are good.** -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v10 2/4] devtools: add multi-provider AI patch review script 2026-01-30 17:40 ` [PATCH v10 0/4] devtools: add AI-assisted code review infrastructure Stephen Hemminger 2026-01-30 17:41 ` [PATCH v10 1/4] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger @ 2026-01-30 17:41 ` Stephen Hemminger 2026-01-30 17:41 ` [PATCH v10 3/4] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger 2026-01-30 17:41 ` [PATCH v10 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-30 17:41 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This is an AI generated script to review DPDK patches against the AGENTS.md coding guidelines using AI language models. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini The script reads a patch file and the AGENTS.md guidelines, then submits them to the selected AI provider for review. Results are organized by severity level (Error, Warning, Info) as defined in the guidelines. Features: - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Verbose mode shows token usage statistics - Uses temporary files for API requests to handle large patches - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/analyze-patch.py 0001-net-ixgbe-fix-something.patch ./devtools/analyze-patch.py -p xai my-patch.patch ./devtools/analyze-patch.py -l # list providers Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/analyze-patch.py | 1323 +++++++++++++++++++++++++++++++++++++ 1 file changed, 1323 insertions(+) create mode 100755 devtools/analyze-patch.py diff --git a/devtools/analyze-patch.py b/devtools/analyze-patch.py new file mode 100755 index 0000000000..a2292b9516 --- /dev/null +++ b/devtools/analyze-patch.py @@ -0,0 +1,1323 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Analyze DPDK patches using AI providers. + +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import json +import os +import re +import subprocess +import sys +import tempfile +from datetime import date +from email.message import EmailMessage +from pathlib import Path +from typing import Iterator +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Large file handling modes +LARGE_FILE_MODES = ["error", "truncate", "chunk", "commits-only", "summary"] + +# Approximate tokens per character (conservative estimate for code) +CHARS_PER_TOKEN = 3.5 + +# Default token limits by provider (leaving room for system prompt and response) +PROVIDER_INPUT_LIMITS = { + "anthropic": 180000, # 200K context, reserve for system/response + "openai": 115000, # 128K context for gpt-4o + "xai": 115000, # Assume similar to OpenAI + "google": 900000, # Gemini has 1M context +} + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# Known LTS releases +LTS_RELEASES = {"23.11", "22.11", "21.11", "20.11", "19.11"} + +SYSTEM_PROMPT_BASE = """\ +You are an expert DPDK code reviewer. Analyze patches for compliance with \ +DPDK coding standards and contribution guidelines. Provide clear, actionable \ +feedback organized by severity (Error, Warning, Info) as defined in the \ +guidelines.""" + +LTS_RULES = """ +LTS (Long Term Stable) branch rules apply: +- Only bug fixes allowed, no new features +- No new APIs (experimental or stable) +- ABI must remain unchanged +- Backported fixes should reference the original commit with Fixes: tag +- Copyright years should reflect when the code was originally written +- Be conservative: reject changes that aren't clearly bug fixes""" + +FORMAT_INSTRUCTIONS = { + "text": """Provide your review in plain text format.""", + "markdown": """Provide your review in Markdown format with: +- Headers (##) for each severity level (Errors, Warnings, Info) +- Bullet points for individual issues +- Code blocks (```) for code references +- Bold (**) for emphasis on key points""", + "html": """Provide your review in HTML format with: +- <h2> tags for each severity level (Errors, Warnings, Info) +- <ul>/<li> for individual issues +- <pre><code> for code references +- <strong> for emphasis on key points +- Use appropriate semantic HTML tags +- Do NOT include <html>, <head>, or <body> tags - just the content""", + "json": """Provide your review in JSON format with this structure: +{ + "summary": "Brief one-line summary of the review", + "errors": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "warnings": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "info": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "passed_checks": ["list of checks that passed"], + "overall_status": "PASS|WARN|FAIL" +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """Please review the following DPDK patch file '{patch_name}' \ +against the AGENTS.md guidelines. Check for: + +1. Commit message format (subject line, body, tags) +2. License/copyright compliance +3. C coding style issues +4. API and documentation requirements +5. Any other guideline violations + +{format_instruction} + +--- PATCH CONTENT --- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def is_lts_release(release): + """Check if a release is an LTS release.""" + if not release: + return False + # Check for explicit -lts suffix or known LTS versions + if "-lts" in release.lower(): + return True + # Extract version number (e.g., "23.11" from "23.11.1") + version = release.split("-")[0] + base_version = ".".join(version.split(".")[:2]) + return base_version in LTS_RELEASES + + +def estimate_tokens(text): + """Estimate token count from text length.""" + return int(len(text) / CHARS_PER_TOKEN) + + +def split_mbox_patches(content): + """Split an mbox file into individual patches.""" + patches = [] + current_patch = [] + in_patch = False + + for line in content.split("\n"): + # Detect start of new message in mbox format + if line.startswith("From ") and ( + " Mon " in line + or " Tue " in line + or " Wed " in line + or " Thu " in line + or " Fri " in line + or " Sat " in line + or " Sun " in line + ): + if current_patch: + patches.append("\n".join(current_patch)) + current_patch = [line] + in_patch = True + elif in_patch: + current_patch.append(line) + + # Don't forget the last patch + if current_patch: + patches.append("\n".join(current_patch)) + + return patches if patches else [content] + + +def extract_commit_messages(content): + """Extract only commit messages from patch content.""" + patches = split_mbox_patches(content) + messages = [] + + for patch in patches: + lines = patch.split("\n") + msg_lines = [] + in_headers = True + in_body = False + found_subject = False + + for line in lines: + # Collect headers we care about + if in_headers: + if line.startswith("Subject:"): + msg_lines.append(line) + found_subject = True + elif line.startswith(("From:", "Date:")): + msg_lines.append(line) + elif line.startswith((" ", "\t")) and found_subject: + # Subject continuation + msg_lines.append(line) + elif line == "": + if found_subject: + in_headers = False + in_body = True + msg_lines.append("") + elif in_body: + # Stop at the diff + if line.startswith("---") and not line.startswith("----"): + break + if line.startswith("diff --git"): + break + msg_lines.append(line) + + if msg_lines: + messages.append("\n".join(msg_lines)) + + return "\n\n---\n\n".join(messages) + + +def truncate_content(content, max_tokens, provider): + """Truncate content to fit within token limit.""" + max_chars = int(max_tokens * CHARS_PER_TOKEN) + + if len(content) <= max_chars: + return content, False + + # Try to truncate at a reasonable boundary + truncated = content[:max_chars] + + # Find last complete diff hunk or patch boundary + last_diff = truncated.rfind("\ndiff --git") + last_patch = truncated.rfind("\nFrom ") + + if last_diff > max_chars * 0.5: + truncated = truncated[:last_diff] + elif last_patch > max_chars * 0.5: + truncated = truncated[:last_patch] + + truncated += "\n\n[... Content truncated due to size limits ...]\n" + return truncated, True + + +def chunk_content(content, max_tokens, provider) -> Iterator[tuple[str, int, int]]: + """Split content into chunks that fit within token limit. + + Yields tuples of (chunk_content, chunk_number, total_chunks). + """ + patches = split_mbox_patches(content) + + if len(patches) == 1: + # Single large patch - split by diff sections + yield from chunk_single_patch(content, max_tokens) + return + + # Multiple patches - group them to fit within limits + chunks = [] + current_chunk = [] + current_size = 0 + max_chars = int(max_tokens * CHARS_PER_TOKEN * 0.9) # 90% to leave margin + + for patch in patches: + patch_size = len(patch) + if current_size + patch_size > max_chars and current_chunk: + chunks.append("\n".join(current_chunk)) + current_chunk = [] + current_size = 0 + + if patch_size > max_chars: + # Single patch too large, truncate it + if current_chunk: + chunks.append("\n".join(current_chunk)) + current_chunk = [] + current_size = 0 + truncated, _ = truncate_content(patch, max_tokens * 0.9, provider) + chunks.append(truncated) + else: + current_chunk.append(patch) + current_size += patch_size + + if current_chunk: + chunks.append("\n".join(current_chunk)) + + total = len(chunks) + for i, chunk in enumerate(chunks, 1): + yield chunk, i, total + + +def chunk_single_patch(content, max_tokens) -> Iterator[tuple[str, int, int]]: + """Split a single large patch by diff sections.""" + max_chars = int(max_tokens * CHARS_PER_TOKEN * 0.9) + + # Extract header (everything before first diff) + first_diff = content.find("\ndiff --git") + if first_diff == -1: + # No diff sections, just truncate + truncated, _ = truncate_content(content, max_tokens * 0.9, "anthropic") + yield truncated, 1, 1 + return + + header = content[: first_diff + 1] + diff_content = content[first_diff + 1 :] + + # Split by diff sections + diffs = [] + current_diff = [] + for line in diff_content.split("\n"): + if line.startswith("diff --git") and current_diff: + diffs.append("\n".join(current_diff)) + current_diff = [] + current_diff.append(line) + if current_diff: + diffs.append("\n".join(current_diff)) + + # Group diffs into chunks + chunks = [] + current_chunk_diffs = [] + current_size = len(header) + + for diff in diffs: + diff_size = len(diff) + if current_size + diff_size > max_chars and current_chunk_diffs: + chunks.append(header + "\n".join(current_chunk_diffs)) + current_chunk_diffs = [] + current_size = len(header) + + if diff_size + len(header) > max_chars: + # Single diff too large + if current_chunk_diffs: + chunks.append(header + "\n".join(current_chunk_diffs)) + current_chunk_diffs = [] + truncated_diff = diff[: max_chars - len(header) - 100] + truncated_diff += "\n[... diff truncated ...]\n" + chunks.append(header + truncated_diff) + current_size = len(header) + else: + current_chunk_diffs.append(diff) + current_size += diff_size + + if current_chunk_diffs: + chunks.append(header + "\n".join(current_chunk_diffs)) + + total = len(chunks) + for i, chunk in enumerate(chunks, 1): + yield chunk, i, total + + +def get_summary_prompt(): + """Get prompt modifications for summary mode.""" + return """ +NOTE: This is a LARGE patch series. Provide a HIGH-LEVEL summary review only: +- Focus on overall architecture and design concerns +- Check commit message formatting across the series +- Identify any obvious policy violations +- Do NOT attempt detailed line-by-line code review +- Summarize the scope and purpose of the changes +""" + + +def format_combined_reviews(reviews, output_format, patch_name): + """Combine multiple chunk/patch reviews into a single output.""" + if output_format == "json": + combined = { + "patch_file": patch_name, + "sections": [ + {"label": label, "review": review} for label, review in reviews + ], + } + return json.dumps(combined, indent=2) + elif output_format == "html": + sections = [] + for label, review in reviews: + sections.append(f"<h2>{label}</h2>\n{review}") + return "\n<hr>\n".join(sections) + elif output_format == "markdown": + sections = [] + for label, review in reviews: + sections.append(f"## {label}\n\n{review}") + return "\n\n---\n\n".join(sections) + else: # text + sections = [] + for label, review in reviews: + sections.append(f"=== {label} ===\n\n{review}") + return "\n\n" + "=" * 60 + "\n\n".join(sections) + + +def build_system_prompt(review_date, release): + """Build system prompt with date and release context.""" + prompt = SYSTEM_PROMPT_BASE + prompt += f"\n\nCurrent date: {review_date}." + + if release: + prompt += f"\nTarget DPDK release: {release}." + if is_lts_release(release): + prompt += LTS_RULES + else: + prompt += "\nThis is a main branch or standard release." + prompt += "\nNew features and experimental APIs are allowed." + + return prompt + + +def build_anthropic_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": system_prompt}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + patch_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": system_prompt}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + patch_content, + }, + ], + } + + +def build_google_request( + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": system_prompt}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + patch_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: {usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: {usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def get_last_message_id(patch_content): + """Extract Message-ID from the last patch in an mbox.""" + msg_ids = re.findall( + r"^Message-I[Dd]:\s*(.+)$", patch_content, re.MULTILINE | re.IGNORECASE + ) + if msg_ids: + msg_id = msg_ids[-1].strip() + # Normalize: remove < > and add them back + msg_id = msg_id.strip("<>") + return f"<{msg_id}>" + return None + + +def get_last_subject(patch_content): + """Extract subject from the last patch in an mbox.""" + # Find all Subject lines with potential continuations + subjects = [] + lines = patch_content.split("\n") + i = 0 + while i < len(lines): + if lines[i].lower().startswith("subject:"): + subject = lines[i][8:].strip() + i += 1 + # Handle continuation lines + while i < len(lines) and lines[i].startswith((" ", "\t")): + subject += lines[i].strip() + i += 1 + subjects.append(subject) + else: + i += 1 + return subjects[-1] if subjects else None + + +def send_email( + to_addrs, cc_addrs, from_addr, subject, in_reply_to, body, dry_run=False +): + """Send review email using git send-email, sendmail, or msmtp.""" + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + email_text = msg.as_string() + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(email_text, file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Write to temp file for git send-email + with tempfile.NamedTemporaryFile(mode="w", suffix=".eml", delete=False) as f: + f.write(email_text) + temp_file = f.name + + try: + # Try git send-email first + if get_git_config("sendemail.smtpserver"): + # Build command with all arguments + flat_cmd = ["git", "send-email", "--confirm=never", "--quiet"] + for addr in to_addrs: + flat_cmd.extend(["--to", addr]) + for addr in cc_addrs: + flat_cmd.extend(["--cc", addr]) + if from_addr: + flat_cmd.extend(["--from", from_addr]) + if in_reply_to: + flat_cmd.extend(["--in-reply-to", in_reply_to]) + flat_cmd.append(temp_file) + + try: + subprocess.run(flat_cmd, check=True, capture_output=True) + print("Email sent via git send-email", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + # Try sendmail + try: + subprocess.run( + ["sendmail", "-t"], + input=email_text, + text=True, + capture_output=True, + check=True, + ) + print("Email sent via sendmail", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + # Try msmtp + try: + subprocess.run( + ["msmtp", "-t"], + input=email_text, + text=True, + capture_output=True, + check=True, + ) + print("Email sent via msmtp", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + error("Could not send email. Configure git send-email, sendmail, or msmtp.") + + finally: + os.unlink(temp_file) + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Analyze DPDK patches using AI providers", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s patch.patch # Review with default settings + %(prog)s -p openai my-patch.patch # Use OpenAI ChatGPT + %(prog)s -f markdown patch.patch # Output as Markdown + %(prog)s -f json -o review.json patch.patch # Save JSON to file + %(prog)s -f html -o review.html patch.patch # Save HTML to file + %(prog)s -r 24.11 patch.patch # Review for specific release + %(prog)s -r 23.11-lts patch.patch # Review for LTS branch + %(prog)s --send-email --to dev@dpdk.org series.mbox + %(prog)s --send-email --to dev@dpdk.org --dry-run series.mbox + +Large File Handling: + %(prog)s --split-patches series.mbox # Review each patch separately + %(prog)s --split-patches --patch-range 1-5 series.mbox # Review patches 1-5 + %(prog)s --large-file=truncate patch.mbox # Truncate to fit limit + %(prog)s --large-file=commits-only series.mbox # Review commit messages only + %(prog)s --large-file=summary series.mbox # High-level summary only + %(prog)s --large-file=chunk series.mbox # Split and review in chunks + +Large File Modes: + error - Fail with error (default) + truncate - Truncate content to fit token limit + chunk - Split into chunks and review each + commits-only - Extract and review only commit messages + summary - Request high-level summary review + +LTS Releases: + Use -r/--release with LTS version (e.g., 23.11-lts, 22.11) to enable + stricter review rules: bug fixes only, no new features or APIs. + """, + ) + + parser.add_argument("patch_file", nargs="?", help="Patch file to analyze") + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=4096, + help="Max tokens for response (default: 4096)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-o", + "--output", + metavar="FILE", + help="Write output to file instead of stdout", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Date and release options + parser.add_argument( + "-D", + "--date", + metavar="YYYY-MM-DD", + help="Review date context (default: today)", + ) + parser.add_argument( + "-r", + "--release", + metavar="VERSION", + help="Target DPDK release (e.g., 24.11, 23.11-lts)", + ) + + # Large file handling options + large_group = parser.add_argument_group("Large File Handling") + large_group.add_argument( + "--large-file", + choices=LARGE_FILE_MODES, + default="error", + metavar="MODE", + help="How to handle large files: error (default), truncate, " + "chunk, commits-only, summary", + ) + large_group.add_argument( + "--max-tokens", + type=int, + metavar="N", + help="Max input tokens (default: provider-specific)", + ) + large_group.add_argument( + "--split-patches", + action="store_true", + help="Split mbox into individual patches and review each separately", + ) + large_group.add_argument( + "--patch-range", + metavar="N-M", + help="Review only patches N through M (1-indexed, use with --split-patches)", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Check patch file is provided + if not args.patch_file: + parser.error("patch_file is required") + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + patch_path = Path(args.patch_file) + if not patch_path.exists(): + error(f"Patch file not found: {args.patch_file}") + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Determine review date + review_date = args.date or date.today().isoformat() + + # Build system prompt with date and release context + system_prompt = build_system_prompt(review_date, args.release) + + # Read files + agents_content = agents_path.read_text() + patch_content = patch_path.read_text() + patch_name = patch_path.name + + # Determine max tokens for this provider + max_input_tokens = args.max_tokens or PROVIDER_INPUT_LIMITS.get( + args.provider, 100000 + ) + + # Estimate token count + estimated_tokens = estimate_tokens(patch_content + agents_content) + + # Parse patch range if specified + patch_start, patch_end = None, None + if args.patch_range: + try: + if "-" in args.patch_range: + start, end = args.patch_range.split("-", 1) + patch_start = int(start) + patch_end = int(end) + else: + patch_start = patch_end = int(args.patch_range) + except ValueError: + error(f"Invalid --patch-range format: {args.patch_range}") + + # Handle --split-patches mode + if args.split_patches: + patches = split_mbox_patches(patch_content) + total_patches = len(patches) + + if total_patches == 1: + print( + f"Note: Only 1 patch found in mbox, --split-patches has no effect", + file=sys.stderr, + ) + else: + print( + f"Found {total_patches} patches in mbox", + file=sys.stderr, + ) + + # Apply patch range filter + if patch_start is not None: + if patch_start < 1 or patch_start > total_patches: + error( + f"Patch range start {patch_start} out of range (1-{total_patches})" + ) + if patch_end < patch_start or patch_end > total_patches: + error( + f"Patch range end {patch_end} out of range ({patch_start}-{total_patches})" + ) + patches = patches[patch_start - 1 : patch_end] + print( + f"Reviewing patches {patch_start}-{patch_end} ({len(patches)} patches)", + file=sys.stderr, + ) + + # Review each patch separately + all_reviews = [] + for i, patch in enumerate(patches, patch_start or 1): + patch_label = f"Patch {i}/{total_patches}" + print(f"\nReviewing {patch_label}...", file=sys.stderr) + + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + patch, + f"{patch_name} ({patch_label})", + args.output_format, + args.verbose, + ) + all_reviews.append((patch_label, review_text)) + + # Combine reviews + review_text = format_combined_reviews( + all_reviews, args.output_format, patch_name + ) + + # Skip the normal API call + estimated_tokens = 0 # Bypass size check since we've already processed + + # Check if content is too large + is_large = estimated_tokens > max_input_tokens + + if is_large: + print( + f"Warning: Estimated {estimated_tokens:,} tokens exceeds limit of " + f"{max_input_tokens:,}", + file=sys.stderr, + ) + + if args.large_file == "error": + error( + f"Patch file too large ({estimated_tokens:,} tokens). " + f"Use --large-file=truncate|chunk|commits-only|summary to handle, " + f"or --split-patches to review patches individually." + ) + elif args.large_file == "truncate": + print("Truncating content to fit token limit...", file=sys.stderr) + patch_content, was_truncated = truncate_content( + patch_content, max_input_tokens, args.provider + ) + if was_truncated: + print("Content was truncated.", file=sys.stderr) + elif args.large_file == "commits-only": + print("Extracting commit messages only...", file=sys.stderr) + patch_content = extract_commit_messages(patch_content) + new_estimate = estimate_tokens(patch_content + agents_content) + print( + f"Reduced to ~{new_estimate:,} tokens (commit messages only)", + file=sys.stderr, + ) + if new_estimate > max_input_tokens: + patch_content, _ = truncate_content( + patch_content, max_input_tokens, args.provider + ) + elif args.large_file == "summary": + print("Using summary mode for large patch...", file=sys.stderr) + system_prompt += get_summary_prompt() + patch_content, _ = truncate_content( + patch_content, max_input_tokens, args.provider + ) + elif args.large_file == "chunk": + print("Processing in chunks...", file=sys.stderr) + all_reviews = [] + for chunk, chunk_num, total_chunks in chunk_content( + patch_content, max_input_tokens, args.provider + ): + chunk_label = f"Chunk {chunk_num}/{total_chunks}" + print(f"Reviewing {chunk_label}...", file=sys.stderr) + + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + chunk, + f"{patch_name} ({chunk_label})", + args.output_format, + args.verbose, + ) + all_reviews.append((chunk_label, review_text)) + + # Combine chunk reviews + review_text = format_combined_reviews( + all_reviews, args.output_format, patch_name + ) + + # Skip the normal single API call below + estimated_tokens = 0 + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Review date: {review_date}", file=sys.stderr) + if args.release: + lts_status = " (LTS)" if is_lts_release(args.release) else "" + print(f"Target release: {args.release}{lts_status}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Patch file: {args.patch_file}", file=sys.stderr) + print(f"Estimated tokens: {estimated_tokens:,}", file=sys.stderr) + print(f"Max input tokens: {max_input_tokens:,}", file=sys.stderr) + if args.large_file != "error": + print(f"Large file mode: {args.large_file}", file=sys.stderr) + if args.split_patches: + print("Split patches: yes", file=sys.stderr) + if args.output: + print(f"Output file: {args.output}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API (unless already processed via chunks/split) + if estimated_tokens > 0: # Not already processed + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + args.output_format, + args.verbose, + ) + + if not review_text: + error(f"No response received from {args.provider}") + + # Format output based on requested format + provider_name = config["name"] + + if args.output_format == "json": + # For JSON, try to parse and add metadata + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + # If AI didn't return valid JSON, wrap the text + review_data = {"raw_review": review_text} + + output_data = { + "metadata": { + "patch_file": patch_name, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "review_date": review_date, + "target_release": args.release, + "is_lts": is_lts_release(args.release) if args.release else False, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "html": + # Wrap HTML content with header + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"<br>Target release: {args.release}{lts_badge}" + output_text = f"""<!-- AI-generated review of {patch_name} --> +<!-- Reviewed using {provider_name} ({model}) on {review_date} --> +<div class="patch-review"> +<h1>Patch Review: {patch_name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model}) on {review_date}{release_info}</p> +{review_text} +</div> +""" + elif args.output_format == "markdown": + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"\n*Target release: {args.release}{lts_badge}*\n" + output_text = f"""# Patch Review: {patch_name} + +*Reviewed by {provider_name} ({model}) on {review_date}* +{release_info} +{review_text} +""" + else: # text + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"Target release: {args.release}{lts_badge}\n" + output_text = f"=== Patch Review: {patch_name} (via {provider_name}) ===\n" + output_text += f"Review date: {review_date}\n" + output_text += release_info + output_text += "\n" + review_text + + # Write output + if args.output: + Path(args.output).write_text(output_text) + print(f"Review written to: {args.output}", file=sys.stderr) + else: + print(output_text) + + # Send email if requested + if args.send_email: + # Email always uses plain text - warn if different format requested + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + in_reply_to = get_last_message_id(patch_content) + orig_subject = get_last_subject(patch_content) + + if orig_subject: + # Remove [PATCH n/m] prefix + review_subject = re.sub(r"^\[PATCH[^\]]*\]\s*", "", orig_subject) + review_subject = f"[REVIEW] {review_subject}" + else: + review_subject = f"[REVIEW] {patch_name}" + + # Build email body - always use plain text version + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"Target release: {args.release}{lts_badge}\n" + + email_body = f"""AI-generated review of {patch_name} +Reviewed using {provider_name} ({model}) on {review_date} +{release_info} +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print(f"In-Reply-To: {in_reply_to}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + in_reply_to, + email_body, + args.dry_run, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v10 3/4] devtools: add compare-reviews.sh for multi-provider analysis 2026-01-30 17:40 ` [PATCH v10 0/4] devtools: add AI-assisted code review infrastructure Stephen Hemminger 2026-01-30 17:41 ` [PATCH v10 1/4] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger 2026-01-30 17:41 ` [PATCH v10 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger @ 2026-01-30 17:41 ` Stephen Hemminger 2026-01-30 17:41 ` [PATCH v10 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-30 17:41 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add script to run patch reviews across multiple AI providers for comparison purposes. The script automatically detects which providers have API keys configured and runs analyze-patch.py for each one. This allows users to compare review quality and feedback across different AI models. Features: - Auto-detects available providers based on environment variables - Optional provider selection via -p/--providers option - Saves individual reviews to separate files with -o/--output - Verbose mode passes through to underlying analyze-patch.py Usage: ./devtools/compare-reviews.sh my-patch.patch ./devtools/compare-reviews.sh -p anthropic,xai my-patch.patch ./devtools/compare-reviews.sh -o ./reviews my-patch.patch Output files are named <patch>-<provider>.txt when using the output directory option. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/compare-reviews.sh | 192 ++++++++++++++++++++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100644 devtools/compare-reviews.sh diff --git a/devtools/compare-reviews.sh b/devtools/compare-reviews.sh new file mode 100644 index 0000000000..a63eeffb71 --- /dev/null +++ b/devtools/compare-reviews.sh @@ -0,0 +1,192 @@ +#!/bin/bash +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +# Compare DPDK patch reviews across multiple AI providers +# Runs analyze-patch.py with each available provider + +set -e + +SCRIPT_DIR="$(dirname "$(readlink -f "$0")")" +ANALYZE_SCRIPT="${SCRIPT_DIR}/analyze-patch.py" +AGENTS_FILE="AGENTS.md" +OUTPUT_DIR="" +PROVIDERS="" +FORMAT="text" + +usage() { + cat <<EOF +Usage: $(basename "$0") [OPTIONS] <patch-file> + +Compare DPDK patch reviews across multiple AI providers. + +Options: + -a, --agents FILE Path to AGENTS.md file (default: AGENTS.md) + -o, --output DIR Save individual reviews to directory + -p, --providers LIST Comma-separated list of providers to use + (default: all providers with API keys set) + -f, --format FORMAT Output format: text, markdown, html, json + (default: text) + -v, --verbose Show verbose output from each provider + -h, --help Show this help message + +Environment Variables: + Set API keys for providers you want to use: + ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY + +Examples: + $(basename "$0") my-patch.patch + $(basename "$0") -p anthropic,openai my-patch.patch + $(basename "$0") -o ./reviews -f markdown my-patch.patch +EOF + exit "${1:-0}" +} + +error() { + echo "Error: $1" >&2 + exit 1 +} + +# Check which providers have API keys configured +get_available_providers() { + local available="" + + [[ -n "$ANTHROPIC_API_KEY" ]] && available="${available}anthropic," + [[ -n "$OPENAI_API_KEY" ]] && available="${available}openai," + [[ -n "$XAI_API_KEY" ]] && available="${available}xai," + [[ -n "$GOOGLE_API_KEY" ]] && available="${available}google," + + # Remove trailing comma + echo "${available%,}" +} + +# Get file extension for format +get_extension() { + case "$1" in + text) echo "txt" ;; + markdown) echo "md" ;; + html) echo "html" ;; + json) echo "json" ;; + *) echo "txt" ;; + esac +} + +# Parse command line options +VERBOSE="" + +while [[ $# -gt 0 ]]; do + case "$1" in + -a|--agents) + AGENTS_FILE="$2" + shift 2 + ;; + -o|--output) + OUTPUT_DIR="$2" + shift 2 + ;; + -p|--providers) + PROVIDERS="$2" + shift 2 + ;; + -f|--format) + FORMAT="$2" + shift 2 + ;; + -v|--verbose) + VERBOSE="-v" + shift + ;; + -h|--help) + usage 0 + ;; + -*) + error "Unknown option: $1" + ;; + *) + break + ;; + esac +done + +# Check for required arguments +if [[ $# -lt 1 ]]; then + echo "Error: No patch file specified" >&2 + usage 1 +fi + +PATCH_FILE="$1" + +if [[ ! -f "$PATCH_FILE" ]]; then + error "Patch file not found: $PATCH_FILE" +fi + +if [[ ! -f "$ANALYZE_SCRIPT" ]]; then + error "analyze-patch.py not found: $ANALYZE_SCRIPT" +fi + +# Validate format +case "$FORMAT" in + text|markdown|html|json) ;; + *) error "Invalid format: $FORMAT (must be text, markdown, html, or json)" ;; +esac + +# Get providers to use +if [[ -z "$PROVIDERS" ]]; then + PROVIDERS=$(get_available_providers) +fi + +if [[ -z "$PROVIDERS" ]]; then + error "No API keys configured. Set at least one of: "\ +"ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY" +fi + +# Create output directory if specified +if [[ -n "$OUTPUT_DIR" ]]; then + mkdir -p "$OUTPUT_DIR" +fi + +PATCH_BASENAME=$(basename "$PATCH_FILE") +PATCH_STEM="${PATCH_BASENAME%.*}" +EXT=$(get_extension "$FORMAT") + +echo "Reviewing patch: $PATCH_BASENAME" +echo "Providers: $PROVIDERS" +echo "Format: $FORMAT" +echo "========================================" +echo "" + +# Run review for each provider +IFS=',' read -ra PROVIDER_LIST <<< "$PROVIDERS" +for provider in "${PROVIDER_LIST[@]}"; do + echo ">>> Running review with: $provider" + echo "" + + if [[ -n "$OUTPUT_DIR" ]]; then + OUTPUT_FILE="${OUTPUT_DIR}/${PATCH_STEM}-${provider}.${EXT}" + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" | tee "$OUTPUT_FILE" + echo "" + echo "Saved to: $OUTPUT_FILE" + else + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" + fi + + echo "" + echo "========================================" + echo "" +done + +echo "Review comparison complete." + +if [[ -n "$OUTPUT_DIR" ]]; then + echo "All reviews saved to: $OUTPUT_DIR" +fi -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v10 4/4] devtools: add multi-provider AI documentation review script 2026-01-30 17:40 ` [PATCH v10 0/4] devtools: add AI-assisted code review infrastructure Stephen Hemminger ` (2 preceding siblings ...) 2026-01-30 17:41 ` [PATCH v10 3/4] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger @ 2026-01-30 17:41 ` Stephen Hemminger 3 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-01-30 17:41 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add review-doc.py script that reviews DPDK documentation files for spelling, grammar, technical correctness, and clarity using AI language models. Supports batch processing of multiple files. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini Output formats (-f/--format): - text: plain text with extractable diff/msg markers (default) - markdown: formatted review document - html: complete HTML document with styling - json: structured data with metadata For each input file, the script produces: - <basename>.{txt,md,html,json}: review in selected format - <basename>.diff: unified diff (text/json, or with -d flag) - <basename>.msg: commit message (text/json, or with -d flag) The commit message prefix is automatically determined from the file path (e.g., doc/guides/prog_guide: for programmer's guide). Features: - Multiple file processing with glob support - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Configurable output directory via -o/--output-dir option - Output format selection via -f/--format option - Force diff/msg generation via -d/--diff option - Quiet mode (-q) suppresses stdout output - Verbose mode (-v) shows token usage and API details - Email integration using git sendemail configuration - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/review-doc.py doc/guides/prog_guide/mempool_lib.rst ./devtools/review-doc.py doc/guides/nics/*.rst ./devtools/review-doc.py -f html -d -o /tmp doc/guides/nics/*.rst ./devtools/review-doc.py --send-email --to dev@dpdk.org file.rst Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/review-doc.py | 1098 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1098 insertions(+) create mode 100755 devtools/review-doc.py diff --git a/devtools/review-doc.py b/devtools/review-doc.py new file mode 100755 index 0000000000..1366aa0f85 --- /dev/null +++ b/devtools/review-doc.py @@ -0,0 +1,1098 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Review DPDK documentation files using AI providers. + +Produces a diff file and commit message compliant with DPDK standards. +Accepts multiple documentation files and generates output for each. +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import getpass +import json +import os +import re +import smtplib +import ssl +import subprocess +import sys +from email.message import EmailMessage +from pathlib import Path +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Map output format to file extension +FORMAT_EXTENSIONS = { + "text": ".txt", + "markdown": ".md", + "html": ".html", + "json": ".json", +} + +# Additional markers for extracting diff/msg (used with --diff flag) +DIFF_MARKERS_INSTRUCTION = """ + +ADDITIONALLY, at the end of your response, include these exact markers for automated extraction: +---COMMIT_MESSAGE_START--- +(same commit message as above) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(same unified diff as above) +---UNIFIED_DIFF_END--- +""" + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# Commit prefix mappings based on file path +COMMIT_PREFIX_MAP = [ + ("doc/guides/prog_guide/", "doc/guides/prog_guide:"), + ("doc/guides/sample_app_ug/", "doc/guides/sample_app:"), + ("doc/guides/nics/", "doc/guides/nics:"), + ("doc/guides/cryptodevs/", "doc/guides/cryptodevs:"), + ("doc/guides/compressdevs/", "doc/guides/compressdevs:"), + ("doc/guides/eventdevs/", "doc/guides/eventdevs:"), + ("doc/guides/rawdevs/", "doc/guides/rawdevs:"), + ("doc/guides/bbdevs/", "doc/guides/bbdevs:"), + ("doc/guides/gpus/", "doc/guides/gpus:"), + ("doc/guides/dmadevs/", "doc/guides/dmadevs:"), + ("doc/guides/regexdevs/", "doc/guides/regexdevs:"), + ("doc/guides/mldevs/", "doc/guides/mldevs:"), + ("doc/guides/rel_notes/", "doc/guides/rel_notes:"), + ("doc/guides/linux_gsg/", "doc/guides/linux_gsg:"), + ("doc/guides/freebsd_gsg/", "doc/guides/freebsd_gsg:"), + ("doc/guides/windows_gsg/", "doc/guides/windows_gsg:"), + ("doc/guides/tools/", "doc/guides/tools:"), + ("doc/guides/testpmd_app_ug/", "doc/guides/testpmd:"), + ("doc/guides/howto/", "doc/guides/howto:"), + ("doc/guides/contributing/", "doc/guides/contributing:"), + ("doc/guides/platform/", "doc/guides/platform:"), + ("doc/guides/", "doc:"), + ("doc/api/", "doc/api:"), + ("doc/", "doc:"), +] + +SYSTEM_PROMPT = """\ +You are an expert technical documentation reviewer for DPDK. +Your task is to review documentation files and suggest improvements for: +- Spelling errors +- Grammar issues +- Technical correctness +- Clarity and readability +- Consistency with DPDK terminology + +IMPORTANT COMMIT MESSAGE RULES (from check-git-log.sh): +- Subject line MUST be ≤60 characters +- Format: "prefix: lowercase description" +- First word after colon must be lowercase (except acronyms like Rx, Tx, VF, MAC, API) +- Use imperative mood (e.g., "fix typo" not "fixed typo" or "fixes typo") +- NO trailing period on subject line +- NO punctuation marks: , ; ! ? & | +- NO underscores in subject after colon +- Body lines wrapped at 75 characters +- Body must NOT start with "It" +- Do NOT include Signed-off-by (user adds via git commit --sign) +- Only use "Fixes:" tag for actual errors in documentation, not style improvements + +Case-sensitive terms (must use exact case): +- Rx, Tx (not RX, TX, rx, tx) +- VF, PF (not vf, pf) +- MAC, VLAN, RSS, API +- Linux, Windows, FreeBSD + +For style/clarity improvements, do NOT use Fixes tag. +For actual errors (wrong information, broken examples), include Fixes tag \ +if you can identify the commit.""" + +FORMAT_INSTRUCTIONS = { + "text": """ +OUTPUT FORMAT: +You must output exactly two sections: + +1. COMMIT_MESSAGE section containing the complete commit message +2. UNIFIED_DIFF section containing the unified diff + +Use these exact markers: +---COMMIT_MESSAGE_START--- +(commit message here) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(unified diff here) +---UNIFIED_DIFF_END--- + +The diff should be in unified format that can be applied with "git apply". +If no changes are needed, output empty sections with a note.""", + "markdown": """ +OUTPUT FORMAT: +Provide your review in Markdown format with: + +## Summary +Brief description of changes + +## Commit Message +``` +(complete commit message here, ready to use) +``` + +## Changes +For each change: +### Issue N: Brief title +- **Location**: file path and line +- **Problem**: description +- **Fix**: suggested correction + +## Unified Diff +```diff +(unified diff here) +```""", + "html": """ +OUTPUT FORMAT: +Provide your review in HTML format with: +- <h2> for sections (Summary, Commit Message, Changes, Diff) +- <pre><code> for commit message and diff +- <ul>/<li> for individual issues +- Do NOT include <html>, <head>, or <body> tags - just the content + +Include sections for: Summary, Commit Message, Changes, Unified Diff""", + "json": """ +OUTPUT FORMAT: +Provide your review as JSON with this structure: +{ + "summary": "Brief description of changes", + "commit_message": "Complete commit message ready to use", + "changes": [ + { + "type": "spelling|grammar|technical|clarity|style", + "location": "line number or section", + "original": "original text", + "suggested": "corrected text", + "reason": "why this change" + } + ], + "diff": "unified diff as a string", + "stats": { + "total_issues": 0, + "spelling": 0, + "grammar": 0, + "technical": 0, + "clarity": 0 + } +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """\ +Review the following DPDK documentation file and provide improvements. + +File path: {doc_file} +Commit message prefix to use: {commit_prefix} + +{format_instruction} + +---DOCUMENT CONTENT--- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def get_smtp_config(): + """Get SMTP configuration from git config sendemail settings.""" + config = { + "server": get_git_config("sendemail.smtpserver"), + "port": get_git_config("sendemail.smtpserverport"), + "user": get_git_config("sendemail.smtpuser"), + "encryption": get_git_config("sendemail.smtpencryption"), + "password": get_git_config("sendemail.smtppass"), + } + + # Set defaults + if not config["port"]: + if config["encryption"] == "ssl": + config["port"] = "465" + else: + config["port"] = "587" + + # Convert port to int + if config["port"]: + config["port"] = int(config["port"]) + + return config + + +def get_commit_prefix(filepath): + """Determine commit message prefix from file path.""" + for prefix_path, prefix in COMMIT_PREFIX_MAP: + if filepath.startswith(prefix_path): + return prefix + return "doc:" + + +def build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": SYSTEM_PROMPT}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + doc_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + doc_content, + }, + ], + } + + +def build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": SYSTEM_PROMPT}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + doc_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: " f"{usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: " f"{usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def parse_review_text(review_text): + """Extract commit message and diff from text format response.""" + commit_msg = "" + diff = "" + + # Extract commit message + msg_match = re.search( + r"---COMMIT_MESSAGE_START---\s*\n(.*?)\n---COMMIT_MESSAGE_END---", + review_text, + re.DOTALL, + ) + if msg_match: + commit_msg = msg_match.group(1).strip() + + # Extract unified diff + diff_match = re.search( + r"---UNIFIED_DIFF_START---\s*\n(.*?)\n---UNIFIED_DIFF_END---", + review_text, + re.DOTALL, + ) + if diff_match: + diff = diff_match.group(1).strip() + # Clean up any markdown code fence if present + diff = re.sub(r"^```diff\s*\n?", "", diff) + diff = re.sub(r"\n?```\s*$", "", diff) + + return commit_msg, diff + + +def strip_diff_markers(text): + """Remove the diff/msg extraction markers from text.""" + # Remove commit message markers and content + text = re.sub( + r"\n*---COMMIT_MESSAGE_START---\s*\n.*?\n---COMMIT_MESSAGE_END---\s*", + "", + text, + flags=re.DOTALL, + ) + # Remove unified diff markers and content + text = re.sub( + r"\n*---UNIFIED_DIFF_START---\s*\n.*?\n---UNIFIED_DIFF_END---\s*", + "", + text, + flags=re.DOTALL, + ) + return text.strip() + + +def send_email( + to_addrs, + cc_addrs, + from_addr, + subject, + in_reply_to, + body, + dry_run=False, + verbose=False, +): + """Send review email via SMTP using git sendemail config.""" + # Build email message + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(msg.as_string(), file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Get SMTP configuration from git config + smtp_config = get_smtp_config() + + if not smtp_config["server"]: + error("No SMTP server configured. Set git config sendemail.smtpserver") + + server = smtp_config["server"] + port = smtp_config["port"] + user = smtp_config["user"] + encryption = smtp_config["encryption"] + + # Get password from environment or git config, or prompt + password = os.environ.get("SMTP_PASSWORD") or smtp_config["password"] + if user and not password: + password = getpass.getpass(f"SMTP password for {user}@{server}: ") + + if verbose: + print(f"SMTP server: {server}:{port}", file=sys.stderr) + print(f"SMTP user: {user or '(none)'}", file=sys.stderr) + print(f"Encryption: {encryption or 'starttls'}", file=sys.stderr) + + # Collect all recipients + all_recipients = list(to_addrs) + if cc_addrs: + all_recipients.extend(cc_addrs) + + try: + if encryption == "ssl": + # SSL/TLS connection from the start (port 465) + context = ssl.create_default_context() + with smtplib.SMTP_SSL(server, port, context=context) as smtp: + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + else: + # STARTTLS (port 587) or plain (port 25) + with smtplib.SMTP(server, port) as smtp: + smtp.ehlo() + if encryption == "tls" or port == 587: + context = ssl.create_default_context() + smtp.starttls(context=context) + smtp.ehlo() + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + + print(f"Email sent via SMTP ({server}:{port})", file=sys.stderr) + return True + + except smtplib.SMTPAuthenticationError as e: + error(f"SMTP authentication failed: {e}") + except smtplib.SMTPException as e: + error(f"SMTP error: {e}") + except OSError as e: + error(f"Connection error to {server}:{port}: {e}") + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Review DPDK documentation files using AI providers. " + "Accepts multiple files and generates output for each.", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s doc/guides/prog_guide/mempool_lib.rst + %(prog)s doc/guides/nics/*.rst # Review all NIC docs + %(prog)s -p openai -o /tmp doc/guides/nics/ixgbe.rst doc/guides/nics/i40e.rst + %(prog)s -f html -d -o /tmp/reviews doc/guides/nics/*.rst # HTML + diff files + %(prog)s -f json -o /tmp doc/guides/howto/flow_bifurcation.rst + %(prog)s --send-email --to dev@dpdk.org doc/guides/nics/ixgbe.rst + +Output files (in output-dir): + <basename>.txt|.md|.html|.json Review in selected format + <basename>.diff Unified diff (text/json, or with --diff) + <basename>.msg Commit message (text/json, or with --diff) + +After review: + git apply <basename>.diff + git commit -sF <basename>.msg + +SMTP Configuration (from git config): + sendemail.smtpserver SMTP server hostname + sendemail.smtpserverport SMTP port (default: 587 for TLS, 465 for SSL) + sendemail.smtpuser SMTP username + sendemail.smtpencryption 'tls' for STARTTLS, 'ssl' for SSL/TLS + sendemail.smtppass SMTP password (or set SMTP_PASSWORD env var) + +Example git config: + git config --global sendemail.smtpserver smtp.gmail.com + git config --global sendemail.smtpserverport 587 + git config --global sendemail.smtpuser yourname@gmail.com + git config --global sendemail.smtpencryption tls + """, + ) + + parser.add_argument( + "doc_files", + nargs="+", + metavar="doc_file", + help="Documentation file(s) to review", + ) + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=8192, + help="Max tokens for response (default: 8192)", + ) + parser.add_argument( + "-o", + "--output-dir", + default=".", + help="Output directory for all output files (default: .)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-q", + "--quiet", + action="store_true", + help="Suppress review output to stdout (only write files)", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-d", + "--diff", + action="store_true", + help="Always produce .diff and .msg files (automatic for text/json)", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + # Validate all doc files exist before processing + doc_paths = [] + for doc_file in args.doc_files: + doc_path = Path(doc_file) + if not doc_path.exists(): + error(f"Documentation file not found: {doc_file}") + doc_paths.append((doc_file, doc_path)) + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Read AGENTS.md once + agents_content = agents_path.read_text() + output_dir = Path(args.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + provider_name = config["name"] + + # Process each file + num_files = len(doc_paths) + for file_idx, (doc_file, doc_path) in enumerate(doc_paths, 1): + if num_files > 1: + print( + f"\n{'=' * 60}", + file=sys.stderr, + ) + print( + f"Processing file {file_idx}/{num_files}: {doc_file}", + file=sys.stderr, + ) + print( + f"{'=' * 60}", + file=sys.stderr, + ) + + # Determine output filenames + doc_basename = doc_path.stem + diff_file = output_dir / f"{doc_basename}.diff" + msg_file = output_dir / f"{doc_basename}.msg" + + # Get commit prefix + commit_prefix = get_commit_prefix(doc_file) + + # Read doc content + doc_content = doc_path.read_text() + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Doc file: {doc_file}", file=sys.stderr) + print(f"Commit prefix: {commit_prefix}", file=sys.stderr) + print(f"Output dir: {args.output_dir}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + args.output_format, + args.diff, + args.verbose, + ) + + if not review_text: + print( + f"Warning: No response received for {doc_file}", + file=sys.stderr, + ) + continue + + # Determine review output file + format_ext = FORMAT_EXTENSIONS[args.output_format] + review_file = output_dir / f"{doc_basename}{format_ext}" + + # Determine if we should write diff/msg files + write_diff_msg = args.diff or args.output_format in ("text", "json") + + # Extract commit message and diff first (before stripping markers) + commit_msg, diff = "", "" + if write_diff_msg: + if args.output_format == "json": + # Will extract from JSON below + pass + else: + # Parse from text format markers + commit_msg, diff = parse_review_text(review_text) + + # For non-text formats with --diff, strip the markers from display output + display_text = review_text + if args.diff and args.output_format in ("markdown", "html"): + display_text = strip_diff_markers(review_text) + + # Build formatted output text + if args.output_format == "text": + output_text = review_text + elif args.output_format == "json": + # Try to parse JSON response + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + print("Warning: Response is not valid JSON", file=sys.stderr) + review_data = {"raw_response": review_text} + + # Extract diff/msg from JSON if present + if write_diff_msg: + if isinstance(review_data, dict) and "raw_response" not in review_data: + commit_msg = review_data.get("commit_message", "") + diff = review_data.get("diff", "") + + # Add metadata + output_data = { + "metadata": { + "doc_file": doc_file, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "commit_prefix": commit_prefix, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "markdown": + output_text = f"""# Documentation Review: {doc_path.name} + +*Reviewed by {provider_name} ({model})* + +{display_text} +""" + elif args.output_format == "html": + output_text = f"""<!DOCTYPE html> +<html> +<head> +<meta charset="utf-8"> +<title>Review: {doc_path.name}</title> +<style> +body {{ font-family: system-ui, sans-serif; max-width: 900px; margin: 2em auto; padding: 0 1em; }} +h1 {{ color: #333; }} +.review-meta {{ color: #666; font-style: italic; }} +pre {{ background: #f5f5f5; padding: 1em; overflow-x: auto; }} +</style> +</head> +<body> +<h1>Documentation Review: {doc_path.name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model})</p> +<div class="review-content"> +{display_text} +</div> +</body> +</html> +""" + + # Write formatted review to file + review_file.write_text(output_text) + print(f"Review written to: {review_file}", file=sys.stderr) + + # Write diff/msg files + if write_diff_msg: + if commit_msg: + msg_file.write_text(commit_msg + "\n") + print(f"Commit message written to: {msg_file}", file=sys.stderr) + else: + msg_file.write_text("# No commit message generated\n") + print("Warning: Could not extract commit message", file=sys.stderr) + + if diff: + diff_file.write_text(diff + "\n") + print(f"Diff written to: {diff_file}", file=sys.stderr) + else: + diff_file.write_text("# No changes suggested\n") + print("Warning: Could not extract diff", file=sys.stderr) + + # Print to stdout unless quiet (or multiple files without verbose) + show_stdout = not args.quiet and (num_files == 1 or args.verbose) + if show_stdout: + print( + f"\n=== Documentation Review: {doc_path.name} " + f"(via {provider_name}) ===" + ) + print(output_text) + + # Print usage instructions for text format + if args.output_format == "text": + print("\n=== Output Files ===") + print(f"Commit message: {msg_file}") + print(f"Diff file: {diff_file}") + print("\nTo apply changes:") + print(f" git apply {diff_file}") + print(f" git commit -sF {msg_file}") + + # Send email if requested + if args.send_email: + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + review_subject = f"[REVIEW] {commit_prefix} {doc_path.name}" + + # Build email body + email_body = f"""AI-generated documentation review of {doc_file} +Reviewed using {provider_name} ({model}) + +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + None, + email_body, + args.dry_run, + args.verbose, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + # Print summary for multiple files + if num_files > 1: + print(f"\n{'=' * 60}", file=sys.stderr) + print(f"Processed {num_files} files", file=sys.stderr) + print(f"Output directory: {output_dir}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v8 0/6] devtools: AI-assisted code and documentation review 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger ` (9 preceding siblings ...) 2026-01-30 17:40 ` [PATCH v10 0/4] devtools: add AI-assisted code review infrastructure Stephen Hemminger @ 2026-02-05 17:51 ` Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger ` (5 more replies) 2026-02-13 21:39 ` [PATCH v9 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger 11 siblings, 6 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-05 17:51 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This series adds tooling for AI-assisted review of DPDK patches and documentation. The goal is to catch common coding style violations, commit message formatting errors, and documentation issues before patches reach the mailing list, reducing reviewer workload on mechanical checks. The approach uses a structured guidelines document (AGENTS.md) that codifies the project's existing standards from the contributor documentation, coding style guide, and validation scripts into a format that AI models can apply consistently. The review scripts then submit patches or documentation against these guidelines using any of several AI providers. The series includes: 1. AGENTS.md - consolidated review guidelines derived from patches.rst, coding_style.rst, check-git-log.sh, checkpatches.sh, and the Coccinelle scripts 2. analyze-patch.py - reviews patches against AGENTS.md using AI providers (Anthropic, OpenAI, xAI, Google), with support for patch series splitting and LTS-specific rules 3. compare-reviews.sh - runs the same patch through multiple providers for comparison, auto-detecting available API keys 4. review-doc.py - reviews documentation files for spelling, grammar, and technical accuracy, with batch processing and multiple output formats 5. Contributing guide updates documenting the new tools 6. MAINTAINERS entry for the new files AI review is intended as a supplement to human review, not a replacement. The tools are optional and require the contributor to have an API key for their chosen provider. Stephen Hemminger (6): doc: add AGENTS.md for AI code review tools devtools: add multi-provider AI patch review script devtools: add compare-reviews.sh for multi-provider analysis devtools: add multi-provider AI documentation review script doc: add AI-assisted patch review to contributing guide MAINTAINERS: add section for AI review tools AGENTS.md | 1392 ++++++++++++++++++++++++ MAINTAINERS | 8 + devtools/analyze-patch.py | 1334 +++++++++++++++++++++++ devtools/compare-reviews.sh | 192 ++++ devtools/review-doc.py | 1098 +++++++++++++++++++ doc/guides/contributing/new_driver.rst | 2 + doc/guides/contributing/patches.rst | 56 + 7 files changed, 4082 insertions(+) create mode 100644 AGENTS.md create mode 100755 devtools/analyze-patch.py create mode 100755 devtools/compare-reviews.sh create mode 100755 devtools/review-doc.py -- 2.51.0 ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v8 1/6] doc: add AGENTS.md for AI code review tools 2026-02-05 17:51 ` [PATCH v8 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger @ 2026-02-05 17:51 ` Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 2/6] devtools: add multi-provider AI patch review script Stephen Hemminger ` (4 subsequent siblings) 5 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-05 17:51 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Aaron Conole Add a reference document that enables AI code review tools to validate DPDK patches against project coding standards and submission requirements. The document consolidates guidelines from the DPDK contributor documentation, coding style guide, and validation scripts (check-git-log.sh, checkpatches.sh) into a single structured file. It covers SPDX license requirements, commit message formatting, C coding style rules, forbidden token restrictions, API tag placement, and severity levels aligned with existing validation tool output. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- AGENTS.md | 1392 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1392 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..e3326d1c05 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,1392 @@ +# AGENTS.md - DPDK Code Review Guidelines for AI Tools + +## CRITICAL INSTRUCTION - READ FIRST + +**NEVER list an item under "Errors" or "Warnings" if you conclude it is correct.** + +Before outputting ANY error or warning, verify it is actually wrong. If your analysis concludes with phrases like: +- "there's no issue here" +- "which is fine" +- "appears correct" +- "is acceptable" +- "this is actually correct" + +Then DO NOT INCLUDE IT IN YOUR OUTPUT AT ALL. Delete it. Omit it entirely. + +The "Errors" section is ONLY for actual errors. If something is not an error, it must not appear there. + +--- + +This document provides guidelines for AI-powered code review tools +when reviewing contributions to the Data Plane Development Kit +(DPDK). It is derived from the official DPDK contributor guidelines +and validation scripts. + +## Overview + +DPDK follows a development process modeled on the Linux Kernel. All +patches are reviewed publicly on the mailing list before being +merged. AI review tools should verify compliance with the standards +outlined below. + +### LTS (Long Term Stable) Releases + +Any DPDK release whose minor version is `.11` is an LTS release +(e.g., 19.11, 20.11, 21.11, 22.11, 23.11, 24.11, 25.11, …). +To determine whether a version is LTS, extract the base +`MAJOR.MINOR` (e.g., `23.11` from `23.11.1`) and check whether the +minor component is `11`. An explicit `-lts` suffix (e.g., +`23.11-lts`) also indicates an LTS release. + +LTS branch rules: +- Only bug fixes allowed, no new features +- No new APIs (experimental or stable) +- ABI must remain unchanged +- Backported fixes should reference the original commit with a `Fixes:` tag +- Copyright years should reflect when the code was originally written +- Be conservative: reject changes that are not clearly bug fixes + +## Review Philosophy + +**BEFORE OUTPUTTING YOUR REVIEW**: Re-read each item. If ANY item contains phrases like "is fine", "no issue", "appears correct", "is acceptable", "actually correct" - DELETE THAT ITEM. Do not include it. + +- Only comment when you have HIGH CONFIDENCE (>80%) that an issue exists +- Be concise: one sentence per comment when possible +- Focus on actionable feedback, not observations +- When reviewing text, only comment on clarity issues if the text is genuinely + confusing or could lead to errors. +- Do NOT comment on copyright years unless outside valid range (2013 to current year) +- Do NOT report an issue then contradict yourself - if something is acceptable, do not mention it at all +- Do NOT include items in Errors/Warnings that you then say are "acceptable" or "correct" +- Do NOT mention things that are correct or "not an issue" - only report actual problems +- Do NOT speculate about contributor circumstances (employment, company policies, etc.) +- Before adding any item to your review, ask: "Is this actually wrong?" If no, omit it entirely. +- VERIFY before reporting: For subject line length, COUNT the characters first. If ≤60, do not mention it. +- NEVER write "(Correction: ...)" - if you need to correct yourself, simply omit the item entirely +- Do NOT add vague suggestions like "should be verified" or "should be checked" - either it's wrong or don't mention it +- Do NOT flag something as an Error then say "which is correct" in the same item +- Do NOT say "no issue here" or "this is actually correct" - if there's no issue, do not include it in your review +- Do NOT call the standard DPDK SPDX/copyright format "different style" - it is THE standard +- Do NOT analyze cross-patch dependencies or compilation order - you cannot reliably determine this from patch review +- Do NOT claim a patch "would cause compilation failure" based on symbols used in other patches in the series +- Review each patch individually for its own correctness; assume the patch author ordered them correctly + +## Priority Areas (Review These) + +### Security & Safety +- Unsafe code blocks without justification +- Command injection risks (shell commands, user input) +- Path traversal vulnerabilities +- Credential exposure or hard coded secrets +- Missing input validation on external data +- Improper error handling that could leak sensitive info + +### Correctness Issues +- Logic errors that could cause panics or incorrect behavior +- Buffer overflows +- Race conditions +- Resource leaks (files, connections, memory) +- Off-by-one errors or boundary conditions +- Incorrect error propagation +- Changes to API without release notes +- Changes to ABI on LTS release (see LTS rules above) +- Usage of deprecated APIs when replacements exist +- Overly defensive code that adds unnecessary checks +- Unnecessary comments that just restate what the code already shows (remove them) +- **Process-shared synchronization errors** (pthread mutexes in shared memory without `PTHREAD_PROCESS_SHARED`) + +### Architecture & Patterns +- Code that violates existing patterns in the code base +- Missing error handling +- Code that is not safe against signals + +--- + +## Source License Requirements + +### SPDX License Identifiers + +Every source file must begin with an SPDX license identifier, followed +by the copyright notice, then a blank line before other content. + +- SPDX tag on first line (or second line for `#!` scripts) +- Copyright line immediately follows +- Blank line after copyright before any code/includes +- Core libraries and drivers use `BSD-3-Clause` +- Kernel components use `GPL-2.0` +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` + +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 John Smith + */ + +#include <stdio.h> +``` + +For scripts: +```python +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 Jane Doe + +import sys +``` + +**Do not include boilerplate license text** - the SPDX identifier is sufficient. + +**Do NOT flag copyright years** - Copyright years reflect when code was written. Valid years range from 2013 (DPDK's first release) through the current year. Only flag years outside this range (e.g., years before 2013 or future years beyond the current date). + +**Copyright holders can be individuals or organizations** - Both are equally valid. NEVER comment on, question, or speculate about copyright holders. Do not mention employer policies, company resources, or suggest copyright "should" be assigned differently. The copyright holder's choice is not subject to review. + +**The following SPDX/copyright format is correct** - do NOT flag it or comment on it: +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2026 Stephen Hemminger + */ +``` +This is the standard DPDK format. Do not say it is "different" or "unusual" - it is correct. +Do not suggest the copyright year "should be verified" - if it's in the valid range (2013-current year), it's fine. +If SPDX is on line 1 and copyright follows, this is CORRECT - do not include it in Errors. + +--- + +## Commit Message Requirements + +### Subject Line (First Line) + +| Rule | Limit | +|------|-------| +| Maximum length | **60 characters** | +| Format | `component: lowercase description` | +| Case | Lowercase except acronyms | +| Mood | Imperative (instructions to codebase) | +| Punctuation | **No trailing period** | + +**Before flagging subject line length**: Actually count the characters. Only flag if >60. Do not flag then correct yourself. + +``` +# Good examples +net/ixgbe: fix offload config option name +config: increase max queues per port +net/mlx5: add support for flow counters +app/testpmd: fix memory leak in flow create + +# Bad examples +Fixed the offload config option. # past tense, has period, no prefix +net/ixgbe: Fix Offload Config # uppercase after colon +ixgbe: fix something # wrong prefix, should be net/ixgbe +lib/ethdev: add new feature # wrong prefix, should be ethdev: +``` + +#### Headline Format Errors (from check-git-log.sh) + +The following are flagged as errors: +- Tab characters in subject +- Leading or trailing spaces +- Trailing period (`.`) +- Punctuation marks: `, ; ! ? & |` +- Underscores after the colon (indicates code in subject) +- Missing colon separator +- No space after colon +- Space before colon + +#### Common Prefix Mistakes + +| Wrong | Correct | +|-------|---------| +| `ixgbe:` | `net/ixgbe:` | +| `lib/ethdev:` | `ethdev:` | +| `example:` | `examples/foo:` | +| `apps/` | `app/name:` | +| `app/test:` | `test:` | +| `testpmd:` | `app/testpmd:` | +| `test-pmd:` | `app/testpmd:` | +| `bond:` | `net/bonding:` | + +#### Case-Sensitive Terms (Commit Messages Only) + +These terms must use exact capitalization **in commit messages** (from `devtools/words-case.txt`): +- `Rx`, `Tx` (not `RX`, `TX`, `rx`, `tx`) +- `VF`, `PF` (not `vf`, `pf`) +- `MAC`, `VLAN`, `RSS`, `API` +- `Linux`, `Windows`, `FreeBSD` +- Check `devtools/words-case.txt` for complete list + +**Note**: These rules apply to commit messages only, NOT to code comments or documentation. + +### Commit Body + +| Rule | Limit | +|------|-------| +| Line wrap | **75 characters** | +| Exception | `Fixes:` lines may exceed 75 chars | + +Body guidelines: +- Describe the issue being fixed or feature being added +- Provide enough context for reviewers +- **Do not start the commit message body with "It"** +- **Must end with** `Signed-off-by:` line (real name, not alias) + +### Fixes Tag + +When fixing regressions, use the `Fixes:` tag with a 12-character abbreviated SHA: + +``` +Fixes: abcdefgh1234 ("original commit subject") +``` + +The hash must reference a commit in the current branch, and the subject must match exactly. + +**Do NOT flag Fixes tags** asking for verification that the commit "exists in the tree" or "cannot verify" - you cannot verify this from a patch review. If the format is correct (12-char SHA, quoted subject), accept it. NEVER say "cannot verify this exists". + +**Finding maintainers**: Use `devtools/get-maintainer.sh` to identify the current subsystem maintainer from the `MAINTAINERS` file, rather than CC'ing the original author: + +```bash +git send-email --to-cmd ./devtools/get-maintainer.sh --cc dev@dpdk.org 000*.patch +``` + +### Required Tags + +``` +# For Coverity issues (required if "coverity" mentioned in body): +Coverity issue: 12345 + +# For Bugzilla issues (required if "bugzilla" mentioned in body): +Bugzilla ID: 12345 + +# For stable release backport candidates: +Cc: stable@dpdk.org + +# For patch dependencies (in commit notes after ---): +Depends-on: series-NNNNN ("Title of the series") +``` + +### Tag Order + +Tags must appear in this order, with a blank line separating the two groups: + +**Group 1** (optional tags, no blank lines within this group): +- Coverity issue: +- Bugzilla ID: +- Fixes: +- Cc: + +**Blank line required here** (only if Group 1 tags are present) + +**Group 2** (no blank lines within this group): +- Reported-by: +- Suggested-by: +- Signed-off-by: +- Acked-by: +- Reviewed-by: +- Tested-by: + +**Correct examples:** + +Simple patch with no Group 1 tags (most common): +``` +The info_get callback doesn't need to check its args +since already done by ethdev. + +Signed-off-by: John Smith <john@example.com> +``` + +Patch with Fixes and Cc tags: +``` +Fixes: c743e50c475f ("null: new poll mode driver") +Cc: stable@dpdk.org + +Signed-off-by: John Smith <john@example.com> +``` + +Patch with only Fixes tag: +``` +Fixes: abcd1234abcd ("component: original commit") + +Signed-off-by: Jane Doe <jane@example.com> +``` + +**What is correct (do NOT flag):** +- Signed-off-by directly after commit body when there are no Group 1 tags - this is CORRECT +- No blank line between `Fixes:` and `Cc:` - this is CORRECT +- Blank line between `Cc:` (or last tag in Group 1) and `Signed-off-by:` - this is CORRECT + +**What is wrong (DO flag):** +- Missing blank line between Group 1 tags and Group 2 tags (when Group 1 tags exist) + +**Tag format**: `Tag-name: Full Name <email@domain.com>` + +--- + +## C Coding Style + +### Line Length + +| Context | Limit | +|---------|-------| +| Source code | **100 characters** | +| Commit body | **75 characters** | + +### General Formatting + +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) +- **No trailing whitespace** on lines or at end of files +- Files must end with a new line +- Code style should be consistent within each file + + +### Comments + +```c +/* Most single-line comments look like this. */ + +/* + * VERY important single-line comments look like this. + */ + +/* + * Multi-line comments look like this. Make them real sentences. Fill + * them so they look like real paragraphs. + */ +``` + +### Header File Organization + +Include order (each group separated by blank line): +1. System/libc includes +2. DPDK EAL includes +3. DPDK misc library includes +4. Application-specific includes + +```c +#include <stdio.h> +#include <stdlib.h> + +#include <rte_eal.h> + +#include <rte_ring.h> +#include <rte_mempool.h> + +#include "application.h" +``` + +### Header Guards + +```c +#ifndef _FILE_H_ +#define _FILE_H_ + +/* Code */ + +#endif /* _FILE_H_ */ +``` + +### Naming Conventions + +- **All external symbols** must have `RTE_` or `rte_` prefix +- **Macros**: ALL_UPPERCASE with `RTE_` prefix +- **Functions**: lowercase with underscores only (no CamelCase) +- **Variables**: lowercase with underscores only +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix + +**Exception**: Driver base directories (`drivers/*/base/`) may use different +naming conventions when sharing code across platforms or with upstream vendor code. + +#### Symbol Naming for Static Linking + +Drivers and libraries must not expose global variables that could +clash when statically linked with other DPDK components or +applications. Use consistent and unique prefixes for all exported +symbols to avoid namespace collisions. + +**Good practice**: Use a driver-specific or library-specific prefix for all global variables: + +```c +/* Good - virtio driver uses consistent "virtio_" prefix */ +const struct virtio_ops virtio_legacy_ops = { + .read = virtio_legacy_read, + .write = virtio_legacy_write, + .configure = virtio_legacy_configure, +}; + +const struct virtio_ops virtio_modern_ops = { + .read = virtio_modern_read, + .write = virtio_modern_write, + .configure = virtio_modern_configure, +}; + +/* Good - mlx5 driver uses consistent "mlx5_" prefix */ +struct mlx5_flow_driver_ops mlx5_flow_dv_ops; +``` + +**Bad practice**: Generic names that may clash: + +```c +/* Bad - "ops" is too generic, will clash with other drivers */ +const struct virtio_ops ops = { ... }; + +/* Bad - "legacy_ops" could clash with other legacy implementations */ +const struct virtio_ops legacy_ops = { ... }; + +/* Bad - "driver_config" is not unique */ +struct driver_config config; +``` + +**Guidelines**: +- Prefix all global variables with the driver or library name (e.g., `virtio_`, `mlx5_`, `ixgbe_`) +- Prefix all global functions similarly unless they use the `rte_` namespace +- Internal static variables do not require prefixes as they have file scope +- Consider using the `RTE_` or `rte_` prefix only for symbols that are part of the public DPDK API + +#### Prohibited Terminology + +Do not use non-inclusive naming including: +- `master/slave` → Use: primary/secondary, controller/worker, leader/follower +- `blacklist/whitelist` → Use: denylist/allowlist, blocklist/passlist +- `cripple` → Use: impacted, degraded, restricted, immobolized +- `tribe` → Use: team, squad +- `sanity check` → Use: coherence check, test, verification + + +### Comparisons and Boolean Logic + +```c +/* Pointers - compare explicitly with NULL */ +if (p == NULL) /* Good */ +if (p != NULL) /* Good */ +if (likely(p != NULL)) /* Good - likely/unlikely don't change this */ +if (unlikely(p == NULL)) /* Good - likely/unlikely don't change this */ +if (!p) /* Bad - don't use ! on pointers */ + +/* Integers - compare explicitly with zero */ +if (a == 0) /* Good */ +if (a != 0) /* Good */ +if (errno != 0) /* Good - this IS explicit */ +if (likely(a != 0)) /* Good - likely/unlikely don't change this */ +if (!a) /* Bad - don't use ! on integers */ +if (a) /* Bad - implicit, should be a != 0 */ + +/* Characters - compare with character constant */ +if (*p == '\0') /* Good */ + +/* Booleans - direct test is acceptable */ +if (flag) /* Good for actual bool types */ +if (!flag) /* Good for actual bool types */ +``` + +**Explicit comparison** means using `==` or `!=` operators (e.g., `x != 0`, `p == NULL`). +**Implicit comparison** means relying on truthiness without an operator (e.g., `if (x)`, `if (!p)`). +**Note**: `likely()` and `unlikely()` macros do NOT affect whether a comparison is explicit or implicit. + +### Boolean Usage + +- Using `bool` type is allowed +- Prefer `bool` over `int` when a variable or field is only used as a boolean +- For structure fields, consider if the size/alignment impact is acceptable + +### Indentation and Braces + +```c +/* Control statements - no braces for single statements */ +if (val != NULL) + val = realloc(val, newsize); + +/* Braces on same line as else */ +if (test) + stmt; +else if (bar) { + stmt; + stmt; +} else + stmt; + +/* Switch statements - don't indent case */ +switch (ch) { +case 'a': + aflag = 1; + /* FALLTHROUGH */ +case 'b': + bflag = 1; + break; +default: + usage(); +} + +/* Long conditions - double indent continuation */ +if (really_long_variable_name_1 == really_long_variable_name_2 && + really_long_variable_name_3 == really_long_variable_name_4) + stmt; +``` + +### Variable Declarations + +- Prefer declaring variables inside the basic block where they are used +- Variables may be declared either at the start of the block, or at point of first use (C99 style) +- Both declaration styles are acceptable; consistency within a function is preferred +- Initialize variables only when a meaningful value exists at declaration time +- Use C99 designated initializers for structures + +```c +/* Good - declaration at start of block */ +int ret; +ret = some_function(); + +/* Also good - declaration at point of use (C99 style) */ +for (int i = 0; i < count; i++) + process(i); + +/* Good - declaration in inner block where variable is used */ +if (condition) { + int local_val = compute(); + use(local_val); +} + +/* Bad - unnecessary initialization defeats compiler warnings */ +int ret = 0; +ret = some_function(); /* Compiler won't warn if assignment removed */ +``` + +### Function Format + +- Return type on its own line +- Opening brace on its own line +- Place an empty line between declarations and statements + +```c +static char * +function(int a1, int b1) +{ + char *p; + + p = do_something(a1, b1); + return p; +} +``` + +--- + +## Unnecessary Code Patterns + +The following patterns add unnecessary code, hide bugs, or reduce performance. Avoid them. + +### Unnecessary Variable Initialization + +Do not initialize variables that will be assigned before use. This defeats the compiler's uninitialized variable warnings, hiding potential bugs. + +```c +/* Bad - initialization defeats -Wuninitialized */ +int ret = 0; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - compiler will warn if any path misses assignment */ +int ret; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - meaningful initial value */ +int count = 0; +for (i = 0; i < n; i++) + if (test(i)) + count++; +``` + +### Unnecessary Casts of void * + +In C, `void *` converts implicitly to any pointer type. Casting the result of `malloc()`, `calloc()`, `rte_malloc()`, or similar functions is unnecessary and can hide the error of a missing `#include <stdlib.h>`. + +```c +/* Bad - unnecessary cast */ +struct foo *p = (struct foo *)malloc(sizeof(*p)); +struct bar *q = (struct bar *)rte_malloc(NULL, sizeof(*q), 0); + +/* Good - no cast needed in C */ +struct foo *p = malloc(sizeof(*p)); +struct bar *q = rte_malloc(NULL, sizeof(*q), 0); +``` + +Note: Casts are required in C++ but DPDK is a C project. + +### Zero-Length Arrays vs Variable-Length Arrays + +Zero-length arrays (`int arr[0]`) are a GCC extension. Use C99 flexible array members instead. + +```c +/* Bad - GCC extension */ +struct msg { + int len; + char data[0]; +}; + +/* Good - C99 flexible array member */ +struct msg { + int len; + char data[]; +}; +``` + +### Unnecessary NULL Checks Before free() + +Functions like `free()`, `rte_free()`, and similar deallocation functions accept NULL pointers safely. Do not add redundant NULL checks. + +```c +/* Bad - unnecessary check */ +if (ptr != NULL) + free(ptr); + +if (rte_ptr != NULL) + rte_free(rte_ptr); + +/* Good - free handles NULL */ +free(ptr); +rte_free(rte_ptr); +``` + +### memset Before free() + +Do not call `memset()` to zero memory before freeing it. The compiler may optimize away the `memset()` as a dead store. For security-sensitive data, use `rte_free_sensitive()` which ensures memory is cleared. + +```c +/* Bad - compiler may eliminate memset */ +memset(secret_key, 0, sizeof(secret_key)); +free(secret_key); + +/* Good - for non-sensitive data, just free */ +free(ptr); + +/* Good - for sensitive data, use secure free */ +rte_free_sensitive(secret_key); +``` + +### Appropriate Use of rte_malloc() + +`rte_malloc()` allocates from hugepage memory. Use it only when required: + +- Memory that will be accessed by DMA (NIC descriptors, packet buffers) +- Memory shared between primary and secondary DPDK processes +- Memory requiring specific NUMA node placement + +For general allocations, use standard `malloc()` which is faster and does not consume limited hugepage resources. + +```c +/* Bad - rte_malloc for ordinary data structure */ +struct config *cfg = rte_malloc(NULL, sizeof(*cfg), 0); + +/* Good - standard malloc for control structures */ +struct config *cfg = malloc(sizeof(*cfg)); + +/* Good - rte_malloc for DMA-accessible memory */ +struct rte_mbuf *mbufs = rte_malloc(NULL, n * sizeof(*mbufs), RTE_CACHE_LINE_SIZE); +``` + +### Appropriate Use of rte_memcpy() + +`rte_memcpy()` is optimized for bulk data transfer in the fast path. For general use, standard `memcpy()` is preferred because: + +- Modern compilers optimize `memcpy()` effectively +- `memcpy()` includes bounds checking with `_FORTIFY_SOURCE` +- `memcpy()` handles small fixed-size copies efficiently + +```c +/* Bad - rte_memcpy in control path */ +rte_memcpy(&config, &default_config, sizeof(config)); + +/* Good - standard memcpy for control path */ +memcpy(&config, &default_config, sizeof(config)); + +/* Good - rte_memcpy for packet data in fast path */ +rte_memcpy(rte_pktmbuf_mtod(m, void *), payload, len); +``` + +--- + +## Forbidden Tokens + +### Functions + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `rte_panic()` | Return error codes | lib/, drivers/ | +| `rte_exit()` | Return error codes | lib/, drivers/ | +| `perror()` | `RTE_LOG()` with `strerror(errno)` | lib/, drivers/ (allowed in examples/, app/test/) | +| `printf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | +| `fprintf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | + +### Atomics and Memory Barriers + +| Forbidden | Preferred | +|-----------|-----------| +| `rte_atomic16/32/64_xxx()` | C11 atomics via `rte_atomic_xxx()` | +| `rte_smp_mb()` | `rte_atomic_thread_fence()` | +| `rte_smp_rmb()` | `rte_atomic_thread_fence()` | +| `rte_smp_wmb()` | `rte_atomic_thread_fence()` | +| `__sync_xxx()` | `rte_atomic_xxx()` | +| `__atomic_xxx()` | `rte_atomic_xxx()` | +| `__ATOMIC_RELAXED` etc. | `rte_memory_order_xxx` | +| `__rte_atomic_thread_fence()` | `rte_atomic_thread_fence()` | + +### Threading + +| Forbidden | Preferred | +|-----------|-----------| +| `pthread_create()` | `rte_thread_create()` | +| `pthread_join()` | `rte_thread_join()` | +| `pthread_detach()` | EAL thread functions | +| `pthread_setaffinity_np()` | `rte_thread_set_affinity()` | +| `rte_thread_set_name()` | `rte_thread_set_prefixed_name()` | +| `rte_thread_create_control()` | `rte_thread_create_internal_control()` | + +### Process-Shared Synchronization + +When placing synchronization primitives in shared memory (memory accessible by multiple processes, such as DPDK primary/secondary processes or `mmap`'d regions), they **must** be initialized with process-shared attributes. Failure to do so causes **undefined behavior** that may appear to work in testing but fail unpredictably in production. + +#### pthread Mutexes in Shared Memory + +**This is an error** - mutex in shared memory without `PTHREAD_PROCESS_SHARED`: + +```c +/* BAD - undefined behavior when used across processes */ +struct shared_data { + pthread_mutex_t lock; + int counter; +}; + +void init_shared(struct shared_data *shm) { + pthread_mutex_init(&shm->lock, NULL); /* ERROR: missing pshared attribute */ +} +``` + +**Correct implementation**: + +```c +/* GOOD - properly initialized for cross-process use */ +struct shared_data { + pthread_mutex_t lock; + int counter; +}; + +int init_shared(struct shared_data *shm) { + pthread_mutexattr_t attr; + int ret; + + ret = pthread_mutexattr_init(&attr); + if (ret != 0) + return -ret; + + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); + if (ret != 0) { + pthread_mutexattr_destroy(&attr); + return -ret; + } + + ret = pthread_mutex_init(&shm->lock, &attr); + pthread_mutexattr_destroy(&attr); + + return -ret; +} +``` + +#### pthread Condition Variables in Shared Memory + +Condition variables also require the process-shared attribute: + +```c +/* BAD - will not work correctly across processes */ +pthread_cond_init(&shm->cond, NULL); + +/* GOOD */ +pthread_condattr_t cattr; +pthread_condattr_init(&cattr); +pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED); +pthread_cond_init(&shm->cond, &cattr); +pthread_condattr_destroy(&cattr); +``` + +#### pthread Read-Write Locks in Shared Memory + +```c +/* BAD */ +pthread_rwlock_init(&shm->rwlock, NULL); + +/* GOOD */ +pthread_rwlockattr_t rwattr; +pthread_rwlockattr_init(&rwattr); +pthread_rwlockattr_setpshared(&rwattr, PTHREAD_PROCESS_SHARED); +pthread_rwlock_init(&shm->rwlock, &rwattr); +pthread_rwlockattr_destroy(&rwattr); +``` + +#### When to Flag This Issue + +Flag as an **Error** when ALL of the following are true: +1. A `pthread_mutex_t`, `pthread_cond_t`, `pthread_rwlock_t`, or `pthread_barrier_t` is initialized +2. The primitive is stored in shared memory (identified by context such as: structure in `rte_malloc`/`rte_memzone`, `mmap`'d memory, memory passed to secondary processes, or structures documented as shared) +3. The initialization uses `NULL` attributes or attributes without `PTHREAD_PROCESS_SHARED` + +**Do NOT flag** when: +- The mutex is in thread-local or process-private heap memory (`malloc`) +- The mutex is a local/static variable not in shared memory +- The code already uses `pthread_mutexattr_setpshared()` with `PTHREAD_PROCESS_SHARED` +- The synchronization uses DPDK primitives (`rte_spinlock_t`, `rte_rwlock_t`) which are designed for shared memory + +#### Preferred Alternatives + +For DPDK code, prefer DPDK's own synchronization primitives which are designed for shared memory: + +| pthread Primitive | DPDK Alternative | +|-------------------|------------------| +| `pthread_mutex_t` | `rte_spinlock_t` (busy-wait) or properly initialized pthread mutex | +| `pthread_rwlock_t` | `rte_rwlock_t` | +| `pthread_spinlock_t` | `rte_spinlock_t` | + +Note: `rte_spinlock_t` and `rte_rwlock_t` work correctly in shared memory without special initialization, but they are spinning locks unsuitable for long wait times. + +### Compiler Built-ins and Attributes + +| Forbidden | Preferred | Notes | +|-----------|-----------|-------| +| `__attribute__` | RTE macros in `rte_common.h` | Except in `lib/eal/include/rte_common.h` | +| `__alignof__` | C11 `alignof` | | +| `__typeof__` | `typeof` | | +| `__builtin_*` | EAL macros | Except in `lib/eal/` and `drivers/*/base/` | +| `__reserved` | Different name | Reserved in Windows headers | +| `#pragma` / `_Pragma` | Avoid | Except in `rte_common.h` | + +### Format Specifiers + +| Forbidden | Preferred | +|-----------|-----------| +| `%lld`, `%llu`, `%llx` | `%PRId64`, `%PRIu64`, `%PRIx64` | + +### Headers and Build + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `#include <linux/pci_regs.h>` | `#include <rte_pci.h>` | | +| `install_headers()` | Meson `headers` variable | meson.build | +| `-DALLOW_EXPERIMENTAL_API` | Not in lib/drivers/app | Build flags | +| `allow_experimental_apis` | Not in lib/drivers/app | Meson | +| `#undef XXX` | `// XXX is not set` | config/rte_config.h | +| Driver headers (`*_driver.h`, `*_pmd.h`) | Public API headers | app/, examples/ | + +### Testing + +| Forbidden | Preferred | +|-----------|-----------| +| `REGISTER_TEST_COMMAND` | `REGISTER_<suite_name>_TEST` | + +### Documentation + +| Forbidden | Preferred | +|-----------|-----------| +| `http://...dpdk.org` | `https://...dpdk.org` | +| `//doc.dpdk.org/guides/...` | `:ref:` or `:doc:` Sphinx references | +| `:: file.svg` | `:: file.*` (wildcard extension) | + +--- + +## Deprecated API Usage + +New patches must not introduce usage of deprecated APIs, macros, or functions. +Deprecated items are marked with `RTE_DEPRECATED` or documented in the +deprecation notices section of the release notes. + +### Rules for New Code + +- Do not call functions marked with `RTE_DEPRECATED` or `__rte_deprecated` +- Do not use macros that have been superseded by newer alternatives +- Do not use data structures or enum values marked as deprecated +- Check `doc/guides/rel_notes/deprecation.rst` for planned deprecations +- When a deprecated API has a replacement, use the replacement + +### Deprecating APIs + +A patch may mark an API as deprecated provided: + +- No remaining usages exist in the current DPDK codebase +- The deprecation is documented in the release notes +- A migration path or replacement API is documented +- The `RTE_DEPRECATED` macro is used to generate compiler warnings + +```c +/* Marking a function as deprecated */ +__rte_deprecated +int +rte_old_function(void); + +/* With a message pointing to the replacement */ +__rte_deprecated_msg("use rte_new_function() instead") +int +rte_old_function(void); +``` + +### Common Deprecated Patterns + +| Deprecated | Replacement | Notes | +|-----------|-------------|-------| +| `rte_atomic*_t` types | C11 atomics | Use `rte_atomic_xxx()` wrappers | +| `rte_smp_*mb()` barriers | `rte_atomic_thread_fence()` | See Atomics section | +| `pthread_*()` in portable code | `rte_thread_*()` | See Threading section | + +When reviewing patches that add new code, flag any usage of deprecated APIs +as requiring change to use the modern replacement. + +--- + +## API Tag Requirements + +### `__rte_experimental` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_experimental +int +rte_new_feature(void); + +/* Wrong - not alone on line */ +__rte_experimental int rte_new_feature(void); + +/* Wrong - in .c file */ +``` + +### `__rte_internal` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_internal +int +internal_function(void); +``` + +### Alignment Attributes + +`__rte_aligned`, `__rte_cache_aligned`, `__rte_cache_min_aligned` may only be used with `struct` or `union` types: + +```c +/* Correct */ +struct __rte_cache_aligned my_struct { + /* ... */ +}; + +/* Wrong */ +int __rte_cache_aligned my_variable; +``` + +### Packed Attributes + +- `__rte_packed_begin` must follow `struct`, `union`, or alignment attributes +- `__rte_packed_begin` and `__rte_packed_end` must be used in pairs +- Cannot use `__rte_packed_begin` with `enum` + +```c +/* Correct */ +struct __rte_packed_begin my_packed_struct { + /* ... */ +} __rte_packed_end; + +/* Wrong - with enum */ +enum __rte_packed_begin my_enum { + /* ... */ +}; +``` + +--- + +## Code Quality Requirements + +### Compilation + +- Each commit must compile independently (for `git bisect`) +- No forward dependencies within a patchset +- Test with multiple targets, compilers, and options +- Use `devtools/test-meson-builds.sh` + +**Note for AI reviewers**: You cannot verify compilation order or cross-patch dependencies from patch review alone. Do NOT flag patches claiming they "would fail to compile" based on symbols used in other patches in the series. Assume the patch author has ordered them correctly. + +### Testing + +- Add tests to `app/test` unit test framework +- New API functions must be used in `/app` test directory +- New device APIs require at least one driver implementation + +#### Functional Test Infrastructure + +Standalone functional tests should use the `TEST_ASSERT` macros and `unit_test_suite_runner` infrastructure for consistency and proper integration with the DPDK test framework. + +```c +#include <rte_test.h> + +static int +test_feature_basic(void) +{ + int ret; + + ret = rte_feature_init(); + TEST_ASSERT_SUCCESS(ret, "Failed to initialize feature"); + + ret = rte_feature_operation(); + TEST_ASSERT_EQUAL(ret, 0, "Operation returned unexpected value"); + + TEST_ASSERT_NOT_NULL(rte_feature_get_ptr(), + "Feature pointer should not be NULL"); + + return TEST_SUCCESS; +} + +static struct unit_test_suite feature_testsuite = { + .suite_name = "feature_autotest", + .setup = test_feature_setup, + .teardown = test_feature_teardown, + .unit_test_cases = { + TEST_CASE(test_feature_basic), + TEST_CASE(test_feature_advanced), + TEST_CASES_END() + } +}; + +static int +test_feature(void) +{ + return unit_test_suite_runner(&feature_testsuite); +} + +REGISTER_FAST_TEST(feature_autotest, NOHUGE_OK, ASAN_OK, test_feature); +``` + +The `REGISTER_FAST_TEST` macro parameters are: +- Test name (e.g., `feature_autotest`) +- `NOHUGE_OK` or `HUGEPAGES_REQUIRED` - whether test can run without hugepages +- `ASAN_OK` or `ASAN_FAILS` - whether test is compatible with Address Sanitizer +- Test function name + +Common `TEST_ASSERT` macros: +- `TEST_ASSERT(cond, msg, ...)` - Assert condition is true +- `TEST_ASSERT_SUCCESS(val, msg, ...)` - Assert value equals 0 +- `TEST_ASSERT_FAIL(val, msg, ...)` - Assert value is non-zero +- `TEST_ASSERT_EQUAL(a, b, msg, ...)` - Assert two values are equal +- `TEST_ASSERT_NOT_EQUAL(a, b, msg, ...)` - Assert two values differ +- `TEST_ASSERT_NULL(val, msg, ...)` - Assert value is NULL +- `TEST_ASSERT_NOT_NULL(val, msg, ...)` - Assert value is not NULL + +### Documentation + +- Add Doxygen comments for public APIs +- Update release notes in `doc/guides/rel_notes/` for important changes +- Code and documentation must be updated atomically in same patch +- Only update the **current release** notes file +- Documentation must match the code +- PMD features must match the features matrix in `doc/guides/nics/features/` +- Documentation must match device operations (see `doc/guides/nics/features.rst` for the mapping between features, `eth_dev_ops`, and related APIs) +- Release notes are NOT required for: + - Test-only changes (unit tests, functional tests) + - Internal APIs and helper functions (not exported to applications) + - Internal implementation changes that don't affect public API + +### API and Driver Changes + +- New APIs must be marked as `__rte_experimental` +- New APIs must have hooks in `app/testpmd` and tests in the functional test suite +- Changes to existing APIs require release notes +- New drivers or subsystems must have release notes +- Internal APIs (used only within DPDK, not exported to applications) do NOT require release notes + +### ABI Compatibility and Symbol Exports + +**IMPORTANT**: DPDK uses automatic symbol map generation. Do **NOT** recommend +manually editing `version.map` files - they are auto-generated from source code +annotations. + +#### Symbol Export Macros + +New public functions must be annotated with export macros (defined in +`rte_export.h`). Place the macro on the line immediately before the function +definition in the `.c` file: + +```c +/* For stable ABI symbols */ +RTE_EXPORT_SYMBOL(rte_foo_create) +int +rte_foo_create(struct rte_foo_config *config) +{ + /* ... */ +} + +/* For experimental symbols (include version when first added) */ +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_foo_new_feature, 25.03) +__rte_experimental +int +rte_foo_new_feature(void) +{ + /* ... */ +} + +/* For internal symbols (shared between DPDK components only) */ +RTE_EXPORT_INTERNAL_SYMBOL(rte_foo_internal_helper) +int +rte_foo_internal_helper(void) +{ + /* ... */ +} +``` + +#### Symbol Export Rules + +- `RTE_EXPORT_SYMBOL` - Use for stable ABI functions +- `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, ver)` - Use for new experimental APIs + (version is the DPDK release, e.g., `25.03`) +- `RTE_EXPORT_INTERNAL_SYMBOL` - Use for functions shared between DPDK libs/drivers + but not part of public API +- Export macros go in `.c` files, not headers +- The build system generates linker version maps automatically + +#### What NOT to Review + +- Do **NOT** flag missing `version.map` updates - maps are auto-generated +- Do **NOT** suggest adding symbols to `lib/*/version.map` files + +#### ABI Versioning for Changed Functions + +When changing the signature of an existing stable function, use versioning macros +from `rte_function_versioning.h`: + +- `RTE_VERSION_SYMBOL` - Create versioned symbol for backward compatibility +- `RTE_DEFAULT_SYMBOL` - Mark the new default version + +Follow ABI policy and versioning guidelines in the contributor documentation. +Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable. + +--- + +## Patch Validation Checklist + +### Commit Message + +- [ ] Subject line ≤60 characters +- [ ] Subject is lowercase (except acronyms from words-case.txt) +- [ ] Correct component prefix (e.g., `net/ixgbe:` not `ixgbe:`) +- [ ] No `lib/` prefix for libraries +- [ ] Imperative mood, no trailing period +- [ ] No tabs, leading/trailing spaces, or punctuation marks +- [ ] Body wrapped at 75 characters +- [ ] Body does not start with "It" +- [ ] `Signed-off-by:` present with real name and valid email +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA and exact subject +- [ ] `Coverity issue:` tag present if Coverity mentioned +- [ ] `Bugzilla ID:` tag present if Bugzilla mentioned +- [ ] `Cc: stable@dpdk.org` for stable backport candidates +- [ ] Tags in correct order with blank line separator + +### License + +- [ ] SPDX identifier on first line (or second for scripts) +- [ ] Copyright line follows SPDX +- [ ] Blank line after copyright before code +- [ ] Appropriate license for file type + +### Code Style + +- [ ] Lines ≤100 characters +- [ ] Hard tabs for indentation, spaces for alignment +- [ ] No trailing whitespace +- [ ] Proper include order +- [ ] Header guards present +- [ ] `rte_`/`RTE_` prefix on external symbols +- [ ] Driver/library global variables use unique prefixes (e.g., `virtio_`, `mlx5_`) +- [ ] No prohibited terminology +- [ ] Proper brace style +- [ ] Function return type on own line +- [ ] Explicit comparisons: `== NULL`, `== 0`, `!= NULL`, `!= 0` +- [ ] No forbidden tokens (see table above) +- [ ] No unnecessary code patterns (see section above) +- [ ] No usage of deprecated APIs, macros, or functions +- [ ] Process-shared primitives in shared memory use `PTHREAD_PROCESS_SHARED` + +### API Tags + +- [ ] `__rte_experimental` alone on line, only in headers +- [ ] `__rte_internal` alone on line, only in headers +- [ ] Alignment attributes only on struct/union +- [ ] Packed attributes properly paired +- [ ] New public functions have `RTE_EXPORT_*` macro in `.c` file +- [ ] Experimental functions use `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, version)` + +### Structure + +- [ ] Each commit compiles independently +- [ ] Code and docs updated together +- [ ] Documentation matches code behavior +- [ ] PMD features match `doc/guides/nics/features/` matrix +- [ ] Device operations match documentation (per `features.rst` mappings) +- [ ] Tests added/updated as needed +- [ ] Functional tests use TEST_ASSERT macros and unit_test_suite_runner +- [ ] New APIs marked as `__rte_experimental` +- [ ] New APIs have testpmd hooks and functional tests +- [ ] Current release notes updated for significant changes +- [ ] Release notes updated for API changes +- [ ] Release notes updated for new drivers or subsystems + +--- + +## Meson Build Files + +### Style Requirements + +- 4-space indentation (no tabs) +- Line continuations double-indented +- Lists alphabetically ordered +- Short lists (≤3 items): single line, no trailing comma +- Long lists: one item per line, trailing comma on last item +- No strict line length limit for meson files; lines under 100 characters are acceptable + +```python +# Short list +sources = files('file1.c', 'file2.c') + +# Long list +headers = files( + 'header1.h', + 'header2.h', + 'header3.h', +) +``` + +--- + +## Python Code + +- Must comply with formatting standards +- Use **`black`** for code formatting validation +- Line length acceptable up to 100 characters + +--- + +## Validation Tools + +Run these before submitting: + +```bash +# Check commit messages +devtools/check-git-log.sh -n1 + +# Check patch format and forbidden tokens +devtools/checkpatches.sh -n1 + +# Check maintainers coverage +devtools/check-maintainers.sh + +# Build validation +devtools/test-meson-builds.sh + +# Find maintainers for your patch +devtools/get-maintainer.sh <patch-file> +``` + +--- + +## Severity Levels for AI Review + +**Error** (must fix): +- Missing or malformed SPDX license +- Missing Signed-off-by +- Subject line over 60 characters +- Body lines over 75 characters +- Wrong tag order or format +- Missing required tags (Fixes, Coverity issue, Bugzilla ID) +- Forbidden tokens in code +- `__rte_experimental`/`__rte_internal` in .c files or not alone on line +- Compilation failures +- ABI breaks without proper versioning +- pthread mutex/cond/rwlock in shared memory without `PTHREAD_PROCESS_SHARED` + +**Warning** (should fix): +- Subject line style issues (case, punctuation) +- Wrong component prefix +- Missing Cc: stable@dpdk.org for fixes +- Documentation gaps +- Documentation does not match code behavior +- PMD features missing from `doc/guides/nics/features/` matrix +- Device operations not documented per `features.rst` mappings +- Missing tests +- Functional tests not using TEST_ASSERT macros or unit_test_suite_runner +- New API not marked as `__rte_experimental` +- New API without testpmd hooks or functional tests +- New public function missing `RTE_EXPORT_*` macro +- API changes without release notes +- New drivers or subsystems without release notes +- Implicit comparisons (`!ptr` instead of `ptr == NULL`) +- Unnecessary variable initialization +- Unnecessary casts of `void *` +- Unnecessary NULL checks before free +- Inappropriate use of `rte_malloc()` or `rte_memcpy()` +- Use of `perror()`, `printf()`, `fprintf()` in libraries or drivers (allowed in examples and test code) +- Driver/library global variables without unique prefixes (static linking clash risk) +- Usage of deprecated APIs, macros, or functions in new code + +**Do NOT flag** (common false positives): +- Missing `version.map` updates (maps are auto-generated from `RTE_EXPORT_*` macros) +- Suggesting manual edits to any `version.map` file +- Copyright years within valid range (2013 to current year) +- Copyright held by individuals (never speculate about employers, company policies, or who "should" hold copyright) +- SPDX/copyright format that matches the standard DPDK format (do not call it "different" or "unusual") +- Meson file lines under 100 characters +- Case-sensitive term violations in code comments (words-case.txt applies to commit messages only) +- Comparisons using `== 0`, `!= 0`, `== NULL`, `!= NULL` as "implicit" (these ARE explicit) +- Comparisons wrapped in `likely()` or `unlikely()` macros - these are still explicit if using == or != +- Anything you determine is correct (do not mention non-issues or say "No issue here") +- `REGISTER_FAST_TEST` using `NOHUGE_OK`/`ASAN_OK` macros (this is the correct current format) +- Tag ordering: Signed-off-by directly after commit body (when no Fixes/Cc tags) is CORRECT +- Tag ordering: no blank line between Fixes/Cc, blank line before Signed-off-by (when Fixes/Cc present) is CORRECT +- Missing release notes for test-only changes (unit tests do not require release notes) +- Missing release notes for internal APIs or helper functions (only public APIs need release notes) +- Subject lines that are within the 60 character limit (count first, do not guess) +- Any item you later correct with "(Correction: ...)" or "actually acceptable" - just omit it +- Vague concerns ("should be verified", "should be checked") - if you're not sure it's wrong, don't flag it +- Items where you say "which is correct" or "this is correct" - if it's correct, don't mention it at all +- Fixes tags with correct format (12-char SHA, quoted subject) - NEVER ask for verification that commit exists +- Items where you conclude "no issue here" or "this is actually correct" - omit these entirely +- Cross-patch compilation dependencies - you cannot determine patch ordering correctness from review +- Claims that a symbol "was removed in patch N" causing issues in patch M - assume author ordered correctly +- Any speculation about whether patches will compile when applied in sequence +- Mutexes/locks in process-private memory (standard `malloc`, stack, static non-shared) - these don't need `PTHREAD_PROCESS_SHARED` +- Use of `rte_spinlock_t` or `rte_rwlock_t` in shared memory (these work correctly without special init) + +**Info** (consider): +- Minor style preferences +- Optimization suggestions +- Alternative approaches + +--- + +# Response Format + +When you identify an issue: +1. **State the problem** (1 sentence) +2. **Why it matters** (1 sentence, only if not obvious) +3. **Suggested fix** (code snippet or specific action) + +Example: +This could panic if the string is NULL. + +--- + +## FINAL CHECK BEFORE SUBMITTING REVIEW + +Before outputting your review, scan through each item in your Errors and Warnings sections. +For EACH item, ask: "Did I conclude this is actually fine/correct/acceptable/no issue?" + +If YES → DELETE THAT ITEM. It should not be in your output. + +An item that says "X is wrong... actually this is correct" is a FALSE POSITIVE and must be removed. + +**If your Errors section would be empty after this check, that's fine - it means the patches are good.** -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v8 2/6] devtools: add multi-provider AI patch review script 2026-02-05 17:51 ` [PATCH v8 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger @ 2026-02-05 17:51 ` Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 3/6] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger ` (3 subsequent siblings) 5 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-05 17:51 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Aaron Conole This is an AI generated script to review DPDK patches against the AGENTS.md coding guidelines using AI language models. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini The script reads a patch file and the AGENTS.md guidelines, then submits them to the selected AI provider for review. Results are organized by severity level (Error, Warning, Info) as defined in the guidelines. Features: - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Verbose mode shows token usage statistics - Uses temporary files for API requests to handle large patches - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/analyze-patch.py 0001-net-ixgbe-fix-something.patch ./devtools/analyze-patch.py -p xai my-patch.patch ./devtools/analyze-patch.py -l # list providers Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/analyze-patch.py | 1334 +++++++++++++++++++++++++++++++++++++ 1 file changed, 1334 insertions(+) create mode 100755 devtools/analyze-patch.py diff --git a/devtools/analyze-patch.py b/devtools/analyze-patch.py new file mode 100755 index 0000000000..c77908fb3c --- /dev/null +++ b/devtools/analyze-patch.py @@ -0,0 +1,1334 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Analyze DPDK patches using AI providers. + +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import json +import os +import re +import subprocess +import sys +import tempfile +from datetime import date +from email.message import EmailMessage +from pathlib import Path +from typing import Iterator +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Large file handling modes +LARGE_FILE_MODES = ["error", "truncate", "chunk", "commits-only", "summary"] + +# Approximate tokens per character (conservative estimate for code) +CHARS_PER_TOKEN = 3.5 + +# Default token limits by provider (leaving room for system prompt and response) +PROVIDER_INPUT_LIMITS = { + "anthropic": 180000, # 200K context, reserve for system/response + "openai": 115000, # 128K context for gpt-4o + "xai": 115000, # Assume similar to OpenAI + "google": 900000, # Gemini has 1M context +} + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# LTS releases: any DPDK release with minor version .11 +# (e.g., 19.11, 20.11, 21.11, 22.11, 23.11, 24.11, 25.11, ...) + +SYSTEM_PROMPT_BASE = """\ +You are an expert DPDK code reviewer. Analyze patches for compliance with \ +DPDK coding standards and contribution guidelines. Provide clear, actionable \ +feedback organized by severity (Error, Warning, Info) as defined in the \ +guidelines.""" + +LTS_RULES = """ +LTS (Long Term Stable) branch rules apply: +- Only bug fixes allowed, no new features +- No new APIs (experimental or stable) +- ABI must remain unchanged +- Backported fixes should reference the original commit with Fixes: tag +- Copyright years should reflect when the code was originally written +- Be conservative: reject changes that aren't clearly bug fixes""" + +FORMAT_INSTRUCTIONS = { + "text": """Provide your review in plain text format.""", + "markdown": """Provide your review in Markdown format with: +- Headers (##) for each severity level (Errors, Warnings, Info) +- Bullet points for individual issues +- Code blocks (```) for code references +- Bold (**) for emphasis on key points""", + "html": """Provide your review in HTML format with: +- <h2> tags for each severity level (Errors, Warnings, Info) +- <ul>/<li> for individual issues +- <pre><code> for code references +- <strong> for emphasis on key points +- Use appropriate semantic HTML tags +- Do NOT include <html>, <head>, or <body> tags - just the content""", + "json": """Provide your review in JSON format with this structure: +{ + "summary": "Brief one-line summary of the review", + "errors": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "warnings": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "info": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "passed_checks": ["list of checks that passed"], + "overall_status": "PASS|WARN|FAIL" +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """Please review the following DPDK patch file '{patch_name}' \ +against the AGENTS.md guidelines. Check for: + +1. Commit message format (subject line, body, tags) +2. License/copyright compliance +3. C coding style issues +4. API and documentation requirements +5. Any other guideline violations + +{format_instruction} + +--- PATCH CONTENT --- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def is_lts_release(release): + """Check if a release is an LTS release. + + Per DPDK project guidelines, any release with minor version .11 + is an LTS release (e.g., 19.11, 21.11, 23.11, 24.11, 25.11). + """ + if not release: + return False + # Check for explicit -lts suffix + if "-lts" in release.lower(): + return True + # Extract base version (e.g., "23.11" from "23.11.1" or "23.11-rc1") + version = release.split("-")[0] + parts = version.split(".") + if len(parts) >= 2: + try: + minor = int(parts[1]) + return minor == 11 + except ValueError: + pass + return False + + +def estimate_tokens(text): + """Estimate token count from text length.""" + return int(len(text) / CHARS_PER_TOKEN) + + +def split_mbox_patches(content): + """Split an mbox file into individual patches.""" + patches = [] + current_patch = [] + in_patch = False + + for line in content.split("\n"): + # Detect start of new message in mbox format + if line.startswith("From ") and ( + " Mon " in line + or " Tue " in line + or " Wed " in line + or " Thu " in line + or " Fri " in line + or " Sat " in line + or " Sun " in line + ): + if current_patch: + patches.append("\n".join(current_patch)) + current_patch = [line] + in_patch = True + elif in_patch: + current_patch.append(line) + + # Don't forget the last patch + if current_patch: + patches.append("\n".join(current_patch)) + + return patches if patches else [content] + + +def extract_commit_messages(content): + """Extract only commit messages from patch content.""" + patches = split_mbox_patches(content) + messages = [] + + for patch in patches: + lines = patch.split("\n") + msg_lines = [] + in_headers = True + in_body = False + found_subject = False + + for line in lines: + # Collect headers we care about + if in_headers: + if line.startswith("Subject:"): + msg_lines.append(line) + found_subject = True + elif line.startswith(("From:", "Date:")): + msg_lines.append(line) + elif line.startswith((" ", "\t")) and found_subject: + # Subject continuation + msg_lines.append(line) + elif line == "": + if found_subject: + in_headers = False + in_body = True + msg_lines.append("") + elif in_body: + # Stop at the diff + if line.startswith("---") and not line.startswith("----"): + break + if line.startswith("diff --git"): + break + msg_lines.append(line) + + if msg_lines: + messages.append("\n".join(msg_lines)) + + return "\n\n---\n\n".join(messages) + + +def truncate_content(content, max_tokens, provider): + """Truncate content to fit within token limit.""" + max_chars = int(max_tokens * CHARS_PER_TOKEN) + + if len(content) <= max_chars: + return content, False + + # Try to truncate at a reasonable boundary + truncated = content[:max_chars] + + # Find last complete diff hunk or patch boundary + last_diff = truncated.rfind("\ndiff --git") + last_patch = truncated.rfind("\nFrom ") + + if last_diff > max_chars * 0.5: + truncated = truncated[:last_diff] + elif last_patch > max_chars * 0.5: + truncated = truncated[:last_patch] + + truncated += "\n\n[... Content truncated due to size limits ...]\n" + return truncated, True + + +def chunk_content(content, max_tokens, provider) -> Iterator[tuple[str, int, int]]: + """Split content into chunks that fit within token limit. + + Yields tuples of (chunk_content, chunk_number, total_chunks). + """ + patches = split_mbox_patches(content) + + if len(patches) == 1: + # Single large patch - split by diff sections + yield from chunk_single_patch(content, max_tokens) + return + + # Multiple patches - group them to fit within limits + chunks = [] + current_chunk = [] + current_size = 0 + max_chars = int(max_tokens * CHARS_PER_TOKEN * 0.9) # 90% to leave margin + + for patch in patches: + patch_size = len(patch) + if current_size + patch_size > max_chars and current_chunk: + chunks.append("\n".join(current_chunk)) + current_chunk = [] + current_size = 0 + + if patch_size > max_chars: + # Single patch too large, truncate it + if current_chunk: + chunks.append("\n".join(current_chunk)) + current_chunk = [] + current_size = 0 + truncated, _ = truncate_content(patch, max_tokens * 0.9, provider) + chunks.append(truncated) + else: + current_chunk.append(patch) + current_size += patch_size + + if current_chunk: + chunks.append("\n".join(current_chunk)) + + total = len(chunks) + for i, chunk in enumerate(chunks, 1): + yield chunk, i, total + + +def chunk_single_patch(content, max_tokens) -> Iterator[tuple[str, int, int]]: + """Split a single large patch by diff sections.""" + max_chars = int(max_tokens * CHARS_PER_TOKEN * 0.9) + + # Extract header (everything before first diff) + first_diff = content.find("\ndiff --git") + if first_diff == -1: + # No diff sections, just truncate + truncated, _ = truncate_content(content, max_tokens * 0.9, "anthropic") + yield truncated, 1, 1 + return + + header = content[: first_diff + 1] + diff_content = content[first_diff + 1 :] + + # Split by diff sections + diffs = [] + current_diff = [] + for line in diff_content.split("\n"): + if line.startswith("diff --git") and current_diff: + diffs.append("\n".join(current_diff)) + current_diff = [] + current_diff.append(line) + if current_diff: + diffs.append("\n".join(current_diff)) + + # Group diffs into chunks + chunks = [] + current_chunk_diffs = [] + current_size = len(header) + + for diff in diffs: + diff_size = len(diff) + if current_size + diff_size > max_chars and current_chunk_diffs: + chunks.append(header + "\n".join(current_chunk_diffs)) + current_chunk_diffs = [] + current_size = len(header) + + if diff_size + len(header) > max_chars: + # Single diff too large + if current_chunk_diffs: + chunks.append(header + "\n".join(current_chunk_diffs)) + current_chunk_diffs = [] + truncated_diff = diff[: max_chars - len(header) - 100] + truncated_diff += "\n[... diff truncated ...]\n" + chunks.append(header + truncated_diff) + current_size = len(header) + else: + current_chunk_diffs.append(diff) + current_size += diff_size + + if current_chunk_diffs: + chunks.append(header + "\n".join(current_chunk_diffs)) + + total = len(chunks) + for i, chunk in enumerate(chunks, 1): + yield chunk, i, total + + +def get_summary_prompt(): + """Get prompt modifications for summary mode.""" + return """ +NOTE: This is a LARGE patch series. Provide a HIGH-LEVEL summary review only: +- Focus on overall architecture and design concerns +- Check commit message formatting across the series +- Identify any obvious policy violations +- Do NOT attempt detailed line-by-line code review +- Summarize the scope and purpose of the changes +""" + + +def format_combined_reviews(reviews, output_format, patch_name): + """Combine multiple chunk/patch reviews into a single output.""" + if output_format == "json": + combined = { + "patch_file": patch_name, + "sections": [ + {"label": label, "review": review} for label, review in reviews + ], + } + return json.dumps(combined, indent=2) + elif output_format == "html": + sections = [] + for label, review in reviews: + sections.append(f"<h2>{label}</h2>\n{review}") + return "\n<hr>\n".join(sections) + elif output_format == "markdown": + sections = [] + for label, review in reviews: + sections.append(f"## {label}\n\n{review}") + return "\n\n---\n\n".join(sections) + else: # text + sections = [] + for label, review in reviews: + sections.append(f"=== {label} ===\n\n{review}") + return "\n\n" + "=" * 60 + "\n\n".join(sections) + + +def build_system_prompt(review_date, release): + """Build system prompt with date and release context.""" + prompt = SYSTEM_PROMPT_BASE + prompt += f"\n\nCurrent date: {review_date}." + + if release: + prompt += f"\nTarget DPDK release: {release}." + if is_lts_release(release): + prompt += LTS_RULES + else: + prompt += "\nThis is a main branch or standard release." + prompt += "\nNew features and experimental APIs are allowed." + + return prompt + + +def build_anthropic_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": system_prompt}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + patch_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": system_prompt}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + patch_content, + }, + ], + } + + +def build_google_request( + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": system_prompt}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + patch_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: {usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: {usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def get_last_message_id(patch_content): + """Extract Message-ID from the last patch in an mbox.""" + msg_ids = re.findall( + r"^Message-I[Dd]:\s*(.+)$", patch_content, re.MULTILINE | re.IGNORECASE + ) + if msg_ids: + msg_id = msg_ids[-1].strip() + # Normalize: remove < > and add them back + msg_id = msg_id.strip("<>") + return f"<{msg_id}>" + return None + + +def get_last_subject(patch_content): + """Extract subject from the last patch in an mbox.""" + # Find all Subject lines with potential continuations + subjects = [] + lines = patch_content.split("\n") + i = 0 + while i < len(lines): + if lines[i].lower().startswith("subject:"): + subject = lines[i][8:].strip() + i += 1 + # Handle continuation lines + while i < len(lines) and lines[i].startswith((" ", "\t")): + subject += lines[i].strip() + i += 1 + subjects.append(subject) + else: + i += 1 + return subjects[-1] if subjects else None + + +def send_email( + to_addrs, cc_addrs, from_addr, subject, in_reply_to, body, dry_run=False +): + """Send review email using git send-email, sendmail, or msmtp.""" + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + email_text = msg.as_string() + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(email_text, file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Write to temp file for git send-email + with tempfile.NamedTemporaryFile(mode="w", suffix=".eml", delete=False) as f: + f.write(email_text) + temp_file = f.name + + try: + # Try git send-email first + if get_git_config("sendemail.smtpserver"): + # Build command with all arguments + flat_cmd = ["git", "send-email", "--confirm=never", "--quiet"] + for addr in to_addrs: + flat_cmd.extend(["--to", addr]) + for addr in cc_addrs: + flat_cmd.extend(["--cc", addr]) + if from_addr: + flat_cmd.extend(["--from", from_addr]) + if in_reply_to: + flat_cmd.extend(["--in-reply-to", in_reply_to]) + flat_cmd.append(temp_file) + + try: + subprocess.run(flat_cmd, check=True, capture_output=True) + print("Email sent via git send-email", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + # Try sendmail + try: + subprocess.run( + ["sendmail", "-t"], + input=email_text, + text=True, + capture_output=True, + check=True, + ) + print("Email sent via sendmail", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + # Try msmtp + try: + subprocess.run( + ["msmtp", "-t"], + input=email_text, + text=True, + capture_output=True, + check=True, + ) + print("Email sent via msmtp", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + error("Could not send email. Configure git send-email, sendmail, or msmtp.") + + finally: + os.unlink(temp_file) + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Analyze DPDK patches using AI providers", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s patch.patch # Review with default settings + %(prog)s -p openai my-patch.patch # Use OpenAI ChatGPT + %(prog)s -f markdown patch.patch # Output as Markdown + %(prog)s -f json -o review.json patch.patch # Save JSON to file + %(prog)s -f html -o review.html patch.patch # Save HTML to file + %(prog)s -r 24.11 patch.patch # Review for specific release + %(prog)s -r 24.11-lts patch.patch # Review for LTS branch + %(prog)s --send-email --to dev@dpdk.org series.mbox + %(prog)s --send-email --to dev@dpdk.org --dry-run series.mbox + +Large File Handling: + %(prog)s --split-patches series.mbox # Review each patch separately + %(prog)s --split-patches --patch-range 1-5 series.mbox # Review patches 1-5 + %(prog)s --large-file=truncate patch.mbox # Truncate to fit limit + %(prog)s --large-file=commits-only series.mbox # Review commit messages only + %(prog)s --large-file=summary series.mbox # High-level summary only + %(prog)s --large-file=chunk series.mbox # Split and review in chunks + +Large File Modes: + error - Fail with error (default) + truncate - Truncate content to fit token limit + chunk - Split into chunks and review each + commits-only - Extract and review only commit messages + summary - Request high-level summary review + +LTS Releases: + Use -r/--release with LTS version (e.g., 24.11-lts, 23.11) to enable + stricter review rules: bug fixes only, no new features or APIs. + Any DPDK release with minor version .11 is an LTS release. + """, + ) + + parser.add_argument("patch_file", nargs="?", help="Patch file to analyze") + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=4096, + help="Max tokens for response (default: 4096)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-o", + "--output", + metavar="FILE", + help="Write output to file instead of stdout", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Date and release options + parser.add_argument( + "-D", + "--date", + metavar="YYYY-MM-DD", + help="Review date context (default: today)", + ) + parser.add_argument( + "-r", + "--release", + metavar="VERSION", + help="Target DPDK release (e.g., 24.11, 23.11-lts)", + ) + + # Large file handling options + large_group = parser.add_argument_group("Large File Handling") + large_group.add_argument( + "--large-file", + choices=LARGE_FILE_MODES, + default="error", + metavar="MODE", + help="How to handle large files: error (default), truncate, " + "chunk, commits-only, summary", + ) + large_group.add_argument( + "--max-tokens", + type=int, + metavar="N", + help="Max input tokens (default: provider-specific)", + ) + large_group.add_argument( + "--split-patches", + action="store_true", + help="Split mbox into individual patches and review each separately", + ) + large_group.add_argument( + "--patch-range", + metavar="N-M", + help="Review only patches N through M (1-indexed, use with --split-patches)", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Check patch file is provided + if not args.patch_file: + parser.error("patch_file is required") + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + patch_path = Path(args.patch_file) + if not patch_path.exists(): + error(f"Patch file not found: {args.patch_file}") + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Determine review date + review_date = args.date or date.today().isoformat() + + # Build system prompt with date and release context + system_prompt = build_system_prompt(review_date, args.release) + + # Read files + agents_content = agents_path.read_text() + patch_content = patch_path.read_text() + patch_name = patch_path.name + + # Determine max tokens for this provider + max_input_tokens = args.max_tokens or PROVIDER_INPUT_LIMITS.get( + args.provider, 100000 + ) + + # Estimate token count + estimated_tokens = estimate_tokens(patch_content + agents_content) + + # Parse patch range if specified + patch_start, patch_end = None, None + if args.patch_range: + try: + if "-" in args.patch_range: + start, end = args.patch_range.split("-", 1) + patch_start = int(start) + patch_end = int(end) + else: + patch_start = patch_end = int(args.patch_range) + except ValueError: + error(f"Invalid --patch-range format: {args.patch_range}") + + # Handle --split-patches mode + if args.split_patches: + patches = split_mbox_patches(patch_content) + total_patches = len(patches) + + if total_patches == 1: + print( + f"Note: Only 1 patch found in mbox, --split-patches has no effect", + file=sys.stderr, + ) + else: + print( + f"Found {total_patches} patches in mbox", + file=sys.stderr, + ) + + # Apply patch range filter + if patch_start is not None: + if patch_start < 1 or patch_start > total_patches: + error( + f"Patch range start {patch_start} out of range (1-{total_patches})" + ) + if patch_end < patch_start or patch_end > total_patches: + error( + f"Patch range end {patch_end} out of range ({patch_start}-{total_patches})" + ) + patches = patches[patch_start - 1 : patch_end] + print( + f"Reviewing patches {patch_start}-{patch_end} ({len(patches)} patches)", + file=sys.stderr, + ) + + # Review each patch separately + all_reviews = [] + for i, patch in enumerate(patches, patch_start or 1): + patch_label = f"Patch {i}/{total_patches}" + print(f"\nReviewing {patch_label}...", file=sys.stderr) + + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + patch, + f"{patch_name} ({patch_label})", + args.output_format, + args.verbose, + ) + all_reviews.append((patch_label, review_text)) + + # Combine reviews + review_text = format_combined_reviews( + all_reviews, args.output_format, patch_name + ) + + # Skip the normal API call + estimated_tokens = 0 # Bypass size check since we've already processed + + # Check if content is too large + is_large = estimated_tokens > max_input_tokens + + if is_large: + print( + f"Warning: Estimated {estimated_tokens:,} tokens exceeds limit of " + f"{max_input_tokens:,}", + file=sys.stderr, + ) + + if args.large_file == "error": + error( + f"Patch file too large ({estimated_tokens:,} tokens). " + f"Use --large-file=truncate|chunk|commits-only|summary to handle, " + f"or --split-patches to review patches individually." + ) + elif args.large_file == "truncate": + print("Truncating content to fit token limit...", file=sys.stderr) + patch_content, was_truncated = truncate_content( + patch_content, max_input_tokens, args.provider + ) + if was_truncated: + print("Content was truncated.", file=sys.stderr) + elif args.large_file == "commits-only": + print("Extracting commit messages only...", file=sys.stderr) + patch_content = extract_commit_messages(patch_content) + new_estimate = estimate_tokens(patch_content + agents_content) + print( + f"Reduced to ~{new_estimate:,} tokens (commit messages only)", + file=sys.stderr, + ) + if new_estimate > max_input_tokens: + patch_content, _ = truncate_content( + patch_content, max_input_tokens, args.provider + ) + elif args.large_file == "summary": + print("Using summary mode for large patch...", file=sys.stderr) + system_prompt += get_summary_prompt() + patch_content, _ = truncate_content( + patch_content, max_input_tokens, args.provider + ) + elif args.large_file == "chunk": + print("Processing in chunks...", file=sys.stderr) + all_reviews = [] + for chunk, chunk_num, total_chunks in chunk_content( + patch_content, max_input_tokens, args.provider + ): + chunk_label = f"Chunk {chunk_num}/{total_chunks}" + print(f"Reviewing {chunk_label}...", file=sys.stderr) + + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + chunk, + f"{patch_name} ({chunk_label})", + args.output_format, + args.verbose, + ) + all_reviews.append((chunk_label, review_text)) + + # Combine chunk reviews + review_text = format_combined_reviews( + all_reviews, args.output_format, patch_name + ) + + # Skip the normal single API call below + estimated_tokens = 0 + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Review date: {review_date}", file=sys.stderr) + if args.release: + lts_status = " (LTS)" if is_lts_release(args.release) else "" + print(f"Target release: {args.release}{lts_status}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Patch file: {args.patch_file}", file=sys.stderr) + print(f"Estimated tokens: {estimated_tokens:,}", file=sys.stderr) + print(f"Max input tokens: {max_input_tokens:,}", file=sys.stderr) + if args.large_file != "error": + print(f"Large file mode: {args.large_file}", file=sys.stderr) + if args.split_patches: + print("Split patches: yes", file=sys.stderr) + if args.output: + print(f"Output file: {args.output}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API (unless already processed via chunks/split) + if estimated_tokens > 0: # Not already processed + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + args.output_format, + args.verbose, + ) + + if not review_text: + error(f"No response received from {args.provider}") + + # Format output based on requested format + provider_name = config["name"] + + if args.output_format == "json": + # For JSON, try to parse and add metadata + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + # If AI didn't return valid JSON, wrap the text + review_data = {"raw_review": review_text} + + output_data = { + "metadata": { + "patch_file": patch_name, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "review_date": review_date, + "target_release": args.release, + "is_lts": is_lts_release(args.release) if args.release else False, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "html": + # Wrap HTML content with header + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"<br>Target release: {args.release}{lts_badge}" + output_text = f"""<!-- AI-generated review of {patch_name} --> +<!-- Reviewed using {provider_name} ({model}) on {review_date} --> +<div class="patch-review"> +<h1>Patch Review: {patch_name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model}) on {review_date}{release_info}</p> +{review_text} +</div> +""" + elif args.output_format == "markdown": + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"\n*Target release: {args.release}{lts_badge}*\n" + output_text = f"""# Patch Review: {patch_name} + +*Reviewed by {provider_name} ({model}) on {review_date}* +{release_info} +{review_text} +""" + else: # text + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"Target release: {args.release}{lts_badge}\n" + output_text = f"=== Patch Review: {patch_name} (via {provider_name}) ===\n" + output_text += f"Review date: {review_date}\n" + output_text += release_info + output_text += "\n" + review_text + + # Write output + if args.output: + Path(args.output).write_text(output_text) + print(f"Review written to: {args.output}", file=sys.stderr) + else: + print(output_text) + + # Send email if requested + if args.send_email: + # Email always uses plain text - warn if different format requested + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + in_reply_to = get_last_message_id(patch_content) + orig_subject = get_last_subject(patch_content) + + if orig_subject: + # Remove [PATCH n/m] prefix + review_subject = re.sub(r"^\[PATCH[^\]]*\]\s*", "", orig_subject) + review_subject = f"[REVIEW] {review_subject}" + else: + review_subject = f"[REVIEW] {patch_name}" + + # Build email body - always use plain text version + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"Target release: {args.release}{lts_badge}\n" + + email_body = f"""AI-generated review of {patch_name} +Reviewed using {provider_name} ({model}) on {review_date} +{release_info} +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print(f"In-Reply-To: {in_reply_to}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + in_reply_to, + email_body, + args.dry_run, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v8 3/6] devtools: add compare-reviews.sh for multi-provider analysis 2026-02-05 17:51 ` [PATCH v8 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 2/6] devtools: add multi-provider AI patch review script Stephen Hemminger @ 2026-02-05 17:51 ` Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 4/6] devtools: add multi-provider AI documentation review script Stephen Hemminger ` (2 subsequent siblings) 5 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-05 17:51 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Aaron Conole Add script to run patch reviews across multiple AI providers for comparison purposes. The script automatically detects which providers have API keys configured and runs analyze-patch.py for each one. This allows users to compare review quality and feedback across different AI models. Features: - Auto-detects available providers based on environment variables - Optional provider selection via -p/--providers option - Saves individual reviews to separate files with -o/--output - Verbose mode passes through to underlying analyze-patch.py Usage: ./devtools/compare-reviews.sh my-patch.patch ./devtools/compare-reviews.sh -p anthropic,xai my-patch.patch ./devtools/compare-reviews.sh -o ./reviews my-patch.patch Output files are named <patch>-<provider>.txt when using the output directory option. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/compare-reviews.sh | 192 ++++++++++++++++++++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100755 devtools/compare-reviews.sh diff --git a/devtools/compare-reviews.sh b/devtools/compare-reviews.sh new file mode 100755 index 0000000000..a63eeffb71 --- /dev/null +++ b/devtools/compare-reviews.sh @@ -0,0 +1,192 @@ +#!/bin/bash +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +# Compare DPDK patch reviews across multiple AI providers +# Runs analyze-patch.py with each available provider + +set -e + +SCRIPT_DIR="$(dirname "$(readlink -f "$0")")" +ANALYZE_SCRIPT="${SCRIPT_DIR}/analyze-patch.py" +AGENTS_FILE="AGENTS.md" +OUTPUT_DIR="" +PROVIDERS="" +FORMAT="text" + +usage() { + cat <<EOF +Usage: $(basename "$0") [OPTIONS] <patch-file> + +Compare DPDK patch reviews across multiple AI providers. + +Options: + -a, --agents FILE Path to AGENTS.md file (default: AGENTS.md) + -o, --output DIR Save individual reviews to directory + -p, --providers LIST Comma-separated list of providers to use + (default: all providers with API keys set) + -f, --format FORMAT Output format: text, markdown, html, json + (default: text) + -v, --verbose Show verbose output from each provider + -h, --help Show this help message + +Environment Variables: + Set API keys for providers you want to use: + ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY + +Examples: + $(basename "$0") my-patch.patch + $(basename "$0") -p anthropic,openai my-patch.patch + $(basename "$0") -o ./reviews -f markdown my-patch.patch +EOF + exit "${1:-0}" +} + +error() { + echo "Error: $1" >&2 + exit 1 +} + +# Check which providers have API keys configured +get_available_providers() { + local available="" + + [[ -n "$ANTHROPIC_API_KEY" ]] && available="${available}anthropic," + [[ -n "$OPENAI_API_KEY" ]] && available="${available}openai," + [[ -n "$XAI_API_KEY" ]] && available="${available}xai," + [[ -n "$GOOGLE_API_KEY" ]] && available="${available}google," + + # Remove trailing comma + echo "${available%,}" +} + +# Get file extension for format +get_extension() { + case "$1" in + text) echo "txt" ;; + markdown) echo "md" ;; + html) echo "html" ;; + json) echo "json" ;; + *) echo "txt" ;; + esac +} + +# Parse command line options +VERBOSE="" + +while [[ $# -gt 0 ]]; do + case "$1" in + -a|--agents) + AGENTS_FILE="$2" + shift 2 + ;; + -o|--output) + OUTPUT_DIR="$2" + shift 2 + ;; + -p|--providers) + PROVIDERS="$2" + shift 2 + ;; + -f|--format) + FORMAT="$2" + shift 2 + ;; + -v|--verbose) + VERBOSE="-v" + shift + ;; + -h|--help) + usage 0 + ;; + -*) + error "Unknown option: $1" + ;; + *) + break + ;; + esac +done + +# Check for required arguments +if [[ $# -lt 1 ]]; then + echo "Error: No patch file specified" >&2 + usage 1 +fi + +PATCH_FILE="$1" + +if [[ ! -f "$PATCH_FILE" ]]; then + error "Patch file not found: $PATCH_FILE" +fi + +if [[ ! -f "$ANALYZE_SCRIPT" ]]; then + error "analyze-patch.py not found: $ANALYZE_SCRIPT" +fi + +# Validate format +case "$FORMAT" in + text|markdown|html|json) ;; + *) error "Invalid format: $FORMAT (must be text, markdown, html, or json)" ;; +esac + +# Get providers to use +if [[ -z "$PROVIDERS" ]]; then + PROVIDERS=$(get_available_providers) +fi + +if [[ -z "$PROVIDERS" ]]; then + error "No API keys configured. Set at least one of: "\ +"ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY" +fi + +# Create output directory if specified +if [[ -n "$OUTPUT_DIR" ]]; then + mkdir -p "$OUTPUT_DIR" +fi + +PATCH_BASENAME=$(basename "$PATCH_FILE") +PATCH_STEM="${PATCH_BASENAME%.*}" +EXT=$(get_extension "$FORMAT") + +echo "Reviewing patch: $PATCH_BASENAME" +echo "Providers: $PROVIDERS" +echo "Format: $FORMAT" +echo "========================================" +echo "" + +# Run review for each provider +IFS=',' read -ra PROVIDER_LIST <<< "$PROVIDERS" +for provider in "${PROVIDER_LIST[@]}"; do + echo ">>> Running review with: $provider" + echo "" + + if [[ -n "$OUTPUT_DIR" ]]; then + OUTPUT_FILE="${OUTPUT_DIR}/${PATCH_STEM}-${provider}.${EXT}" + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" | tee "$OUTPUT_FILE" + echo "" + echo "Saved to: $OUTPUT_FILE" + else + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" + fi + + echo "" + echo "========================================" + echo "" +done + +echo "Review comparison complete." + +if [[ -n "$OUTPUT_DIR" ]]; then + echo "All reviews saved to: $OUTPUT_DIR" +fi -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v8 4/6] devtools: add multi-provider AI documentation review script 2026-02-05 17:51 ` [PATCH v8 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger ` (2 preceding siblings ...) 2026-02-05 17:51 ` [PATCH v8 3/6] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger @ 2026-02-05 17:51 ` Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 5/6] doc: add AI-assisted patch review to contributing guide Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 6/6] MAINTAINERS: add section for AI review tools Stephen Hemminger 5 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-05 17:51 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Aaron Conole Add review-doc.py script that reviews DPDK documentation files for spelling, grammar, technical correctness, and clarity using AI language models. Supports batch processing of multiple files. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini Output formats (-f/--format): - text: plain text with extractable diff/msg markers (default) - markdown: formatted review document - html: complete HTML document with styling - json: structured data with metadata For each input file, the script produces: - <basename>.{txt,md,html,json}: review in selected format - <basename>.diff: unified diff (text/json, or with -d flag) - <basename>.msg: commit message (text/json, or with -d flag) The commit message prefix is automatically determined from the file path (e.g., doc/guides/prog_guide: for programmer's guide). Features: - Multiple file processing with glob support - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Configurable output directory via -o/--output-dir option - Output format selection via -f/--format option - Force diff/msg generation via -d/--diff option - Quiet mode (-q) suppresses stdout output - Verbose mode (-v) shows token usage and API details - Email integration using git sendemail configuration - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/review-doc.py doc/guides/prog_guide/mempool_lib.rst ./devtools/review-doc.py doc/guides/nics/*.rst ./devtools/review-doc.py -f html -d -o /tmp doc/guides/nics/*.rst ./devtools/review-doc.py --send-email --to dev@dpdk.org file.rst Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/review-doc.py | 1098 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1098 insertions(+) create mode 100755 devtools/review-doc.py diff --git a/devtools/review-doc.py b/devtools/review-doc.py new file mode 100755 index 0000000000..1366aa0f85 --- /dev/null +++ b/devtools/review-doc.py @@ -0,0 +1,1098 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Review DPDK documentation files using AI providers. + +Produces a diff file and commit message compliant with DPDK standards. +Accepts multiple documentation files and generates output for each. +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import getpass +import json +import os +import re +import smtplib +import ssl +import subprocess +import sys +from email.message import EmailMessage +from pathlib import Path +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Map output format to file extension +FORMAT_EXTENSIONS = { + "text": ".txt", + "markdown": ".md", + "html": ".html", + "json": ".json", +} + +# Additional markers for extracting diff/msg (used with --diff flag) +DIFF_MARKERS_INSTRUCTION = """ + +ADDITIONALLY, at the end of your response, include these exact markers for automated extraction: +---COMMIT_MESSAGE_START--- +(same commit message as above) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(same unified diff as above) +---UNIFIED_DIFF_END--- +""" + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# Commit prefix mappings based on file path +COMMIT_PREFIX_MAP = [ + ("doc/guides/prog_guide/", "doc/guides/prog_guide:"), + ("doc/guides/sample_app_ug/", "doc/guides/sample_app:"), + ("doc/guides/nics/", "doc/guides/nics:"), + ("doc/guides/cryptodevs/", "doc/guides/cryptodevs:"), + ("doc/guides/compressdevs/", "doc/guides/compressdevs:"), + ("doc/guides/eventdevs/", "doc/guides/eventdevs:"), + ("doc/guides/rawdevs/", "doc/guides/rawdevs:"), + ("doc/guides/bbdevs/", "doc/guides/bbdevs:"), + ("doc/guides/gpus/", "doc/guides/gpus:"), + ("doc/guides/dmadevs/", "doc/guides/dmadevs:"), + ("doc/guides/regexdevs/", "doc/guides/regexdevs:"), + ("doc/guides/mldevs/", "doc/guides/mldevs:"), + ("doc/guides/rel_notes/", "doc/guides/rel_notes:"), + ("doc/guides/linux_gsg/", "doc/guides/linux_gsg:"), + ("doc/guides/freebsd_gsg/", "doc/guides/freebsd_gsg:"), + ("doc/guides/windows_gsg/", "doc/guides/windows_gsg:"), + ("doc/guides/tools/", "doc/guides/tools:"), + ("doc/guides/testpmd_app_ug/", "doc/guides/testpmd:"), + ("doc/guides/howto/", "doc/guides/howto:"), + ("doc/guides/contributing/", "doc/guides/contributing:"), + ("doc/guides/platform/", "doc/guides/platform:"), + ("doc/guides/", "doc:"), + ("doc/api/", "doc/api:"), + ("doc/", "doc:"), +] + +SYSTEM_PROMPT = """\ +You are an expert technical documentation reviewer for DPDK. +Your task is to review documentation files and suggest improvements for: +- Spelling errors +- Grammar issues +- Technical correctness +- Clarity and readability +- Consistency with DPDK terminology + +IMPORTANT COMMIT MESSAGE RULES (from check-git-log.sh): +- Subject line MUST be ≤60 characters +- Format: "prefix: lowercase description" +- First word after colon must be lowercase (except acronyms like Rx, Tx, VF, MAC, API) +- Use imperative mood (e.g., "fix typo" not "fixed typo" or "fixes typo") +- NO trailing period on subject line +- NO punctuation marks: , ; ! ? & | +- NO underscores in subject after colon +- Body lines wrapped at 75 characters +- Body must NOT start with "It" +- Do NOT include Signed-off-by (user adds via git commit --sign) +- Only use "Fixes:" tag for actual errors in documentation, not style improvements + +Case-sensitive terms (must use exact case): +- Rx, Tx (not RX, TX, rx, tx) +- VF, PF (not vf, pf) +- MAC, VLAN, RSS, API +- Linux, Windows, FreeBSD + +For style/clarity improvements, do NOT use Fixes tag. +For actual errors (wrong information, broken examples), include Fixes tag \ +if you can identify the commit.""" + +FORMAT_INSTRUCTIONS = { + "text": """ +OUTPUT FORMAT: +You must output exactly two sections: + +1. COMMIT_MESSAGE section containing the complete commit message +2. UNIFIED_DIFF section containing the unified diff + +Use these exact markers: +---COMMIT_MESSAGE_START--- +(commit message here) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(unified diff here) +---UNIFIED_DIFF_END--- + +The diff should be in unified format that can be applied with "git apply". +If no changes are needed, output empty sections with a note.""", + "markdown": """ +OUTPUT FORMAT: +Provide your review in Markdown format with: + +## Summary +Brief description of changes + +## Commit Message +``` +(complete commit message here, ready to use) +``` + +## Changes +For each change: +### Issue N: Brief title +- **Location**: file path and line +- **Problem**: description +- **Fix**: suggested correction + +## Unified Diff +```diff +(unified diff here) +```""", + "html": """ +OUTPUT FORMAT: +Provide your review in HTML format with: +- <h2> for sections (Summary, Commit Message, Changes, Diff) +- <pre><code> for commit message and diff +- <ul>/<li> for individual issues +- Do NOT include <html>, <head>, or <body> tags - just the content + +Include sections for: Summary, Commit Message, Changes, Unified Diff""", + "json": """ +OUTPUT FORMAT: +Provide your review as JSON with this structure: +{ + "summary": "Brief description of changes", + "commit_message": "Complete commit message ready to use", + "changes": [ + { + "type": "spelling|grammar|technical|clarity|style", + "location": "line number or section", + "original": "original text", + "suggested": "corrected text", + "reason": "why this change" + } + ], + "diff": "unified diff as a string", + "stats": { + "total_issues": 0, + "spelling": 0, + "grammar": 0, + "technical": 0, + "clarity": 0 + } +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """\ +Review the following DPDK documentation file and provide improvements. + +File path: {doc_file} +Commit message prefix to use: {commit_prefix} + +{format_instruction} + +---DOCUMENT CONTENT--- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def get_smtp_config(): + """Get SMTP configuration from git config sendemail settings.""" + config = { + "server": get_git_config("sendemail.smtpserver"), + "port": get_git_config("sendemail.smtpserverport"), + "user": get_git_config("sendemail.smtpuser"), + "encryption": get_git_config("sendemail.smtpencryption"), + "password": get_git_config("sendemail.smtppass"), + } + + # Set defaults + if not config["port"]: + if config["encryption"] == "ssl": + config["port"] = "465" + else: + config["port"] = "587" + + # Convert port to int + if config["port"]: + config["port"] = int(config["port"]) + + return config + + +def get_commit_prefix(filepath): + """Determine commit message prefix from file path.""" + for prefix_path, prefix in COMMIT_PREFIX_MAP: + if filepath.startswith(prefix_path): + return prefix + return "doc:" + + +def build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": SYSTEM_PROMPT}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + doc_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + doc_content, + }, + ], + } + + +def build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": SYSTEM_PROMPT}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + doc_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: " f"{usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: " f"{usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def parse_review_text(review_text): + """Extract commit message and diff from text format response.""" + commit_msg = "" + diff = "" + + # Extract commit message + msg_match = re.search( + r"---COMMIT_MESSAGE_START---\s*\n(.*?)\n---COMMIT_MESSAGE_END---", + review_text, + re.DOTALL, + ) + if msg_match: + commit_msg = msg_match.group(1).strip() + + # Extract unified diff + diff_match = re.search( + r"---UNIFIED_DIFF_START---\s*\n(.*?)\n---UNIFIED_DIFF_END---", + review_text, + re.DOTALL, + ) + if diff_match: + diff = diff_match.group(1).strip() + # Clean up any markdown code fence if present + diff = re.sub(r"^```diff\s*\n?", "", diff) + diff = re.sub(r"\n?```\s*$", "", diff) + + return commit_msg, diff + + +def strip_diff_markers(text): + """Remove the diff/msg extraction markers from text.""" + # Remove commit message markers and content + text = re.sub( + r"\n*---COMMIT_MESSAGE_START---\s*\n.*?\n---COMMIT_MESSAGE_END---\s*", + "", + text, + flags=re.DOTALL, + ) + # Remove unified diff markers and content + text = re.sub( + r"\n*---UNIFIED_DIFF_START---\s*\n.*?\n---UNIFIED_DIFF_END---\s*", + "", + text, + flags=re.DOTALL, + ) + return text.strip() + + +def send_email( + to_addrs, + cc_addrs, + from_addr, + subject, + in_reply_to, + body, + dry_run=False, + verbose=False, +): + """Send review email via SMTP using git sendemail config.""" + # Build email message + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(msg.as_string(), file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Get SMTP configuration from git config + smtp_config = get_smtp_config() + + if not smtp_config["server"]: + error("No SMTP server configured. Set git config sendemail.smtpserver") + + server = smtp_config["server"] + port = smtp_config["port"] + user = smtp_config["user"] + encryption = smtp_config["encryption"] + + # Get password from environment or git config, or prompt + password = os.environ.get("SMTP_PASSWORD") or smtp_config["password"] + if user and not password: + password = getpass.getpass(f"SMTP password for {user}@{server}: ") + + if verbose: + print(f"SMTP server: {server}:{port}", file=sys.stderr) + print(f"SMTP user: {user or '(none)'}", file=sys.stderr) + print(f"Encryption: {encryption or 'starttls'}", file=sys.stderr) + + # Collect all recipients + all_recipients = list(to_addrs) + if cc_addrs: + all_recipients.extend(cc_addrs) + + try: + if encryption == "ssl": + # SSL/TLS connection from the start (port 465) + context = ssl.create_default_context() + with smtplib.SMTP_SSL(server, port, context=context) as smtp: + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + else: + # STARTTLS (port 587) or plain (port 25) + with smtplib.SMTP(server, port) as smtp: + smtp.ehlo() + if encryption == "tls" or port == 587: + context = ssl.create_default_context() + smtp.starttls(context=context) + smtp.ehlo() + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + + print(f"Email sent via SMTP ({server}:{port})", file=sys.stderr) + return True + + except smtplib.SMTPAuthenticationError as e: + error(f"SMTP authentication failed: {e}") + except smtplib.SMTPException as e: + error(f"SMTP error: {e}") + except OSError as e: + error(f"Connection error to {server}:{port}: {e}") + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Review DPDK documentation files using AI providers. " + "Accepts multiple files and generates output for each.", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s doc/guides/prog_guide/mempool_lib.rst + %(prog)s doc/guides/nics/*.rst # Review all NIC docs + %(prog)s -p openai -o /tmp doc/guides/nics/ixgbe.rst doc/guides/nics/i40e.rst + %(prog)s -f html -d -o /tmp/reviews doc/guides/nics/*.rst # HTML + diff files + %(prog)s -f json -o /tmp doc/guides/howto/flow_bifurcation.rst + %(prog)s --send-email --to dev@dpdk.org doc/guides/nics/ixgbe.rst + +Output files (in output-dir): + <basename>.txt|.md|.html|.json Review in selected format + <basename>.diff Unified diff (text/json, or with --diff) + <basename>.msg Commit message (text/json, or with --diff) + +After review: + git apply <basename>.diff + git commit -sF <basename>.msg + +SMTP Configuration (from git config): + sendemail.smtpserver SMTP server hostname + sendemail.smtpserverport SMTP port (default: 587 for TLS, 465 for SSL) + sendemail.smtpuser SMTP username + sendemail.smtpencryption 'tls' for STARTTLS, 'ssl' for SSL/TLS + sendemail.smtppass SMTP password (or set SMTP_PASSWORD env var) + +Example git config: + git config --global sendemail.smtpserver smtp.gmail.com + git config --global sendemail.smtpserverport 587 + git config --global sendemail.smtpuser yourname@gmail.com + git config --global sendemail.smtpencryption tls + """, + ) + + parser.add_argument( + "doc_files", + nargs="+", + metavar="doc_file", + help="Documentation file(s) to review", + ) + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=8192, + help="Max tokens for response (default: 8192)", + ) + parser.add_argument( + "-o", + "--output-dir", + default=".", + help="Output directory for all output files (default: .)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-q", + "--quiet", + action="store_true", + help="Suppress review output to stdout (only write files)", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-d", + "--diff", + action="store_true", + help="Always produce .diff and .msg files (automatic for text/json)", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + # Validate all doc files exist before processing + doc_paths = [] + for doc_file in args.doc_files: + doc_path = Path(doc_file) + if not doc_path.exists(): + error(f"Documentation file not found: {doc_file}") + doc_paths.append((doc_file, doc_path)) + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Read AGENTS.md once + agents_content = agents_path.read_text() + output_dir = Path(args.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + provider_name = config["name"] + + # Process each file + num_files = len(doc_paths) + for file_idx, (doc_file, doc_path) in enumerate(doc_paths, 1): + if num_files > 1: + print( + f"\n{'=' * 60}", + file=sys.stderr, + ) + print( + f"Processing file {file_idx}/{num_files}: {doc_file}", + file=sys.stderr, + ) + print( + f"{'=' * 60}", + file=sys.stderr, + ) + + # Determine output filenames + doc_basename = doc_path.stem + diff_file = output_dir / f"{doc_basename}.diff" + msg_file = output_dir / f"{doc_basename}.msg" + + # Get commit prefix + commit_prefix = get_commit_prefix(doc_file) + + # Read doc content + doc_content = doc_path.read_text() + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Doc file: {doc_file}", file=sys.stderr) + print(f"Commit prefix: {commit_prefix}", file=sys.stderr) + print(f"Output dir: {args.output_dir}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + args.output_format, + args.diff, + args.verbose, + ) + + if not review_text: + print( + f"Warning: No response received for {doc_file}", + file=sys.stderr, + ) + continue + + # Determine review output file + format_ext = FORMAT_EXTENSIONS[args.output_format] + review_file = output_dir / f"{doc_basename}{format_ext}" + + # Determine if we should write diff/msg files + write_diff_msg = args.diff or args.output_format in ("text", "json") + + # Extract commit message and diff first (before stripping markers) + commit_msg, diff = "", "" + if write_diff_msg: + if args.output_format == "json": + # Will extract from JSON below + pass + else: + # Parse from text format markers + commit_msg, diff = parse_review_text(review_text) + + # For non-text formats with --diff, strip the markers from display output + display_text = review_text + if args.diff and args.output_format in ("markdown", "html"): + display_text = strip_diff_markers(review_text) + + # Build formatted output text + if args.output_format == "text": + output_text = review_text + elif args.output_format == "json": + # Try to parse JSON response + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + print("Warning: Response is not valid JSON", file=sys.stderr) + review_data = {"raw_response": review_text} + + # Extract diff/msg from JSON if present + if write_diff_msg: + if isinstance(review_data, dict) and "raw_response" not in review_data: + commit_msg = review_data.get("commit_message", "") + diff = review_data.get("diff", "") + + # Add metadata + output_data = { + "metadata": { + "doc_file": doc_file, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "commit_prefix": commit_prefix, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "markdown": + output_text = f"""# Documentation Review: {doc_path.name} + +*Reviewed by {provider_name} ({model})* + +{display_text} +""" + elif args.output_format == "html": + output_text = f"""<!DOCTYPE html> +<html> +<head> +<meta charset="utf-8"> +<title>Review: {doc_path.name}</title> +<style> +body {{ font-family: system-ui, sans-serif; max-width: 900px; margin: 2em auto; padding: 0 1em; }} +h1 {{ color: #333; }} +.review-meta {{ color: #666; font-style: italic; }} +pre {{ background: #f5f5f5; padding: 1em; overflow-x: auto; }} +</style> +</head> +<body> +<h1>Documentation Review: {doc_path.name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model})</p> +<div class="review-content"> +{display_text} +</div> +</body> +</html> +""" + + # Write formatted review to file + review_file.write_text(output_text) + print(f"Review written to: {review_file}", file=sys.stderr) + + # Write diff/msg files + if write_diff_msg: + if commit_msg: + msg_file.write_text(commit_msg + "\n") + print(f"Commit message written to: {msg_file}", file=sys.stderr) + else: + msg_file.write_text("# No commit message generated\n") + print("Warning: Could not extract commit message", file=sys.stderr) + + if diff: + diff_file.write_text(diff + "\n") + print(f"Diff written to: {diff_file}", file=sys.stderr) + else: + diff_file.write_text("# No changes suggested\n") + print("Warning: Could not extract diff", file=sys.stderr) + + # Print to stdout unless quiet (or multiple files without verbose) + show_stdout = not args.quiet and (num_files == 1 or args.verbose) + if show_stdout: + print( + f"\n=== Documentation Review: {doc_path.name} " + f"(via {provider_name}) ===" + ) + print(output_text) + + # Print usage instructions for text format + if args.output_format == "text": + print("\n=== Output Files ===") + print(f"Commit message: {msg_file}") + print(f"Diff file: {diff_file}") + print("\nTo apply changes:") + print(f" git apply {diff_file}") + print(f" git commit -sF {msg_file}") + + # Send email if requested + if args.send_email: + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + review_subject = f"[REVIEW] {commit_prefix} {doc_path.name}" + + # Build email body + email_body = f"""AI-generated documentation review of {doc_file} +Reviewed using {provider_name} ({model}) + +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + None, + email_body, + args.dry_run, + args.verbose, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + # Print summary for multiple files + if num_files > 1: + print(f"\n{'=' * 60}", file=sys.stderr) + print(f"Processed {num_files} files", file=sys.stderr) + print(f"Output directory: {output_dir}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v8 5/6] doc: add AI-assisted patch review to contributing guide 2026-02-05 17:51 ` [PATCH v8 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger ` (3 preceding siblings ...) 2026-02-05 17:51 ` [PATCH v8 4/6] devtools: add multi-provider AI documentation review script Stephen Hemminger @ 2026-02-05 17:51 ` Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 6/6] MAINTAINERS: add section for AI review tools Stephen Hemminger 5 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-05 17:51 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a new section to the contributing guide describing the analyze-patch.py script which uses AI providers to review patches against DPDK coding standards before submission to the mailing list. The new section covers basic usage, provider selection, patch series handling, LTS release review, and output format options. A note clarifies that AI review supplements but does not replace human review. Also add a reference to the script in the new driver guide's test tools checklist. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- doc/guides/contributing/new_driver.rst | 2 + doc/guides/contributing/patches.rst | 56 ++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/doc/guides/contributing/new_driver.rst b/doc/guides/contributing/new_driver.rst index 555e875329..6c0d356cfd 100644 --- a/doc/guides/contributing/new_driver.rst +++ b/doc/guides/contributing/new_driver.rst @@ -210,3 +210,5 @@ Be sure to run the following test tools per patch in a patch series: * `check-doc-vs-code.sh` * `check-spdx-tag.sh` * Build documentation and validate how output looks +* Optionally run ``analyze-patch.py`` for AI-assisted review + (see :ref:`ai_assisted_review` in the Contributing Guide) diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst index 5f554d47e6..74fc714d16 100644 --- a/doc/guides/contributing/patches.rst +++ b/doc/guides/contributing/patches.rst @@ -503,6 +503,62 @@ Additionally, when contributing to the DTS tool, patches should also be checked the ``dts-check-format.sh`` script in the ``devtools`` directory of the DPDK repo. To run the script, extra :ref:`Python dependencies <dts_deps>` are needed. + +.. _ai_assisted_review: + +AI-Assisted Patch Review +------------------------ + +Contributors may optionally use the ``devtools/analyze-patch.py`` script +to get an AI-assisted review of patches before submitting them to the mailing list. +The script checks patches against the DPDK coding standards and contribution +guidelines documented in ``AGENTS.md``. + +The script supports multiple AI providers (Anthropic Claude, OpenAI ChatGPT, +xAI Grok, Google Gemini). An API key for the chosen provider must be set +in the corresponding environment variable (see ``--list-providers``). + +Basic usage:: + + # Review a single patch (default provider: Anthropic Claude) + devtools/analyze-patch.py my-patch.patch + + # Use a different provider + devtools/analyze-patch.py -p openai my-patch.patch + + # Review for an LTS branch (enables stricter rules) + devtools/analyze-patch.py -r 24.11 my-patch.patch + + # List available providers and their API key variables + devtools/analyze-patch.py --list-providers + +For a patch series in an mbox file, the ``--split-patches`` option reviews +each patch individually:: + + devtools/analyze-patch.py --split-patches series.mbox + + # Review only a range of patches + devtools/analyze-patch.py --split-patches --patch-range 1-5 series.mbox + +When reviewing for a Long Term Stable (LTS) release, use the ``-r`` option +with the target version. Any DPDK release with minor version ``.11`` +(e.g., 23.11, 24.11) is automatically recognized as LTS, +and the script will enforce stricter rules: bug fixes only, no new features or APIs. + +Output can be formatted as plain text (default), Markdown, HTML, or JSON:: + + devtools/analyze-patch.py -f markdown -o review.md my-patch.patch + +The review guidelines in ``AGENTS.md`` cover commit message formatting, +SPDX licensing, C coding style, forbidden API usage, symbol export rules, +and other DPDK-specific requirements. + +.. note:: + + AI-assisted review is a supplement to, not a replacement for, + human review on the mailing list. + Always verify AI suggestions before acting on them. + .. _contrib_check_compilation: Checking Compilation -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v8 6/6] MAINTAINERS: add section for AI review tools 2026-02-05 17:51 ` [PATCH v8 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger ` (4 preceding siblings ...) 2026-02-05 17:51 ` [PATCH v8 5/6] doc: add AI-assisted patch review to contributing guide Stephen Hemminger @ 2026-02-05 17:51 ` Stephen Hemminger 5 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-05 17:51 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Thomas Monjalon Add maintainer entries for the AI-assisted code review tooling: AGENTS.md, analyze-patch.py, compare-reviews.sh, and review-doc.py. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- MAINTAINERS | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 5683b87e4a..a30f08974e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -109,6 +109,14 @@ F: license/ F: .editorconfig F: .mailmap +AI review tools +M: Stephen Hemminger <stephen@networkplumber.org> +M: Aaron Conole <aconole@redhat.com> +F: AGENTS.md +F: devtools/analyze-patch.py +F: devtools/compare-reviews.sh +F: devtools/review-doc.py + Linux kernel uAPI headers M: Maxime Coquelin <maxime.coquelin@redhat.com> F: devtools/linux-uapi.sh -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v9 0/6] devtools: AI-assisted code and documentation review 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger ` (10 preceding siblings ...) 2026-02-05 17:51 ` [PATCH v8 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger @ 2026-02-13 21:39 ` Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger ` (6 more replies) 11 siblings, 7 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-13 21:39 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This series adds tooling for AI-assisted review of DPDK patches and documentation. The goal is to catch common coding style violations, commit message formatting errors, and documentation issues before patches reach the mailing list, reducing reviewer workload on mechanical checks. The approach uses a structured guidelines document (AGENTS.md) that codifies the project's existing standards from the contributor documentation, coding style guide, and validation scripts into a format that AI models can apply consistently. The review scripts then submit patches or documentation against these guidelines using any of several AI providers. The series includes: 1. AGENTS.md - consolidated review guidelines derived from patches.rst, coding_style.rst, check-git-log.sh, checkpatches.sh, and the Coccinelle scripts 2. analyze-patch.py - reviews patches against AGENTS.md using AI providers (Anthropic, OpenAI, xAI, Google), with support for patch series splitting and LTS-specific rules 3. compare-reviews.sh - runs the same patch through multiple providers for comparison, auto-detecting available API keys 4. review-doc.py - reviews documentation files for spelling, grammar, and technical accuracy, with batch processing and multiple output formats 5. Contributing guide updates documenting the new tools 6. MAINTAINERS entry for the new files AI review is intended as a supplement to human review, not a replacement. The tools are optional and require the contributor to have an API key for their chosen provider. v9 - update AGENTS.md to cover bugs that were found in 26.03 and previous PVS studio scans. Stephen Hemminger (6): doc: add AGENTS.md for AI code review tools devtools: add multi-provider AI patch review script devtools: add compare-reviews.sh for multi-provider analysis devtools: add multi-provider AI documentation review script doc: add AI-assisted patch review to contributing guide MAINTAINERS: add section for AI review tools AGENTS.md | 1767 ++++++++++++++++++++++++ MAINTAINERS | 8 + devtools/analyze-patch.py | 1334 ++++++++++++++++++ devtools/compare-reviews.sh | 192 +++ devtools/review-doc.py | 1098 +++++++++++++++ doc/guides/contributing/new_driver.rst | 2 + doc/guides/contributing/patches.rst | 56 + 7 files changed, 4457 insertions(+) create mode 100644 AGENTS.md create mode 100755 devtools/analyze-patch.py create mode 100755 devtools/compare-reviews.sh create mode 100755 devtools/review-doc.py -- 2.51.0 ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v9 1/6] doc: add AGENTS.md for AI code review tools 2026-02-13 21:39 ` [PATCH v9 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger @ 2026-02-13 21:39 ` Stephen Hemminger 2026-02-18 14:59 ` Aaron Conole 2026-02-13 21:39 ` [PATCH v9 2/6] devtools: add multi-provider AI patch review script Stephen Hemminger ` (5 subsequent siblings) 6 siblings, 1 reply; 63+ messages in thread From: Stephen Hemminger @ 2026-02-13 21:39 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Aaron Conole Add a reference document that enables AI code review tools to validate DPDK patches against project coding standards and submission requirements. The document consolidates guidelines from the DPDK contributor documentation, coding style guide, and validation scripts (check-git-log.sh, checkpatches.sh) into a single structured file. It covers SPDX license requirements, commit message formatting, C coding style rules, forbidden token restrictions, API tag placement, and severity levels aligned with existing validation tool output. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- AGENTS.md | 1767 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1767 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..a7fa39c863 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,1767 @@ +# AGENTS.md - DPDK Code Review Guidelines for AI Tools + +## CRITICAL INSTRUCTION - READ FIRST + +This document has two categories of review rules with different +confidence thresholds: + +### 1. Correctness Bugs -- HIGHEST PRIORITY (report at >=50% confidence) + +**Always report potential correctness bugs.** These are the most +valuable findings. When in doubt, report them with a note about +your confidence level. A possible use-after-free or resource leak +is worth mentioning even if you are not certain. + +Correctness bugs include: +- Use-after-free (accessing memory after `free`/`rte_free`) +- Resource leaks on error paths (memory, file descriptors, locks) +- Double-free or double-close +- NULL pointer dereference +- Buffer overflows or out-of-bounds access +- Uninitialized variable use in a reachable code path +- Race conditions (unsynchronized shared state) +- Missing error checks on functions that can fail +- Error paths that skip cleanup (goto labels, missing free/close) +- Incorrect error propagation (wrong return value, lost errno) +- Logic errors in conditionals (wrong operator, inverted test) +- Integer overflow/truncation in size calculations +- Missing bounds checks on user-supplied sizes or indices +- `mmap()` return checked against `NULL` instead of `MAP_FAILED` +- Statistics accumulation using `=` instead of `+=` +- Integer multiply without widening cast losing upper bits (16×16, 32×32, etc.) +- Unbounded descriptor chain traversal on guest/API-supplied data +- `1 << n` on 64-bit bitmask (must use `1ULL << n` or `RTE_BIT64()`) +- Variable assigned then overwritten before being read (dead store) +- Same variable used as loop counter in nested loops +- `memcpy`/`memcmp`/`memset` with same pointer for source and destination (no-op or undefined) + +**Do NOT self-censor correctness bugs.** If you identify a code +path where a resource could leak or memory could be used after +free, report it. Do not talk yourself out of it. + +### 2. Style, Process, and Formatting -- suppress false positives + +**NEVER list a style/process item under "Errors" or "Warnings" if +you conclude it is correct.** + +Before outputting any style, formatting, or process error/warning, +verify it is actually wrong. If your analysis concludes with +phrases like "there's no issue here", "which is fine", "appears +correct", "is acceptable", or "this is actually correct" -- then +DO NOT INCLUDE IT IN YOUR OUTPUT AT ALL. Delete it. Omit it +entirely. + +This suppression rule applies to: SPDX/copyright format, commit +message formatting, tag ordering, line length, naming conventions, +code style, and process compliance. It does NOT apply to +correctness bugs listed above. + +--- + +This document provides guidelines for AI-powered code review tools +when reviewing contributions to the Data Plane Development Kit +(DPDK). It is derived from the official DPDK contributor guidelines +and validation scripts. + +## Overview + +DPDK follows a development process modeled on the Linux Kernel. All +patches are reviewed publicly on the mailing list before being +merged. AI review tools should verify compliance with the standards +outlined below. + +## Review Philosophy + +**Correctness bugs are the primary goal of AI review.** Style and +formatting checks are secondary. A review that catches a +use-after-free but misses a style nit is far more valuable than +one that catches every style issue but misses the bug. + +**BEFORE OUTPUTTING YOUR REVIEW**: Re-read each item. +- For correctness bugs: keep them. If you have reasonable doubt + that a code path is safe, report it. +- For style/process items: if ANY item contains phrases like "is + fine", "no issue", "appears correct", "is acceptable", + "actually correct" -- DELETE THAT ITEM. Do not include it. + +### Correctness review guidelines +- Trace error paths: for every function that allocates a resource + or acquires a lock, verify that ALL error paths after that point + release it +- Check every `goto error` and early `return`: does it clean up + everything allocated so far? +- Look for use-after-free: after `free(p)`, is `p` accessed again? +- Check that error codes are propagated, not silently dropped +- Report at >=50% confidence; note uncertainty if appropriate +- It is better to report a potential bug that turns out to be safe + than to miss a real bug + +### Style and process review guidelines +- Only comment on style/process issues when you have HIGH CONFIDENCE (>80%) that an issue exists +- Be concise: one sentence per comment when possible +- Focus on actionable feedback, not observations +- When reviewing text, only comment on clarity issues if the text is genuinely + confusing or could lead to errors. +- Do NOT comment on copyright years unless outside valid range (2013 to current year) +- Do NOT report an issue then contradict yourself - if something is acceptable, do not mention it at all +- Do NOT include items in Errors/Warnings that you then say are "acceptable" or "correct" +- Do NOT mention things that are correct or "not an issue" - only report actual problems +- Do NOT speculate about contributor circumstances (employment, company policies, etc.) +- Before adding any style item to your review, ask: "Is this actually wrong?" If no, omit it entirely. +- VERIFY before reporting: For subject line length, COUNT the characters first. If <=60, do not mention it. +- NEVER write "(Correction: ...)" - if you need to correct yourself, simply omit the item entirely +- Do NOT add vague suggestions like "should be verified" or "should be checked" - either it's wrong or don't mention it +- Do NOT flag something as an Error then say "which is correct" in the same item +- Do NOT say "no issue here" or "this is actually correct" - if there's no issue, do not include it in your review +- Do NOT call the standard DPDK SPDX/copyright format "different style" - it is THE standard +- Do NOT analyze cross-patch dependencies or compilation order - you cannot reliably determine this from patch review +- Do NOT claim a patch "would cause compilation failure" based on symbols used in other patches in the series +- Review each patch individually for its own correctness; assume the patch author ordered them correctly +- When reviewing a patch series, OMIT patches that have no issues. Do not include a patch in your output just to say "no issues found" or to summarize what the patch does. Only include patches where you have actual findings to report. + +## Priority Areas (Review These) + +### Security & Safety +- Unsafe code blocks without justification +- Command injection risks (shell commands, user input) +- Path traversal vulnerabilities +- Credential exposure or hard coded secrets +- Missing input validation on external data +- Improper error handling that could leak sensitive info + +### Correctness Issues +- Logic errors that could cause panics or incorrect behavior +- Buffer overflows +- Race conditions +- Resource leaks (files, connections, memory) +- Off-by-one errors or boundary conditions +- Incorrect error propagation +- **Use-after-free** (any access to memory after it has been freed) +- **Error path resource leaks**: For every allocation or fd open, + trace each error path (`goto`, early `return`, conditional) to + verify the resource is released. Common patterns to check: + - `malloc`/`rte_malloc` followed by a failure that does `return -1` + instead of `goto cleanup` + - `open()`/`socket()` fd not closed on a later error + - Lock acquired but not released on an error branch + - Partially initialized structure where early fields are allocated + but later allocation fails without freeing the early ones +- **Double-free / double-close**: resource freed in both a normal + path and an error path, or fd closed but not set to -1 allowing + a second close +- **Missing error checks**: functions that can fail (malloc, open, + ioctl, etc.) whose return value is not checked +- Changes to API without release notes +- Changes to ABI on non-LTS release +- Usage of deprecated APIs when replacements exist +- Overly defensive code that adds unnecessary checks +- Unnecessary comments that just restate what the code already shows (remove them) +- **Process-shared synchronization errors** (pthread mutexes in shared memory without `PTHREAD_PROCESS_SHARED`) +- **`mmap()` checked against NULL instead of `MAP_FAILED`**: `mmap()` returns + `MAP_FAILED` (i.e., `(void *)-1`) on failure, NOT `NULL`. Checking + `== NULL` or `!= NULL` will miss the error and use an invalid pointer. + ```c + /* BAD - mmap never returns NULL on failure */ + p = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); + if (p == NULL) /* WRONG - will not catch MAP_FAILED */ + return -1; + + /* GOOD */ + p = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); + if (p == MAP_FAILED) + return -1; + ``` +- **Statistics accumulation using `=` instead of `+=`**: When accumulating + statistics (counters, byte totals, packet counts), using `=` overwrites + the running total with only the latest value. This silently produces + wrong results. + ```c + /* BAD - overwrites instead of accumulating */ + stats->rx_packets = nb_rx; + stats->rx_bytes = total_bytes; + + /* GOOD - accumulates over time */ + stats->rx_packets += nb_rx; + stats->rx_bytes += total_bytes; + ``` + Note: `=` is correct for gauge-type values (e.g., queue depth, link + status) and for initial assignment. Only flag when the context is + clearly incremental accumulation (loop bodies, per-burst counters, + callback tallies). +- **Integer multiply without widening cast**: When multiplying integers + to produce a result wider than the operands (sizes, offsets, byte + counts), the multiplication is performed at the operand width and + the upper bits are silently lost before the assignment. This applies + to any narrowing scenario: 16×16 assigned to a 32-bit variable, + 32×32 assigned to a 64-bit variable, etc. + ```c + /* BAD - 32×32 overflows before widening to 64 */ + uint64_t total_size = num_entries * entry_size; /* both are uint32_t */ + size_t offset = ring->idx * ring->desc_size; /* 32×32 → truncated */ + + /* BAD - 16×16 overflows before widening to 32 */ + uint32_t byte_count = pkt_len * nb_segs; /* both are uint16_t */ + + /* GOOD - widen before multiply */ + uint64_t total_size = (uint64_t)num_entries * entry_size; + size_t offset = (size_t)ring->idx * ring->desc_size; + uint32_t byte_count = (uint32_t)pkt_len * nb_segs; + ``` +- **Unbounded descriptor chain traversal**: When walking a chain of + descriptors (virtio, DMA, NIC Rx/Tx rings) where the chain length + or next-index comes from guest memory or an untrusted API caller, + the traversal MUST have a bounds check or loop counter to prevent + infinite loops or out-of-bounds access from malicious/corrupt data. + ```c + /* BAD - guest controls desc[idx].next with no bound */ + while (desc[idx].flags & VRING_DESC_F_NEXT) { + idx = desc[idx].next; /* guest-supplied, unbounded */ + process(desc[idx]); + } + + /* GOOD - cap iterations to descriptor ring size */ + for (i = 0; i < ring_size; i++) { + if (!(desc[idx].flags & VRING_DESC_F_NEXT)) + break; + idx = desc[idx].next; + if (idx >= ring_size) /* bounds check */ + return -EINVAL; + process(desc[idx]); + } + ``` + This applies to any chain/linked-list traversal where indices or + pointers originate from untrusted input (guest VMs, user-space + callers, network packets). +- **Bitmask shift using `1` instead of `1ULL` on 64-bit masks**: The + literal `1` is `int` (32 bits). Shifting it by 32 or more is + undefined behavior; shifting it by less than 32 but assigning to a + `uint64_t` silently zeroes the upper 32 bits. Use `1ULL << n`, + `UINT64_C(1) << n`, or the DPDK `RTE_BIT64(n)` macro. + ```c + /* BAD - 1 is int, UB if n >= 32, wrong if result used as uint64_t */ + uint64_t mask = 1 << bit_pos; + if (features & (1 << VIRTIO_NET_F_MRG_RXBUF)) /* bit 15 OK, bit 32+ UB */ + + /* GOOD */ + uint64_t mask = UINT64_C(1) << bit_pos; + uint64_t mask = 1ULL << bit_pos; + uint64_t mask = RTE_BIT64(bit_pos); /* preferred in DPDK */ + if (features & RTE_BIT64(VIRTIO_NET_F_MRG_RXBUF)) + ``` + Note: `1U << n` is acceptable when the mask is known to be 32-bit + (e.g., `uint32_t` register fields with `n < 32`). Only flag when + the result is stored in, compared against, or returned as a 64-bit + type, or when `n` could be >= 32. +- **Variable overwrite before read (dead store)**: A variable is + assigned a value that is unconditionally overwritten before it is + ever read. This usually indicates a logic error (wrong variable + name, missing `if`, copy-paste mistake) or at minimum is dead code. + ```c + /* BAD - first assignment is never read */ + ret = validate_input(cfg); + ret = apply_config(cfg); /* overwrites without checking first ret */ + if (ret != 0) + return ret; + + /* GOOD - check each return value */ + ret = validate_input(cfg); + if (ret != 0) + return ret; + ret = apply_config(cfg); + if (ret != 0) + return ret; + ``` + Do NOT flag cases where the initial value is intentionally a default + that may or may not be overwritten (e.g., `int ret = 0;` followed + by a conditional assignment). Only flag unconditional overwrites + where the first value can never be observed. +- **Shared loop counter in nested loops**: Using the same variable as + the loop counter in both an outer and inner loop causes the outer + loop to malfunction because the inner loop modifies its counter. + ```c + /* BAD - inner loop clobbers outer loop counter */ + int i; + for (i = 0; i < nb_queues; i++) { + setup_queue(i); + for (i = 0; i < nb_descs; i++) /* BUG: reuses i */ + init_desc(i); + } + + /* GOOD - distinct loop counters */ + for (int i = 0; i < nb_queues; i++) { + setup_queue(i); + for (int j = 0; j < nb_descs; j++) + init_desc(j); + } + ``` +- **`memcpy`/`memcmp`/`memset` self-argument (same pointer as both + operands)**: Passing the same pointer as both source and destination + to `memcpy()` is undefined behavior per C99. Passing the same + pointer to both arguments of `memcmp()` is a no-op that always + returns 0, indicating a logic error (usually a copy-paste mistake + with the wrong variable name). The same applies to `rte_memcpy()` + and `memmove()` with identical arguments. + ```c + /* BAD - memcpy with same src and dst is undefined behavior */ + memcpy(buf, buf, len); + rte_memcpy(dst, dst, len); + + /* BAD - memcmp with same pointer always returns 0 (logic error) */ + if (memcmp(key, key, KEY_LEN) == 0) /* always true, wrong variable? */ + + /* BAD - likely copy-paste: should be comparing two different MACs */ + if (memcmp(ð->src_addr, ð->src_addr, RTE_ETHER_ADDR_LEN) == 0) + + /* GOOD - comparing two different things */ + memcpy(dst, src, len); + if (memcmp(ð->src_addr, ð->dst_addr, RTE_ETHER_ADDR_LEN) == 0) + ``` + This pattern almost always indicates a copy-paste bug where one of + the arguments should be a different variable. + +### Architecture & Patterns +- Code that violates existing patterns in the code base +- Missing error handling +- Code that is not safe against signals + +--- + +## Source License Requirements + +### SPDX License Identifiers + +Every source file must begin with an SPDX license identifier, followed +by the copyright notice, then a blank line before other content. + +- SPDX tag on first line (or second line for `#!` scripts) +- Copyright line immediately follows +- Blank line after copyright before any code/includes +- Core libraries and drivers use `BSD-3-Clause` +- Kernel components use `GPL-2.0` +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` + +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 John Smith + */ + +#include <stdio.h> +``` + +For scripts: +```python +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 Jane Doe + +import sys +``` + +**Do not include boilerplate license text** - the SPDX identifier is sufficient. + +**Do NOT flag copyright years** - Copyright years reflect when code was written. Valid years range from 2013 (DPDK's first release) through the current year. Only flag years outside this range (e.g., years before 2013 or future years beyond the current date). + +**Copyright holders can be individuals or organizations** - Both are equally valid. NEVER comment on, question, or speculate about copyright holders. Do not mention employer policies, company resources, or suggest copyright "should" be assigned differently. The copyright holder's choice is not subject to review. + +**The following SPDX/copyright format is correct** - do NOT flag it or comment on it: +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2026 Stephen Hemminger + */ +``` +This is the standard DPDK format. Do not say it is "different" or "unusual" - it is correct. +Do not suggest the copyright year "should be verified" - if it's in the valid range (2013-current year), it's fine. +If SPDX is on line 1 and copyright follows, this is CORRECT - do not include it in Errors. + +--- + +## Commit Message Requirements + +### Subject Line (First Line) + +| Rule | Limit | +|------|-------| +| Maximum length | **60 characters** | +| Format | `component: lowercase description` | +| Case | Lowercase except acronyms | +| Mood | Imperative (instructions to codebase) | +| Punctuation | **No trailing period** | + +**Before flagging subject line length**: Actually count the characters. Only flag if >60. Do not flag then correct yourself. + +``` +# Good examples +net/ixgbe: fix offload config option name +config: increase max queues per port +net/mlx5: add support for flow counters +app/testpmd: fix memory leak in flow create + +# Bad examples +Fixed the offload config option. # past tense, has period, no prefix +net/ixgbe: Fix Offload Config # uppercase after colon +ixgbe: fix something # wrong prefix, should be net/ixgbe +lib/ethdev: add new feature # wrong prefix, should be ethdev: +``` + +#### Headline Format Errors (from check-git-log.sh) + +The following are flagged as errors: +- Tab characters in subject +- Leading or trailing spaces +- Trailing period (`.`) +- Punctuation marks: `, ; ! ? & |` +- Underscores after the colon (indicates code in subject) +- Missing colon separator +- No space after colon +- Space before colon + +#### Common Prefix Mistakes + +| Wrong | Correct | +|-------|---------| +| `ixgbe:` | `net/ixgbe:` | +| `lib/ethdev:` | `ethdev:` | +| `example:` | `examples/foo:` | +| `apps/` | `app/name:` | +| `app/test:` | `test:` | +| `testpmd:` | `app/testpmd:` | +| `test-pmd:` | `app/testpmd:` | +| `bond:` | `net/bonding:` | + +#### Case-Sensitive Terms (Commit Messages Only) + +These terms must use exact capitalization **in commit messages** (from `devtools/words-case.txt`): +- `Rx`, `Tx` (not `RX`, `TX`, `rx`, `tx`) +- `VF`, `PF` (not `vf`, `pf`) +- `MAC`, `VLAN`, `RSS`, `API` +- `Linux`, `Windows`, `FreeBSD` +- Check `devtools/words-case.txt` for complete list + +**Note**: These rules apply to commit messages only, NOT to code comments or documentation. + +### Commit Body + +| Rule | Limit | +|------|-------| +| Line wrap | **75 characters** | +| Exception | `Fixes:` lines may exceed 75 chars | + +Body guidelines: +- Describe the issue being fixed or feature being added +- Provide enough context for reviewers +- **Do not start the commit message body with "It"** +- **Must end with** `Signed-off-by:` line (real name, not alias) + +### Fixes Tag + +When fixing regressions, use the `Fixes:` tag with a 12-character abbreviated SHA: + +``` +Fixes: abcdefgh1234 ("original commit subject") +``` + +The hash must reference a commit in the current branch, and the subject must match exactly. + +**Do NOT flag Fixes tags** asking for verification that the commit "exists in the tree" or "cannot verify" - you cannot verify this from a patch review. If the format is correct (12-char SHA, quoted subject), accept it. NEVER say "cannot verify this exists". + +**Finding maintainers**: Use `devtools/get-maintainer.sh` to identify the current subsystem maintainer from the `MAINTAINERS` file, rather than CC'ing the original author: + +```bash +git send-email --to-cmd ./devtools/get-maintainer.sh --cc dev@dpdk.org 000*.patch +``` + +### Required Tags + +``` +# For Coverity issues (required if "coverity" mentioned in body): +Coverity issue: 12345 + +# For Bugzilla issues (required if "bugzilla" mentioned in body): +Bugzilla ID: 12345 + +# For stable release backport candidates: +Cc: stable@dpdk.org + +# For patch dependencies (in commit notes after ---): +Depends-on: series-NNNNN ("Title of the series") +``` + +### Tag Order + +Tags must appear in this order, with a blank line separating the two groups: + +**Group 1** (optional tags, no blank lines within this group): +- Coverity issue: +- Bugzilla ID: +- Fixes: +- Cc: + +**Blank line required here** (only if Group 1 tags are present) + +**Group 2** (no blank lines within this group): +- Reported-by: +- Suggested-by: +- Signed-off-by: +- Acked-by: +- Reviewed-by: +- Tested-by: + +**Correct examples:** + +Simple patch with no Group 1 tags (most common): +``` +The info_get callback doesn't need to check its args +since already done by ethdev. + +Signed-off-by: John Smith <john@example.com> +``` + +Patch with Fixes and Cc tags: +``` +Fixes: c743e50c475f ("null: new poll mode driver") +Cc: stable@dpdk.org + +Signed-off-by: John Smith <john@example.com> +``` + +Patch with only Fixes tag: +``` +Fixes: abcd1234abcd ("component: original commit") + +Signed-off-by: Jane Doe <jane@example.com> +``` + +**What is correct (do NOT flag):** +- Signed-off-by directly after commit body when there are no Group 1 tags - this is CORRECT +- No blank line between `Fixes:` and `Cc:` - this is CORRECT +- Blank line between `Cc:` (or last tag in Group 1) and `Signed-off-by:` - this is CORRECT + +**What is wrong (DO flag):** +- Missing blank line between Group 1 tags and Group 2 tags (when Group 1 tags exist) + +**Tag format**: `Tag-name: Full Name <email@domain.com>` + +--- + +## C Coding Style + +### Line Length + +| Context | Limit | +|---------|-------| +| Source code | **100 characters** | +| Commit body | **75 characters** | + +### General Formatting + +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) +- **No trailing whitespace** on lines or at end of files +- Files must end with a new line +- Code style should be consistent within each file + + +### Comments + +```c +/* Most single-line comments look like this. */ + +/* + * VERY important single-line comments look like this. + */ + +/* + * Multi-line comments look like this. Make them real sentences. Fill + * them so they look like real paragraphs. + */ +``` + +### Header File Organization + +Include order (each group separated by blank line): +1. System/libc includes +2. DPDK EAL includes +3. DPDK misc library includes +4. Application-specific includes + +```c +#include <stdio.h> +#include <stdlib.h> + +#include <rte_eal.h> + +#include <rte_ring.h> +#include <rte_mempool.h> + +#include "application.h" +``` + +### Header Guards + +```c +#ifndef _FILE_H_ +#define _FILE_H_ + +/* Code */ + +#endif /* _FILE_H_ */ +``` + +### Naming Conventions + +- **All external symbols** must have `RTE_` or `rte_` prefix +- **Macros**: ALL_UPPERCASE with `RTE_` prefix +- **Functions**: lowercase with underscores only (no CamelCase) +- **Variables**: lowercase with underscores only +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix + +**Exception**: Driver base directories (`drivers/*/base/`) may use different +naming conventions when sharing code across platforms or with upstream vendor code. + +#### Symbol Naming for Static Linking + +Drivers and libraries must not expose global variables that could +clash when statically linked with other DPDK components or +applications. Use consistent and unique prefixes for all exported +symbols to avoid namespace collisions. + +**Good practice**: Use a driver-specific or library-specific prefix for all global variables: + +```c +/* Good - virtio driver uses consistent "virtio_" prefix */ +const struct virtio_ops virtio_legacy_ops = { + .read = virtio_legacy_read, + .write = virtio_legacy_write, + .configure = virtio_legacy_configure, +}; + +const struct virtio_ops virtio_modern_ops = { + .read = virtio_modern_read, + .write = virtio_modern_write, + .configure = virtio_modern_configure, +}; + +/* Good - mlx5 driver uses consistent "mlx5_" prefix */ +struct mlx5_flow_driver_ops mlx5_flow_dv_ops; +``` + +**Bad practice**: Generic names that may clash: + +```c +/* Bad - "ops" is too generic, will clash with other drivers */ +const struct virtio_ops ops = { ... }; + +/* Bad - "legacy_ops" could clash with other legacy implementations */ +const struct virtio_ops legacy_ops = { ... }; + +/* Bad - "driver_config" is not unique */ +struct driver_config config; +``` + +**Guidelines**: +- Prefix all global variables with the driver or library name (e.g., `virtio_`, `mlx5_`, `ixgbe_`) +- Prefix all global functions similarly unless they use the `rte_` namespace +- Internal static variables do not require prefixes as they have file scope +- Consider using the `RTE_` or `rte_` prefix only for symbols that are part of the public DPDK API + +#### Prohibited Terminology + +Do not use non-inclusive naming including: +- `master/slave` -> Use: primary/secondary, controller/worker, leader/follower +- `blacklist/whitelist` -> Use: denylist/allowlist, blocklist/passlist +- `cripple` -> Use: impacted, degraded, restricted, immobolized +- `tribe` -> Use: team, squad +- `sanity check` -> Use: coherence check, test, verification + + +### Comparisons and Boolean Logic + +```c +/* Pointers - compare explicitly with NULL */ +if (p == NULL) /* Good */ +if (p != NULL) /* Good */ +if (likely(p != NULL)) /* Good - likely/unlikely don't change this */ +if (unlikely(p == NULL)) /* Good - likely/unlikely don't change this */ +if (!p) /* Bad - don't use ! on pointers */ + +/* Integers - compare explicitly with zero */ +if (a == 0) /* Good */ +if (a != 0) /* Good */ +if (errno != 0) /* Good - this IS explicit */ +if (likely(a != 0)) /* Good - likely/unlikely don't change this */ +if (!a) /* Bad - don't use ! on integers */ +if (a) /* Bad - implicit, should be a != 0 */ + +/* Characters - compare with character constant */ +if (*p == '\0') /* Good */ + +/* Booleans - direct test is acceptable */ +if (flag) /* Good for actual bool types */ +if (!flag) /* Good for actual bool types */ +``` + +**Explicit comparison** means using `==` or `!=` operators (e.g., `x != 0`, `p == NULL`). +**Implicit comparison** means relying on truthiness without an operator (e.g., `if (x)`, `if (!p)`). +**Note**: `likely()` and `unlikely()` macros do NOT affect whether a comparison is explicit or implicit. + +### Boolean Usage + +- Using `bool` type is allowed +- Prefer `bool` over `int` when a variable or field is only used as a boolean +- For structure fields, consider if the size/alignment impact is acceptable + +### Indentation and Braces + +```c +/* Control statements - no braces for single statements */ +if (val != NULL) + val = realloc(val, newsize); + +/* Braces on same line as else */ +if (test) + stmt; +else if (bar) { + stmt; + stmt; +} else + stmt; + +/* Switch statements - don't indent case */ +switch (ch) { +case 'a': + aflag = 1; + /* FALLTHROUGH */ +case 'b': + bflag = 1; + break; +default: + usage(); +} + +/* Long conditions - double indent continuation */ +if (really_long_variable_name_1 == really_long_variable_name_2 && + really_long_variable_name_3 == really_long_variable_name_4) + stmt; +``` + +### Variable Declarations + +- Prefer declaring variables inside the basic block where they are used +- Variables may be declared either at the start of the block, or at point of first use (C99 style) +- Both declaration styles are acceptable; consistency within a function is preferred +- Initialize variables only when a meaningful value exists at declaration time +- Use C99 designated initializers for structures + +```c +/* Good - declaration at start of block */ +int ret; +ret = some_function(); + +/* Also good - declaration at point of use (C99 style) */ +for (int i = 0; i < count; i++) + process(i); + +/* Good - declaration in inner block where variable is used */ +if (condition) { + int local_val = compute(); + use(local_val); +} + +/* Bad - unnecessary initialization defeats compiler warnings */ +int ret = 0; +ret = some_function(); /* Compiler won't warn if assignment removed */ +``` + +### Function Format + +- Return type on its own line +- Opening brace on its own line +- Place an empty line between declarations and statements + +```c +static char * +function(int a1, int b1) +{ + char *p; + + p = do_something(a1, b1); + return p; +} +``` + +--- + +## Unnecessary Code Patterns + +The following patterns add unnecessary code, hide bugs, or reduce performance. Avoid them. + +### Unnecessary Variable Initialization + +Do not initialize variables that will be assigned before use. This defeats the compiler's uninitialized variable warnings, hiding potential bugs. + +```c +/* Bad - initialization defeats -Wuninitialized */ +int ret = 0; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - compiler will warn if any path misses assignment */ +int ret; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - meaningful initial value */ +int count = 0; +for (i = 0; i < n; i++) + if (test(i)) + count++; +``` + +### Unnecessary Casts of void * + +In C, `void *` converts implicitly to any pointer type. Casting the result of `malloc()`, `calloc()`, `rte_malloc()`, or similar functions is unnecessary and can hide the error of a missing `#include <stdlib.h>`. + +```c +/* Bad - unnecessary cast */ +struct foo *p = (struct foo *)malloc(sizeof(*p)); +struct bar *q = (struct bar *)rte_malloc(NULL, sizeof(*q), 0); + +/* Good - no cast needed in C */ +struct foo *p = malloc(sizeof(*p)); +struct bar *q = rte_malloc(NULL, sizeof(*q), 0); +``` + +Note: Casts are required in C++ but DPDK is a C project. + +### Zero-Length Arrays vs Variable-Length Arrays + +Zero-length arrays (`int arr[0]`) are a GCC extension. Use C99 flexible array members instead. + +```c +/* Bad - GCC extension */ +struct msg { + int len; + char data[0]; +}; + +/* Good - C99 flexible array member */ +struct msg { + int len; + char data[]; +}; +``` + +### Unnecessary NULL Checks Before free() + +Functions like `free()`, `rte_free()`, and similar deallocation functions accept NULL pointers safely. Do not add redundant NULL checks. + +```c +/* Bad - unnecessary check */ +if (ptr != NULL) + free(ptr); + +if (rte_ptr != NULL) + rte_free(rte_ptr); + +/* Good - free handles NULL */ +free(ptr); +rte_free(rte_ptr); +``` + +### memset Before free() (CWE-14) + +Do not call `memset()` to zero memory before freeing it. The compiler may optimize away the `memset()` as a dead store (CWE-14: Compiler Removal of Code to Clear Buffers). For security-sensitive data, use `explicit_bzero()`, `rte_memset_sensitive()`, or `rte_free_sensitive()` which the compiler is not permitted to eliminate. + +```c +/* Bad - compiler may eliminate memset */ +memset(secret_key, 0, sizeof(secret_key)); +free(secret_key); + +/* Good - for non-sensitive data, just free */ +free(ptr); + +/* Good - explicit_bzero cannot be optimized away */ +explicit_bzero(secret_key, sizeof(secret_key)); +free(secret_key); + +/* Good - DPDK wrapper for clearing sensitive data */ +rte_memset_sensitive(secret_key, 0, sizeof(secret_key)); +free(secret_key); + +/* Good - for rte_malloc'd sensitive data, combined clear+free */ +rte_free_sensitive(secret_key); +``` + +### Appropriate Use of rte_malloc() + +`rte_malloc()` allocates from hugepage memory. Use it only when required: + +- Memory that will be accessed by DMA (NIC descriptors, packet buffers) +- Memory shared between primary and secondary DPDK processes +- Memory requiring specific NUMA node placement + +For general allocations, use standard `malloc()` which is faster and does not consume limited hugepage resources. + +```c +/* Bad - rte_malloc for ordinary data structure */ +struct config *cfg = rte_malloc(NULL, sizeof(*cfg), 0); + +/* Good - standard malloc for control structures */ +struct config *cfg = malloc(sizeof(*cfg)); + +/* Good - rte_malloc for DMA-accessible memory */ +struct rte_mbuf *mbufs = rte_malloc(NULL, n * sizeof(*mbufs), RTE_CACHE_LINE_SIZE); +``` + +### Appropriate Use of rte_memcpy() + +`rte_memcpy()` is optimized for bulk data transfer in the fast path. For general use, standard `memcpy()` is preferred because: + +- Modern compilers optimize `memcpy()` effectively +- `memcpy()` includes bounds checking with `_FORTIFY_SOURCE` +- `memcpy()` handles small fixed-size copies efficiently + +```c +/* Bad - rte_memcpy in control path */ +rte_memcpy(&config, &default_config, sizeof(config)); + +/* Good - standard memcpy for control path */ +memcpy(&config, &default_config, sizeof(config)); + +/* Good - rte_memcpy for packet data in fast path */ +rte_memcpy(rte_pktmbuf_mtod(m, void *), payload, len); +``` + +--- + +## Forbidden Tokens + +### Functions + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `rte_panic()` | Return error codes | lib/, drivers/ | +| `rte_exit()` | Return error codes | lib/, drivers/ | +| `perror()` | `RTE_LOG()` with `strerror(errno)` | lib/, drivers/ (allowed in examples/, app/test/) | +| `printf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | +| `fprintf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | + +### Atomics and Memory Barriers + +| Forbidden | Preferred | +|-----------|-----------| +| `rte_atomic16/32/64_xxx()` | C11 atomics via `rte_atomic_xxx()` | +| `rte_smp_mb()` | `rte_atomic_thread_fence()` | +| `rte_smp_rmb()` | `rte_atomic_thread_fence()` | +| `rte_smp_wmb()` | `rte_atomic_thread_fence()` | +| `__sync_xxx()` | `rte_atomic_xxx()` | +| `__atomic_xxx()` | `rte_atomic_xxx()` | +| `__ATOMIC_RELAXED` etc. | `rte_memory_order_xxx` | +| `__rte_atomic_thread_fence()` | `rte_atomic_thread_fence()` | + +### Threading + +| Forbidden | Preferred | +|-----------|-----------| +| `pthread_create()` | `rte_thread_create()` | +| `pthread_join()` | `rte_thread_join()` | +| `pthread_detach()` | EAL thread functions | +| `pthread_setaffinity_np()` | `rte_thread_set_affinity()` | +| `rte_thread_set_name()` | `rte_thread_set_prefixed_name()` | +| `rte_thread_create_control()` | `rte_thread_create_internal_control()` | + +### Process-Shared Synchronization + +When placing synchronization primitives in shared memory (memory accessible by multiple processes, such as DPDK primary/secondary processes or `mmap`'d regions), they **must** be initialized with process-shared attributes. Failure to do so causes **undefined behavior** that may appear to work in testing but fail unpredictably in production. + +#### pthread Mutexes in Shared Memory + +**This is an error** - mutex in shared memory without `PTHREAD_PROCESS_SHARED`: + +```c +/* BAD - undefined behavior when used across processes */ +struct shared_data { + pthread_mutex_t lock; + int counter; +}; + +void init_shared(struct shared_data *shm) { + pthread_mutex_init(&shm->lock, NULL); /* ERROR: missing pshared attribute */ +} +``` + +**Correct implementation**: + +```c +/* GOOD - properly initialized for cross-process use */ +struct shared_data { + pthread_mutex_t lock; + int counter; +}; + +int init_shared(struct shared_data *shm) { + pthread_mutexattr_t attr; + int ret; + + ret = pthread_mutexattr_init(&attr); + if (ret != 0) + return -ret; + + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); + if (ret != 0) { + pthread_mutexattr_destroy(&attr); + return -ret; + } + + ret = pthread_mutex_init(&shm->lock, &attr); + pthread_mutexattr_destroy(&attr); + + return -ret; +} +``` + +#### pthread Condition Variables in Shared Memory + +Condition variables also require the process-shared attribute: + +```c +/* BAD - will not work correctly across processes */ +pthread_cond_init(&shm->cond, NULL); + +/* GOOD */ +pthread_condattr_t cattr; +pthread_condattr_init(&cattr); +pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED); +pthread_cond_init(&shm->cond, &cattr); +pthread_condattr_destroy(&cattr); +``` + +#### pthread Read-Write Locks in Shared Memory + +```c +/* BAD */ +pthread_rwlock_init(&shm->rwlock, NULL); + +/* GOOD */ +pthread_rwlockattr_t rwattr; +pthread_rwlockattr_init(&rwattr); +pthread_rwlockattr_setpshared(&rwattr, PTHREAD_PROCESS_SHARED); +pthread_rwlock_init(&shm->rwlock, &rwattr); +pthread_rwlockattr_destroy(&rwattr); +``` + +#### When to Flag This Issue + +Flag as an **Error** when ALL of the following are true: +1. A `pthread_mutex_t`, `pthread_cond_t`, `pthread_rwlock_t`, or `pthread_barrier_t` is initialized +2. The primitive is stored in shared memory (identified by context such as: structure in `rte_malloc`/`rte_memzone`, `mmap`'d memory, memory passed to secondary processes, or structures documented as shared) +3. The initialization uses `NULL` attributes or attributes without `PTHREAD_PROCESS_SHARED` + +**Do NOT flag** when: +- The mutex is in thread-local or process-private heap memory (`malloc`) +- The mutex is a local/static variable not in shared memory +- The code already uses `pthread_mutexattr_setpshared()` with `PTHREAD_PROCESS_SHARED` +- The synchronization uses DPDK primitives (`rte_spinlock_t`, `rte_rwlock_t`) which are designed for shared memory + +#### Preferred Alternatives + +For DPDK code, prefer DPDK's own synchronization primitives which are designed for shared memory: + +| pthread Primitive | DPDK Alternative | +|-------------------|------------------| +| `pthread_mutex_t` | `rte_spinlock_t` (busy-wait) or properly initialized pthread mutex | +| `pthread_rwlock_t` | `rte_rwlock_t` | +| `pthread_spinlock_t` | `rte_spinlock_t` | + +Note: `rte_spinlock_t` and `rte_rwlock_t` work correctly in shared memory without special initialization, but they are spinning locks unsuitable for long wait times. + +### Compiler Built-ins and Attributes + +| Forbidden | Preferred | Notes | +|-----------|-----------|-------| +| `__attribute__` | RTE macros in `rte_common.h` | Except in `lib/eal/include/rte_common.h` | +| `__alignof__` | C11 `alignof` | | +| `__typeof__` | `typeof` | | +| `__builtin_*` | EAL macros | Except in `lib/eal/` and `drivers/*/base/` | +| `__reserved` | Different name | Reserved in Windows headers | +| `#pragma` / `_Pragma` | Avoid | Except in `rte_common.h` | + +### Format Specifiers + +| Forbidden | Preferred | +|-----------|-----------| +| `%lld`, `%llu`, `%llx` | `%PRId64`, `%PRIu64`, `%PRIx64` | + +### Headers and Build + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `#include <linux/pci_regs.h>` | `#include <rte_pci.h>` | | +| `install_headers()` | Meson `headers` variable | meson.build | +| `-DALLOW_EXPERIMENTAL_API` | Not in lib/drivers/app | Build flags | +| `allow_experimental_apis` | Not in lib/drivers/app | Meson | +| `#undef XXX` | `// XXX is not set` | config/rte_config.h | +| Driver headers (`*_driver.h`, `*_pmd.h`) | Public API headers | app/, examples/ | + +### Testing + +| Forbidden | Preferred | +|-----------|-----------| +| `REGISTER_TEST_COMMAND` | `REGISTER_<suite_name>_TEST` | + +### Documentation + +| Forbidden | Preferred | +|-----------|-----------| +| `http://...dpdk.org` | `https://...dpdk.org` | +| `//doc.dpdk.org/guides/...` | `:ref:` or `:doc:` Sphinx references | +| `:: file.svg` | `:: file.*` (wildcard extension) | + +--- + +## Deprecated API Usage + +New patches must not introduce usage of deprecated APIs, macros, or functions. +Deprecated items are marked with `RTE_DEPRECATED` or documented in the +deprecation notices section of the release notes. + +### Rules for New Code + +- Do not call functions marked with `RTE_DEPRECATED` or `__rte_deprecated` +- Do not use macros that have been superseded by newer alternatives +- Do not use data structures or enum values marked as deprecated +- Check `doc/guides/rel_notes/deprecation.rst` for planned deprecations +- When a deprecated API has a replacement, use the replacement + +### Deprecating APIs + +A patch may mark an API as deprecated provided: + +- No remaining usages exist in the current DPDK codebase +- The deprecation is documented in the release notes +- A migration path or replacement API is documented +- The `RTE_DEPRECATED` macro is used to generate compiler warnings + +```c +/* Marking a function as deprecated */ +__rte_deprecated +int +rte_old_function(void); + +/* With a message pointing to the replacement */ +__rte_deprecated_msg("use rte_new_function() instead") +int +rte_old_function(void); +``` + +### Common Deprecated Patterns + +| Deprecated | Replacement | Notes | +|-----------|-------------|-------| +| `rte_atomic*_t` types | C11 atomics | Use `rte_atomic_xxx()` wrappers | +| `rte_smp_*mb()` barriers | `rte_atomic_thread_fence()` | See Atomics section | +| `pthread_*()` in portable code | `rte_thread_*()` | See Threading section | + +When reviewing patches that add new code, flag any usage of deprecated APIs +as requiring change to use the modern replacement. + +--- + +## API Tag Requirements + +### `__rte_experimental` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_experimental +int +rte_new_feature(void); + +/* Wrong - not alone on line */ +__rte_experimental int rte_new_feature(void); + +/* Wrong - in .c file */ +``` + +### `__rte_internal` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_internal +int +internal_function(void); +``` + +### Alignment Attributes + +`__rte_aligned`, `__rte_cache_aligned`, `__rte_cache_min_aligned` may only be used with `struct` or `union` types: + +```c +/* Correct */ +struct __rte_cache_aligned my_struct { + /* ... */ +}; + +/* Wrong */ +int __rte_cache_aligned my_variable; +``` + +### Packed Attributes + +- `__rte_packed_begin` must follow `struct`, `union`, or alignment attributes +- `__rte_packed_begin` and `__rte_packed_end` must be used in pairs +- Cannot use `__rte_packed_begin` with `enum` + +```c +/* Correct */ +struct __rte_packed_begin my_packed_struct { + /* ... */ +} __rte_packed_end; + +/* Wrong - with enum */ +enum __rte_packed_begin my_enum { + /* ... */ +}; +``` + +--- + +## Code Quality Requirements + +### Compilation + +- Each commit must compile independently (for `git bisect`) +- No forward dependencies within a patchset +- Test with multiple targets, compilers, and options +- Use `devtools/test-meson-builds.sh` + +**Note for AI reviewers**: You cannot verify compilation order or cross-patch dependencies from patch review alone. Do NOT flag patches claiming they "would fail to compile" based on symbols used in other patches in the series. Assume the patch author has ordered them correctly. + +### Testing + +- Add tests to `app/test` unit test framework +- New API functions must be used in `/app` test directory +- New device APIs require at least one driver implementation + +#### Functional Test Infrastructure + +Standalone functional tests should use the `TEST_ASSERT` macros and `unit_test_suite_runner` infrastructure for consistency and proper integration with the DPDK test framework. + +```c +#include <rte_test.h> + +static int +test_feature_basic(void) +{ + int ret; + + ret = rte_feature_init(); + TEST_ASSERT_SUCCESS(ret, "Failed to initialize feature"); + + ret = rte_feature_operation(); + TEST_ASSERT_EQUAL(ret, 0, "Operation returned unexpected value"); + + TEST_ASSERT_NOT_NULL(rte_feature_get_ptr(), + "Feature pointer should not be NULL"); + + return TEST_SUCCESS; +} + +static struct unit_test_suite feature_testsuite = { + .suite_name = "feature_autotest", + .setup = test_feature_setup, + .teardown = test_feature_teardown, + .unit_test_cases = { + TEST_CASE(test_feature_basic), + TEST_CASE(test_feature_advanced), + TEST_CASES_END() + } +}; + +static int +test_feature(void) +{ + return unit_test_suite_runner(&feature_testsuite); +} + +REGISTER_FAST_TEST(feature_autotest, NOHUGE_OK, ASAN_OK, test_feature); +``` + +The `REGISTER_FAST_TEST` macro parameters are: +- Test name (e.g., `feature_autotest`) +- `NOHUGE_OK` or `HUGEPAGES_REQUIRED` - whether test can run without hugepages +- `ASAN_OK` or `ASAN_FAILS` - whether test is compatible with Address Sanitizer +- Test function name + +Common `TEST_ASSERT` macros: +- `TEST_ASSERT(cond, msg, ...)` - Assert condition is true +- `TEST_ASSERT_SUCCESS(val, msg, ...)` - Assert value equals 0 +- `TEST_ASSERT_FAIL(val, msg, ...)` - Assert value is non-zero +- `TEST_ASSERT_EQUAL(a, b, msg, ...)` - Assert two values are equal +- `TEST_ASSERT_NOT_EQUAL(a, b, msg, ...)` - Assert two values differ +- `TEST_ASSERT_NULL(val, msg, ...)` - Assert value is NULL +- `TEST_ASSERT_NOT_NULL(val, msg, ...)` - Assert value is not NULL + +### Documentation + +- Add Doxygen comments for public APIs +- Update release notes in `doc/guides/rel_notes/` for important changes +- Code and documentation must be updated atomically in same patch +- Only update the **current release** notes file +- Documentation must match the code +- PMD features must match the features matrix in `doc/guides/nics/features/` +- Documentation must match device operations (see `doc/guides/nics/features.rst` for the mapping between features, `eth_dev_ops`, and related APIs) +- Release notes are NOT required for: + - Test-only changes (unit tests, functional tests) + - Internal APIs and helper functions (not exported to applications) + - Internal implementation changes that don't affect public API + +### RST Documentation Style + +When reviewing `.rst` documentation files, prefer **definition lists** +over simple bullet lists where each item has a term and a description. +Definition lists produce better-structured HTML/PDF output and are +easier to scan. + +**When to suggest a definition list:** +- A bullet list where each item starts with a bold or emphasized term + followed by a dash, colon, or long explanation +- Lists of options, parameters, configuration values, or features + where each entry has a name and a description +- Glossary-style enumerations + +**When a simple list is fine (do NOT flag):** +- Short lists of items without descriptions (e.g., file names, steps) +- Lists where items are single phrases or sentences with no term/definition structure +- Enumerated steps in a procedure + +**RST definition list syntax:** + +```rst +term 1 + Description of term 1. + +term 2 + Description of term 2. + Can span multiple lines. +``` + +**Example — flag this pattern:** + +```rst +* **error** - Fail with error (default) +* **truncate** - Truncate content to fit token limit +* **summary** - Request high-level summary review +``` + +**Suggest rewriting as:** + +```rst +error + Fail with error (default). + +truncate + Truncate content to fit token limit. + +summary + Request high-level summary review. +``` + +This is a **Warning**-level suggestion, not an Error. Do not flag it +when the existing list structure is appropriate (see "when a simple +list is fine" above). + +### API and Driver Changes + +- New APIs must be marked as `__rte_experimental` +- New APIs must have hooks in `app/testpmd` and tests in the functional test suite +- Changes to existing APIs require release notes +- New drivers or subsystems must have release notes +- Internal APIs (used only within DPDK, not exported to applications) do NOT require release notes + +### ABI Compatibility and Symbol Exports + +**IMPORTANT**: DPDK uses automatic symbol map generation. Do **NOT** recommend +manually editing `version.map` files - they are auto-generated from source code +annotations. + +#### Symbol Export Macros + +New public functions must be annotated with export macros (defined in +`rte_export.h`). Place the macro on the line immediately before the function +definition in the `.c` file: + +```c +/* For stable ABI symbols */ +RTE_EXPORT_SYMBOL(rte_foo_create) +int +rte_foo_create(struct rte_foo_config *config) +{ + /* ... */ +} + +/* For experimental symbols (include version when first added) */ +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_foo_new_feature, 25.03) +__rte_experimental +int +rte_foo_new_feature(void) +{ + /* ... */ +} + +/* For internal symbols (shared between DPDK components only) */ +RTE_EXPORT_INTERNAL_SYMBOL(rte_foo_internal_helper) +int +rte_foo_internal_helper(void) +{ + /* ... */ +} +``` + +#### Symbol Export Rules + +- `RTE_EXPORT_SYMBOL` - Use for stable ABI functions +- `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, ver)` - Use for new experimental APIs + (version is the DPDK release, e.g., `25.03`) +- `RTE_EXPORT_INTERNAL_SYMBOL` - Use for functions shared between DPDK libs/drivers + but not part of public API +- Export macros go in `.c` files, not headers +- The build system generates linker version maps automatically + +#### What NOT to Review + +- Do **NOT** flag missing `version.map` updates - maps are auto-generated +- Do **NOT** suggest adding symbols to `lib/*/version.map` files + +#### ABI Versioning for Changed Functions + +When changing the signature of an existing stable function, use versioning macros +from `rte_function_versioning.h`: + +- `RTE_VERSION_SYMBOL` - Create versioned symbol for backward compatibility +- `RTE_DEFAULT_SYMBOL` - Mark the new default version + +Follow ABI policy and versioning guidelines in the contributor documentation. +Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable. + +--- + +## LTS (Long Term Stable) Release Review + +LTS releases are DPDK versions ending in `.11` (e.g., 23.11, 22.11, +21.11, 20.11, 19.11). When reviewing patches targeting an LTS branch, +apply stricter criteria: + +### LTS-Specific Rules + +- **Only bug fixes allowed** -- no new features +- **No new APIs** (experimental or stable) +- **ABI must remain unchanged** -- no symbol additions, removals, + or signature changes +- Backported fixes should reference the original commit with a + `Fixes:` tag +- Copyright years should reflect when the code was originally + written +- Be conservative: reject changes that are not clearly bug fixes + +### What to Flag on LTS Branches + +**Error:** +- New feature code (new functions, new driver capabilities) +- New experimental or stable API additions +- ABI changes (new or removed symbols, changed function signatures) +- Changes that add new configuration options or parameters + +**Warning:** +- Large refactoring that goes beyond what is needed for a fix +- Missing `Fixes:` tag on a backported bug fix +- Missing `Cc: stable@dpdk.org` + +### When LTS Rules Apply + +LTS rules apply when the reviewer is told the target release is an +LTS version (via the `--release` option or equivalent). If no +release is specified, assume the patch targets the main development +branch where new features and APIs are allowed. + +--- + +## Patch Validation Checklist + +### Commit Message + +- [ ] Subject line <=60 characters +- [ ] Subject is lowercase (except acronyms from words-case.txt) +- [ ] Correct component prefix (e.g., `net/ixgbe:` not `ixgbe:`) +- [ ] No `lib/` prefix for libraries +- [ ] Imperative mood, no trailing period +- [ ] No tabs, leading/trailing spaces, or punctuation marks +- [ ] Body wrapped at 75 characters +- [ ] Body does not start with "It" +- [ ] `Signed-off-by:` present with real name and valid email +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA and exact subject +- [ ] `Coverity issue:` tag present if Coverity mentioned +- [ ] `Bugzilla ID:` tag present if Bugzilla mentioned +- [ ] `Cc: stable@dpdk.org` for stable backport candidates +- [ ] Tags in correct order with blank line separator + +### License + +- [ ] SPDX identifier on first line (or second for scripts) +- [ ] Copyright line follows SPDX +- [ ] Blank line after copyright before code +- [ ] Appropriate license for file type + +### Code Style + +- [ ] Lines <=100 characters +- [ ] Hard tabs for indentation, spaces for alignment +- [ ] No trailing whitespace +- [ ] Proper include order +- [ ] Header guards present +- [ ] `rte_`/`RTE_` prefix on external symbols +- [ ] Driver/library global variables use unique prefixes (e.g., `virtio_`, `mlx5_`) +- [ ] No prohibited terminology +- [ ] Proper brace style +- [ ] Function return type on own line +- [ ] Explicit comparisons: `== NULL`, `== 0`, `!= NULL`, `!= 0` +- [ ] No forbidden tokens (see table above) +- [ ] No unnecessary code patterns (see section above) +- [ ] No usage of deprecated APIs, macros, or functions +- [ ] Process-shared primitives in shared memory use `PTHREAD_PROCESS_SHARED` +- [ ] `mmap()` return checked against `MAP_FAILED`, not `NULL` +- [ ] Statistics use `+=` not `=` for accumulation +- [ ] Integer multiplies widened before operation when result is 64-bit +- [ ] Descriptor chain traversals bounded by ring size or loop counter +- [ ] 64-bit bitmasks use `1ULL <<` or `RTE_BIT64()`, not `1 <<` +- [ ] No unconditional variable overwrites before read +- [ ] Nested loops use distinct counter variables +- [ ] No `memcpy`/`memcmp` with identical source and destination pointers +- [ ] Sensitive data cleared with `explicit_bzero()`/`rte_free_sensitive()`, not `memset()` + +### API Tags + +- [ ] `__rte_experimental` alone on line, only in headers +- [ ] `__rte_internal` alone on line, only in headers +- [ ] Alignment attributes only on struct/union +- [ ] Packed attributes properly paired +- [ ] New public functions have `RTE_EXPORT_*` macro in `.c` file +- [ ] Experimental functions use `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, version)` + +### Structure + +- [ ] Each commit compiles independently +- [ ] Code and docs updated together +- [ ] Documentation matches code behavior +- [ ] RST docs use definition lists for term/description patterns +- [ ] PMD features match `doc/guides/nics/features/` matrix +- [ ] Device operations match documentation (per `features.rst` mappings) +- [ ] Tests added/updated as needed +- [ ] Functional tests use TEST_ASSERT macros and unit_test_suite_runner +- [ ] New APIs marked as `__rte_experimental` +- [ ] New APIs have testpmd hooks and functional tests +- [ ] Current release notes updated for significant changes +- [ ] Release notes updated for API changes +- [ ] Release notes updated for new drivers or subsystems + +--- + +## Meson Build Files + +### Style Requirements + +- 4-space indentation (no tabs) +- Line continuations double-indented +- Lists alphabetically ordered +- Short lists (<=3 items): single line, no trailing comma +- Long lists: one item per line, trailing comma on last item +- No strict line length limit for meson files; lines under 100 characters are acceptable + +```python +# Short list +sources = files('file1.c', 'file2.c') + +# Long list +headers = files( + 'header1.h', + 'header2.h', + 'header3.h', +) +``` + +--- + +## Python Code + +- Must comply with formatting standards +- Use **`black`** for code formatting validation +- Line length acceptable up to 100 characters + +--- + +## Validation Tools + +Run these before submitting: + +```bash +# Check commit messages +devtools/check-git-log.sh -n1 + +# Check patch format and forbidden tokens +devtools/checkpatches.sh -n1 + +# Check maintainers coverage +devtools/check-maintainers.sh + +# Build validation +devtools/test-meson-builds.sh + +# Find maintainers for your patch +devtools/get-maintainer.sh <patch-file> +``` + +--- + +## Severity Levels for AI Review + +**Error** (must fix): + +*Correctness bugs (highest value findings):* +- Use-after-free +- Resource leaks on error paths (memory, file descriptors, locks) +- Double-free or double-close +- NULL pointer dereference on reachable code path +- Buffer overflow or out-of-bounds access +- Missing error check on a function that can fail, leading to undefined behavior +- Race condition on shared mutable state without synchronization +- Error path that skips necessary cleanup +- `mmap()` return value checked against NULL instead of `MAP_FAILED` +- Statistics accumulation using `=` instead of `+=` (overwrite vs increment) +- Integer multiply without widening cast losing upper bits (16×16, 32×32, etc.) +- Unbounded descriptor chain traversal on guest/API-supplied indices +- `1 << n` used for 64-bit bitmask (undefined behavior if n >= 32) +- Variable assigned then unconditionally overwritten before read +- Same variable used as counter in nested loops +- `memcpy`/`memcmp` with same pointer as both arguments (UB or no-op logic error) + +*Process and format errors:* +- Missing or malformed SPDX license +- Missing Signed-off-by +- Subject line over 60 characters +- Body lines over 75 characters +- Wrong tag order or format +- Missing required tags (Fixes, Coverity issue, Bugzilla ID) +- Forbidden tokens in code +- `__rte_experimental`/`__rte_internal` in .c files or not alone on line +- Compilation failures +- ABI breaks without proper versioning +- pthread mutex/cond/rwlock in shared memory without `PTHREAD_PROCESS_SHARED` + +**Warning** (should fix): +- Subject line style issues (case, punctuation) +- Wrong component prefix +- Missing Cc: stable@dpdk.org for fixes +- Documentation gaps +- Documentation does not match code behavior +- PMD features missing from `doc/guides/nics/features/` matrix +- Device operations not documented per `features.rst` mappings +- Missing tests +- Functional tests not using TEST_ASSERT macros or unit_test_suite_runner +- New API not marked as `__rte_experimental` +- New API without testpmd hooks or functional tests +- New public function missing `RTE_EXPORT_*` macro +- API changes without release notes +- New drivers or subsystems without release notes +- Implicit comparisons (`!ptr` instead of `ptr == NULL`) +- Unnecessary variable initialization +- Unnecessary casts of `void *` +- Unnecessary NULL checks before free +- Inappropriate use of `rte_malloc()` or `rte_memcpy()` +- Use of `perror()`, `printf()`, `fprintf()` in libraries or drivers (allowed in examples and test code) +- Driver/library global variables without unique prefixes (static linking clash risk) +- Usage of deprecated APIs, macros, or functions in new code +- RST documentation using bullet lists where definition lists would be more appropriate + +**Do NOT flag** (common false positives): +- Missing `version.map` updates (maps are auto-generated from `RTE_EXPORT_*` macros) +- Suggesting manual edits to any `version.map` file +- Copyright years within valid range (2013 to current year) +- Copyright held by individuals (never speculate about employers, company policies, or who "should" hold copyright) +- SPDX/copyright format that matches the standard DPDK format (do not call it "different" or "unusual") +- Meson file lines under 100 characters +- Case-sensitive term violations in code comments (words-case.txt applies to commit messages only) +- Comparisons using `== 0`, `!= 0`, `== NULL`, `!= NULL` as "implicit" (these ARE explicit) +- Comparisons wrapped in `likely()` or `unlikely()` macros - these are still explicit if using == or != +- Anything you determine is correct (do not mention non-issues or say "No issue here") +- `REGISTER_FAST_TEST` using `NOHUGE_OK`/`ASAN_OK` macros (this is the correct current format) +- Tag ordering: Signed-off-by directly after commit body (when no Fixes/Cc tags) is CORRECT +- Tag ordering: no blank line between Fixes/Cc, blank line before Signed-off-by (when Fixes/Cc present) is CORRECT +- Missing release notes for test-only changes (unit tests do not require release notes) +- Missing release notes for internal APIs or helper functions (only public APIs need release notes) +- Subject lines that are within the 60 character limit (count first, do not guess) +- Any item you later correct with "(Correction: ...)" or "actually acceptable" - just omit it +- Vague concerns ("should be verified", "should be checked") - if you're not sure it's wrong, don't flag it +- Items where you say "which is correct" or "this is correct" - if it's correct, don't mention it at all +- Fixes tags with correct format (12-char SHA, quoted subject) - NEVER ask for verification that commit exists +- Items where you conclude "no issue here" or "this is actually correct" - omit these entirely +- Clean patches in a series - do not include a patch just to say "no issues" or describe what it does +- Cross-patch compilation dependencies - you cannot determine patch ordering correctness from review +- Claims that a symbol "was removed in patch N" causing issues in patch M - assume author ordered correctly +- Any speculation about whether patches will compile when applied in sequence +- Mutexes/locks in process-private memory (standard `malloc`, stack, static non-shared) - these don't need `PTHREAD_PROCESS_SHARED` +- Use of `rte_spinlock_t` or `rte_rwlock_t` in shared memory (these work correctly without special init) + +**Info** (consider): +- Minor style preferences +- Optimization suggestions +- Alternative approaches + +--- + +# Response Format + +When you identify an issue: +1. **State the problem** (1 sentence) +2. **Why it matters** (1 sentence, only if not obvious) +3. **Suggested fix** (code snippet or specific action) + +Example: +This could panic if the string is NULL. + +--- + +## FINAL CHECK BEFORE SUBMITTING REVIEW + +Before outputting your review, do two separate passes: + +### Pass 1: Verify correctness bugs are included + +Ask: "Did I trace every error path for resource leaks? Did I check +for use-after-free? Did I verify error codes are propagated?" + +If you identified a potential correctness bug but talked yourself +out of it, **add it back**. It is better to report a possible bug +than to miss a real one. + +### Pass 2: Remove style/process false positives + +For EACH style/process item, ask: "Did I conclude this is actually +fine/correct/acceptable/no issue?" + +If YES, DELETE THAT ITEM. It should not be in your output. + +An item that says "X is wrong... actually this is correct" is a +FALSE POSITIVE and must be removed. This applies to style, format, +and process items only. + +**If your Errors section would be empty after this check, that's +fine -- it means the patches are good.** -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH v9 1/6] doc: add AGENTS.md for AI code review tools 2026-02-13 21:39 ` [PATCH v9 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger @ 2026-02-18 14:59 ` Aaron Conole 2026-02-19 17:47 ` Stephen Hemminger 2026-02-19 18:04 ` Stephen Hemminger 0 siblings, 2 replies; 63+ messages in thread From: Aaron Conole @ 2026-02-18 14:59 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev, Aaron Conole Stephen Hemminger <stephen@networkplumber.org> writes: > Add a reference document that enables AI code review tools to > validate DPDK patches against project coding standards and > submission requirements. > > The document consolidates guidelines from the DPDK contributor > documentation, coding style guide, and validation scripts > (check-git-log.sh, checkpatches.sh) into a single structured > file. It covers SPDX license requirements, commit message > formatting, C coding style rules, forbidden token restrictions, > API tag placement, and severity levels aligned with existing > validation tool output. > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- Hi Stephen, Thanks for working on this - the AGENTS file here is quite comprehensive, but I have some questions about it. > AGENTS.md | 1767 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 1767 insertions(+) > create mode 100644 AGENTS.md > > diff --git a/AGENTS.md b/AGENTS.md > new file mode 100644 > index 0000000000..a7fa39c863 > --- /dev/null > +++ b/AGENTS.md > @@ -0,0 +1,1767 @@ > +# AGENTS.md - DPDK Code Review Guidelines for AI Tools > + > +## CRITICAL INSTRUCTION - READ FIRST > + > +This document has two categories of review rules with different > +confidence thresholds: > + > +### 1. Correctness Bugs -- HIGHEST PRIORITY (report at >=50% confidence) > + > +**Always report potential correctness bugs.** These are the most > +valuable findings. When in doubt, report them with a note about > +your confidence level. A possible use-after-free or resource leak > +is worth mentioning even if you are not certain. > + > +Correctness bugs include: > +- Use-after-free (accessing memory after `free`/`rte_free`) > +- Resource leaks on error paths (memory, file descriptors, locks) > +- Double-free or double-close > +- NULL pointer dereference > +- Buffer overflows or out-of-bounds access > +- Uninitialized variable use in a reachable code path > +- Race conditions (unsynchronized shared state) > +- Missing error checks on functions that can fail > +- Error paths that skip cleanup (goto labels, missing free/close) > +- Incorrect error propagation (wrong return value, lost errno) > +- Logic errors in conditionals (wrong operator, inverted test) > +- Integer overflow/truncation in size calculations > +- Missing bounds checks on user-supplied sizes or indices > +- `mmap()` return checked against `NULL` instead of `MAP_FAILED` > +- Statistics accumulation using `=` instead of `+=` > +- Integer multiply without widening cast losing upper bits (16×16, 32×32, etc.) > +- Unbounded descriptor chain traversal on guest/API-supplied data > +- `1 << n` on 64-bit bitmask (must use `1ULL << n` or `RTE_BIT64()`) > +- Variable assigned then overwritten before being read (dead store) > +- Same variable used as loop counter in nested loops > +- `memcpy`/`memcmp`/`memset` with same pointer for source and destination (no-op or undefined) > + > +**Do NOT self-censor correctness bugs.** If you identify a code > +path where a resource could leak or memory could be used after > +free, report it. Do not talk yourself out of it. > + > +### 2. Style, Process, and Formatting -- suppress false positives > + > +**NEVER list a style/process item under "Errors" or "Warnings" if > +you conclude it is correct.** > + > +Before outputting any style, formatting, or process error/warning, > +verify it is actually wrong. If your analysis concludes with > +phrases like "there's no issue here", "which is fine", "appears > +correct", "is acceptable", or "this is actually correct" -- then > +DO NOT INCLUDE IT IN YOUR OUTPUT AT ALL. Delete it. Omit it > +entirely. > + > +This suppression rule applies to: SPDX/copyright format, commit > +message formatting, tag ordering, line length, naming conventions, > +code style, and process compliance. It does NOT apply to > +correctness bugs listed above. > + > +--- > + > +This document provides guidelines for AI-powered code review tools > +when reviewing contributions to the Data Plane Development Kit > +(DPDK). It is derived from the official DPDK contributor guidelines > +and validation scripts. > + > +## Overview > + > +DPDK follows a development process modeled on the Linux Kernel. All > +patches are reviewed publicly on the mailing list before being > +merged. AI review tools should verify compliance with the standards > +outlined below. > + > +## Review Philosophy > + > +**Correctness bugs are the primary goal of AI review.** Style and > +formatting checks are secondary. A review that catches a > +use-after-free but misses a style nit is far more valuable than > +one that catches every style issue but misses the bug. > + > +**BEFORE OUTPUTTING YOUR REVIEW**: Re-read each item. > +- For correctness bugs: keep them. If you have reasonable doubt > + that a code path is safe, report it. > +- For style/process items: if ANY item contains phrases like "is > + fine", "no issue", "appears correct", "is acceptable", > + "actually correct" -- DELETE THAT ITEM. Do not include it. > + > +### Correctness review guidelines > +- Trace error paths: for every function that allocates a resource > + or acquires a lock, verify that ALL error paths after that point > + release it > +- Check every `goto error` and early `return`: does it clean up > + everything allocated so far? > +- Look for use-after-free: after `free(p)`, is `p` accessed again? > +- Check that error codes are propagated, not silently dropped > +- Report at >=50% confidence; note uncertainty if appropriate > +- It is better to report a potential bug that turns out to be safe > + than to miss a real bug > + > +### Style and process review guidelines > +- Only comment on style/process issues when you have HIGH CONFIDENCE (>80%) that an issue exists > +- Be concise: one sentence per comment when possible > +- Focus on actionable feedback, not observations > +- When reviewing text, only comment on clarity issues if the text is genuinely > + confusing or could lead to errors. > +- Do NOT comment on copyright years unless outside valid range (2013 to current year) > +- Do NOT report an issue then contradict yourself - if something is acceptable, do not mention it at all > +- Do NOT include items in Errors/Warnings that you then say are "acceptable" or "correct" > +- Do NOT mention things that are correct or "not an issue" - only report actual problems > +- Do NOT speculate about contributor circumstances (employment, company policies, etc.) > +- Before adding any style item to your review, ask: "Is this actually wrong?" If no, omit it entirely. > +- VERIFY before reporting: For subject line length, COUNT the characters first. If <=60, do not mention it. > +- NEVER write "(Correction: ...)" - if you need to correct yourself, simply omit the item entirely > +- Do NOT add vague suggestions like "should be verified" or "should be checked" - either it's wrong or don't mention it > +- Do NOT flag something as an Error then say "which is correct" in the same item > +- Do NOT say "no issue here" or "this is actually correct" - if there's no issue, do not include it in your review > +- Do NOT call the standard DPDK SPDX/copyright format "different style" - it is THE standard > +- Do NOT analyze cross-patch dependencies or compilation order - you cannot reliably determine this from patch review > +- Do NOT claim a patch "would cause compilation failure" based on symbols used in other patches in the series > +- Review each patch individually for its own correctness; assume the patch author ordered them correctly > +- When reviewing a patch series, OMIT patches that have no issues. Do not include a patch in your output just to say "no issues found" or to summarize what the patch does. Only include patches where you have actual findings to report. > + > +## Priority Areas (Review These) > + > +### Security & Safety > +- Unsafe code blocks without justification > +- Command injection risks (shell commands, user input) > +- Path traversal vulnerabilities > +- Credential exposure or hard coded secrets > +- Missing input validation on external data > +- Improper error handling that could leak sensitive info > + > +### Correctness Issues > +- Logic errors that could cause panics or incorrect behavior > +- Buffer overflows > +- Race conditions > +- Resource leaks (files, connections, memory) > +- Off-by-one errors or boundary conditions > +- Incorrect error propagation > +- **Use-after-free** (any access to memory after it has been freed) > +- **Error path resource leaks**: For every allocation or fd open, > + trace each error path (`goto`, early `return`, conditional) to > + verify the resource is released. Common patterns to check: > + - `malloc`/`rte_malloc` followed by a failure that does `return -1` > + instead of `goto cleanup` > + - `open()`/`socket()` fd not closed on a later error > + - Lock acquired but not released on an error branch > + - Partially initialized structure where early fields are allocated > + but later allocation fails without freeing the early ones > +- **Double-free / double-close**: resource freed in both a normal > + path and an error path, or fd closed but not set to -1 allowing > + a second close > +- **Missing error checks**: functions that can fail (malloc, open, > + ioctl, etc.) whose return value is not checked > +- Changes to API without release notes > +- Changes to ABI on non-LTS release > +- Usage of deprecated APIs when replacements exist > +- Overly defensive code that adds unnecessary checks > +- Unnecessary comments that just restate what the code already shows (remove them) > +- **Process-shared synchronization errors** (pthread mutexes in shared memory without `PTHREAD_PROCESS_SHARED`) > +- **`mmap()` checked against NULL instead of `MAP_FAILED`**: `mmap()` returns > + `MAP_FAILED` (i.e., `(void *)-1`) on failure, NOT `NULL`. Checking > + `== NULL` or `!= NULL` will miss the error and use an invalid pointer. > + ```c > + /* BAD - mmap never returns NULL on failure */ > + p = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); > + if (p == NULL) /* WRONG - will not catch MAP_FAILED */ > + return -1; > + > + /* GOOD */ > + p = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); > + if (p == MAP_FAILED) > + return -1; > + ``` > +- **Statistics accumulation using `=` instead of `+=`**: When accumulating > + statistics (counters, byte totals, packet counts), using `=` overwrites > + the running total with only the latest value. This silently produces > + wrong results. > + ```c > + /* BAD - overwrites instead of accumulating */ > + stats->rx_packets = nb_rx; > + stats->rx_bytes = total_bytes; > + > + /* GOOD - accumulates over time */ > + stats->rx_packets += nb_rx; > + stats->rx_bytes += total_bytes; > + ``` > + Note: `=` is correct for gauge-type values (e.g., queue depth, link > + status) and for initial assignment. Only flag when the context is > + clearly incremental accumulation (loop bodies, per-burst counters, > + callback tallies). > +- **Integer multiply without widening cast**: When multiplying integers > + to produce a result wider than the operands (sizes, offsets, byte > + counts), the multiplication is performed at the operand width and > + the upper bits are silently lost before the assignment. This applies > + to any narrowing scenario: 16×16 assigned to a 32-bit variable, > + 32×32 assigned to a 64-bit variable, etc. > + ```c > + /* BAD - 32×32 overflows before widening to 64 */ > + uint64_t total_size = num_entries * entry_size; /* both are uint32_t */ > + size_t offset = ring->idx * ring->desc_size; /* 32×32 → truncated */ > + > + /* BAD - 16×16 overflows before widening to 32 */ > + uint32_t byte_count = pkt_len * nb_segs; /* both are uint16_t */ > + > + /* GOOD - widen before multiply */ > + uint64_t total_size = (uint64_t)num_entries * entry_size; > + size_t offset = (size_t)ring->idx * ring->desc_size; > + uint32_t byte_count = (uint32_t)pkt_len * nb_segs; > + ``` > +- **Unbounded descriptor chain traversal**: When walking a chain of > + descriptors (virtio, DMA, NIC Rx/Tx rings) where the chain length > + or next-index comes from guest memory or an untrusted API caller, > + the traversal MUST have a bounds check or loop counter to prevent > + infinite loops or out-of-bounds access from malicious/corrupt data. > + ```c > + /* BAD - guest controls desc[idx].next with no bound */ > + while (desc[idx].flags & VRING_DESC_F_NEXT) { > + idx = desc[idx].next; /* guest-supplied, unbounded */ > + process(desc[idx]); > + } > + > + /* GOOD - cap iterations to descriptor ring size */ > + for (i = 0; i < ring_size; i++) { > + if (!(desc[idx].flags & VRING_DESC_F_NEXT)) > + break; > + idx = desc[idx].next; > + if (idx >= ring_size) /* bounds check */ > + return -EINVAL; > + process(desc[idx]); > + } > + ``` > + This applies to any chain/linked-list traversal where indices or > + pointers originate from untrusted input (guest VMs, user-space > + callers, network packets). > +- **Bitmask shift using `1` instead of `1ULL` on 64-bit masks**: The > + literal `1` is `int` (32 bits). Shifting it by 32 or more is > + undefined behavior; shifting it by less than 32 but assigning to a > + `uint64_t` silently zeroes the upper 32 bits. Use `1ULL << n`, > + `UINT64_C(1) << n`, or the DPDK `RTE_BIT64(n)` macro. > + ```c > + /* BAD - 1 is int, UB if n >= 32, wrong if result used as uint64_t */ > + uint64_t mask = 1 << bit_pos; > + if (features & (1 << VIRTIO_NET_F_MRG_RXBUF)) /* bit 15 OK, bit 32+ UB */ > + > + /* GOOD */ > + uint64_t mask = UINT64_C(1) << bit_pos; > + uint64_t mask = 1ULL << bit_pos; > + uint64_t mask = RTE_BIT64(bit_pos); /* preferred in DPDK */ > + if (features & RTE_BIT64(VIRTIO_NET_F_MRG_RXBUF)) > + ``` > + Note: `1U << n` is acceptable when the mask is known to be 32-bit > + (e.g., `uint32_t` register fields with `n < 32`). Only flag when > + the result is stored in, compared against, or returned as a 64-bit > + type, or when `n` could be >= 32. > +- **Variable overwrite before read (dead store)**: A variable is > + assigned a value that is unconditionally overwritten before it is > + ever read. This usually indicates a logic error (wrong variable > + name, missing `if`, copy-paste mistake) or at minimum is dead code. > + ```c > + /* BAD - first assignment is never read */ > + ret = validate_input(cfg); > + ret = apply_config(cfg); /* overwrites without checking first ret */ > + if (ret != 0) > + return ret; > + > + /* GOOD - check each return value */ > + ret = validate_input(cfg); > + if (ret != 0) > + return ret; > + ret = apply_config(cfg); > + if (ret != 0) > + return ret; > + ``` > + Do NOT flag cases where the initial value is intentionally a default > + that may or may not be overwritten (e.g., `int ret = 0;` followed > + by a conditional assignment). Only flag unconditional overwrites > + where the first value can never be observed. > +- **Shared loop counter in nested loops**: Using the same variable as > + the loop counter in both an outer and inner loop causes the outer > + loop to malfunction because the inner loop modifies its counter. > + ```c > + /* BAD - inner loop clobbers outer loop counter */ > + int i; > + for (i = 0; i < nb_queues; i++) { > + setup_queue(i); > + for (i = 0; i < nb_descs; i++) /* BUG: reuses i */ > + init_desc(i); > + } > + > + /* GOOD - distinct loop counters */ > + for (int i = 0; i < nb_queues; i++) { > + setup_queue(i); > + for (int j = 0; j < nb_descs; j++) > + init_desc(j); > + } > + ``` > +- **`memcpy`/`memcmp`/`memset` self-argument (same pointer as both > + operands)**: Passing the same pointer as both source and destination > + to `memcpy()` is undefined behavior per C99. Passing the same > + pointer to both arguments of `memcmp()` is a no-op that always > + returns 0, indicating a logic error (usually a copy-paste mistake > + with the wrong variable name). The same applies to `rte_memcpy()` > + and `memmove()` with identical arguments. > + ```c > + /* BAD - memcpy with same src and dst is undefined behavior */ > + memcpy(buf, buf, len); > + rte_memcpy(dst, dst, len); > + > + /* BAD - memcmp with same pointer always returns 0 (logic error) */ > + if (memcmp(key, key, KEY_LEN) == 0) /* always true, wrong variable? */ > + > + /* BAD - likely copy-paste: should be comparing two different MACs */ > + if (memcmp(ð->src_addr, ð->src_addr, RTE_ETHER_ADDR_LEN) == 0) > + > + /* GOOD - comparing two different things */ > + memcpy(dst, src, len); > + if (memcmp(ð->src_addr, ð->dst_addr, RTE_ETHER_ADDR_LEN) == 0) > + ``` > + This pattern almost always indicates a copy-paste bug where one of > + the arguments should be a different variable. > + > +### Architecture & Patterns > +- Code that violates existing patterns in the code base > +- Missing error handling > +- Code that is not safe against signals > + > +--- > + Much of the following sections can be written as part of checkpatch, which means we don't need to spend compute resources with the AI on it. For example, telling the AI that source files need to begin with SPDX identifiers, line lenghts, tag format order, tag parsing, etc. The downside is that if we ask the AI to *generate* code, then it won't follow these rules; but when we ask AI to *review* the code, it takes fewer tokens to submit and we can let the AI do the thing it really shines at - recognizing subtle patterns, rather than stuff we can write a python script to do. I'm approaching this from the perspective of running CI using this AGENTS.md file - but I might be wrong on the scope of this one, as it may be intended for something else (like using some IDE integrated extensions where that stuff can make sense to guide generation). As for the CI reviewing part, napkin math for Anthropic API shows this takes up about .30c / patch (~17k tok) we would submit, and that doesn't include whatever the patch context is - so I think we would exhaust any budget dedicated to this really early (probably within a few dozen patches). Maybe it's possible to split up the AGENTS.md file into stuff that's good for review and stuff that's good for generation. WDYT? > +## Source License Requirements > + > +### SPDX License Identifiers > + > +Every source file must begin with an SPDX license identifier, followed > +by the copyright notice, then a blank line before other content. > + > +- SPDX tag on first line (or second line for `#!` scripts) > +- Copyright line immediately follows > +- Blank line after copyright before any code/includes > +- Core libraries and drivers use `BSD-3-Clause` > +- Kernel components use `GPL-2.0` > +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` > + > +```c > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2024 John Smith > + */ > + > +#include <stdio.h> > +``` > + > +For scripts: > +```python > +#!/usr/bin/env python3 > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2024 Jane Doe > + > +import sys > +``` > + > +**Do not include boilerplate license text** - the SPDX identifier is sufficient. > + > +**Do NOT flag copyright years** - Copyright years reflect when code was written. Valid years range from 2013 (DPDK's first release) through the current year. Only flag years outside this range (e.g., years before 2013 or future years beyond the current date). > + > +**Copyright holders can be individuals or organizations** - Both are equally valid. NEVER comment on, question, or speculate about copyright holders. Do not mention employer policies, company resources, or suggest copyright "should" be assigned differently. The copyright holder's choice is not subject to review. > + > +**The following SPDX/copyright format is correct** - do NOT flag it or comment on it: > +```c > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2026 Stephen Hemminger > + */ > +``` > +This is the standard DPDK format. Do not say it is "different" or "unusual" - it is correct. > +Do not suggest the copyright year "should be verified" - if it's in the valid range (2013-current year), it's fine. > +If SPDX is on line 1 and copyright follows, this is CORRECT - do not include it in Errors. > + > +--- > + > +## Commit Message Requirements > + > +### Subject Line (First Line) > + > +| Rule | Limit | > +|------|-------| > +| Maximum length | **60 characters** | > +| Format | `component: lowercase description` | > +| Case | Lowercase except acronyms | > +| Mood | Imperative (instructions to codebase) | > +| Punctuation | **No trailing period** | > + > +**Before flagging subject line length**: Actually count the characters. Only flag if >60. Do not flag then correct yourself. > + > +``` > +# Good examples > +net/ixgbe: fix offload config option name > +config: increase max queues per port > +net/mlx5: add support for flow counters > +app/testpmd: fix memory leak in flow create > + > +# Bad examples > +Fixed the offload config option. # past tense, has period, no prefix > +net/ixgbe: Fix Offload Config # uppercase after colon > +ixgbe: fix something # wrong prefix, should be net/ixgbe > +lib/ethdev: add new feature # wrong prefix, should be ethdev: > +``` > + > +#### Headline Format Errors (from check-git-log.sh) > + > +The following are flagged as errors: > +- Tab characters in subject > +- Leading or trailing spaces > +- Trailing period (`.`) > +- Punctuation marks: `, ; ! ? & |` > +- Underscores after the colon (indicates code in subject) > +- Missing colon separator > +- No space after colon > +- Space before colon > + > +#### Common Prefix Mistakes > + > +| Wrong | Correct | > +|-------|---------| > +| `ixgbe:` | `net/ixgbe:` | > +| `lib/ethdev:` | `ethdev:` | > +| `example:` | `examples/foo:` | > +| `apps/` | `app/name:` | > +| `app/test:` | `test:` | > +| `testpmd:` | `app/testpmd:` | > +| `test-pmd:` | `app/testpmd:` | > +| `bond:` | `net/bonding:` | > + > +#### Case-Sensitive Terms (Commit Messages Only) > + > +These terms must use exact capitalization **in commit messages** (from `devtools/words-case.txt`): > +- `Rx`, `Tx` (not `RX`, `TX`, `rx`, `tx`) > +- `VF`, `PF` (not `vf`, `pf`) > +- `MAC`, `VLAN`, `RSS`, `API` > +- `Linux`, `Windows`, `FreeBSD` > +- Check `devtools/words-case.txt` for complete list > + > +**Note**: These rules apply to commit messages only, NOT to code comments or documentation. > + > +### Commit Body > + > +| Rule | Limit | > +|------|-------| > +| Line wrap | **75 characters** | > +| Exception | `Fixes:` lines may exceed 75 chars | > + > +Body guidelines: > +- Describe the issue being fixed or feature being added > +- Provide enough context for reviewers > +- **Do not start the commit message body with "It"** > +- **Must end with** `Signed-off-by:` line (real name, not alias) > + > +### Fixes Tag > + > +When fixing regressions, use the `Fixes:` tag with a 12-character abbreviated SHA: > + > +``` > +Fixes: abcdefgh1234 ("original commit subject") > +``` > + > +The hash must reference a commit in the current branch, and the subject must match exactly. > + > +**Do NOT flag Fixes tags** asking for verification that the commit "exists in the tree" or "cannot verify" - you cannot verify this from a patch review. If the format is correct (12-char SHA, quoted subject), accept it. NEVER say "cannot verify this exists". > + > +**Finding maintainers**: Use `devtools/get-maintainer.sh` to identify the current subsystem maintainer from the `MAINTAINERS` file, rather than CC'ing the original author: > + > +```bash > +git send-email --to-cmd ./devtools/get-maintainer.sh --cc dev@dpdk.org 000*.patch > +``` > + > +### Required Tags > + > +``` > +# For Coverity issues (required if "coverity" mentioned in body): > +Coverity issue: 12345 > + > +# For Bugzilla issues (required if "bugzilla" mentioned in body): > +Bugzilla ID: 12345 > + > +# For stable release backport candidates: > +Cc: stable@dpdk.org > + > +# For patch dependencies (in commit notes after ---): > +Depends-on: series-NNNNN ("Title of the series") > +``` > + > +### Tag Order > + > +Tags must appear in this order, with a blank line separating the two groups: > + > +**Group 1** (optional tags, no blank lines within this group): > +- Coverity issue: > +- Bugzilla ID: > +- Fixes: > +- Cc: > + > +**Blank line required here** (only if Group 1 tags are present) > + > +**Group 2** (no blank lines within this group): > +- Reported-by: > +- Suggested-by: > +- Signed-off-by: > +- Acked-by: > +- Reviewed-by: > +- Tested-by: > + > +**Correct examples:** > + > +Simple patch with no Group 1 tags (most common): > +``` > +The info_get callback doesn't need to check its args > +since already done by ethdev. > + > +Signed-off-by: John Smith <john@example.com> > +``` > + > +Patch with Fixes and Cc tags: > +``` > +Fixes: c743e50c475f ("null: new poll mode driver") > +Cc: stable@dpdk.org > + > +Signed-off-by: John Smith <john@example.com> > +``` > + > +Patch with only Fixes tag: > +``` > +Fixes: abcd1234abcd ("component: original commit") > + > +Signed-off-by: Jane Doe <jane@example.com> > +``` > + > +**What is correct (do NOT flag):** > +- Signed-off-by directly after commit body when there are no Group 1 tags - this is CORRECT > +- No blank line between `Fixes:` and `Cc:` - this is CORRECT > +- Blank line between `Cc:` (or last tag in Group 1) and `Signed-off-by:` - this is CORRECT > + > +**What is wrong (DO flag):** > +- Missing blank line between Group 1 tags and Group 2 tags (when Group 1 tags exist) > + > +**Tag format**: `Tag-name: Full Name <email@domain.com>` > + > +--- > + > +## C Coding Style > + > +### Line Length > + > +| Context | Limit | > +|---------|-------| > +| Source code | **100 characters** | > +| Commit body | **75 characters** | > + > +### General Formatting > + > +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) > +- **No trailing whitespace** on lines or at end of files > +- Files must end with a new line > +- Code style should be consistent within each file > + > + > +### Comments > + > +```c > +/* Most single-line comments look like this. */ > + > +/* > + * VERY important single-line comments look like this. > + */ > + > +/* > + * Multi-line comments look like this. Make them real sentences. Fill > + * them so they look like real paragraphs. > + */ > +``` > + > +### Header File Organization > + > +Include order (each group separated by blank line): > +1. System/libc includes > +2. DPDK EAL includes > +3. DPDK misc library includes > +4. Application-specific includes > + > +```c > +#include <stdio.h> > +#include <stdlib.h> > + > +#include <rte_eal.h> > + > +#include <rte_ring.h> > +#include <rte_mempool.h> > + > +#include "application.h" > +``` > + > +### Header Guards > + > +```c > +#ifndef _FILE_H_ > +#define _FILE_H_ > + > +/* Code */ > + > +#endif /* _FILE_H_ */ > +``` > + > +### Naming Conventions > + > +- **All external symbols** must have `RTE_` or `rte_` prefix > +- **Macros**: ALL_UPPERCASE with `RTE_` prefix > +- **Functions**: lowercase with underscores only (no CamelCase) > +- **Variables**: lowercase with underscores only > +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix > + > +**Exception**: Driver base directories (`drivers/*/base/`) may use different > +naming conventions when sharing code across platforms or with upstream vendor code. > + > +#### Symbol Naming for Static Linking > + > +Drivers and libraries must not expose global variables that could > +clash when statically linked with other DPDK components or > +applications. Use consistent and unique prefixes for all exported > +symbols to avoid namespace collisions. > + > +**Good practice**: Use a driver-specific or library-specific prefix for all global variables: > + > +```c > +/* Good - virtio driver uses consistent "virtio_" prefix */ > +const struct virtio_ops virtio_legacy_ops = { > + .read = virtio_legacy_read, > + .write = virtio_legacy_write, > + .configure = virtio_legacy_configure, > +}; > + > +const struct virtio_ops virtio_modern_ops = { > + .read = virtio_modern_read, > + .write = virtio_modern_write, > + .configure = virtio_modern_configure, > +}; > + > +/* Good - mlx5 driver uses consistent "mlx5_" prefix */ > +struct mlx5_flow_driver_ops mlx5_flow_dv_ops; > +``` > + > +**Bad practice**: Generic names that may clash: > + > +```c > +/* Bad - "ops" is too generic, will clash with other drivers */ > +const struct virtio_ops ops = { ... }; > + > +/* Bad - "legacy_ops" could clash with other legacy implementations */ > +const struct virtio_ops legacy_ops = { ... }; > + > +/* Bad - "driver_config" is not unique */ > +struct driver_config config; > +``` > + > +**Guidelines**: > +- Prefix all global variables with the driver or library name (e.g., `virtio_`, `mlx5_`, `ixgbe_`) > +- Prefix all global functions similarly unless they use the `rte_` namespace > +- Internal static variables do not require prefixes as they have file scope > +- Consider using the `RTE_` or `rte_` prefix only for symbols that are part of the public DPDK API > + > +#### Prohibited Terminology > + > +Do not use non-inclusive naming including: > +- `master/slave` -> Use: primary/secondary, controller/worker, leader/follower > +- `blacklist/whitelist` -> Use: denylist/allowlist, blocklist/passlist > +- `cripple` -> Use: impacted, degraded, restricted, immobolized > +- `tribe` -> Use: team, squad > +- `sanity check` -> Use: coherence check, test, verification > + > + > +### Comparisons and Boolean Logic > + > +```c > +/* Pointers - compare explicitly with NULL */ > +if (p == NULL) /* Good */ > +if (p != NULL) /* Good */ > +if (likely(p != NULL)) /* Good - likely/unlikely don't change this */ > +if (unlikely(p == NULL)) /* Good - likely/unlikely don't change this */ > +if (!p) /* Bad - don't use ! on pointers */ > + > +/* Integers - compare explicitly with zero */ > +if (a == 0) /* Good */ > +if (a != 0) /* Good */ > +if (errno != 0) /* Good - this IS explicit */ > +if (likely(a != 0)) /* Good - likely/unlikely don't change this */ > +if (!a) /* Bad - don't use ! on integers */ > +if (a) /* Bad - implicit, should be a != 0 */ > + > +/* Characters - compare with character constant */ > +if (*p == '\0') /* Good */ > + > +/* Booleans - direct test is acceptable */ > +if (flag) /* Good for actual bool types */ > +if (!flag) /* Good for actual bool types */ > +``` > + > +**Explicit comparison** means using `==` or `!=` operators (e.g., `x != 0`, `p == NULL`). > +**Implicit comparison** means relying on truthiness without an operator (e.g., `if (x)`, `if (!p)`). > +**Note**: `likely()` and `unlikely()` macros do NOT affect whether a comparison is explicit or implicit. > + > +### Boolean Usage > + > +- Using `bool` type is allowed > +- Prefer `bool` over `int` when a variable or field is only used as a boolean > +- For structure fields, consider if the size/alignment impact is acceptable > + > +### Indentation and Braces > + > +```c > +/* Control statements - no braces for single statements */ > +if (val != NULL) > + val = realloc(val, newsize); > + > +/* Braces on same line as else */ > +if (test) > + stmt; > +else if (bar) { > + stmt; > + stmt; > +} else > + stmt; > + > +/* Switch statements - don't indent case */ > +switch (ch) { > +case 'a': > + aflag = 1; > + /* FALLTHROUGH */ > +case 'b': > + bflag = 1; > + break; > +default: > + usage(); > +} > + > +/* Long conditions - double indent continuation */ > +if (really_long_variable_name_1 == really_long_variable_name_2 && > + really_long_variable_name_3 == really_long_variable_name_4) > + stmt; > +``` > + > +### Variable Declarations > + > +- Prefer declaring variables inside the basic block where they are used > +- Variables may be declared either at the start of the block, or at point of first use (C99 style) > +- Both declaration styles are acceptable; consistency within a function is preferred > +- Initialize variables only when a meaningful value exists at declaration time > +- Use C99 designated initializers for structures > + > +```c > +/* Good - declaration at start of block */ > +int ret; > +ret = some_function(); > + > +/* Also good - declaration at point of use (C99 style) */ > +for (int i = 0; i < count; i++) > + process(i); > + > +/* Good - declaration in inner block where variable is used */ > +if (condition) { > + int local_val = compute(); > + use(local_val); > +} > + > +/* Bad - unnecessary initialization defeats compiler warnings */ > +int ret = 0; > +ret = some_function(); /* Compiler won't warn if assignment removed */ > +``` > + > +### Function Format > + > +- Return type on its own line > +- Opening brace on its own line > +- Place an empty line between declarations and statements > + > +```c > +static char * > +function(int a1, int b1) > +{ > + char *p; > + > + p = do_something(a1, b1); > + return p; > +} > +``` > + > +--- > + > +## Unnecessary Code Patterns > + > +The following patterns add unnecessary code, hide bugs, or reduce performance. Avoid them. > + > +### Unnecessary Variable Initialization > + > +Do not initialize variables that will be assigned before use. This defeats the compiler's uninitialized variable warnings, hiding potential bugs. > + > +```c > +/* Bad - initialization defeats -Wuninitialized */ > +int ret = 0; > +if (condition) > + ret = func_a(); > +else > + ret = func_b(); > + > +/* Good - compiler will warn if any path misses assignment */ > +int ret; > +if (condition) > + ret = func_a(); > +else > + ret = func_b(); > + > +/* Good - meaningful initial value */ > +int count = 0; > +for (i = 0; i < n; i++) > + if (test(i)) > + count++; > +``` > + > +### Unnecessary Casts of void * > + > +In C, `void *` converts implicitly to any pointer type. Casting the result of `malloc()`, `calloc()`, `rte_malloc()`, or similar functions is unnecessary and can hide the error of a missing `#include <stdlib.h>`. > + > +```c > +/* Bad - unnecessary cast */ > +struct foo *p = (struct foo *)malloc(sizeof(*p)); > +struct bar *q = (struct bar *)rte_malloc(NULL, sizeof(*q), 0); > + > +/* Good - no cast needed in C */ > +struct foo *p = malloc(sizeof(*p)); > +struct bar *q = rte_malloc(NULL, sizeof(*q), 0); > +``` > + > +Note: Casts are required in C++ but DPDK is a C project. > + > +### Zero-Length Arrays vs Variable-Length Arrays > + > +Zero-length arrays (`int arr[0]`) are a GCC extension. Use C99 flexible array members instead. > + > +```c > +/* Bad - GCC extension */ > +struct msg { > + int len; > + char data[0]; > +}; > + > +/* Good - C99 flexible array member */ > +struct msg { > + int len; > + char data[]; > +}; > +``` > + > +### Unnecessary NULL Checks Before free() > + > +Functions like `free()`, `rte_free()`, and similar deallocation functions accept NULL pointers safely. Do not add redundant NULL checks. > + > +```c > +/* Bad - unnecessary check */ > +if (ptr != NULL) > + free(ptr); > + > +if (rte_ptr != NULL) > + rte_free(rte_ptr); > + > +/* Good - free handles NULL */ > +free(ptr); > +rte_free(rte_ptr); > +``` > + > +### memset Before free() (CWE-14) > + > +Do not call `memset()` to zero memory before freeing it. The compiler may optimize away the `memset()` as a dead store (CWE-14: Compiler Removal of Code to Clear Buffers). For security-sensitive data, use `explicit_bzero()`, `rte_memset_sensitive()`, or `rte_free_sensitive()` which the compiler is not permitted to eliminate. > + > +```c > +/* Bad - compiler may eliminate memset */ > +memset(secret_key, 0, sizeof(secret_key)); > +free(secret_key); > + > +/* Good - for non-sensitive data, just free */ > +free(ptr); > + > +/* Good - explicit_bzero cannot be optimized away */ > +explicit_bzero(secret_key, sizeof(secret_key)); > +free(secret_key); > + > +/* Good - DPDK wrapper for clearing sensitive data */ > +rte_memset_sensitive(secret_key, 0, sizeof(secret_key)); > +free(secret_key); > + > +/* Good - for rte_malloc'd sensitive data, combined clear+free */ > +rte_free_sensitive(secret_key); > +``` > + > +### Appropriate Use of rte_malloc() > + > +`rte_malloc()` allocates from hugepage memory. Use it only when required: > + > +- Memory that will be accessed by DMA (NIC descriptors, packet buffers) > +- Memory shared between primary and secondary DPDK processes > +- Memory requiring specific NUMA node placement > + > +For general allocations, use standard `malloc()` which is faster and does not consume limited hugepage resources. > + > +```c > +/* Bad - rte_malloc for ordinary data structure */ > +struct config *cfg = rte_malloc(NULL, sizeof(*cfg), 0); > + > +/* Good - standard malloc for control structures */ > +struct config *cfg = malloc(sizeof(*cfg)); > + > +/* Good - rte_malloc for DMA-accessible memory */ > +struct rte_mbuf *mbufs = rte_malloc(NULL, n * sizeof(*mbufs), RTE_CACHE_LINE_SIZE); > +``` > + > +### Appropriate Use of rte_memcpy() > + > +`rte_memcpy()` is optimized for bulk data transfer in the fast path. For general use, standard `memcpy()` is preferred because: > + > +- Modern compilers optimize `memcpy()` effectively > +- `memcpy()` includes bounds checking with `_FORTIFY_SOURCE` > +- `memcpy()` handles small fixed-size copies efficiently > + > +```c > +/* Bad - rte_memcpy in control path */ > +rte_memcpy(&config, &default_config, sizeof(config)); > + > +/* Good - standard memcpy for control path */ > +memcpy(&config, &default_config, sizeof(config)); > + > +/* Good - rte_memcpy for packet data in fast path */ > +rte_memcpy(rte_pktmbuf_mtod(m, void *), payload, len); > +``` > + > +--- > + > +## Forbidden Tokens > + > +### Functions > + > +| Forbidden | Preferred | Context | > +|-----------|-----------|---------| > +| `rte_panic()` | Return error codes | lib/, drivers/ | > +| `rte_exit()` | Return error codes | lib/, drivers/ | > +| `perror()` | `RTE_LOG()` with `strerror(errno)` | lib/, drivers/ (allowed in examples/, app/test/) | > +| `printf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | > +| `fprintf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | > + > +### Atomics and Memory Barriers > + > +| Forbidden | Preferred | > +|-----------|-----------| > +| `rte_atomic16/32/64_xxx()` | C11 atomics via `rte_atomic_xxx()` | > +| `rte_smp_mb()` | `rte_atomic_thread_fence()` | > +| `rte_smp_rmb()` | `rte_atomic_thread_fence()` | > +| `rte_smp_wmb()` | `rte_atomic_thread_fence()` | > +| `__sync_xxx()` | `rte_atomic_xxx()` | > +| `__atomic_xxx()` | `rte_atomic_xxx()` | > +| `__ATOMIC_RELAXED` etc. | `rte_memory_order_xxx` | > +| `__rte_atomic_thread_fence()` | `rte_atomic_thread_fence()` | > + > +### Threading > + > +| Forbidden | Preferred | > +|-----------|-----------| > +| `pthread_create()` | `rte_thread_create()` | > +| `pthread_join()` | `rte_thread_join()` | > +| `pthread_detach()` | EAL thread functions | > +| `pthread_setaffinity_np()` | `rte_thread_set_affinity()` | > +| `rte_thread_set_name()` | `rte_thread_set_prefixed_name()` | > +| `rte_thread_create_control()` | `rte_thread_create_internal_control()` | > + > +### Process-Shared Synchronization > + > +When placing synchronization primitives in shared memory (memory accessible by multiple processes, such as DPDK primary/secondary processes or `mmap`'d regions), they **must** be initialized with process-shared attributes. Failure to do so causes **undefined behavior** that may appear to work in testing but fail unpredictably in production. > + > +#### pthread Mutexes in Shared Memory > + > +**This is an error** - mutex in shared memory without `PTHREAD_PROCESS_SHARED`: > + > +```c > +/* BAD - undefined behavior when used across processes */ > +struct shared_data { > + pthread_mutex_t lock; > + int counter; > +}; > + > +void init_shared(struct shared_data *shm) { > + pthread_mutex_init(&shm->lock, NULL); /* ERROR: missing pshared attribute */ > +} > +``` > + > +**Correct implementation**: > + > +```c > +/* GOOD - properly initialized for cross-process use */ > +struct shared_data { > + pthread_mutex_t lock; > + int counter; > +}; > + > +int init_shared(struct shared_data *shm) { > + pthread_mutexattr_t attr; > + int ret; > + > + ret = pthread_mutexattr_init(&attr); > + if (ret != 0) > + return -ret; > + > + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); > + if (ret != 0) { > + pthread_mutexattr_destroy(&attr); > + return -ret; > + } > + > + ret = pthread_mutex_init(&shm->lock, &attr); > + pthread_mutexattr_destroy(&attr); > + > + return -ret; > +} > +``` > + > +#### pthread Condition Variables in Shared Memory > + > +Condition variables also require the process-shared attribute: > + > +```c > +/* BAD - will not work correctly across processes */ > +pthread_cond_init(&shm->cond, NULL); > + > +/* GOOD */ > +pthread_condattr_t cattr; > +pthread_condattr_init(&cattr); > +pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED); > +pthread_cond_init(&shm->cond, &cattr); > +pthread_condattr_destroy(&cattr); > +``` > + > +#### pthread Read-Write Locks in Shared Memory > + > +```c > +/* BAD */ > +pthread_rwlock_init(&shm->rwlock, NULL); > + > +/* GOOD */ > +pthread_rwlockattr_t rwattr; > +pthread_rwlockattr_init(&rwattr); > +pthread_rwlockattr_setpshared(&rwattr, PTHREAD_PROCESS_SHARED); > +pthread_rwlock_init(&shm->rwlock, &rwattr); > +pthread_rwlockattr_destroy(&rwattr); > +``` > + > +#### When to Flag This Issue > + > +Flag as an **Error** when ALL of the following are true: > +1. A `pthread_mutex_t`, `pthread_cond_t`, `pthread_rwlock_t`, or `pthread_barrier_t` is initialized > +2. The primitive is stored in shared memory (identified by context such as: structure in `rte_malloc`/`rte_memzone`, `mmap`'d memory, memory passed to secondary processes, or structures documented as shared) > +3. The initialization uses `NULL` attributes or attributes without `PTHREAD_PROCESS_SHARED` > + > +**Do NOT flag** when: > +- The mutex is in thread-local or process-private heap memory (`malloc`) > +- The mutex is a local/static variable not in shared memory > +- The code already uses `pthread_mutexattr_setpshared()` with `PTHREAD_PROCESS_SHARED` > +- The synchronization uses DPDK primitives (`rte_spinlock_t`, `rte_rwlock_t`) which are designed for shared memory > + > +#### Preferred Alternatives > + > +For DPDK code, prefer DPDK's own synchronization primitives which are designed for shared memory: > + > +| pthread Primitive | DPDK Alternative | > +|-------------------|------------------| > +| `pthread_mutex_t` | `rte_spinlock_t` (busy-wait) or properly initialized pthread mutex | > +| `pthread_rwlock_t` | `rte_rwlock_t` | > +| `pthread_spinlock_t` | `rte_spinlock_t` | > + > +Note: `rte_spinlock_t` and `rte_rwlock_t` work correctly in shared memory without special initialization, but they are spinning locks unsuitable for long wait times. > + > +### Compiler Built-ins and Attributes > + > +| Forbidden | Preferred | Notes | > +|-----------|-----------|-------| > +| `__attribute__` | RTE macros in `rte_common.h` | Except in `lib/eal/include/rte_common.h` | > +| `__alignof__` | C11 `alignof` | | > +| `__typeof__` | `typeof` | | > +| `__builtin_*` | EAL macros | Except in `lib/eal/` and `drivers/*/base/` | > +| `__reserved` | Different name | Reserved in Windows headers | > +| `#pragma` / `_Pragma` | Avoid | Except in `rte_common.h` | > + > +### Format Specifiers > + > +| Forbidden | Preferred | > +|-----------|-----------| > +| `%lld`, `%llu`, `%llx` | `%PRId64`, `%PRIu64`, `%PRIx64` | > + > +### Headers and Build > + > +| Forbidden | Preferred | Context | > +|-----------|-----------|---------| > +| `#include <linux/pci_regs.h>` | `#include <rte_pci.h>` | | > +| `install_headers()` | Meson `headers` variable | meson.build | > +| `-DALLOW_EXPERIMENTAL_API` | Not in lib/drivers/app | Build flags | > +| `allow_experimental_apis` | Not in lib/drivers/app | Meson | > +| `#undef XXX` | `// XXX is not set` | config/rte_config.h | > +| Driver headers (`*_driver.h`, `*_pmd.h`) | Public API headers | app/, examples/ | > + > +### Testing > + > +| Forbidden | Preferred | > +|-----------|-----------| > +| `REGISTER_TEST_COMMAND` | `REGISTER_<suite_name>_TEST` | > + > +### Documentation > + > +| Forbidden | Preferred | > +|-----------|-----------| > +| `http://...dpdk.org` | `https://...dpdk.org` | > +| `//doc.dpdk.org/guides/...` | `:ref:` or `:doc:` Sphinx references | > +| `:: file.svg` | `:: file.*` (wildcard extension) | > + > +--- > + > +## Deprecated API Usage > + > +New patches must not introduce usage of deprecated APIs, macros, or functions. > +Deprecated items are marked with `RTE_DEPRECATED` or documented in the > +deprecation notices section of the release notes. > + > +### Rules for New Code > + > +- Do not call functions marked with `RTE_DEPRECATED` or `__rte_deprecated` > +- Do not use macros that have been superseded by newer alternatives > +- Do not use data structures or enum values marked as deprecated > +- Check `doc/guides/rel_notes/deprecation.rst` for planned deprecations > +- When a deprecated API has a replacement, use the replacement > + > +### Deprecating APIs > + > +A patch may mark an API as deprecated provided: > + > +- No remaining usages exist in the current DPDK codebase > +- The deprecation is documented in the release notes > +- A migration path or replacement API is documented > +- The `RTE_DEPRECATED` macro is used to generate compiler warnings > + > +```c > +/* Marking a function as deprecated */ > +__rte_deprecated > +int > +rte_old_function(void); > + > +/* With a message pointing to the replacement */ > +__rte_deprecated_msg("use rte_new_function() instead") > +int > +rte_old_function(void); > +``` > + > +### Common Deprecated Patterns > + > +| Deprecated | Replacement | Notes | > +|-----------|-------------|-------| > +| `rte_atomic*_t` types | C11 atomics | Use `rte_atomic_xxx()` wrappers | > +| `rte_smp_*mb()` barriers | `rte_atomic_thread_fence()` | See Atomics section | > +| `pthread_*()` in portable code | `rte_thread_*()` | See Threading section | > + > +When reviewing patches that add new code, flag any usage of deprecated APIs > +as requiring change to use the modern replacement. > + > +--- > + > +## API Tag Requirements > + > +### `__rte_experimental` > + > +- Must appear **alone on the line** immediately preceding the return type > +- Only allowed in **header files** (not `.c` files) > + > +```c > +/* Correct */ > +__rte_experimental > +int > +rte_new_feature(void); > + > +/* Wrong - not alone on line */ > +__rte_experimental int rte_new_feature(void); > + > +/* Wrong - in .c file */ > +``` > + > +### `__rte_internal` > + > +- Must appear **alone on the line** immediately preceding the return type > +- Only allowed in **header files** (not `.c` files) > + > +```c > +/* Correct */ > +__rte_internal > +int > +internal_function(void); > +``` > + > +### Alignment Attributes > + > +`__rte_aligned`, `__rte_cache_aligned`, `__rte_cache_min_aligned` may only be used with `struct` or `union` types: > + > +```c > +/* Correct */ > +struct __rte_cache_aligned my_struct { > + /* ... */ > +}; > + > +/* Wrong */ > +int __rte_cache_aligned my_variable; > +``` > + > +### Packed Attributes > + > +- `__rte_packed_begin` must follow `struct`, `union`, or alignment attributes > +- `__rte_packed_begin` and `__rte_packed_end` must be used in pairs > +- Cannot use `__rte_packed_begin` with `enum` > + > +```c > +/* Correct */ > +struct __rte_packed_begin my_packed_struct { > + /* ... */ > +} __rte_packed_end; > + > +/* Wrong - with enum */ > +enum __rte_packed_begin my_enum { > + /* ... */ > +}; > +``` > + > +--- > + > +## Code Quality Requirements > + > +### Compilation > + > +- Each commit must compile independently (for `git bisect`) > +- No forward dependencies within a patchset > +- Test with multiple targets, compilers, and options > +- Use `devtools/test-meson-builds.sh` > + > +**Note for AI reviewers**: You cannot verify compilation order or cross-patch dependencies from patch review alone. Do NOT flag patches claiming they "would fail to compile" based on symbols used in other patches in the series. Assume the patch author has ordered them correctly. > + > +### Testing > + > +- Add tests to `app/test` unit test framework > +- New API functions must be used in `/app` test directory > +- New device APIs require at least one driver implementation > + > +#### Functional Test Infrastructure > + > +Standalone functional tests should use the `TEST_ASSERT` macros and `unit_test_suite_runner` infrastructure for consistency and proper integration with the DPDK test framework. > + > +```c > +#include <rte_test.h> > + > +static int > +test_feature_basic(void) > +{ > + int ret; > + > + ret = rte_feature_init(); > + TEST_ASSERT_SUCCESS(ret, "Failed to initialize feature"); > + > + ret = rte_feature_operation(); > + TEST_ASSERT_EQUAL(ret, 0, "Operation returned unexpected value"); > + > + TEST_ASSERT_NOT_NULL(rte_feature_get_ptr(), > + "Feature pointer should not be NULL"); > + > + return TEST_SUCCESS; > +} > + > +static struct unit_test_suite feature_testsuite = { > + .suite_name = "feature_autotest", > + .setup = test_feature_setup, > + .teardown = test_feature_teardown, > + .unit_test_cases = { > + TEST_CASE(test_feature_basic), > + TEST_CASE(test_feature_advanced), > + TEST_CASES_END() > + } > +}; > + > +static int > +test_feature(void) > +{ > + return unit_test_suite_runner(&feature_testsuite); > +} > + > +REGISTER_FAST_TEST(feature_autotest, NOHUGE_OK, ASAN_OK, test_feature); > +``` > + > +The `REGISTER_FAST_TEST` macro parameters are: > +- Test name (e.g., `feature_autotest`) > +- `NOHUGE_OK` or `HUGEPAGES_REQUIRED` - whether test can run without hugepages > +- `ASAN_OK` or `ASAN_FAILS` - whether test is compatible with Address Sanitizer > +- Test function name > + > +Common `TEST_ASSERT` macros: > +- `TEST_ASSERT(cond, msg, ...)` - Assert condition is true > +- `TEST_ASSERT_SUCCESS(val, msg, ...)` - Assert value equals 0 > +- `TEST_ASSERT_FAIL(val, msg, ...)` - Assert value is non-zero > +- `TEST_ASSERT_EQUAL(a, b, msg, ...)` - Assert two values are equal > +- `TEST_ASSERT_NOT_EQUAL(a, b, msg, ...)` - Assert two values differ > +- `TEST_ASSERT_NULL(val, msg, ...)` - Assert value is NULL > +- `TEST_ASSERT_NOT_NULL(val, msg, ...)` - Assert value is not NULL > + > +### Documentation > + > +- Add Doxygen comments for public APIs > +- Update release notes in `doc/guides/rel_notes/` for important changes > +- Code and documentation must be updated atomically in same patch > +- Only update the **current release** notes file > +- Documentation must match the code > +- PMD features must match the features matrix in `doc/guides/nics/features/` > +- Documentation must match device operations (see `doc/guides/nics/features.rst` for the mapping between features, `eth_dev_ops`, and related APIs) > +- Release notes are NOT required for: > + - Test-only changes (unit tests, functional tests) > + - Internal APIs and helper functions (not exported to applications) > + - Internal implementation changes that don't affect public API > + > +### RST Documentation Style > + > +When reviewing `.rst` documentation files, prefer **definition lists** > +over simple bullet lists where each item has a term and a description. > +Definition lists produce better-structured HTML/PDF output and are > +easier to scan. > + > +**When to suggest a definition list:** > +- A bullet list where each item starts with a bold or emphasized term > + followed by a dash, colon, or long explanation > +- Lists of options, parameters, configuration values, or features > + where each entry has a name and a description > +- Glossary-style enumerations > + > +**When a simple list is fine (do NOT flag):** > +- Short lists of items without descriptions (e.g., file names, steps) > +- Lists where items are single phrases or sentences with no term/definition structure > +- Enumerated steps in a procedure > + > +**RST definition list syntax:** > + > +```rst > +term 1 > + Description of term 1. > + > +term 2 > + Description of term 2. > + Can span multiple lines. > +``` > + > +**Example — flag this pattern:** > + > +```rst > +* **error** - Fail with error (default) > +* **truncate** - Truncate content to fit token limit > +* **summary** - Request high-level summary review > +``` > + > +**Suggest rewriting as:** > + > +```rst > +error > + Fail with error (default). > + > +truncate > + Truncate content to fit token limit. > + > +summary > + Request high-level summary review. > +``` > + > +This is a **Warning**-level suggestion, not an Error. Do not flag it > +when the existing list structure is appropriate (see "when a simple > +list is fine" above). > + > +### API and Driver Changes > + > +- New APIs must be marked as `__rte_experimental` > +- New APIs must have hooks in `app/testpmd` and tests in the functional test suite > +- Changes to existing APIs require release notes > +- New drivers or subsystems must have release notes > +- Internal APIs (used only within DPDK, not exported to applications) do NOT require release notes > + > +### ABI Compatibility and Symbol Exports > + > +**IMPORTANT**: DPDK uses automatic symbol map generation. Do **NOT** recommend > +manually editing `version.map` files - they are auto-generated from source code > +annotations. > + > +#### Symbol Export Macros > + > +New public functions must be annotated with export macros (defined in > +`rte_export.h`). Place the macro on the line immediately before the function > +definition in the `.c` file: > + > +```c > +/* For stable ABI symbols */ > +RTE_EXPORT_SYMBOL(rte_foo_create) > +int > +rte_foo_create(struct rte_foo_config *config) > +{ > + /* ... */ > +} > + > +/* For experimental symbols (include version when first added) */ > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_foo_new_feature, 25.03) > +__rte_experimental > +int > +rte_foo_new_feature(void) > +{ > + /* ... */ > +} > + > +/* For internal symbols (shared between DPDK components only) */ > +RTE_EXPORT_INTERNAL_SYMBOL(rte_foo_internal_helper) > +int > +rte_foo_internal_helper(void) > +{ > + /* ... */ > +} > +``` > + > +#### Symbol Export Rules > + > +- `RTE_EXPORT_SYMBOL` - Use for stable ABI functions > +- `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, ver)` - Use for new experimental APIs > + (version is the DPDK release, e.g., `25.03`) > +- `RTE_EXPORT_INTERNAL_SYMBOL` - Use for functions shared between DPDK libs/drivers > + but not part of public API > +- Export macros go in `.c` files, not headers > +- The build system generates linker version maps automatically > + > +#### What NOT to Review > + > +- Do **NOT** flag missing `version.map` updates - maps are auto-generated > +- Do **NOT** suggest adding symbols to `lib/*/version.map` files > + > +#### ABI Versioning for Changed Functions > + > +When changing the signature of an existing stable function, use versioning macros > +from `rte_function_versioning.h`: > + > +- `RTE_VERSION_SYMBOL` - Create versioned symbol for backward compatibility > +- `RTE_DEFAULT_SYMBOL` - Mark the new default version > + > +Follow ABI policy and versioning guidelines in the contributor documentation. > +Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable. > + > +--- > + > +## LTS (Long Term Stable) Release Review > + > +LTS releases are DPDK versions ending in `.11` (e.g., 23.11, 22.11, > +21.11, 20.11, 19.11). When reviewing patches targeting an LTS branch, > +apply stricter criteria: > + > +### LTS-Specific Rules > + > +- **Only bug fixes allowed** -- no new features > +- **No new APIs** (experimental or stable) > +- **ABI must remain unchanged** -- no symbol additions, removals, > + or signature changes > +- Backported fixes should reference the original commit with a > + `Fixes:` tag > +- Copyright years should reflect when the code was originally > + written > +- Be conservative: reject changes that are not clearly bug fixes > + > +### What to Flag on LTS Branches > + > +**Error:** > +- New feature code (new functions, new driver capabilities) > +- New experimental or stable API additions > +- ABI changes (new or removed symbols, changed function signatures) > +- Changes that add new configuration options or parameters > + > +**Warning:** > +- Large refactoring that goes beyond what is needed for a fix > +- Missing `Fixes:` tag on a backported bug fix > +- Missing `Cc: stable@dpdk.org` > + > +### When LTS Rules Apply > + > +LTS rules apply when the reviewer is told the target release is an > +LTS version (via the `--release` option or equivalent). If no > +release is specified, assume the patch targets the main development > +branch where new features and APIs are allowed. > + > +--- > + > +## Patch Validation Checklist > + > +### Commit Message > + > +- [ ] Subject line <=60 characters > +- [ ] Subject is lowercase (except acronyms from words-case.txt) > +- [ ] Correct component prefix (e.g., `net/ixgbe:` not `ixgbe:`) > +- [ ] No `lib/` prefix for libraries > +- [ ] Imperative mood, no trailing period > +- [ ] No tabs, leading/trailing spaces, or punctuation marks > +- [ ] Body wrapped at 75 characters > +- [ ] Body does not start with "It" > +- [ ] `Signed-off-by:` present with real name and valid email > +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA and exact subject > +- [ ] `Coverity issue:` tag present if Coverity mentioned > +- [ ] `Bugzilla ID:` tag present if Bugzilla mentioned > +- [ ] `Cc: stable@dpdk.org` for stable backport candidates > +- [ ] Tags in correct order with blank line separator > + > +### License > + > +- [ ] SPDX identifier on first line (or second for scripts) > +- [ ] Copyright line follows SPDX > +- [ ] Blank line after copyright before code > +- [ ] Appropriate license for file type > + > +### Code Style > + > +- [ ] Lines <=100 characters > +- [ ] Hard tabs for indentation, spaces for alignment > +- [ ] No trailing whitespace > +- [ ] Proper include order > +- [ ] Header guards present > +- [ ] `rte_`/`RTE_` prefix on external symbols > +- [ ] Driver/library global variables use unique prefixes (e.g., `virtio_`, `mlx5_`) > +- [ ] No prohibited terminology > +- [ ] Proper brace style > +- [ ] Function return type on own line > +- [ ] Explicit comparisons: `== NULL`, `== 0`, `!= NULL`, `!= 0` > +- [ ] No forbidden tokens (see table above) > +- [ ] No unnecessary code patterns (see section above) > +- [ ] No usage of deprecated APIs, macros, or functions > +- [ ] Process-shared primitives in shared memory use `PTHREAD_PROCESS_SHARED` > +- [ ] `mmap()` return checked against `MAP_FAILED`, not `NULL` > +- [ ] Statistics use `+=` not `=` for accumulation > +- [ ] Integer multiplies widened before operation when result is 64-bit > +- [ ] Descriptor chain traversals bounded by ring size or loop counter > +- [ ] 64-bit bitmasks use `1ULL <<` or `RTE_BIT64()`, not `1 <<` > +- [ ] No unconditional variable overwrites before read > +- [ ] Nested loops use distinct counter variables > +- [ ] No `memcpy`/`memcmp` with identical source and destination pointers > +- [ ] Sensitive data cleared with `explicit_bzero()`/`rte_free_sensitive()`, not `memset()` > + > +### API Tags > + > +- [ ] `__rte_experimental` alone on line, only in headers > +- [ ] `__rte_internal` alone on line, only in headers > +- [ ] Alignment attributes only on struct/union > +- [ ] Packed attributes properly paired > +- [ ] New public functions have `RTE_EXPORT_*` macro in `.c` file > +- [ ] Experimental functions use `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, version)` > + > +### Structure > + > +- [ ] Each commit compiles independently > +- [ ] Code and docs updated together > +- [ ] Documentation matches code behavior > +- [ ] RST docs use definition lists for term/description patterns > +- [ ] PMD features match `doc/guides/nics/features/` matrix > +- [ ] Device operations match documentation (per `features.rst` mappings) > +- [ ] Tests added/updated as needed > +- [ ] Functional tests use TEST_ASSERT macros and unit_test_suite_runner > +- [ ] New APIs marked as `__rte_experimental` > +- [ ] New APIs have testpmd hooks and functional tests > +- [ ] Current release notes updated for significant changes > +- [ ] Release notes updated for API changes > +- [ ] Release notes updated for new drivers or subsystems > + > +--- > + > +## Meson Build Files > + > +### Style Requirements > + > +- 4-space indentation (no tabs) > +- Line continuations double-indented > +- Lists alphabetically ordered > +- Short lists (<=3 items): single line, no trailing comma > +- Long lists: one item per line, trailing comma on last item > +- No strict line length limit for meson files; lines under 100 characters are acceptable > + > +```python > +# Short list > +sources = files('file1.c', 'file2.c') > + > +# Long list > +headers = files( > + 'header1.h', > + 'header2.h', > + 'header3.h', > +) > +``` > + > +--- > + > +## Python Code > + > +- Must comply with formatting standards > +- Use **`black`** for code formatting validation > +- Line length acceptable up to 100 characters > + > +--- > + > +## Validation Tools > + > +Run these before submitting: > + > +```bash > +# Check commit messages > +devtools/check-git-log.sh -n1 > + > +# Check patch format and forbidden tokens > +devtools/checkpatches.sh -n1 > + > +# Check maintainers coverage > +devtools/check-maintainers.sh > + > +# Build validation > +devtools/test-meson-builds.sh > + > +# Find maintainers for your patch > +devtools/get-maintainer.sh <patch-file> > +``` > + > +--- > + > +## Severity Levels for AI Review > + > +**Error** (must fix): > + > +*Correctness bugs (highest value findings):* > +- Use-after-free > +- Resource leaks on error paths (memory, file descriptors, locks) > +- Double-free or double-close > +- NULL pointer dereference on reachable code path > +- Buffer overflow or out-of-bounds access > +- Missing error check on a function that can fail, leading to undefined behavior > +- Race condition on shared mutable state without synchronization > +- Error path that skips necessary cleanup > +- `mmap()` return value checked against NULL instead of `MAP_FAILED` > +- Statistics accumulation using `=` instead of `+=` (overwrite vs increment) > +- Integer multiply without widening cast losing upper bits (16×16, 32×32, etc.) > +- Unbounded descriptor chain traversal on guest/API-supplied indices > +- `1 << n` used for 64-bit bitmask (undefined behavior if n >= 32) > +- Variable assigned then unconditionally overwritten before read > +- Same variable used as counter in nested loops > +- `memcpy`/`memcmp` with same pointer as both arguments (UB or no-op logic error) > + > +*Process and format errors:* > +- Missing or malformed SPDX license > +- Missing Signed-off-by > +- Subject line over 60 characters > +- Body lines over 75 characters > +- Wrong tag order or format > +- Missing required tags (Fixes, Coverity issue, Bugzilla ID) > +- Forbidden tokens in code > +- `__rte_experimental`/`__rte_internal` in .c files or not alone on line > +- Compilation failures > +- ABI breaks without proper versioning > +- pthread mutex/cond/rwlock in shared memory without `PTHREAD_PROCESS_SHARED` > + > +**Warning** (should fix): > +- Subject line style issues (case, punctuation) > +- Wrong component prefix > +- Missing Cc: stable@dpdk.org for fixes > +- Documentation gaps > +- Documentation does not match code behavior > +- PMD features missing from `doc/guides/nics/features/` matrix > +- Device operations not documented per `features.rst` mappings > +- Missing tests > +- Functional tests not using TEST_ASSERT macros or unit_test_suite_runner > +- New API not marked as `__rte_experimental` > +- New API without testpmd hooks or functional tests > +- New public function missing `RTE_EXPORT_*` macro > +- API changes without release notes > +- New drivers or subsystems without release notes > +- Implicit comparisons (`!ptr` instead of `ptr == NULL`) > +- Unnecessary variable initialization > +- Unnecessary casts of `void *` > +- Unnecessary NULL checks before free > +- Inappropriate use of `rte_malloc()` or `rte_memcpy()` > +- Use of `perror()`, `printf()`, `fprintf()` in libraries or drivers (allowed in examples and test code) > +- Driver/library global variables without unique prefixes (static linking clash risk) > +- Usage of deprecated APIs, macros, or functions in new code > +- RST documentation using bullet lists where definition lists would be more appropriate > + > +**Do NOT flag** (common false positives): > +- Missing `version.map` updates (maps are auto-generated from `RTE_EXPORT_*` macros) > +- Suggesting manual edits to any `version.map` file > +- Copyright years within valid range (2013 to current year) > +- Copyright held by individuals (never speculate about employers, company policies, or who "should" hold copyright) > +- SPDX/copyright format that matches the standard DPDK format (do not call it "different" or "unusual") > +- Meson file lines under 100 characters > +- Case-sensitive term violations in code comments (words-case.txt applies to commit messages only) > +- Comparisons using `== 0`, `!= 0`, `== NULL`, `!= NULL` as "implicit" (these ARE explicit) > +- Comparisons wrapped in `likely()` or `unlikely()` macros - these are still explicit if using == or != > +- Anything you determine is correct (do not mention non-issues or say "No issue here") > +- `REGISTER_FAST_TEST` using `NOHUGE_OK`/`ASAN_OK` macros (this is the correct current format) > +- Tag ordering: Signed-off-by directly after commit body (when no Fixes/Cc tags) is CORRECT > +- Tag ordering: no blank line between Fixes/Cc, blank line before Signed-off-by (when Fixes/Cc present) is CORRECT > +- Missing release notes for test-only changes (unit tests do not require release notes) > +- Missing release notes for internal APIs or helper functions (only public APIs need release notes) > +- Subject lines that are within the 60 character limit (count first, do not guess) > +- Any item you later correct with "(Correction: ...)" or "actually acceptable" - just omit it > +- Vague concerns ("should be verified", "should be checked") - if you're not sure it's wrong, don't flag it > +- Items where you say "which is correct" or "this is correct" - if it's correct, don't mention it at all > +- Fixes tags with correct format (12-char SHA, quoted subject) - NEVER ask for verification that commit exists > +- Items where you conclude "no issue here" or "this is actually correct" - omit these entirely > +- Clean patches in a series - do not include a patch just to say "no issues" or describe what it does > +- Cross-patch compilation dependencies - you cannot determine patch ordering correctness from review > +- Claims that a symbol "was removed in patch N" causing issues in patch M - assume author ordered correctly > +- Any speculation about whether patches will compile when applied in sequence > +- Mutexes/locks in process-private memory (standard `malloc`, stack, static non-shared) - these don't need `PTHREAD_PROCESS_SHARED` > +- Use of `rte_spinlock_t` or `rte_rwlock_t` in shared memory (these work correctly without special init) > + > +**Info** (consider): > +- Minor style preferences > +- Optimization suggestions > +- Alternative approaches > + > +--- > + > +# Response Format > + > +When you identify an issue: > +1. **State the problem** (1 sentence) > +2. **Why it matters** (1 sentence, only if not obvious) > +3. **Suggested fix** (code snippet or specific action) > + > +Example: > +This could panic if the string is NULL. > + > +--- > + > +## FINAL CHECK BEFORE SUBMITTING REVIEW > + > +Before outputting your review, do two separate passes: > + > +### Pass 1: Verify correctness bugs are included > + > +Ask: "Did I trace every error path for resource leaks? Did I check > +for use-after-free? Did I verify error codes are propagated?" > + > +If you identified a potential correctness bug but talked yourself > +out of it, **add it back**. It is better to report a possible bug > +than to miss a real one. > + > +### Pass 2: Remove style/process false positives > + > +For EACH style/process item, ask: "Did I conclude this is actually > +fine/correct/acceptable/no issue?" > + > +If YES, DELETE THAT ITEM. It should not be in your output. > + > +An item that says "X is wrong... actually this is correct" is a > +FALSE POSITIVE and must be removed. This applies to style, format, > +and process items only. > + > +**If your Errors section would be empty after this check, that's > +fine -- it means the patches are good.** ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v9 1/6] doc: add AGENTS.md for AI code review tools 2026-02-18 14:59 ` Aaron Conole @ 2026-02-19 17:47 ` Stephen Hemminger 2026-02-19 18:04 ` Stephen Hemminger 1 sibling, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-19 17:47 UTC (permalink / raw) To: Aaron Conole; +Cc: dev On Wed, 18 Feb 2026 09:59:21 -0500 Aaron Conole <aconole@redhat.com> wrote: > Much of the following sections can be written as part of checkpatch, > which means we don't need to spend compute resources with the AI on it. > For example, telling the AI that source files need to begin with SPDX > identifiers, line lenghts, tag format order, tag parsing, etc. The > downside is that if we ask the AI to *generate* code, then it won't > follow these rules; but when we ask AI to *review* the code, it takes > fewer tokens to submit and we can let the AI do the thing it really > shines at - recognizing subtle patterns, rather than stuff we can write > a python script to do. Well, checkpatch is slow, misses somethings and doesn't provide any good rationale back to developer. My experience is that the cost of AI is per-input token. And AGENTS.md is also integrated into IDE's like Visual Studio so it will get coverage there where checkpatch will not. > > I'm approaching this from the perspective of running CI using this > AGENTS.md file - but I might be wrong on the scope of this one, as it > may be intended for something else (like using some IDE integrated > extensions where that stuff can make sense to guide generation). I was starting look at kernel review-prompts then making sure other anti-patterns in DPDK got covered. > > As for the CI reviewing part, napkin math for Anthropic API shows this > takes up about .30c / patch (~17k tok) we would submit, and that doesn't > include whatever the patch context is - so I think we would exhaust any > budget dedicated to this really early (probably within a few dozen > patches). For any reasonable size patchset I end up splitting. For me having more through review is higher priority than keeping cost down (at this point). > > Maybe it's possible to split up the AGENTS.md file into stuff that's > good for review and stuff that's good for generation. WDYT? Yes and no. Yes it could be split, but I use it for patch and existing code review already. This is a bigger question, what do other projects do? Being a "trail blazer" in this area is just extra wasted effort. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH v9 1/6] doc: add AGENTS.md for AI code review tools 2026-02-18 14:59 ` Aaron Conole 2026-02-19 17:47 ` Stephen Hemminger @ 2026-02-19 18:04 ` Stephen Hemminger 1 sibling, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-19 18:04 UTC (permalink / raw) To: Aaron Conole; +Cc: dev On Wed, 18 Feb 2026 09:59:21 -0500 Aaron Conole <aconole@redhat.com> wrote: > Much of the following sections can be written as part of checkpatch, > which means we don't need to spend compute resources with the AI on it. > For example, telling the AI that source files need to begin with SPDX > identifiers, line lenghts, tag format order, tag parsing, etc. The > downside is that if we ask the AI to *generate* code, then it won't > follow these rules; but when we ask AI to *review* the code, it takes > fewer tokens to submit and we can let the AI do the thing it really > shines at - recognizing subtle patterns, rather than stuff we can write > a python script to do. Also, the script now has many checkpatch ish things that are not in current script (and would bloat it). Examples are: * using const on function pointer tables * using rte_stdatomic * not using rte_smp_mb * using bool for flag values ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v9 2/6] devtools: add multi-provider AI patch review script 2026-02-13 21:39 ` [PATCH v9 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger @ 2026-02-13 21:39 ` Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 3/6] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger ` (4 subsequent siblings) 6 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-13 21:39 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Aaron Conole This is an AI generated script to review DPDK patches against the AGENTS.md coding guidelines using AI language models. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini The script reads a patch file and the AGENTS.md guidelines, then submits them to the selected AI provider for review. Results are organized by severity level (Error, Warning, Info) as defined in the guidelines. Features: - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Verbose mode shows token usage statistics - Uses temporary files for API requests to handle large patches - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/analyze-patch.py 0001-net-ixgbe-fix-something.patch ./devtools/analyze-patch.py -p xai my-patch.patch ./devtools/analyze-patch.py -l # list providers Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/analyze-patch.py | 1334 +++++++++++++++++++++++++++++++++++++ 1 file changed, 1334 insertions(+) create mode 100755 devtools/analyze-patch.py diff --git a/devtools/analyze-patch.py b/devtools/analyze-patch.py new file mode 100755 index 0000000000..c77908fb3c --- /dev/null +++ b/devtools/analyze-patch.py @@ -0,0 +1,1334 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Analyze DPDK patches using AI providers. + +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import json +import os +import re +import subprocess +import sys +import tempfile +from datetime import date +from email.message import EmailMessage +from pathlib import Path +from typing import Iterator +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Large file handling modes +LARGE_FILE_MODES = ["error", "truncate", "chunk", "commits-only", "summary"] + +# Approximate tokens per character (conservative estimate for code) +CHARS_PER_TOKEN = 3.5 + +# Default token limits by provider (leaving room for system prompt and response) +PROVIDER_INPUT_LIMITS = { + "anthropic": 180000, # 200K context, reserve for system/response + "openai": 115000, # 128K context for gpt-4o + "xai": 115000, # Assume similar to OpenAI + "google": 900000, # Gemini has 1M context +} + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# LTS releases: any DPDK release with minor version .11 +# (e.g., 19.11, 20.11, 21.11, 22.11, 23.11, 24.11, 25.11, ...) + +SYSTEM_PROMPT_BASE = """\ +You are an expert DPDK code reviewer. Analyze patches for compliance with \ +DPDK coding standards and contribution guidelines. Provide clear, actionable \ +feedback organized by severity (Error, Warning, Info) as defined in the \ +guidelines.""" + +LTS_RULES = """ +LTS (Long Term Stable) branch rules apply: +- Only bug fixes allowed, no new features +- No new APIs (experimental or stable) +- ABI must remain unchanged +- Backported fixes should reference the original commit with Fixes: tag +- Copyright years should reflect when the code was originally written +- Be conservative: reject changes that aren't clearly bug fixes""" + +FORMAT_INSTRUCTIONS = { + "text": """Provide your review in plain text format.""", + "markdown": """Provide your review in Markdown format with: +- Headers (##) for each severity level (Errors, Warnings, Info) +- Bullet points for individual issues +- Code blocks (```) for code references +- Bold (**) for emphasis on key points""", + "html": """Provide your review in HTML format with: +- <h2> tags for each severity level (Errors, Warnings, Info) +- <ul>/<li> for individual issues +- <pre><code> for code references +- <strong> for emphasis on key points +- Use appropriate semantic HTML tags +- Do NOT include <html>, <head>, or <body> tags - just the content""", + "json": """Provide your review in JSON format with this structure: +{ + "summary": "Brief one-line summary of the review", + "errors": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "warnings": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "info": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "passed_checks": ["list of checks that passed"], + "overall_status": "PASS|WARN|FAIL" +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """Please review the following DPDK patch file '{patch_name}' \ +against the AGENTS.md guidelines. Check for: + +1. Commit message format (subject line, body, tags) +2. License/copyright compliance +3. C coding style issues +4. API and documentation requirements +5. Any other guideline violations + +{format_instruction} + +--- PATCH CONTENT --- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def is_lts_release(release): + """Check if a release is an LTS release. + + Per DPDK project guidelines, any release with minor version .11 + is an LTS release (e.g., 19.11, 21.11, 23.11, 24.11, 25.11). + """ + if not release: + return False + # Check for explicit -lts suffix + if "-lts" in release.lower(): + return True + # Extract base version (e.g., "23.11" from "23.11.1" or "23.11-rc1") + version = release.split("-")[0] + parts = version.split(".") + if len(parts) >= 2: + try: + minor = int(parts[1]) + return minor == 11 + except ValueError: + pass + return False + + +def estimate_tokens(text): + """Estimate token count from text length.""" + return int(len(text) / CHARS_PER_TOKEN) + + +def split_mbox_patches(content): + """Split an mbox file into individual patches.""" + patches = [] + current_patch = [] + in_patch = False + + for line in content.split("\n"): + # Detect start of new message in mbox format + if line.startswith("From ") and ( + " Mon " in line + or " Tue " in line + or " Wed " in line + or " Thu " in line + or " Fri " in line + or " Sat " in line + or " Sun " in line + ): + if current_patch: + patches.append("\n".join(current_patch)) + current_patch = [line] + in_patch = True + elif in_patch: + current_patch.append(line) + + # Don't forget the last patch + if current_patch: + patches.append("\n".join(current_patch)) + + return patches if patches else [content] + + +def extract_commit_messages(content): + """Extract only commit messages from patch content.""" + patches = split_mbox_patches(content) + messages = [] + + for patch in patches: + lines = patch.split("\n") + msg_lines = [] + in_headers = True + in_body = False + found_subject = False + + for line in lines: + # Collect headers we care about + if in_headers: + if line.startswith("Subject:"): + msg_lines.append(line) + found_subject = True + elif line.startswith(("From:", "Date:")): + msg_lines.append(line) + elif line.startswith((" ", "\t")) and found_subject: + # Subject continuation + msg_lines.append(line) + elif line == "": + if found_subject: + in_headers = False + in_body = True + msg_lines.append("") + elif in_body: + # Stop at the diff + if line.startswith("---") and not line.startswith("----"): + break + if line.startswith("diff --git"): + break + msg_lines.append(line) + + if msg_lines: + messages.append("\n".join(msg_lines)) + + return "\n\n---\n\n".join(messages) + + +def truncate_content(content, max_tokens, provider): + """Truncate content to fit within token limit.""" + max_chars = int(max_tokens * CHARS_PER_TOKEN) + + if len(content) <= max_chars: + return content, False + + # Try to truncate at a reasonable boundary + truncated = content[:max_chars] + + # Find last complete diff hunk or patch boundary + last_diff = truncated.rfind("\ndiff --git") + last_patch = truncated.rfind("\nFrom ") + + if last_diff > max_chars * 0.5: + truncated = truncated[:last_diff] + elif last_patch > max_chars * 0.5: + truncated = truncated[:last_patch] + + truncated += "\n\n[... Content truncated due to size limits ...]\n" + return truncated, True + + +def chunk_content(content, max_tokens, provider) -> Iterator[tuple[str, int, int]]: + """Split content into chunks that fit within token limit. + + Yields tuples of (chunk_content, chunk_number, total_chunks). + """ + patches = split_mbox_patches(content) + + if len(patches) == 1: + # Single large patch - split by diff sections + yield from chunk_single_patch(content, max_tokens) + return + + # Multiple patches - group them to fit within limits + chunks = [] + current_chunk = [] + current_size = 0 + max_chars = int(max_tokens * CHARS_PER_TOKEN * 0.9) # 90% to leave margin + + for patch in patches: + patch_size = len(patch) + if current_size + patch_size > max_chars and current_chunk: + chunks.append("\n".join(current_chunk)) + current_chunk = [] + current_size = 0 + + if patch_size > max_chars: + # Single patch too large, truncate it + if current_chunk: + chunks.append("\n".join(current_chunk)) + current_chunk = [] + current_size = 0 + truncated, _ = truncate_content(patch, max_tokens * 0.9, provider) + chunks.append(truncated) + else: + current_chunk.append(patch) + current_size += patch_size + + if current_chunk: + chunks.append("\n".join(current_chunk)) + + total = len(chunks) + for i, chunk in enumerate(chunks, 1): + yield chunk, i, total + + +def chunk_single_patch(content, max_tokens) -> Iterator[tuple[str, int, int]]: + """Split a single large patch by diff sections.""" + max_chars = int(max_tokens * CHARS_PER_TOKEN * 0.9) + + # Extract header (everything before first diff) + first_diff = content.find("\ndiff --git") + if first_diff == -1: + # No diff sections, just truncate + truncated, _ = truncate_content(content, max_tokens * 0.9, "anthropic") + yield truncated, 1, 1 + return + + header = content[: first_diff + 1] + diff_content = content[first_diff + 1 :] + + # Split by diff sections + diffs = [] + current_diff = [] + for line in diff_content.split("\n"): + if line.startswith("diff --git") and current_diff: + diffs.append("\n".join(current_diff)) + current_diff = [] + current_diff.append(line) + if current_diff: + diffs.append("\n".join(current_diff)) + + # Group diffs into chunks + chunks = [] + current_chunk_diffs = [] + current_size = len(header) + + for diff in diffs: + diff_size = len(diff) + if current_size + diff_size > max_chars and current_chunk_diffs: + chunks.append(header + "\n".join(current_chunk_diffs)) + current_chunk_diffs = [] + current_size = len(header) + + if diff_size + len(header) > max_chars: + # Single diff too large + if current_chunk_diffs: + chunks.append(header + "\n".join(current_chunk_diffs)) + current_chunk_diffs = [] + truncated_diff = diff[: max_chars - len(header) - 100] + truncated_diff += "\n[... diff truncated ...]\n" + chunks.append(header + truncated_diff) + current_size = len(header) + else: + current_chunk_diffs.append(diff) + current_size += diff_size + + if current_chunk_diffs: + chunks.append(header + "\n".join(current_chunk_diffs)) + + total = len(chunks) + for i, chunk in enumerate(chunks, 1): + yield chunk, i, total + + +def get_summary_prompt(): + """Get prompt modifications for summary mode.""" + return """ +NOTE: This is a LARGE patch series. Provide a HIGH-LEVEL summary review only: +- Focus on overall architecture and design concerns +- Check commit message formatting across the series +- Identify any obvious policy violations +- Do NOT attempt detailed line-by-line code review +- Summarize the scope and purpose of the changes +""" + + +def format_combined_reviews(reviews, output_format, patch_name): + """Combine multiple chunk/patch reviews into a single output.""" + if output_format == "json": + combined = { + "patch_file": patch_name, + "sections": [ + {"label": label, "review": review} for label, review in reviews + ], + } + return json.dumps(combined, indent=2) + elif output_format == "html": + sections = [] + for label, review in reviews: + sections.append(f"<h2>{label}</h2>\n{review}") + return "\n<hr>\n".join(sections) + elif output_format == "markdown": + sections = [] + for label, review in reviews: + sections.append(f"## {label}\n\n{review}") + return "\n\n---\n\n".join(sections) + else: # text + sections = [] + for label, review in reviews: + sections.append(f"=== {label} ===\n\n{review}") + return "\n\n" + "=" * 60 + "\n\n".join(sections) + + +def build_system_prompt(review_date, release): + """Build system prompt with date and release context.""" + prompt = SYSTEM_PROMPT_BASE + prompt += f"\n\nCurrent date: {review_date}." + + if release: + prompt += f"\nTarget DPDK release: {release}." + if is_lts_release(release): + prompt += LTS_RULES + else: + prompt += "\nThis is a main branch or standard release." + prompt += "\nNew features and experimental APIs are allowed." + + return prompt + + +def build_anthropic_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": system_prompt}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + patch_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": system_prompt}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + patch_content, + }, + ], + } + + +def build_google_request( + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": system_prompt}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + patch_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: {usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: {usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def get_last_message_id(patch_content): + """Extract Message-ID from the last patch in an mbox.""" + msg_ids = re.findall( + r"^Message-I[Dd]:\s*(.+)$", patch_content, re.MULTILINE | re.IGNORECASE + ) + if msg_ids: + msg_id = msg_ids[-1].strip() + # Normalize: remove < > and add them back + msg_id = msg_id.strip("<>") + return f"<{msg_id}>" + return None + + +def get_last_subject(patch_content): + """Extract subject from the last patch in an mbox.""" + # Find all Subject lines with potential continuations + subjects = [] + lines = patch_content.split("\n") + i = 0 + while i < len(lines): + if lines[i].lower().startswith("subject:"): + subject = lines[i][8:].strip() + i += 1 + # Handle continuation lines + while i < len(lines) and lines[i].startswith((" ", "\t")): + subject += lines[i].strip() + i += 1 + subjects.append(subject) + else: + i += 1 + return subjects[-1] if subjects else None + + +def send_email( + to_addrs, cc_addrs, from_addr, subject, in_reply_to, body, dry_run=False +): + """Send review email using git send-email, sendmail, or msmtp.""" + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + email_text = msg.as_string() + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(email_text, file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Write to temp file for git send-email + with tempfile.NamedTemporaryFile(mode="w", suffix=".eml", delete=False) as f: + f.write(email_text) + temp_file = f.name + + try: + # Try git send-email first + if get_git_config("sendemail.smtpserver"): + # Build command with all arguments + flat_cmd = ["git", "send-email", "--confirm=never", "--quiet"] + for addr in to_addrs: + flat_cmd.extend(["--to", addr]) + for addr in cc_addrs: + flat_cmd.extend(["--cc", addr]) + if from_addr: + flat_cmd.extend(["--from", from_addr]) + if in_reply_to: + flat_cmd.extend(["--in-reply-to", in_reply_to]) + flat_cmd.append(temp_file) + + try: + subprocess.run(flat_cmd, check=True, capture_output=True) + print("Email sent via git send-email", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + # Try sendmail + try: + subprocess.run( + ["sendmail", "-t"], + input=email_text, + text=True, + capture_output=True, + check=True, + ) + print("Email sent via sendmail", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + # Try msmtp + try: + subprocess.run( + ["msmtp", "-t"], + input=email_text, + text=True, + capture_output=True, + check=True, + ) + print("Email sent via msmtp", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + error("Could not send email. Configure git send-email, sendmail, or msmtp.") + + finally: + os.unlink(temp_file) + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Analyze DPDK patches using AI providers", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s patch.patch # Review with default settings + %(prog)s -p openai my-patch.patch # Use OpenAI ChatGPT + %(prog)s -f markdown patch.patch # Output as Markdown + %(prog)s -f json -o review.json patch.patch # Save JSON to file + %(prog)s -f html -o review.html patch.patch # Save HTML to file + %(prog)s -r 24.11 patch.patch # Review for specific release + %(prog)s -r 24.11-lts patch.patch # Review for LTS branch + %(prog)s --send-email --to dev@dpdk.org series.mbox + %(prog)s --send-email --to dev@dpdk.org --dry-run series.mbox + +Large File Handling: + %(prog)s --split-patches series.mbox # Review each patch separately + %(prog)s --split-patches --patch-range 1-5 series.mbox # Review patches 1-5 + %(prog)s --large-file=truncate patch.mbox # Truncate to fit limit + %(prog)s --large-file=commits-only series.mbox # Review commit messages only + %(prog)s --large-file=summary series.mbox # High-level summary only + %(prog)s --large-file=chunk series.mbox # Split and review in chunks + +Large File Modes: + error - Fail with error (default) + truncate - Truncate content to fit token limit + chunk - Split into chunks and review each + commits-only - Extract and review only commit messages + summary - Request high-level summary review + +LTS Releases: + Use -r/--release with LTS version (e.g., 24.11-lts, 23.11) to enable + stricter review rules: bug fixes only, no new features or APIs. + Any DPDK release with minor version .11 is an LTS release. + """, + ) + + parser.add_argument("patch_file", nargs="?", help="Patch file to analyze") + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=4096, + help="Max tokens for response (default: 4096)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-o", + "--output", + metavar="FILE", + help="Write output to file instead of stdout", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Date and release options + parser.add_argument( + "-D", + "--date", + metavar="YYYY-MM-DD", + help="Review date context (default: today)", + ) + parser.add_argument( + "-r", + "--release", + metavar="VERSION", + help="Target DPDK release (e.g., 24.11, 23.11-lts)", + ) + + # Large file handling options + large_group = parser.add_argument_group("Large File Handling") + large_group.add_argument( + "--large-file", + choices=LARGE_FILE_MODES, + default="error", + metavar="MODE", + help="How to handle large files: error (default), truncate, " + "chunk, commits-only, summary", + ) + large_group.add_argument( + "--max-tokens", + type=int, + metavar="N", + help="Max input tokens (default: provider-specific)", + ) + large_group.add_argument( + "--split-patches", + action="store_true", + help="Split mbox into individual patches and review each separately", + ) + large_group.add_argument( + "--patch-range", + metavar="N-M", + help="Review only patches N through M (1-indexed, use with --split-patches)", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Check patch file is provided + if not args.patch_file: + parser.error("patch_file is required") + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + patch_path = Path(args.patch_file) + if not patch_path.exists(): + error(f"Patch file not found: {args.patch_file}") + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Determine review date + review_date = args.date or date.today().isoformat() + + # Build system prompt with date and release context + system_prompt = build_system_prompt(review_date, args.release) + + # Read files + agents_content = agents_path.read_text() + patch_content = patch_path.read_text() + patch_name = patch_path.name + + # Determine max tokens for this provider + max_input_tokens = args.max_tokens or PROVIDER_INPUT_LIMITS.get( + args.provider, 100000 + ) + + # Estimate token count + estimated_tokens = estimate_tokens(patch_content + agents_content) + + # Parse patch range if specified + patch_start, patch_end = None, None + if args.patch_range: + try: + if "-" in args.patch_range: + start, end = args.patch_range.split("-", 1) + patch_start = int(start) + patch_end = int(end) + else: + patch_start = patch_end = int(args.patch_range) + except ValueError: + error(f"Invalid --patch-range format: {args.patch_range}") + + # Handle --split-patches mode + if args.split_patches: + patches = split_mbox_patches(patch_content) + total_patches = len(patches) + + if total_patches == 1: + print( + f"Note: Only 1 patch found in mbox, --split-patches has no effect", + file=sys.stderr, + ) + else: + print( + f"Found {total_patches} patches in mbox", + file=sys.stderr, + ) + + # Apply patch range filter + if patch_start is not None: + if patch_start < 1 or patch_start > total_patches: + error( + f"Patch range start {patch_start} out of range (1-{total_patches})" + ) + if patch_end < patch_start or patch_end > total_patches: + error( + f"Patch range end {patch_end} out of range ({patch_start}-{total_patches})" + ) + patches = patches[patch_start - 1 : patch_end] + print( + f"Reviewing patches {patch_start}-{patch_end} ({len(patches)} patches)", + file=sys.stderr, + ) + + # Review each patch separately + all_reviews = [] + for i, patch in enumerate(patches, patch_start or 1): + patch_label = f"Patch {i}/{total_patches}" + print(f"\nReviewing {patch_label}...", file=sys.stderr) + + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + patch, + f"{patch_name} ({patch_label})", + args.output_format, + args.verbose, + ) + all_reviews.append((patch_label, review_text)) + + # Combine reviews + review_text = format_combined_reviews( + all_reviews, args.output_format, patch_name + ) + + # Skip the normal API call + estimated_tokens = 0 # Bypass size check since we've already processed + + # Check if content is too large + is_large = estimated_tokens > max_input_tokens + + if is_large: + print( + f"Warning: Estimated {estimated_tokens:,} tokens exceeds limit of " + f"{max_input_tokens:,}", + file=sys.stderr, + ) + + if args.large_file == "error": + error( + f"Patch file too large ({estimated_tokens:,} tokens). " + f"Use --large-file=truncate|chunk|commits-only|summary to handle, " + f"or --split-patches to review patches individually." + ) + elif args.large_file == "truncate": + print("Truncating content to fit token limit...", file=sys.stderr) + patch_content, was_truncated = truncate_content( + patch_content, max_input_tokens, args.provider + ) + if was_truncated: + print("Content was truncated.", file=sys.stderr) + elif args.large_file == "commits-only": + print("Extracting commit messages only...", file=sys.stderr) + patch_content = extract_commit_messages(patch_content) + new_estimate = estimate_tokens(patch_content + agents_content) + print( + f"Reduced to ~{new_estimate:,} tokens (commit messages only)", + file=sys.stderr, + ) + if new_estimate > max_input_tokens: + patch_content, _ = truncate_content( + patch_content, max_input_tokens, args.provider + ) + elif args.large_file == "summary": + print("Using summary mode for large patch...", file=sys.stderr) + system_prompt += get_summary_prompt() + patch_content, _ = truncate_content( + patch_content, max_input_tokens, args.provider + ) + elif args.large_file == "chunk": + print("Processing in chunks...", file=sys.stderr) + all_reviews = [] + for chunk, chunk_num, total_chunks in chunk_content( + patch_content, max_input_tokens, args.provider + ): + chunk_label = f"Chunk {chunk_num}/{total_chunks}" + print(f"Reviewing {chunk_label}...", file=sys.stderr) + + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + chunk, + f"{patch_name} ({chunk_label})", + args.output_format, + args.verbose, + ) + all_reviews.append((chunk_label, review_text)) + + # Combine chunk reviews + review_text = format_combined_reviews( + all_reviews, args.output_format, patch_name + ) + + # Skip the normal single API call below + estimated_tokens = 0 + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Review date: {review_date}", file=sys.stderr) + if args.release: + lts_status = " (LTS)" if is_lts_release(args.release) else "" + print(f"Target release: {args.release}{lts_status}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Patch file: {args.patch_file}", file=sys.stderr) + print(f"Estimated tokens: {estimated_tokens:,}", file=sys.stderr) + print(f"Max input tokens: {max_input_tokens:,}", file=sys.stderr) + if args.large_file != "error": + print(f"Large file mode: {args.large_file}", file=sys.stderr) + if args.split_patches: + print("Split patches: yes", file=sys.stderr) + if args.output: + print(f"Output file: {args.output}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API (unless already processed via chunks/split) + if estimated_tokens > 0: # Not already processed + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + args.output_format, + args.verbose, + ) + + if not review_text: + error(f"No response received from {args.provider}") + + # Format output based on requested format + provider_name = config["name"] + + if args.output_format == "json": + # For JSON, try to parse and add metadata + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + # If AI didn't return valid JSON, wrap the text + review_data = {"raw_review": review_text} + + output_data = { + "metadata": { + "patch_file": patch_name, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "review_date": review_date, + "target_release": args.release, + "is_lts": is_lts_release(args.release) if args.release else False, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "html": + # Wrap HTML content with header + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"<br>Target release: {args.release}{lts_badge}" + output_text = f"""<!-- AI-generated review of {patch_name} --> +<!-- Reviewed using {provider_name} ({model}) on {review_date} --> +<div class="patch-review"> +<h1>Patch Review: {patch_name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model}) on {review_date}{release_info}</p> +{review_text} +</div> +""" + elif args.output_format == "markdown": + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"\n*Target release: {args.release}{lts_badge}*\n" + output_text = f"""# Patch Review: {patch_name} + +*Reviewed by {provider_name} ({model}) on {review_date}* +{release_info} +{review_text} +""" + else: # text + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"Target release: {args.release}{lts_badge}\n" + output_text = f"=== Patch Review: {patch_name} (via {provider_name}) ===\n" + output_text += f"Review date: {review_date}\n" + output_text += release_info + output_text += "\n" + review_text + + # Write output + if args.output: + Path(args.output).write_text(output_text) + print(f"Review written to: {args.output}", file=sys.stderr) + else: + print(output_text) + + # Send email if requested + if args.send_email: + # Email always uses plain text - warn if different format requested + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + in_reply_to = get_last_message_id(patch_content) + orig_subject = get_last_subject(patch_content) + + if orig_subject: + # Remove [PATCH n/m] prefix + review_subject = re.sub(r"^\[PATCH[^\]]*\]\s*", "", orig_subject) + review_subject = f"[REVIEW] {review_subject}" + else: + review_subject = f"[REVIEW] {patch_name}" + + # Build email body - always use plain text version + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"Target release: {args.release}{lts_badge}\n" + + email_body = f"""AI-generated review of {patch_name} +Reviewed using {provider_name} ({model}) on {review_date} +{release_info} +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print(f"In-Reply-To: {in_reply_to}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + in_reply_to, + email_body, + args.dry_run, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v9 3/6] devtools: add compare-reviews.sh for multi-provider analysis 2026-02-13 21:39 ` [PATCH v9 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 2/6] devtools: add multi-provider AI patch review script Stephen Hemminger @ 2026-02-13 21:39 ` Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 4/6] devtools: add multi-provider AI documentation review script Stephen Hemminger ` (3 subsequent siblings) 6 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-13 21:39 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Aaron Conole Add script to run patch reviews across multiple AI providers for comparison purposes. The script automatically detects which providers have API keys configured and runs analyze-patch.py for each one. This allows users to compare review quality and feedback across different AI models. Features: - Auto-detects available providers based on environment variables - Optional provider selection via -p/--providers option - Saves individual reviews to separate files with -o/--output - Verbose mode passes through to underlying analyze-patch.py Usage: ./devtools/compare-reviews.sh my-patch.patch ./devtools/compare-reviews.sh -p anthropic,xai my-patch.patch ./devtools/compare-reviews.sh -o ./reviews my-patch.patch Output files are named <patch>-<provider>.txt when using the output directory option. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/compare-reviews.sh | 192 ++++++++++++++++++++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100755 devtools/compare-reviews.sh diff --git a/devtools/compare-reviews.sh b/devtools/compare-reviews.sh new file mode 100755 index 0000000000..a63eeffb71 --- /dev/null +++ b/devtools/compare-reviews.sh @@ -0,0 +1,192 @@ +#!/bin/bash +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +# Compare DPDK patch reviews across multiple AI providers +# Runs analyze-patch.py with each available provider + +set -e + +SCRIPT_DIR="$(dirname "$(readlink -f "$0")")" +ANALYZE_SCRIPT="${SCRIPT_DIR}/analyze-patch.py" +AGENTS_FILE="AGENTS.md" +OUTPUT_DIR="" +PROVIDERS="" +FORMAT="text" + +usage() { + cat <<EOF +Usage: $(basename "$0") [OPTIONS] <patch-file> + +Compare DPDK patch reviews across multiple AI providers. + +Options: + -a, --agents FILE Path to AGENTS.md file (default: AGENTS.md) + -o, --output DIR Save individual reviews to directory + -p, --providers LIST Comma-separated list of providers to use + (default: all providers with API keys set) + -f, --format FORMAT Output format: text, markdown, html, json + (default: text) + -v, --verbose Show verbose output from each provider + -h, --help Show this help message + +Environment Variables: + Set API keys for providers you want to use: + ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY + +Examples: + $(basename "$0") my-patch.patch + $(basename "$0") -p anthropic,openai my-patch.patch + $(basename "$0") -o ./reviews -f markdown my-patch.patch +EOF + exit "${1:-0}" +} + +error() { + echo "Error: $1" >&2 + exit 1 +} + +# Check which providers have API keys configured +get_available_providers() { + local available="" + + [[ -n "$ANTHROPIC_API_KEY" ]] && available="${available}anthropic," + [[ -n "$OPENAI_API_KEY" ]] && available="${available}openai," + [[ -n "$XAI_API_KEY" ]] && available="${available}xai," + [[ -n "$GOOGLE_API_KEY" ]] && available="${available}google," + + # Remove trailing comma + echo "${available%,}" +} + +# Get file extension for format +get_extension() { + case "$1" in + text) echo "txt" ;; + markdown) echo "md" ;; + html) echo "html" ;; + json) echo "json" ;; + *) echo "txt" ;; + esac +} + +# Parse command line options +VERBOSE="" + +while [[ $# -gt 0 ]]; do + case "$1" in + -a|--agents) + AGENTS_FILE="$2" + shift 2 + ;; + -o|--output) + OUTPUT_DIR="$2" + shift 2 + ;; + -p|--providers) + PROVIDERS="$2" + shift 2 + ;; + -f|--format) + FORMAT="$2" + shift 2 + ;; + -v|--verbose) + VERBOSE="-v" + shift + ;; + -h|--help) + usage 0 + ;; + -*) + error "Unknown option: $1" + ;; + *) + break + ;; + esac +done + +# Check for required arguments +if [[ $# -lt 1 ]]; then + echo "Error: No patch file specified" >&2 + usage 1 +fi + +PATCH_FILE="$1" + +if [[ ! -f "$PATCH_FILE" ]]; then + error "Patch file not found: $PATCH_FILE" +fi + +if [[ ! -f "$ANALYZE_SCRIPT" ]]; then + error "analyze-patch.py not found: $ANALYZE_SCRIPT" +fi + +# Validate format +case "$FORMAT" in + text|markdown|html|json) ;; + *) error "Invalid format: $FORMAT (must be text, markdown, html, or json)" ;; +esac + +# Get providers to use +if [[ -z "$PROVIDERS" ]]; then + PROVIDERS=$(get_available_providers) +fi + +if [[ -z "$PROVIDERS" ]]; then + error "No API keys configured. Set at least one of: "\ +"ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY" +fi + +# Create output directory if specified +if [[ -n "$OUTPUT_DIR" ]]; then + mkdir -p "$OUTPUT_DIR" +fi + +PATCH_BASENAME=$(basename "$PATCH_FILE") +PATCH_STEM="${PATCH_BASENAME%.*}" +EXT=$(get_extension "$FORMAT") + +echo "Reviewing patch: $PATCH_BASENAME" +echo "Providers: $PROVIDERS" +echo "Format: $FORMAT" +echo "========================================" +echo "" + +# Run review for each provider +IFS=',' read -ra PROVIDER_LIST <<< "$PROVIDERS" +for provider in "${PROVIDER_LIST[@]}"; do + echo ">>> Running review with: $provider" + echo "" + + if [[ -n "$OUTPUT_DIR" ]]; then + OUTPUT_FILE="${OUTPUT_DIR}/${PATCH_STEM}-${provider}.${EXT}" + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" | tee "$OUTPUT_FILE" + echo "" + echo "Saved to: $OUTPUT_FILE" + else + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" + fi + + echo "" + echo "========================================" + echo "" +done + +echo "Review comparison complete." + +if [[ -n "$OUTPUT_DIR" ]]; then + echo "All reviews saved to: $OUTPUT_DIR" +fi -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v9 4/6] devtools: add multi-provider AI documentation review script 2026-02-13 21:39 ` [PATCH v9 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger ` (2 preceding siblings ...) 2026-02-13 21:39 ` [PATCH v9 3/6] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger @ 2026-02-13 21:39 ` Stephen Hemminger 2026-02-17 18:20 ` Andrew Bailey 2026-02-13 21:39 ` [PATCH v9 5/6] doc: add AI-assisted patch review to contributing guide Stephen Hemminger ` (2 subsequent siblings) 6 siblings, 1 reply; 63+ messages in thread From: Stephen Hemminger @ 2026-02-13 21:39 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Aaron Conole Add review-doc.py script that reviews DPDK documentation files for spelling, grammar, technical correctness, and clarity using AI language models. Supports batch processing of multiple files. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini Output formats (-f/--format): - text: plain text with extractable diff/msg markers (default) - markdown: formatted review document - html: complete HTML document with styling - json: structured data with metadata For each input file, the script produces: - <basename>.{txt,md,html,json}: review in selected format - <basename>.diff: unified diff (text/json, or with -d flag) - <basename>.msg: commit message (text/json, or with -d flag) The commit message prefix is automatically determined from the file path (e.g., doc/guides/prog_guide: for programmer's guide). Features: - Multiple file processing with glob support - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Configurable output directory via -o/--output-dir option - Output format selection via -f/--format option - Force diff/msg generation via -d/--diff option - Quiet mode (-q) suppresses stdout output - Verbose mode (-v) shows token usage and API details - Email integration using git sendemail configuration - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/review-doc.py doc/guides/prog_guide/mempool_lib.rst ./devtools/review-doc.py doc/guides/nics/*.rst ./devtools/review-doc.py -f html -d -o /tmp doc/guides/nics/*.rst ./devtools/review-doc.py --send-email --to dev@dpdk.org file.rst Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/review-doc.py | 1098 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1098 insertions(+) create mode 100755 devtools/review-doc.py diff --git a/devtools/review-doc.py b/devtools/review-doc.py new file mode 100755 index 0000000000..1366aa0f85 --- /dev/null +++ b/devtools/review-doc.py @@ -0,0 +1,1098 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Review DPDK documentation files using AI providers. + +Produces a diff file and commit message compliant with DPDK standards. +Accepts multiple documentation files and generates output for each. +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import getpass +import json +import os +import re +import smtplib +import ssl +import subprocess +import sys +from email.message import EmailMessage +from pathlib import Path +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Map output format to file extension +FORMAT_EXTENSIONS = { + "text": ".txt", + "markdown": ".md", + "html": ".html", + "json": ".json", +} + +# Additional markers for extracting diff/msg (used with --diff flag) +DIFF_MARKERS_INSTRUCTION = """ + +ADDITIONALLY, at the end of your response, include these exact markers for automated extraction: +---COMMIT_MESSAGE_START--- +(same commit message as above) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(same unified diff as above) +---UNIFIED_DIFF_END--- +""" + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# Commit prefix mappings based on file path +COMMIT_PREFIX_MAP = [ + ("doc/guides/prog_guide/", "doc/guides/prog_guide:"), + ("doc/guides/sample_app_ug/", "doc/guides/sample_app:"), + ("doc/guides/nics/", "doc/guides/nics:"), + ("doc/guides/cryptodevs/", "doc/guides/cryptodevs:"), + ("doc/guides/compressdevs/", "doc/guides/compressdevs:"), + ("doc/guides/eventdevs/", "doc/guides/eventdevs:"), + ("doc/guides/rawdevs/", "doc/guides/rawdevs:"), + ("doc/guides/bbdevs/", "doc/guides/bbdevs:"), + ("doc/guides/gpus/", "doc/guides/gpus:"), + ("doc/guides/dmadevs/", "doc/guides/dmadevs:"), + ("doc/guides/regexdevs/", "doc/guides/regexdevs:"), + ("doc/guides/mldevs/", "doc/guides/mldevs:"), + ("doc/guides/rel_notes/", "doc/guides/rel_notes:"), + ("doc/guides/linux_gsg/", "doc/guides/linux_gsg:"), + ("doc/guides/freebsd_gsg/", "doc/guides/freebsd_gsg:"), + ("doc/guides/windows_gsg/", "doc/guides/windows_gsg:"), + ("doc/guides/tools/", "doc/guides/tools:"), + ("doc/guides/testpmd_app_ug/", "doc/guides/testpmd:"), + ("doc/guides/howto/", "doc/guides/howto:"), + ("doc/guides/contributing/", "doc/guides/contributing:"), + ("doc/guides/platform/", "doc/guides/platform:"), + ("doc/guides/", "doc:"), + ("doc/api/", "doc/api:"), + ("doc/", "doc:"), +] + +SYSTEM_PROMPT = """\ +You are an expert technical documentation reviewer for DPDK. +Your task is to review documentation files and suggest improvements for: +- Spelling errors +- Grammar issues +- Technical correctness +- Clarity and readability +- Consistency with DPDK terminology + +IMPORTANT COMMIT MESSAGE RULES (from check-git-log.sh): +- Subject line MUST be ≤60 characters +- Format: "prefix: lowercase description" +- First word after colon must be lowercase (except acronyms like Rx, Tx, VF, MAC, API) +- Use imperative mood (e.g., "fix typo" not "fixed typo" or "fixes typo") +- NO trailing period on subject line +- NO punctuation marks: , ; ! ? & | +- NO underscores in subject after colon +- Body lines wrapped at 75 characters +- Body must NOT start with "It" +- Do NOT include Signed-off-by (user adds via git commit --sign) +- Only use "Fixes:" tag for actual errors in documentation, not style improvements + +Case-sensitive terms (must use exact case): +- Rx, Tx (not RX, TX, rx, tx) +- VF, PF (not vf, pf) +- MAC, VLAN, RSS, API +- Linux, Windows, FreeBSD + +For style/clarity improvements, do NOT use Fixes tag. +For actual errors (wrong information, broken examples), include Fixes tag \ +if you can identify the commit.""" + +FORMAT_INSTRUCTIONS = { + "text": """ +OUTPUT FORMAT: +You must output exactly two sections: + +1. COMMIT_MESSAGE section containing the complete commit message +2. UNIFIED_DIFF section containing the unified diff + +Use these exact markers: +---COMMIT_MESSAGE_START--- +(commit message here) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(unified diff here) +---UNIFIED_DIFF_END--- + +The diff should be in unified format that can be applied with "git apply". +If no changes are needed, output empty sections with a note.""", + "markdown": """ +OUTPUT FORMAT: +Provide your review in Markdown format with: + +## Summary +Brief description of changes + +## Commit Message +``` +(complete commit message here, ready to use) +``` + +## Changes +For each change: +### Issue N: Brief title +- **Location**: file path and line +- **Problem**: description +- **Fix**: suggested correction + +## Unified Diff +```diff +(unified diff here) +```""", + "html": """ +OUTPUT FORMAT: +Provide your review in HTML format with: +- <h2> for sections (Summary, Commit Message, Changes, Diff) +- <pre><code> for commit message and diff +- <ul>/<li> for individual issues +- Do NOT include <html>, <head>, or <body> tags - just the content + +Include sections for: Summary, Commit Message, Changes, Unified Diff""", + "json": """ +OUTPUT FORMAT: +Provide your review as JSON with this structure: +{ + "summary": "Brief description of changes", + "commit_message": "Complete commit message ready to use", + "changes": [ + { + "type": "spelling|grammar|technical|clarity|style", + "location": "line number or section", + "original": "original text", + "suggested": "corrected text", + "reason": "why this change" + } + ], + "diff": "unified diff as a string", + "stats": { + "total_issues": 0, + "spelling": 0, + "grammar": 0, + "technical": 0, + "clarity": 0 + } +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """\ +Review the following DPDK documentation file and provide improvements. + +File path: {doc_file} +Commit message prefix to use: {commit_prefix} + +{format_instruction} + +---DOCUMENT CONTENT--- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def get_smtp_config(): + """Get SMTP configuration from git config sendemail settings.""" + config = { + "server": get_git_config("sendemail.smtpserver"), + "port": get_git_config("sendemail.smtpserverport"), + "user": get_git_config("sendemail.smtpuser"), + "encryption": get_git_config("sendemail.smtpencryption"), + "password": get_git_config("sendemail.smtppass"), + } + + # Set defaults + if not config["port"]: + if config["encryption"] == "ssl": + config["port"] = "465" + else: + config["port"] = "587" + + # Convert port to int + if config["port"]: + config["port"] = int(config["port"]) + + return config + + +def get_commit_prefix(filepath): + """Determine commit message prefix from file path.""" + for prefix_path, prefix in COMMIT_PREFIX_MAP: + if filepath.startswith(prefix_path): + return prefix + return "doc:" + + +def build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": SYSTEM_PROMPT}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + doc_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + doc_content, + }, + ], + } + + +def build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": SYSTEM_PROMPT}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + doc_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: " f"{usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: " f"{usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def parse_review_text(review_text): + """Extract commit message and diff from text format response.""" + commit_msg = "" + diff = "" + + # Extract commit message + msg_match = re.search( + r"---COMMIT_MESSAGE_START---\s*\n(.*?)\n---COMMIT_MESSAGE_END---", + review_text, + re.DOTALL, + ) + if msg_match: + commit_msg = msg_match.group(1).strip() + + # Extract unified diff + diff_match = re.search( + r"---UNIFIED_DIFF_START---\s*\n(.*?)\n---UNIFIED_DIFF_END---", + review_text, + re.DOTALL, + ) + if diff_match: + diff = diff_match.group(1).strip() + # Clean up any markdown code fence if present + diff = re.sub(r"^```diff\s*\n?", "", diff) + diff = re.sub(r"\n?```\s*$", "", diff) + + return commit_msg, diff + + +def strip_diff_markers(text): + """Remove the diff/msg extraction markers from text.""" + # Remove commit message markers and content + text = re.sub( + r"\n*---COMMIT_MESSAGE_START---\s*\n.*?\n---COMMIT_MESSAGE_END---\s*", + "", + text, + flags=re.DOTALL, + ) + # Remove unified diff markers and content + text = re.sub( + r"\n*---UNIFIED_DIFF_START---\s*\n.*?\n---UNIFIED_DIFF_END---\s*", + "", + text, + flags=re.DOTALL, + ) + return text.strip() + + +def send_email( + to_addrs, + cc_addrs, + from_addr, + subject, + in_reply_to, + body, + dry_run=False, + verbose=False, +): + """Send review email via SMTP using git sendemail config.""" + # Build email message + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(msg.as_string(), file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Get SMTP configuration from git config + smtp_config = get_smtp_config() + + if not smtp_config["server"]: + error("No SMTP server configured. Set git config sendemail.smtpserver") + + server = smtp_config["server"] + port = smtp_config["port"] + user = smtp_config["user"] + encryption = smtp_config["encryption"] + + # Get password from environment or git config, or prompt + password = os.environ.get("SMTP_PASSWORD") or smtp_config["password"] + if user and not password: + password = getpass.getpass(f"SMTP password for {user}@{server}: ") + + if verbose: + print(f"SMTP server: {server}:{port}", file=sys.stderr) + print(f"SMTP user: {user or '(none)'}", file=sys.stderr) + print(f"Encryption: {encryption or 'starttls'}", file=sys.stderr) + + # Collect all recipients + all_recipients = list(to_addrs) + if cc_addrs: + all_recipients.extend(cc_addrs) + + try: + if encryption == "ssl": + # SSL/TLS connection from the start (port 465) + context = ssl.create_default_context() + with smtplib.SMTP_SSL(server, port, context=context) as smtp: + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + else: + # STARTTLS (port 587) or plain (port 25) + with smtplib.SMTP(server, port) as smtp: + smtp.ehlo() + if encryption == "tls" or port == 587: + context = ssl.create_default_context() + smtp.starttls(context=context) + smtp.ehlo() + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + + print(f"Email sent via SMTP ({server}:{port})", file=sys.stderr) + return True + + except smtplib.SMTPAuthenticationError as e: + error(f"SMTP authentication failed: {e}") + except smtplib.SMTPException as e: + error(f"SMTP error: {e}") + except OSError as e: + error(f"Connection error to {server}:{port}: {e}") + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Review DPDK documentation files using AI providers. " + "Accepts multiple files and generates output for each.", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s doc/guides/prog_guide/mempool_lib.rst + %(prog)s doc/guides/nics/*.rst # Review all NIC docs + %(prog)s -p openai -o /tmp doc/guides/nics/ixgbe.rst doc/guides/nics/i40e.rst + %(prog)s -f html -d -o /tmp/reviews doc/guides/nics/*.rst # HTML + diff files + %(prog)s -f json -o /tmp doc/guides/howto/flow_bifurcation.rst + %(prog)s --send-email --to dev@dpdk.org doc/guides/nics/ixgbe.rst + +Output files (in output-dir): + <basename>.txt|.md|.html|.json Review in selected format + <basename>.diff Unified diff (text/json, or with --diff) + <basename>.msg Commit message (text/json, or with --diff) + +After review: + git apply <basename>.diff + git commit -sF <basename>.msg + +SMTP Configuration (from git config): + sendemail.smtpserver SMTP server hostname + sendemail.smtpserverport SMTP port (default: 587 for TLS, 465 for SSL) + sendemail.smtpuser SMTP username + sendemail.smtpencryption 'tls' for STARTTLS, 'ssl' for SSL/TLS + sendemail.smtppass SMTP password (or set SMTP_PASSWORD env var) + +Example git config: + git config --global sendemail.smtpserver smtp.gmail.com + git config --global sendemail.smtpserverport 587 + git config --global sendemail.smtpuser yourname@gmail.com + git config --global sendemail.smtpencryption tls + """, + ) + + parser.add_argument( + "doc_files", + nargs="+", + metavar="doc_file", + help="Documentation file(s) to review", + ) + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=8192, + help="Max tokens for response (default: 8192)", + ) + parser.add_argument( + "-o", + "--output-dir", + default=".", + help="Output directory for all output files (default: .)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-q", + "--quiet", + action="store_true", + help="Suppress review output to stdout (only write files)", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-d", + "--diff", + action="store_true", + help="Always produce .diff and .msg files (automatic for text/json)", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + # Validate all doc files exist before processing + doc_paths = [] + for doc_file in args.doc_files: + doc_path = Path(doc_file) + if not doc_path.exists(): + error(f"Documentation file not found: {doc_file}") + doc_paths.append((doc_file, doc_path)) + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Read AGENTS.md once + agents_content = agents_path.read_text() + output_dir = Path(args.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + provider_name = config["name"] + + # Process each file + num_files = len(doc_paths) + for file_idx, (doc_file, doc_path) in enumerate(doc_paths, 1): + if num_files > 1: + print( + f"\n{'=' * 60}", + file=sys.stderr, + ) + print( + f"Processing file {file_idx}/{num_files}: {doc_file}", + file=sys.stderr, + ) + print( + f"{'=' * 60}", + file=sys.stderr, + ) + + # Determine output filenames + doc_basename = doc_path.stem + diff_file = output_dir / f"{doc_basename}.diff" + msg_file = output_dir / f"{doc_basename}.msg" + + # Get commit prefix + commit_prefix = get_commit_prefix(doc_file) + + # Read doc content + doc_content = doc_path.read_text() + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Doc file: {doc_file}", file=sys.stderr) + print(f"Commit prefix: {commit_prefix}", file=sys.stderr) + print(f"Output dir: {args.output_dir}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + args.output_format, + args.diff, + args.verbose, + ) + + if not review_text: + print( + f"Warning: No response received for {doc_file}", + file=sys.stderr, + ) + continue + + # Determine review output file + format_ext = FORMAT_EXTENSIONS[args.output_format] + review_file = output_dir / f"{doc_basename}{format_ext}" + + # Determine if we should write diff/msg files + write_diff_msg = args.diff or args.output_format in ("text", "json") + + # Extract commit message and diff first (before stripping markers) + commit_msg, diff = "", "" + if write_diff_msg: + if args.output_format == "json": + # Will extract from JSON below + pass + else: + # Parse from text format markers + commit_msg, diff = parse_review_text(review_text) + + # For non-text formats with --diff, strip the markers from display output + display_text = review_text + if args.diff and args.output_format in ("markdown", "html"): + display_text = strip_diff_markers(review_text) + + # Build formatted output text + if args.output_format == "text": + output_text = review_text + elif args.output_format == "json": + # Try to parse JSON response + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + print("Warning: Response is not valid JSON", file=sys.stderr) + review_data = {"raw_response": review_text} + + # Extract diff/msg from JSON if present + if write_diff_msg: + if isinstance(review_data, dict) and "raw_response" not in review_data: + commit_msg = review_data.get("commit_message", "") + diff = review_data.get("diff", "") + + # Add metadata + output_data = { + "metadata": { + "doc_file": doc_file, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "commit_prefix": commit_prefix, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "markdown": + output_text = f"""# Documentation Review: {doc_path.name} + +*Reviewed by {provider_name} ({model})* + +{display_text} +""" + elif args.output_format == "html": + output_text = f"""<!DOCTYPE html> +<html> +<head> +<meta charset="utf-8"> +<title>Review: {doc_path.name}</title> +<style> +body {{ font-family: system-ui, sans-serif; max-width: 900px; margin: 2em auto; padding: 0 1em; }} +h1 {{ color: #333; }} +.review-meta {{ color: #666; font-style: italic; }} +pre {{ background: #f5f5f5; padding: 1em; overflow-x: auto; }} +</style> +</head> +<body> +<h1>Documentation Review: {doc_path.name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model})</p> +<div class="review-content"> +{display_text} +</div> +</body> +</html> +""" + + # Write formatted review to file + review_file.write_text(output_text) + print(f"Review written to: {review_file}", file=sys.stderr) + + # Write diff/msg files + if write_diff_msg: + if commit_msg: + msg_file.write_text(commit_msg + "\n") + print(f"Commit message written to: {msg_file}", file=sys.stderr) + else: + msg_file.write_text("# No commit message generated\n") + print("Warning: Could not extract commit message", file=sys.stderr) + + if diff: + diff_file.write_text(diff + "\n") + print(f"Diff written to: {diff_file}", file=sys.stderr) + else: + diff_file.write_text("# No changes suggested\n") + print("Warning: Could not extract diff", file=sys.stderr) + + # Print to stdout unless quiet (or multiple files without verbose) + show_stdout = not args.quiet and (num_files == 1 or args.verbose) + if show_stdout: + print( + f"\n=== Documentation Review: {doc_path.name} " + f"(via {provider_name}) ===" + ) + print(output_text) + + # Print usage instructions for text format + if args.output_format == "text": + print("\n=== Output Files ===") + print(f"Commit message: {msg_file}") + print(f"Diff file: {diff_file}") + print("\nTo apply changes:") + print(f" git apply {diff_file}") + print(f" git commit -sF {msg_file}") + + # Send email if requested + if args.send_email: + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + review_subject = f"[REVIEW] {commit_prefix} {doc_path.name}" + + # Build email body + email_body = f"""AI-generated documentation review of {doc_file} +Reviewed using {provider_name} ({model}) + +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + None, + email_body, + args.dry_run, + args.verbose, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + # Print summary for multiple files + if num_files > 1: + print(f"\n{'=' * 60}", file=sys.stderr) + print(f"Processed {num_files} files", file=sys.stderr) + print(f"Output directory: {output_dir}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH v9 4/6] devtools: add multi-provider AI documentation review script 2026-02-13 21:39 ` [PATCH v9 4/6] devtools: add multi-provider AI documentation review script Stephen Hemminger @ 2026-02-17 18:20 ` Andrew Bailey 0 siblings, 0 replies; 63+ messages in thread From: Andrew Bailey @ 2026-02-17 18:20 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev, Aaron Conole [-- Attachment #1: Type: text/plain, Size: 293 bytes --] Hi stephen, The series looks solid overall. However, could you please add type hints to the methods in your Python scripts? This aligns with current best practices and significantly improves readability, making it easier for review and understanding the code's behavior. best, Andrew Bailey [-- Attachment #2: Type: text/html, Size: 392 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v9 5/6] doc: add AI-assisted patch review to contributing guide 2026-02-13 21:39 ` [PATCH v9 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger ` (3 preceding siblings ...) 2026-02-13 21:39 ` [PATCH v9 4/6] devtools: add multi-provider AI documentation review script Stephen Hemminger @ 2026-02-13 21:39 ` Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 6/6] MAINTAINERS: add section for AI review tools Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 0/6] AI-assisted code and documentation review Stephen Hemminger 6 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-13 21:39 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a new section to the contributing guide describing the analyze-patch.py script which uses AI providers to review patches against DPDK coding standards before submission to the mailing list. The new section covers basic usage, provider selection, patch series handling, LTS release review, and output format options. A note clarifies that AI review supplements but does not replace human review. Also add a reference to the script in the new driver guide's test tools checklist. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- doc/guides/contributing/new_driver.rst | 2 + doc/guides/contributing/patches.rst | 56 ++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/doc/guides/contributing/new_driver.rst b/doc/guides/contributing/new_driver.rst index 555e875329..6c0d356cfd 100644 --- a/doc/guides/contributing/new_driver.rst +++ b/doc/guides/contributing/new_driver.rst @@ -210,3 +210,5 @@ Be sure to run the following test tools per patch in a patch series: * `check-doc-vs-code.sh` * `check-spdx-tag.sh` * Build documentation and validate how output looks +* Optionally run ``analyze-patch.py`` for AI-assisted review + (see :ref:`ai_assisted_review` in the Contributing Guide) diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst index 5f554d47e6..74fc714d16 100644 --- a/doc/guides/contributing/patches.rst +++ b/doc/guides/contributing/patches.rst @@ -503,6 +503,62 @@ Additionally, when contributing to the DTS tool, patches should also be checked the ``dts-check-format.sh`` script in the ``devtools`` directory of the DPDK repo. To run the script, extra :ref:`Python dependencies <dts_deps>` are needed. + +.. _ai_assisted_review: + +AI-Assisted Patch Review +------------------------ + +Contributors may optionally use the ``devtools/analyze-patch.py`` script +to get an AI-assisted review of patches before submitting them to the mailing list. +The script checks patches against the DPDK coding standards and contribution +guidelines documented in ``AGENTS.md``. + +The script supports multiple AI providers (Anthropic Claude, OpenAI ChatGPT, +xAI Grok, Google Gemini). An API key for the chosen provider must be set +in the corresponding environment variable (see ``--list-providers``). + +Basic usage:: + + # Review a single patch (default provider: Anthropic Claude) + devtools/analyze-patch.py my-patch.patch + + # Use a different provider + devtools/analyze-patch.py -p openai my-patch.patch + + # Review for an LTS branch (enables stricter rules) + devtools/analyze-patch.py -r 24.11 my-patch.patch + + # List available providers and their API key variables + devtools/analyze-patch.py --list-providers + +For a patch series in an mbox file, the ``--split-patches`` option reviews +each patch individually:: + + devtools/analyze-patch.py --split-patches series.mbox + + # Review only a range of patches + devtools/analyze-patch.py --split-patches --patch-range 1-5 series.mbox + +When reviewing for a Long Term Stable (LTS) release, use the ``-r`` option +with the target version. Any DPDK release with minor version ``.11`` +(e.g., 23.11, 24.11) is automatically recognized as LTS, +and the script will enforce stricter rules: bug fixes only, no new features or APIs. + +Output can be formatted as plain text (default), Markdown, HTML, or JSON:: + + devtools/analyze-patch.py -f markdown -o review.md my-patch.patch + +The review guidelines in ``AGENTS.md`` cover commit message formatting, +SPDX licensing, C coding style, forbidden API usage, symbol export rules, +and other DPDK-specific requirements. + +.. note:: + + AI-assisted review is a supplement to, not a replacement for, + human review on the mailing list. + Always verify AI suggestions before acting on them. + .. _contrib_check_compilation: Checking Compilation -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v9 6/6] MAINTAINERS: add section for AI review tools 2026-02-13 21:39 ` [PATCH v9 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger ` (4 preceding siblings ...) 2026-02-13 21:39 ` [PATCH v9 5/6] doc: add AI-assisted patch review to contributing guide Stephen Hemminger @ 2026-02-13 21:39 ` Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 0/6] AI-assisted code and documentation review Stephen Hemminger 6 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-13 21:39 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Thomas Monjalon Add maintainer entries for the AI-assisted code review tooling: AGENTS.md, analyze-patch.py, compare-reviews.sh, and review-doc.py. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- MAINTAINERS | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 25fb109ef4..e8362e83ca 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -109,6 +109,14 @@ F: license/ F: .editorconfig F: .mailmap +AI review tools +M: Stephen Hemminger <stephen@networkplumber.org> +M: Aaron Conole <aconole@redhat.com> +F: AGENTS.md +F: devtools/analyze-patch.py +F: devtools/compare-reviews.sh +F: devtools/review-doc.py + Linux kernel uAPI headers M: Maxime Coquelin <maxime.coquelin@redhat.com> F: devtools/linux-uapi.sh -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v10 0/6] AI-assisted code and documentation review 2026-02-13 21:39 ` [PATCH v9 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger ` (5 preceding siblings ...) 2026-02-13 21:39 ` [PATCH v9 6/6] MAINTAINERS: add section for AI review tools Stephen Hemminger @ 2026-02-19 17:48 ` Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger ` (5 more replies) 6 siblings, 6 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-19 17:48 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This series adds tooling for AI-assisted review of DPDK patches and documentation. The goal is to catch common coding style violations, commit message formatting errors, and documentation issues before patches reach the mailing list, reducing reviewer workload on mechanical checks. The approach uses a structured guidelines document (AGENTS.md) that codifies the project's existing standards from the contributor documentation, coding style guide, and validation scripts into a format that AI models can apply consistently. The review scripts then submit patches or documentation against these guidelines using any of several AI providers. The series includes: 1. AGENTS.md - consolidated review guidelines derived from patches.rst, coding_style.rst, check-git-log.sh, checkpatches.sh, and the Coccinelle scripts 2. analyze-patch.py - reviews patches against AGENTS.md using AI providers (Anthropic, OpenAI, xAI, Google), with support for patch series splitting and LTS-specific rules 3. compare-reviews.sh - runs the same patch through multiple providers for comparison, auto-detecting available API keys 4. review-doc.py - reviews documentation files for spelling, grammar, and technical accuracy, with batch processing and multiple output formats 5. Contributing guide updates documenting the new tools 6. MAINTAINERS entry for the new files AI review is intended as a supplement to human review, not a replacement. The tools are optional and require the contributor to have an API key for their chosen provider. v10 - add more anti-patterns found in reviews. v9 - update AGENTS.md to cover bugs that were found in 26.03 and previous PVS studio scans. Stephen Hemminger (6): doc: add AGENTS.md for AI code review tools devtools: add multi-provider AI patch review script devtools: add compare-reviews.sh for multi-provider analysis devtools: add multi-provider AI documentation review script doc: add AI-assisted patch review to contributing guide MAINTAINERS: add section for AI review tools AGENTS.md | 1882 ++++++++++++++++++++++++ MAINTAINERS | 8 + devtools/analyze-patch.py | 1334 +++++++++++++++++ devtools/compare-reviews.sh | 192 +++ devtools/review-doc.py | 1098 ++++++++++++++ doc/guides/contributing/new_driver.rst | 2 + doc/guides/contributing/patches.rst | 56 + 7 files changed, 4572 insertions(+) create mode 100644 AGENTS.md create mode 100755 devtools/analyze-patch.py create mode 100755 devtools/compare-reviews.sh create mode 100755 devtools/review-doc.py -- 2.51.0 ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v10 1/6] doc: add AGENTS.md for AI code review tools 2026-02-19 17:48 ` [PATCH v10 0/6] AI-assisted code and documentation review Stephen Hemminger @ 2026-02-19 17:48 ` Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 2/6] devtools: add multi-provider AI patch review script Stephen Hemminger ` (4 subsequent siblings) 5 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-19 17:48 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a reference document that enables AI code review tools to validate DPDK patches against project coding standards and submission requirements. The document consolidates guidelines from the DPDK contributor documentation, coding style guide, and validation scripts (check-git-log.sh, checkpatches.sh) into a single structured file. It covers SPDX license requirements, commit message formatting, C coding style rules, forbidden token restrictions, API tag placement, and severity levels aligned with existing validation tool output. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- AGENTS.md | 1882 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1882 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..623de590c4 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,1882 @@ +# AGENTS.md - DPDK Code Review Guidelines for AI Tools + +## CRITICAL INSTRUCTION - READ FIRST + +This document has two categories of review rules with different +confidence thresholds: + +### 1. Correctness Bugs -- HIGHEST PRIORITY (report at >=50% confidence) + +**Always report potential correctness bugs.** These are the most +valuable findings. When in doubt, report them with a note about +your confidence level. A possible use-after-free or resource leak +is worth mentioning even if you are not certain. + +Correctness bugs include: +- Use-after-free (accessing memory after `free`/`rte_free`) +- Resource leaks on error paths (memory, file descriptors, locks) +- Double-free or double-close +- NULL pointer dereference +- Buffer overflows or out-of-bounds access +- Uninitialized variable use in a reachable code path +- Race conditions (unsynchronized shared state) +- Missing error checks on functions that can fail +- Error paths that skip cleanup (goto labels, missing free/close) +- Incorrect error propagation (wrong return value, lost errno) +- Logic errors in conditionals (wrong operator, inverted test) +- Integer overflow/truncation in size calculations +- Missing bounds checks on user-supplied sizes or indices +- `mmap()` return checked against `NULL` instead of `MAP_FAILED` +- Statistics accumulation using `=` instead of `+=` +- Integer multiply without widening cast losing upper bits (16×16, 32×32, etc.) +- Unbounded descriptor chain traversal on guest/API-supplied data +- `1 << n` on 64-bit bitmask (must use `1ULL << n` or `RTE_BIT64()`) +- Variable assigned then overwritten before being read (dead store) +- Same variable used as loop counter in nested loops +- `memcpy`/`memcmp`/`memset` with same pointer for source and destination (no-op or undefined) +- `rte_pktmbuf_free_bulk()` called on mbufs that may originate from different mempools (Tx burst, ring dequeue) + +**Do NOT self-censor correctness bugs.** If you identify a code +path where a resource could leak or memory could be used after +free, report it. Do not talk yourself out of it. + +### 2. Style, Process, and Formatting -- suppress false positives + +**NEVER list a style/process item under "Errors" or "Warnings" if +you conclude it is correct.** + +Before outputting any style, formatting, or process error/warning, +verify it is actually wrong. If your analysis concludes with +phrases like "there's no issue here", "which is fine", "appears +correct", "is acceptable", or "this is actually correct" -- then +DO NOT INCLUDE IT IN YOUR OUTPUT AT ALL. Delete it. Omit it +entirely. + +This suppression rule applies to: SPDX/copyright format, commit +message formatting, tag ordering, line length, naming conventions, +code style, and process compliance. It does NOT apply to +correctness bugs listed above. + +--- + +This document provides guidelines for AI-powered code review tools +when reviewing contributions to the Data Plane Development Kit +(DPDK). It is derived from the official DPDK contributor guidelines +and validation scripts. + +## Overview + +DPDK follows a development process modeled on the Linux Kernel. All +patches are reviewed publicly on the mailing list before being +merged. AI review tools should verify compliance with the standards +outlined below. + +## Review Philosophy + +**Correctness bugs are the primary goal of AI review.** Style and +formatting checks are secondary. A review that catches a +use-after-free but misses a style nit is far more valuable than +one that catches every style issue but misses the bug. + +**BEFORE OUTPUTTING YOUR REVIEW**: Re-read each item. +- For correctness bugs: keep them. If you have reasonable doubt + that a code path is safe, report it. +- For style/process items: if ANY item contains phrases like "is + fine", "no issue", "appears correct", "is acceptable", + "actually correct" -- DELETE THAT ITEM. Do not include it. + +### Correctness review guidelines +- Trace error paths: for every function that allocates a resource + or acquires a lock, verify that ALL error paths after that point + release it +- Check every `goto error` and early `return`: does it clean up + everything allocated so far? +- Look for use-after-free: after `free(p)`, is `p` accessed again? +- Check that error codes are propagated, not silently dropped +- Report at >=50% confidence; note uncertainty if appropriate +- It is better to report a potential bug that turns out to be safe + than to miss a real bug + +### Style and process review guidelines +- Only comment on style/process issues when you have HIGH CONFIDENCE (>80%) that an issue exists +- Be concise: one sentence per comment when possible +- Focus on actionable feedback, not observations +- When reviewing text, only comment on clarity issues if the text is genuinely + confusing or could lead to errors. +- Do NOT comment on copyright years unless outside valid range (2013 to current year) +- Do NOT report an issue then contradict yourself - if something is acceptable, do not mention it at all +- Do NOT include items in Errors/Warnings that you then say are "acceptable" or "correct" +- Do NOT mention things that are correct or "not an issue" - only report actual problems +- Do NOT speculate about contributor circumstances (employment, company policies, etc.) +- Before adding any style item to your review, ask: "Is this actually wrong?" If no, omit it entirely. +- VERIFY before reporting: For subject line length, COUNT the characters first. If <=60, do not mention it. +- NEVER write "(Correction: ...)" - if you need to correct yourself, simply omit the item entirely +- Do NOT add vague suggestions like "should be verified" or "should be checked" - either it's wrong or don't mention it +- Do NOT flag something as an Error then say "which is correct" in the same item +- Do NOT say "no issue here" or "this is actually correct" - if there's no issue, do not include it in your review +- Do NOT call the standard DPDK SPDX/copyright format "different style" - it is THE standard +- Do NOT analyze cross-patch dependencies or compilation order - you cannot reliably determine this from patch review +- Do NOT claim a patch "would cause compilation failure" based on symbols used in other patches in the series +- Review each patch individually for its own correctness; assume the patch author ordered them correctly +- When reviewing a patch series, OMIT patches that have no issues. Do not include a patch in your output just to say "no issues found" or to summarize what the patch does. Only include patches where you have actual findings to report. + +## Priority Areas (Review These) + +### Security & Safety +- Unsafe code blocks without justification +- Command injection risks (shell commands, user input) +- Path traversal vulnerabilities +- Credential exposure or hard coded secrets +- Missing input validation on external data +- Improper error handling that could leak sensitive info + +### Correctness Issues +- Logic errors that could cause panics or incorrect behavior +- Buffer overflows +- Race conditions +- Resource leaks (files, connections, memory) +- Off-by-one errors or boundary conditions +- Incorrect error propagation +- **Use-after-free** (any access to memory after it has been freed) +- **Error path resource leaks**: For every allocation or fd open, + trace each error path (`goto`, early `return`, conditional) to + verify the resource is released. Common patterns to check: + - `malloc`/`rte_malloc` followed by a failure that does `return -1` + instead of `goto cleanup` + - `open()`/`socket()` fd not closed on a later error + - Lock acquired but not released on an error branch + - Partially initialized structure where early fields are allocated + but later allocation fails without freeing the early ones +- **Double-free / double-close**: resource freed in both a normal + path and an error path, or fd closed but not set to -1 allowing + a second close +- **Missing error checks**: functions that can fail (malloc, open, + ioctl, etc.) whose return value is not checked +- Changes to API without release notes +- Changes to ABI on non-LTS release +- Usage of deprecated APIs when replacements exist +- Overly defensive code that adds unnecessary checks +- Unnecessary comments that just restate what the code already shows (remove them) +- **Process-shared synchronization errors** (pthread mutexes in shared memory without `PTHREAD_PROCESS_SHARED`) +- **`mmap()` checked against NULL instead of `MAP_FAILED`**: `mmap()` returns + `MAP_FAILED` (i.e., `(void *)-1`) on failure, NOT `NULL`. Checking + `== NULL` or `!= NULL` will miss the error and use an invalid pointer. + ```c + /* BAD - mmap never returns NULL on failure */ + p = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); + if (p == NULL) /* WRONG - will not catch MAP_FAILED */ + return -1; + + /* GOOD */ + p = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); + if (p == MAP_FAILED) + return -1; + ``` +- **Statistics accumulation using `=` instead of `+=`**: When accumulating + statistics (counters, byte totals, packet counts), using `=` overwrites + the running total with only the latest value. This silently produces + wrong results. + ```c + /* BAD - overwrites instead of accumulating */ + stats->rx_packets = nb_rx; + stats->rx_bytes = total_bytes; + + /* GOOD - accumulates over time */ + stats->rx_packets += nb_rx; + stats->rx_bytes += total_bytes; + ``` + Note: `=` is correct for gauge-type values (e.g., queue depth, link + status) and for initial assignment. Only flag when the context is + clearly incremental accumulation (loop bodies, per-burst counters, + callback tallies). +- **Integer multiply without widening cast**: When multiplying integers + to produce a result wider than the operands (sizes, offsets, byte + counts), the multiplication is performed at the operand width and + the upper bits are silently lost before the assignment. This applies + to any narrowing scenario: 16×16 assigned to a 32-bit variable, + 32×32 assigned to a 64-bit variable, etc. + ```c + /* BAD - 32×32 overflows before widening to 64 */ + uint64_t total_size = num_entries * entry_size; /* both are uint32_t */ + size_t offset = ring->idx * ring->desc_size; /* 32×32 → truncated */ + + /* BAD - 16×16 overflows before widening to 32 */ + uint32_t byte_count = pkt_len * nb_segs; /* both are uint16_t */ + + /* GOOD - widen before multiply */ + uint64_t total_size = (uint64_t)num_entries * entry_size; + size_t offset = (size_t)ring->idx * ring->desc_size; + uint32_t byte_count = (uint32_t)pkt_len * nb_segs; + ``` +- **Unbounded descriptor chain traversal**: When walking a chain of + descriptors (virtio, DMA, NIC Rx/Tx rings) where the chain length + or next-index comes from guest memory or an untrusted API caller, + the traversal MUST have a bounds check or loop counter to prevent + infinite loops or out-of-bounds access from malicious/corrupt data. + ```c + /* BAD - guest controls desc[idx].next with no bound */ + while (desc[idx].flags & VRING_DESC_F_NEXT) { + idx = desc[idx].next; /* guest-supplied, unbounded */ + process(desc[idx]); + } + + /* GOOD - cap iterations to descriptor ring size */ + for (i = 0; i < ring_size; i++) { + if (!(desc[idx].flags & VRING_DESC_F_NEXT)) + break; + idx = desc[idx].next; + if (idx >= ring_size) /* bounds check */ + return -EINVAL; + process(desc[idx]); + } + ``` + This applies to any chain/linked-list traversal where indices or + pointers originate from untrusted input (guest VMs, user-space + callers, network packets). +- **Bitmask shift using `1` instead of `1ULL` on 64-bit masks**: The + literal `1` is `int` (32 bits). Shifting it by 32 or more is + undefined behavior; shifting it by less than 32 but assigning to a + `uint64_t` silently zeroes the upper 32 bits. Use `1ULL << n`, + `UINT64_C(1) << n`, or the DPDK `RTE_BIT64(n)` macro. + ```c + /* BAD - 1 is int, UB if n >= 32, wrong if result used as uint64_t */ + uint64_t mask = 1 << bit_pos; + if (features & (1 << VIRTIO_NET_F_MRG_RXBUF)) /* bit 15 OK, bit 32+ UB */ + + /* GOOD */ + uint64_t mask = UINT64_C(1) << bit_pos; + uint64_t mask = 1ULL << bit_pos; + uint64_t mask = RTE_BIT64(bit_pos); /* preferred in DPDK */ + if (features & RTE_BIT64(VIRTIO_NET_F_MRG_RXBUF)) + ``` + Note: `1U << n` is acceptable when the mask is known to be 32-bit + (e.g., `uint32_t` register fields with `n < 32`). Only flag when + the result is stored in, compared against, or returned as a 64-bit + type, or when `n` could be >= 32. +- **Variable overwrite before read (dead store)**: A variable is + assigned a value that is unconditionally overwritten before it is + ever read. This usually indicates a logic error (wrong variable + name, missing `if`, copy-paste mistake) or at minimum is dead code. + ```c + /* BAD - first assignment is never read */ + ret = validate_input(cfg); + ret = apply_config(cfg); /* overwrites without checking first ret */ + if (ret != 0) + return ret; + + /* GOOD - check each return value */ + ret = validate_input(cfg); + if (ret != 0) + return ret; + ret = apply_config(cfg); + if (ret != 0) + return ret; + ``` + Do NOT flag cases where the initial value is intentionally a default + that may or may not be overwritten (e.g., `int ret = 0;` followed + by a conditional assignment). Only flag unconditional overwrites + where the first value can never be observed. +- **Shared loop counter in nested loops**: Using the same variable as + the loop counter in both an outer and inner loop causes the outer + loop to malfunction because the inner loop modifies its counter. + ```c + /* BAD - inner loop clobbers outer loop counter */ + int i; + for (i = 0; i < nb_queues; i++) { + setup_queue(i); + for (i = 0; i < nb_descs; i++) /* BUG: reuses i */ + init_desc(i); + } + + /* GOOD - distinct loop counters */ + for (int i = 0; i < nb_queues; i++) { + setup_queue(i); + for (int j = 0; j < nb_descs; j++) + init_desc(j); + } + ``` +- **`memcpy`/`memcmp`/`memset` self-argument (same pointer as both + operands)**: Passing the same pointer as both source and destination + to `memcpy()` is undefined behavior per C99. Passing the same + pointer to both arguments of `memcmp()` is a no-op that always + returns 0, indicating a logic error (usually a copy-paste mistake + with the wrong variable name). The same applies to `rte_memcpy()` + and `memmove()` with identical arguments. + ```c + /* BAD - memcpy with same src and dst is undefined behavior */ + memcpy(buf, buf, len); + rte_memcpy(dst, dst, len); + + /* BAD - memcmp with same pointer always returns 0 (logic error) */ + if (memcmp(key, key, KEY_LEN) == 0) /* always true, wrong variable? */ + + /* BAD - likely copy-paste: should be comparing two different MACs */ + if (memcmp(ð->src_addr, ð->src_addr, RTE_ETHER_ADDR_LEN) == 0) + + /* GOOD - comparing two different things */ + memcpy(dst, src, len); + if (memcmp(ð->src_addr, ð->dst_addr, RTE_ETHER_ADDR_LEN) == 0) + ``` + This pattern almost always indicates a copy-paste bug where one of + the arguments should be a different variable. +- **`rte_pktmbuf_free_bulk()` on mixed-pool mbuf arrays**: Tx burst functions + and ring/queue dequeue paths receive mbufs that may originate from different + mempools (applications are free to send mbufs from any pool). + `rte_pktmbuf_free_bulk()` returns ALL mbufs to the pool of the first mbuf + in the array. If mbufs come from different pools, subsequent mbufs are + returned to the wrong pool, corrupting pool accounting and causing + hard-to-debug failures. + ```c + /* BAD - assumes all mbufs are from the same pool */ + /* (in tx_burst completion or ring dequeue error path) */ + rte_pktmbuf_free_bulk(mbufs, nb_mbufs); + + /* GOOD - free individually (each mbuf returned to its own pool) */ + for (i = 0; i < nb_mbufs; i++) + rte_pktmbuf_free(mbufs[i]); + + /* GOOD - batch by pool if performance matters */ + /* group mbufs by pool, then call rte_mempool_put_bulk per group */ + ``` + This applies to any path that frees mbufs submitted by the application: + Tx completion, Tx error cleanup, and ring/queue drain paths. Rx burst + functions that allocate all mbufs from a single pool are not affected. + +### Architecture & Patterns +- Code that violates existing patterns in the code base +- Missing error handling +- Code that is not safe against signals + +### New Library API Design + +When a patch adds a new library under `lib/`, review API design in +addition to correctness and style. + +**API boundary.** A library should be a compiler, not a framework. +The model is `rte_acl`: create a context, feed input, get structured +output, caller decides what to do with it. No callbacks needed. If +the library requires callers to implement a callback table to +function, the boundary is wrong — the library is asking the caller +to be its backend. + +**Callback structs** (Warning / Error). Any function-pointer struct +in an installed header is an ABI break waiting to happen. Adding or +reordering a member breaks all consumers. +- Prefer a single callback parameter over an ops table. +- \>5 callbacks: **Warning** — likely needs redesign. +- \>20 callbacks: **Error** — this is an app plugin API, not a library. +- All callbacks must have Doxygen (contract, return values, ownership). +- Void-returning callbacks for failable operations swallow errors — + flag as **Error**. +- Callbacks serving app-specific needs (e.g. `verbose_level_get`) + indicate wrong code was extracted into the library. + +**Extensible structures.** Prefer TLV / tagged-array patterns over +enum + union, following `rte_flow_item` and `rte_flow_action` as +the model. Type tag + pointer to type-specific data allows adding +types without ABI breaks. Flag as **Warning**: +- Large enums (100+) consumers must switch on. +- Unions that grow with every new feature. +- Ask: "What changes when a feature is added next release?" If + "add an enum value and union arm" — should be TLV. + +**Installed headers.** If it's in `headers` or `indirect_headers` +in meson.build, it's public API. Don't call it "private." If truly +internal, don't install it. + +**Global state.** Prefer handle-based APIs (`create`/`destroy`) +over singletons. `rte_acl` allows multiple independent classifier +instances; new libraries should do the same. + +**Output ownership.** Prefer caller-allocated or library-allocated- +caller-freed over internal static buffers. If static buffers are +used, document lifetime and ensure Doxygen examples don't show +stale-pointer usage. + +--- + +## Source License Requirements + +### SPDX License Identifiers + +Every source file must begin with an SPDX license identifier, followed +by the copyright notice, then a blank line before other content. + +- SPDX tag on first line (or second line for `#!` scripts) +- Copyright line immediately follows +- Blank line after copyright before any code/includes +- Core libraries and drivers use `BSD-3-Clause` +- Kernel components use `GPL-2.0` +- Dual-licensed code uses: `(BSD-3-Clause OR GPL-2.0)` + +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 John Smith + */ + +#include <stdio.h> +``` + +For scripts: +```python +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 Jane Doe + +import sys +``` + +**Do not include boilerplate license text** - the SPDX identifier is sufficient. + +**Do NOT flag copyright years** - Copyright years reflect when code was written. Valid years range from 2013 (DPDK's first release) through the current year. Only flag years outside this range (e.g., years before 2013 or future years beyond the current date). + +**Copyright holders can be individuals or organizations** - Both are equally valid. NEVER comment on, question, or speculate about copyright holders. Do not mention employer policies, company resources, or suggest copyright "should" be assigned differently. The copyright holder's choice is not subject to review. + +**The following SPDX/copyright format is correct** - do NOT flag it or comment on it: +```c +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2026 Stephen Hemminger + */ +``` +This is the standard DPDK format. Do not say it is "different" or "unusual" - it is correct. +Do not suggest the copyright year "should be verified" - if it's in the valid range (2013-current year), it's fine. +If SPDX is on line 1 and copyright follows, this is CORRECT - do not include it in Errors. + +--- + +## Commit Message Requirements + +### Subject Line (First Line) + +| Rule | Limit | +|------|-------| +| Maximum length | **60 characters** | +| Format | `component: lowercase description` | +| Case | Lowercase except acronyms | +| Mood | Imperative (instructions to codebase) | +| Punctuation | **No trailing period** | + +**Before flagging subject line length**: Actually count the characters. Only flag if >60. Do not flag then correct yourself. + +``` +# Good examples +net/ixgbe: fix offload config option name +config: increase max queues per port +net/mlx5: add support for flow counters +app/testpmd: fix memory leak in flow create + +# Bad examples +Fixed the offload config option. # past tense, has period, no prefix +net/ixgbe: Fix Offload Config # uppercase after colon +ixgbe: fix something # wrong prefix, should be net/ixgbe +lib/ethdev: add new feature # wrong prefix, should be ethdev: +``` + +#### Headline Format Errors (from check-git-log.sh) + +The following are flagged as errors: +- Tab characters in subject +- Leading or trailing spaces +- Trailing period (`.`) +- Punctuation marks: `, ; ! ? & |` +- Underscores after the colon (indicates code in subject) +- Missing colon separator +- No space after colon +- Space before colon + +#### Common Prefix Mistakes + +| Wrong | Correct | +|-------|---------| +| `ixgbe:` | `net/ixgbe:` | +| `lib/ethdev:` | `ethdev:` | +| `example:` | `examples/foo:` | +| `apps/` | `app/name:` | +| `app/test:` | `test:` | +| `testpmd:` | `app/testpmd:` | +| `test-pmd:` | `app/testpmd:` | +| `bond:` | `net/bonding:` | + +#### Case-Sensitive Terms (Commit Messages Only) + +These terms must use exact capitalization **in commit messages** (from `devtools/words-case.txt`): +- `Rx`, `Tx` (not `RX`, `TX`, `rx`, `tx`) +- `VF`, `PF` (not `vf`, `pf`) +- `MAC`, `VLAN`, `RSS`, `API` +- `Linux`, `Windows`, `FreeBSD` +- Check `devtools/words-case.txt` for complete list + +**Note**: These rules apply to commit messages only, NOT to code comments or documentation. + +### Commit Body + +| Rule | Limit | +|------|-------| +| Line wrap | **75 characters** | +| Exception | `Fixes:` lines may exceed 75 chars | + +Body guidelines: +- Describe the issue being fixed or feature being added +- Provide enough context for reviewers +- **Do not start the commit message body with "It"** +- **Must end with** `Signed-off-by:` line (real name, not alias) + +### Fixes Tag + +When fixing regressions, use the `Fixes:` tag with a 12-character abbreviated SHA: + +``` +Fixes: abcdefgh1234 ("original commit subject") +``` + +The hash must reference a commit in the current branch, and the subject must match exactly. + +**Do NOT flag Fixes tags** asking for verification that the commit "exists in the tree" or "cannot verify" - you cannot verify this from a patch review. If the format is correct (12-char SHA, quoted subject), accept it. NEVER say "cannot verify this exists". + +**Finding maintainers**: Use `devtools/get-maintainer.sh` to identify the current subsystem maintainer from the `MAINTAINERS` file, rather than CC'ing the original author: + +```bash +git send-email --to-cmd ./devtools/get-maintainer.sh --cc dev@dpdk.org 000*.patch +``` + +### Required Tags + +``` +# For Coverity issues (required if "coverity" mentioned in body): +Coverity issue: 12345 + +# For Bugzilla issues (required if "bugzilla" mentioned in body): +Bugzilla ID: 12345 + +# For stable release backport candidates: +Cc: stable@dpdk.org + +# For patch dependencies (in commit notes after ---): +Depends-on: series-NNNNN ("Title of the series") +``` + +### Tag Order + +Tags must appear in this order, with a blank line separating the two groups: + +**Group 1** (optional tags, no blank lines within this group): +- Coverity issue: +- Bugzilla ID: +- Fixes: +- Cc: + +**Blank line required here** (only if Group 1 tags are present) + +**Group 2** (no blank lines within this group): +- Reported-by: +- Suggested-by: +- Signed-off-by: +- Acked-by: +- Reviewed-by: +- Tested-by: + +**Correct examples:** + +Simple patch with no Group 1 tags (most common): +``` +The info_get callback doesn't need to check its args +since already done by ethdev. + +Signed-off-by: John Smith <john@example.com> +``` + +Patch with Fixes and Cc tags: +``` +Fixes: c743e50c475f ("null: new poll mode driver") +Cc: stable@dpdk.org + +Signed-off-by: John Smith <john@example.com> +``` + +Patch with only Fixes tag: +``` +Fixes: abcd1234abcd ("component: original commit") + +Signed-off-by: Jane Doe <jane@example.com> +``` + +**What is correct (do NOT flag):** +- Signed-off-by directly after commit body when there are no Group 1 tags - this is CORRECT +- No blank line between `Fixes:` and `Cc:` - this is CORRECT +- Blank line between `Cc:` (or last tag in Group 1) and `Signed-off-by:` - this is CORRECT + +**What is wrong (DO flag):** +- Missing blank line between Group 1 tags and Group 2 tags (when Group 1 tags exist) + +**Tag format**: `Tag-name: Full Name <email@domain.com>` + +--- + +## C Coding Style + +### Line Length + +| Context | Limit | +|---------|-------| +| Source code | **100 characters** | +| Commit body | **75 characters** | + +### General Formatting + +- **Tab width**: 8 characters (hard tabs for indentation, spaces for alignment) +- **No trailing whitespace** on lines or at end of files +- Files must end with a new line +- Code style should be consistent within each file + + +### Comments + +```c +/* Most single-line comments look like this. */ + +/* + * VERY important single-line comments look like this. + */ + +/* + * Multi-line comments look like this. Make them real sentences. Fill + * them so they look like real paragraphs. + */ +``` + +### Header File Organization + +Include order (each group separated by blank line): +1. System/libc includes +2. DPDK EAL includes +3. DPDK misc library includes +4. Application-specific includes + +```c +#include <stdio.h> +#include <stdlib.h> + +#include <rte_eal.h> + +#include <rte_ring.h> +#include <rte_mempool.h> + +#include "application.h" +``` + +### Header Guards + +```c +#ifndef _FILE_H_ +#define _FILE_H_ + +/* Code */ + +#endif /* _FILE_H_ */ +``` + +### Naming Conventions + +- **All external symbols** must have `RTE_` or `rte_` prefix +- **Macros**: ALL_UPPERCASE with `RTE_` prefix +- **Functions**: lowercase with underscores only (no CamelCase) +- **Variables**: lowercase with underscores only +- **Enum values**: ALL_UPPERCASE with `RTE_<ENUM>_` prefix + +**Exception**: Driver base directories (`drivers/*/base/`) may use different +naming conventions when sharing code across platforms or with upstream vendor code. + +#### Symbol Naming for Static Linking + +Drivers and libraries must not expose global variables that could +clash when statically linked with other DPDK components or +applications. Use consistent and unique prefixes for all exported +symbols to avoid namespace collisions. + +**Good practice**: Use a driver-specific or library-specific prefix for all global variables: + +```c +/* Good - virtio driver uses consistent "virtio_" prefix */ +const struct virtio_ops virtio_legacy_ops = { + .read = virtio_legacy_read, + .write = virtio_legacy_write, + .configure = virtio_legacy_configure, +}; + +const struct virtio_ops virtio_modern_ops = { + .read = virtio_modern_read, + .write = virtio_modern_write, + .configure = virtio_modern_configure, +}; + +/* Good - mlx5 driver uses consistent "mlx5_" prefix */ +struct mlx5_flow_driver_ops mlx5_flow_dv_ops; +``` + +**Bad practice**: Generic names that may clash: + +```c +/* Bad - "ops" is too generic, will clash with other drivers */ +const struct virtio_ops ops = { ... }; + +/* Bad - "legacy_ops" could clash with other legacy implementations */ +const struct virtio_ops legacy_ops = { ... }; + +/* Bad - "driver_config" is not unique */ +struct driver_config config; +``` + +**Guidelines**: +- Prefix all global variables with the driver or library name (e.g., `virtio_`, `mlx5_`, `ixgbe_`) +- Prefix all global functions similarly unless they use the `rte_` namespace +- Internal static variables do not require prefixes as they have file scope +- Consider using the `RTE_` or `rte_` prefix only for symbols that are part of the public DPDK API + +#### Prohibited Terminology + +Do not use non-inclusive naming including: +- `master/slave` -> Use: primary/secondary, controller/worker, leader/follower +- `blacklist/whitelist` -> Use: denylist/allowlist, blocklist/passlist +- `cripple` -> Use: impacted, degraded, restricted, immobolized +- `tribe` -> Use: team, squad +- `sanity check` -> Use: coherence check, test, verification + + +### Comparisons and Boolean Logic + +```c +/* Pointers - compare explicitly with NULL */ +if (p == NULL) /* Good */ +if (p != NULL) /* Good */ +if (likely(p != NULL)) /* Good - likely/unlikely don't change this */ +if (unlikely(p == NULL)) /* Good - likely/unlikely don't change this */ +if (!p) /* Bad - don't use ! on pointers */ + +/* Integers - compare explicitly with zero */ +if (a == 0) /* Good */ +if (a != 0) /* Good */ +if (errno != 0) /* Good - this IS explicit */ +if (likely(a != 0)) /* Good - likely/unlikely don't change this */ +if (!a) /* Bad - don't use ! on integers */ +if (a) /* Bad - implicit, should be a != 0 */ + +/* Characters - compare with character constant */ +if (*p == '\0') /* Good */ + +/* Booleans - direct test is acceptable */ +if (flag) /* Good for actual bool types */ +if (!flag) /* Good for actual bool types */ +``` + +**Explicit comparison** means using `==` or `!=` operators (e.g., `x != 0`, `p == NULL`). +**Implicit comparison** means relying on truthiness without an operator (e.g., `if (x)`, `if (!p)`). +**Note**: `likely()` and `unlikely()` macros do NOT affect whether a comparison is explicit or implicit. + +### Boolean Usage + +- Using `bool` type is allowed +- Prefer `bool` over `int` when a variable or field is only used as a boolean +- For structure fields, consider if the size/alignment impact is acceptable + +### Indentation and Braces + +```c +/* Control statements - no braces for single statements */ +if (val != NULL) + val = realloc(val, newsize); + +/* Braces on same line as else */ +if (test) + stmt; +else if (bar) { + stmt; + stmt; +} else + stmt; + +/* Switch statements - don't indent case */ +switch (ch) { +case 'a': + aflag = 1; + /* FALLTHROUGH */ +case 'b': + bflag = 1; + break; +default: + usage(); +} + +/* Long conditions - double indent continuation */ +if (really_long_variable_name_1 == really_long_variable_name_2 && + really_long_variable_name_3 == really_long_variable_name_4) + stmt; +``` + +### Variable Declarations + +- Prefer declaring variables inside the basic block where they are used +- Variables may be declared either at the start of the block, or at point of first use (C99 style) +- Both declaration styles are acceptable; consistency within a function is preferred +- Initialize variables only when a meaningful value exists at declaration time +- Use C99 designated initializers for structures + +```c +/* Good - declaration at start of block */ +int ret; +ret = some_function(); + +/* Also good - declaration at point of use (C99 style) */ +for (int i = 0; i < count; i++) + process(i); + +/* Good - declaration in inner block where variable is used */ +if (condition) { + int local_val = compute(); + use(local_val); +} + +/* Bad - unnecessary initialization defeats compiler warnings */ +int ret = 0; +ret = some_function(); /* Compiler won't warn if assignment removed */ +``` + +### Function Format + +- Return type on its own line +- Opening brace on its own line +- Place an empty line between declarations and statements + +```c +static char * +function(int a1, int b1) +{ + char *p; + + p = do_something(a1, b1); + return p; +} +``` + +--- + +## Unnecessary Code Patterns + +The following patterns add unnecessary code, hide bugs, or reduce performance. Avoid them. + +### Unnecessary Variable Initialization + +Do not initialize variables that will be assigned before use. This defeats the compiler's uninitialized variable warnings, hiding potential bugs. + +```c +/* Bad - initialization defeats -Wuninitialized */ +int ret = 0; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - compiler will warn if any path misses assignment */ +int ret; +if (condition) + ret = func_a(); +else + ret = func_b(); + +/* Good - meaningful initial value */ +int count = 0; +for (i = 0; i < n; i++) + if (test(i)) + count++; +``` + +### Unnecessary Casts of void * + +In C, `void *` converts implicitly to any pointer type. Casting the result of `malloc()`, `calloc()`, `rte_malloc()`, or similar functions is unnecessary and can hide the error of a missing `#include <stdlib.h>`. + +```c +/* Bad - unnecessary cast */ +struct foo *p = (struct foo *)malloc(sizeof(*p)); +struct bar *q = (struct bar *)rte_malloc(NULL, sizeof(*q), 0); + +/* Good - no cast needed in C */ +struct foo *p = malloc(sizeof(*p)); +struct bar *q = rte_malloc(NULL, sizeof(*q), 0); +``` + +Note: Casts are required in C++ but DPDK is a C project. + +### Zero-Length Arrays vs Variable-Length Arrays + +Zero-length arrays (`int arr[0]`) are a GCC extension. Use C99 flexible array members instead. + +```c +/* Bad - GCC extension */ +struct msg { + int len; + char data[0]; +}; + +/* Good - C99 flexible array member */ +struct msg { + int len; + char data[]; +}; +``` + +### Unnecessary NULL Checks Before free() + +Functions like `free()`, `rte_free()`, and similar deallocation functions accept NULL pointers safely. Do not add redundant NULL checks. + +```c +/* Bad - unnecessary check */ +if (ptr != NULL) + free(ptr); + +if (rte_ptr != NULL) + rte_free(rte_ptr); + +/* Good - free handles NULL */ +free(ptr); +rte_free(rte_ptr); +``` + +### memset Before free() (CWE-14) + +Do not call `memset()` to zero memory before freeing it. The compiler may optimize away the `memset()` as a dead store (CWE-14: Compiler Removal of Code to Clear Buffers). For security-sensitive data, use `explicit_bzero()`, `rte_memset_sensitive()`, or `rte_free_sensitive()` which the compiler is not permitted to eliminate. + +```c +/* Bad - compiler may eliminate memset */ +memset(secret_key, 0, sizeof(secret_key)); +free(secret_key); + +/* Good - for non-sensitive data, just free */ +free(ptr); + +/* Good - explicit_bzero cannot be optimized away */ +explicit_bzero(secret_key, sizeof(secret_key)); +free(secret_key); + +/* Good - DPDK wrapper for clearing sensitive data */ +rte_memset_sensitive(secret_key, 0, sizeof(secret_key)); +free(secret_key); + +/* Good - for rte_malloc'd sensitive data, combined clear+free */ +rte_free_sensitive(secret_key); +``` + +### Appropriate Use of rte_malloc() + +`rte_malloc()` allocates from hugepage memory. Use it only when required: + +- Memory that will be accessed by DMA (NIC descriptors, packet buffers) +- Memory shared between primary and secondary DPDK processes +- Memory requiring specific NUMA node placement + +For general allocations, use standard `malloc()` which is faster and does not consume limited hugepage resources. + +```c +/* Bad - rte_malloc for ordinary data structure */ +struct config *cfg = rte_malloc(NULL, sizeof(*cfg), 0); + +/* Good - standard malloc for control structures */ +struct config *cfg = malloc(sizeof(*cfg)); + +/* Good - rte_malloc for DMA-accessible memory */ +struct rte_mbuf *mbufs = rte_malloc(NULL, n * sizeof(*mbufs), RTE_CACHE_LINE_SIZE); +``` + +### Appropriate Use of rte_memcpy() + +`rte_memcpy()` is optimized for bulk data transfer in the fast path. For general use, standard `memcpy()` is preferred because: + +- Modern compilers optimize `memcpy()` effectively +- `memcpy()` includes bounds checking with `_FORTIFY_SOURCE` +- `memcpy()` handles small fixed-size copies efficiently + +```c +/* Bad - rte_memcpy in control path */ +rte_memcpy(&config, &default_config, sizeof(config)); + +/* Good - standard memcpy for control path */ +memcpy(&config, &default_config, sizeof(config)); + +/* Good - rte_memcpy for packet data in fast path */ +rte_memcpy(rte_pktmbuf_mtod(m, void *), payload, len); +``` + +### Non-const Function Pointer Arrays + +Arrays of function pointers (ops tables, dispatch tables, callback arrays) +should be declared `const` when their contents are fixed at compile time. +A non-`const` function pointer array can be overwritten by bugs or exploits, +and prevents the compiler from placing the table in read-only memory. + +```c +/* Bad - mutable when it doesn't need to be */ +static rte_rx_burst_t rx_functions[] = { + rx_burst_scalar, + rx_burst_vec_avx2, + rx_burst_vec_avx512, +}; + +/* Good - immutable dispatch table */ +static const rte_rx_burst_t rx_functions[] = { + rx_burst_scalar, + rx_burst_vec_avx2, + rx_burst_vec_avx512, +}; +``` + +**Exceptions** (do NOT flag): +- Arrays modified at runtime for CPU feature detection or capability probing + (e.g., selecting a burst function based on `rte_cpu_get_flag_enabled()`) +- Arrays containing mutable state (e.g., entries that are linked into lists) +- Arrays populated dynamically via registration APIs +- `dev_ops` or similar structures assigned per-device at init time + +Only flag when the array is fully initialized at declaration with constant +values and never modified thereafter. + +--- + +## Forbidden Tokens + +### Functions + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `rte_panic()` | Return error codes | lib/, drivers/ | +| `rte_exit()` | Return error codes | lib/, drivers/ | +| `perror()` | `RTE_LOG()` with `strerror(errno)` | lib/, drivers/ (allowed in examples/, app/test/) | +| `printf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | +| `fprintf()` | `RTE_LOG()` | lib/, drivers/ (allowed in examples/, app/test/) | + +### Atomics and Memory Barriers + +| Forbidden | Preferred | +|-----------|-----------| +| `rte_atomic16/32/64_xxx()` | C11 atomics via `rte_atomic_xxx()` | +| `rte_smp_mb()` | `rte_atomic_thread_fence()` | +| `rte_smp_rmb()` | `rte_atomic_thread_fence()` | +| `rte_smp_wmb()` | `rte_atomic_thread_fence()` | +| `__sync_xxx()` | `rte_atomic_xxx()` | +| `__atomic_xxx()` | `rte_atomic_xxx()` | +| `__ATOMIC_RELAXED` etc. | `rte_memory_order_xxx` | +| `__rte_atomic_thread_fence()` | `rte_atomic_thread_fence()` | + +### Threading + +| Forbidden | Preferred | +|-----------|-----------| +| `pthread_create()` | `rte_thread_create()` | +| `pthread_join()` | `rte_thread_join()` | +| `pthread_detach()` | EAL thread functions | +| `pthread_setaffinity_np()` | `rte_thread_set_affinity()` | +| `rte_thread_set_name()` | `rte_thread_set_prefixed_name()` | +| `rte_thread_create_control()` | `rte_thread_create_internal_control()` | + +### Process-Shared Synchronization + +When placing synchronization primitives in shared memory (memory accessible by multiple processes, such as DPDK primary/secondary processes or `mmap`'d regions), they **must** be initialized with process-shared attributes. Failure to do so causes **undefined behavior** that may appear to work in testing but fail unpredictably in production. + +#### pthread Mutexes in Shared Memory + +**This is an error** - mutex in shared memory without `PTHREAD_PROCESS_SHARED`: + +```c +/* BAD - undefined behavior when used across processes */ +struct shared_data { + pthread_mutex_t lock; + int counter; +}; + +void init_shared(struct shared_data *shm) { + pthread_mutex_init(&shm->lock, NULL); /* ERROR: missing pshared attribute */ +} +``` + +**Correct implementation**: + +```c +/* GOOD - properly initialized for cross-process use */ +struct shared_data { + pthread_mutex_t lock; + int counter; +}; + +int init_shared(struct shared_data *shm) { + pthread_mutexattr_t attr; + int ret; + + ret = pthread_mutexattr_init(&attr); + if (ret != 0) + return -ret; + + ret = pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED); + if (ret != 0) { + pthread_mutexattr_destroy(&attr); + return -ret; + } + + ret = pthread_mutex_init(&shm->lock, &attr); + pthread_mutexattr_destroy(&attr); + + return -ret; +} +``` + +#### pthread Condition Variables in Shared Memory + +Condition variables also require the process-shared attribute: + +```c +/* BAD - will not work correctly across processes */ +pthread_cond_init(&shm->cond, NULL); + +/* GOOD */ +pthread_condattr_t cattr; +pthread_condattr_init(&cattr); +pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED); +pthread_cond_init(&shm->cond, &cattr); +pthread_condattr_destroy(&cattr); +``` + +#### pthread Read-Write Locks in Shared Memory + +```c +/* BAD */ +pthread_rwlock_init(&shm->rwlock, NULL); + +/* GOOD */ +pthread_rwlockattr_t rwattr; +pthread_rwlockattr_init(&rwattr); +pthread_rwlockattr_setpshared(&rwattr, PTHREAD_PROCESS_SHARED); +pthread_rwlock_init(&shm->rwlock, &rwattr); +pthread_rwlockattr_destroy(&rwattr); +``` + +#### When to Flag This Issue + +Flag as an **Error** when ALL of the following are true: +1. A `pthread_mutex_t`, `pthread_cond_t`, `pthread_rwlock_t`, or `pthread_barrier_t` is initialized +2. The primitive is stored in shared memory (identified by context such as: structure in `rte_malloc`/`rte_memzone`, `mmap`'d memory, memory passed to secondary processes, or structures documented as shared) +3. The initialization uses `NULL` attributes or attributes without `PTHREAD_PROCESS_SHARED` + +**Do NOT flag** when: +- The mutex is in thread-local or process-private heap memory (`malloc`) +- The mutex is a local/static variable not in shared memory +- The code already uses `pthread_mutexattr_setpshared()` with `PTHREAD_PROCESS_SHARED` +- The synchronization uses DPDK primitives (`rte_spinlock_t`, `rte_rwlock_t`) which are designed for shared memory + +#### Preferred Alternatives + +For DPDK code, prefer DPDK's own synchronization primitives which are designed for shared memory: + +| pthread Primitive | DPDK Alternative | +|-------------------|------------------| +| `pthread_mutex_t` | `rte_spinlock_t` (busy-wait) or properly initialized pthread mutex | +| `pthread_rwlock_t` | `rte_rwlock_t` | +| `pthread_spinlock_t` | `rte_spinlock_t` | + +Note: `rte_spinlock_t` and `rte_rwlock_t` work correctly in shared memory without special initialization, but they are spinning locks unsuitable for long wait times. + +### Compiler Built-ins and Attributes + +| Forbidden | Preferred | Notes | +|-----------|-----------|-------| +| `__attribute__` | RTE macros in `rte_common.h` | Except in `lib/eal/include/rte_common.h` | +| `__alignof__` | C11 `alignof` | | +| `__typeof__` | `typeof` | | +| `__builtin_*` | EAL macros | Except in `lib/eal/` and `drivers/*/base/` | +| `__reserved` | Different name | Reserved in Windows headers | +| `#pragma` / `_Pragma` | Avoid | Except in `rte_common.h` | + +### Format Specifiers + +| Forbidden | Preferred | +|-----------|-----------| +| `%lld`, `%llu`, `%llx` | `%PRId64`, `%PRIu64`, `%PRIx64` | + +### Headers and Build + +| Forbidden | Preferred | Context | +|-----------|-----------|---------| +| `#include <linux/pci_regs.h>` | `#include <rte_pci.h>` | | +| `install_headers()` | Meson `headers` variable | meson.build | +| `-DALLOW_EXPERIMENTAL_API` | Not in lib/drivers/app | Build flags | +| `allow_experimental_apis` | Not in lib/drivers/app | Meson | +| `#undef XXX` | `// XXX is not set` | config/rte_config.h | +| Driver headers (`*_driver.h`, `*_pmd.h`) | Public API headers | app/, examples/ | + +### Testing + +| Forbidden | Preferred | +|-----------|-----------| +| `REGISTER_TEST_COMMAND` | `REGISTER_<suite_name>_TEST` | + +### Documentation + +| Forbidden | Preferred | +|-----------|-----------| +| `http://...dpdk.org` | `https://...dpdk.org` | +| `//doc.dpdk.org/guides/...` | `:ref:` or `:doc:` Sphinx references | +| `:: file.svg` | `:: file.*` (wildcard extension) | + +--- + +## Deprecated API Usage + +New patches must not introduce usage of deprecated APIs, macros, or functions. +Deprecated items are marked with `RTE_DEPRECATED` or documented in the +deprecation notices section of the release notes. + +### Rules for New Code + +- Do not call functions marked with `RTE_DEPRECATED` or `__rte_deprecated` +- Do not use macros that have been superseded by newer alternatives +- Do not use data structures or enum values marked as deprecated +- Check `doc/guides/rel_notes/deprecation.rst` for planned deprecations +- When a deprecated API has a replacement, use the replacement + +### Deprecating APIs + +A patch may mark an API as deprecated provided: + +- No remaining usages exist in the current DPDK codebase +- The deprecation is documented in the release notes +- A migration path or replacement API is documented +- The `RTE_DEPRECATED` macro is used to generate compiler warnings + +```c +/* Marking a function as deprecated */ +__rte_deprecated +int +rte_old_function(void); + +/* With a message pointing to the replacement */ +__rte_deprecated_msg("use rte_new_function() instead") +int +rte_old_function(void); +``` + +### Common Deprecated Patterns + +| Deprecated | Replacement | Notes | +|-----------|-------------|-------| +| `rte_atomic*_t` types | C11 atomics | Use `rte_atomic_xxx()` wrappers | +| `rte_smp_*mb()` barriers | `rte_atomic_thread_fence()` | See Atomics section | +| `pthread_*()` in portable code | `rte_thread_*()` | See Threading section | + +When reviewing patches that add new code, flag any usage of deprecated APIs +as requiring change to use the modern replacement. + +--- + +## API Tag Requirements + +### `__rte_experimental` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_experimental +int +rte_new_feature(void); + +/* Wrong - not alone on line */ +__rte_experimental int rte_new_feature(void); + +/* Wrong - in .c file */ +``` + +### `__rte_internal` + +- Must appear **alone on the line** immediately preceding the return type +- Only allowed in **header files** (not `.c` files) + +```c +/* Correct */ +__rte_internal +int +internal_function(void); +``` + +### Alignment Attributes + +`__rte_aligned`, `__rte_cache_aligned`, `__rte_cache_min_aligned` may only be used with `struct` or `union` types: + +```c +/* Correct */ +struct __rte_cache_aligned my_struct { + /* ... */ +}; + +/* Wrong */ +int __rte_cache_aligned my_variable; +``` + +### Packed Attributes + +- `__rte_packed_begin` must follow `struct`, `union`, or alignment attributes +- `__rte_packed_begin` and `__rte_packed_end` must be used in pairs +- Cannot use `__rte_packed_begin` with `enum` + +```c +/* Correct */ +struct __rte_packed_begin my_packed_struct { + /* ... */ +} __rte_packed_end; + +/* Wrong - with enum */ +enum __rte_packed_begin my_enum { + /* ... */ +}; +``` + +--- + +## Code Quality Requirements + +### Compilation + +- Each commit must compile independently (for `git bisect`) +- No forward dependencies within a patchset +- Test with multiple targets, compilers, and options +- Use `devtools/test-meson-builds.sh` + +**Note for AI reviewers**: You cannot verify compilation order or cross-patch dependencies from patch review alone. Do NOT flag patches claiming they "would fail to compile" based on symbols used in other patches in the series. Assume the patch author has ordered them correctly. + +### Testing + +- Add tests to `app/test` unit test framework +- New API functions must be used in `/app` test directory +- New device APIs require at least one driver implementation + +#### Functional Test Infrastructure + +Standalone functional tests should use the `TEST_ASSERT` macros and `unit_test_suite_runner` infrastructure for consistency and proper integration with the DPDK test framework. + +```c +#include <rte_test.h> + +static int +test_feature_basic(void) +{ + int ret; + + ret = rte_feature_init(); + TEST_ASSERT_SUCCESS(ret, "Failed to initialize feature"); + + ret = rte_feature_operation(); + TEST_ASSERT_EQUAL(ret, 0, "Operation returned unexpected value"); + + TEST_ASSERT_NOT_NULL(rte_feature_get_ptr(), + "Feature pointer should not be NULL"); + + return TEST_SUCCESS; +} + +static struct unit_test_suite feature_testsuite = { + .suite_name = "feature_autotest", + .setup = test_feature_setup, + .teardown = test_feature_teardown, + .unit_test_cases = { + TEST_CASE(test_feature_basic), + TEST_CASE(test_feature_advanced), + TEST_CASES_END() + } +}; + +static int +test_feature(void) +{ + return unit_test_suite_runner(&feature_testsuite); +} + +REGISTER_FAST_TEST(feature_autotest, NOHUGE_OK, ASAN_OK, test_feature); +``` + +The `REGISTER_FAST_TEST` macro parameters are: +- Test name (e.g., `feature_autotest`) +- `NOHUGE_OK` or `HUGEPAGES_REQUIRED` - whether test can run without hugepages +- `ASAN_OK` or `ASAN_FAILS` - whether test is compatible with Address Sanitizer +- Test function name + +Common `TEST_ASSERT` macros: +- `TEST_ASSERT(cond, msg, ...)` - Assert condition is true +- `TEST_ASSERT_SUCCESS(val, msg, ...)` - Assert value equals 0 +- `TEST_ASSERT_FAIL(val, msg, ...)` - Assert value is non-zero +- `TEST_ASSERT_EQUAL(a, b, msg, ...)` - Assert two values are equal +- `TEST_ASSERT_NOT_EQUAL(a, b, msg, ...)` - Assert two values differ +- `TEST_ASSERT_NULL(val, msg, ...)` - Assert value is NULL +- `TEST_ASSERT_NOT_NULL(val, msg, ...)` - Assert value is not NULL + +### Documentation + +- Add Doxygen comments for public APIs +- Update release notes in `doc/guides/rel_notes/` for important changes +- Code and documentation must be updated atomically in same patch +- Only update the **current release** notes file +- Documentation must match the code +- PMD features must match the features matrix in `doc/guides/nics/features/` +- Documentation must match device operations (see `doc/guides/nics/features.rst` for the mapping between features, `eth_dev_ops`, and related APIs) +- Release notes are NOT required for: + - Test-only changes (unit tests, functional tests) + - Internal APIs and helper functions (not exported to applications) + - Internal implementation changes that don't affect public API + +### RST Documentation Style + +When reviewing `.rst` documentation files, prefer **definition lists** +over simple bullet lists where each item has a term and a description. +Definition lists produce better-structured HTML/PDF output and are +easier to scan. + +**When to suggest a definition list:** +- A bullet list where each item starts with a bold or emphasized term + followed by a dash, colon, or long explanation +- Lists of options, parameters, configuration values, or features + where each entry has a name and a description +- Glossary-style enumerations + +**When a simple list is fine (do NOT flag):** +- Short lists of items without descriptions (e.g., file names, steps) +- Lists where items are single phrases or sentences with no term/definition structure +- Enumerated steps in a procedure + +**RST definition list syntax:** + +```rst +term 1 + Description of term 1. + +term 2 + Description of term 2. + Can span multiple lines. +``` + +**Example — flag this pattern:** + +```rst +* **error** - Fail with error (default) +* **truncate** - Truncate content to fit token limit +* **summary** - Request high-level summary review +``` + +**Suggest rewriting as:** + +```rst +error + Fail with error (default). + +truncate + Truncate content to fit token limit. + +summary + Request high-level summary review. +``` + +This is a **Warning**-level suggestion, not an Error. Do not flag it +when the existing list structure is appropriate (see "when a simple +list is fine" above). + +### API and Driver Changes + +- New APIs must be marked as `__rte_experimental` +- New APIs must have hooks in `app/testpmd` and tests in the functional test suite +- Changes to existing APIs require release notes +- New drivers or subsystems must have release notes +- Internal APIs (used only within DPDK, not exported to applications) do NOT require release notes + +### ABI Compatibility and Symbol Exports + +**IMPORTANT**: DPDK uses automatic symbol map generation. Do **NOT** recommend +manually editing `version.map` files - they are auto-generated from source code +annotations. + +#### Symbol Export Macros + +New public functions must be annotated with export macros (defined in +`rte_export.h`). Place the macro on the line immediately before the function +definition in the `.c` file: + +```c +/* For stable ABI symbols */ +RTE_EXPORT_SYMBOL(rte_foo_create) +int +rte_foo_create(struct rte_foo_config *config) +{ + /* ... */ +} + +/* For experimental symbols (include version when first added) */ +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_foo_new_feature, 25.03) +__rte_experimental +int +rte_foo_new_feature(void) +{ + /* ... */ +} + +/* For internal symbols (shared between DPDK components only) */ +RTE_EXPORT_INTERNAL_SYMBOL(rte_foo_internal_helper) +int +rte_foo_internal_helper(void) +{ + /* ... */ +} +``` + +#### Symbol Export Rules + +- `RTE_EXPORT_SYMBOL` - Use for stable ABI functions +- `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, ver)` - Use for new experimental APIs + (version is the DPDK release, e.g., `25.03`) +- `RTE_EXPORT_INTERNAL_SYMBOL` - Use for functions shared between DPDK libs/drivers + but not part of public API +- Export macros go in `.c` files, not headers +- The build system generates linker version maps automatically + +#### What NOT to Review + +- Do **NOT** flag missing `version.map` updates - maps are auto-generated +- Do **NOT** suggest adding symbols to `lib/*/version.map` files + +#### ABI Versioning for Changed Functions + +When changing the signature of an existing stable function, use versioning macros +from `rte_function_versioning.h`: + +- `RTE_VERSION_SYMBOL` - Create versioned symbol for backward compatibility +- `RTE_DEFAULT_SYMBOL` - Mark the new default version + +Follow ABI policy and versioning guidelines in the contributor documentation. +Enable ABI checks with `DPDK_ABI_REF_VERSION` environment variable. + +--- + +## LTS (Long Term Stable) Release Review + +LTS releases are DPDK versions ending in `.11` (e.g., 23.11, 22.11, +21.11, 20.11, 19.11). When reviewing patches targeting an LTS branch, +apply stricter criteria: + +### LTS-Specific Rules + +- **Only bug fixes allowed** -- no new features +- **No new APIs** (experimental or stable) +- **ABI must remain unchanged** -- no symbol additions, removals, + or signature changes +- Backported fixes should reference the original commit with a + `Fixes:` tag +- Copyright years should reflect when the code was originally + written +- Be conservative: reject changes that are not clearly bug fixes + +### What to Flag on LTS Branches + +**Error:** +- New feature code (new functions, new driver capabilities) +- New experimental or stable API additions +- ABI changes (new or removed symbols, changed function signatures) +- Changes that add new configuration options or parameters + +**Warning:** +- Large refactoring that goes beyond what is needed for a fix +- Missing `Fixes:` tag on a backported bug fix +- Missing `Cc: stable@dpdk.org` + +### When LTS Rules Apply + +LTS rules apply when the reviewer is told the target release is an +LTS version (via the `--release` option or equivalent). If no +release is specified, assume the patch targets the main development +branch where new features and APIs are allowed. + +--- + +## Patch Validation Checklist + +### Commit Message + +- [ ] Subject line <=60 characters +- [ ] Subject is lowercase (except acronyms from words-case.txt) +- [ ] Correct component prefix (e.g., `net/ixgbe:` not `ixgbe:`) +- [ ] No `lib/` prefix for libraries +- [ ] Imperative mood, no trailing period +- [ ] No tabs, leading/trailing spaces, or punctuation marks +- [ ] Body wrapped at 75 characters +- [ ] Body does not start with "It" +- [ ] `Signed-off-by:` present with real name and valid email +- [ ] `Fixes:` tag present for bug fixes with 12-char SHA and exact subject +- [ ] `Coverity issue:` tag present if Coverity mentioned +- [ ] `Bugzilla ID:` tag present if Bugzilla mentioned +- [ ] `Cc: stable@dpdk.org` for stable backport candidates +- [ ] Tags in correct order with blank line separator + +### License + +- [ ] SPDX identifier on first line (or second for scripts) +- [ ] Copyright line follows SPDX +- [ ] Blank line after copyright before code +- [ ] Appropriate license for file type + +### Code Style + +- [ ] Lines <=100 characters +- [ ] Hard tabs for indentation, spaces for alignment +- [ ] No trailing whitespace +- [ ] Proper include order +- [ ] Header guards present +- [ ] `rte_`/`RTE_` prefix on external symbols +- [ ] Driver/library global variables use unique prefixes (e.g., `virtio_`, `mlx5_`) +- [ ] No prohibited terminology +- [ ] Proper brace style +- [ ] Function return type on own line +- [ ] Explicit comparisons: `== NULL`, `== 0`, `!= NULL`, `!= 0` +- [ ] No forbidden tokens (see table above) +- [ ] No unnecessary code patterns (see section above) +- [ ] No usage of deprecated APIs, macros, or functions +- [ ] Process-shared primitives in shared memory use `PTHREAD_PROCESS_SHARED` +- [ ] `mmap()` return checked against `MAP_FAILED`, not `NULL` +- [ ] Statistics use `+=` not `=` for accumulation +- [ ] Integer multiplies widened before operation when result is 64-bit +- [ ] Descriptor chain traversals bounded by ring size or loop counter +- [ ] 64-bit bitmasks use `1ULL <<` or `RTE_BIT64()`, not `1 <<` +- [ ] No unconditional variable overwrites before read +- [ ] Nested loops use distinct counter variables +- [ ] No `memcpy`/`memcmp` with identical source and destination pointers +- [ ] `rte_pktmbuf_free_bulk()` not used on mixed-pool mbuf arrays (Tx paths, ring dequeue, error paths) +- [ ] Static function pointer arrays declared `const` when contents are compile-time fixed +- [ ] Sensitive data cleared with `explicit_bzero()`/`rte_free_sensitive()`, not `memset()` + +### API Tags + +- [ ] `__rte_experimental` alone on line, only in headers +- [ ] `__rte_internal` alone on line, only in headers +- [ ] Alignment attributes only on struct/union +- [ ] Packed attributes properly paired +- [ ] New public functions have `RTE_EXPORT_*` macro in `.c` file +- [ ] Experimental functions use `RTE_EXPORT_EXPERIMENTAL_SYMBOL(name, version)` + +### Structure + +- [ ] Each commit compiles independently +- [ ] Code and docs updated together +- [ ] Documentation matches code behavior +- [ ] RST docs use definition lists for term/description patterns +- [ ] PMD features match `doc/guides/nics/features/` matrix +- [ ] Device operations match documentation (per `features.rst` mappings) +- [ ] Tests added/updated as needed +- [ ] Functional tests use TEST_ASSERT macros and unit_test_suite_runner +- [ ] New APIs marked as `__rte_experimental` +- [ ] New APIs have testpmd hooks and functional tests +- [ ] Current release notes updated for significant changes +- [ ] Release notes updated for API changes +- [ ] Release notes updated for new drivers or subsystems + +--- + +## Meson Build Files + +### Style Requirements + +- 4-space indentation (no tabs) +- Line continuations double-indented +- Lists alphabetically ordered +- Short lists (<=3 items): single line, no trailing comma +- Long lists: one item per line, trailing comma on last item +- No strict line length limit for meson files; lines under 100 characters are acceptable + +```python +# Short list +sources = files('file1.c', 'file2.c') + +# Long list +headers = files( + 'header1.h', + 'header2.h', + 'header3.h', +) +``` + +--- + +## Python Code + +- Must comply with formatting standards +- Use **`black`** for code formatting validation +- Line length acceptable up to 100 characters + +--- + +## Validation Tools + +Run these before submitting: + +```bash +# Check commit messages +devtools/check-git-log.sh -n1 + +# Check patch format and forbidden tokens +devtools/checkpatches.sh -n1 + +# Check maintainers coverage +devtools/check-maintainers.sh + +# Build validation +devtools/test-meson-builds.sh + +# Find maintainers for your patch +devtools/get-maintainer.sh <patch-file> +``` + +--- + +## Severity Levels for AI Review + +**Error** (must fix): + +*Correctness bugs (highest value findings):* +- Use-after-free +- Resource leaks on error paths (memory, file descriptors, locks) +- Double-free or double-close +- NULL pointer dereference on reachable code path +- Buffer overflow or out-of-bounds access +- Missing error check on a function that can fail, leading to undefined behavior +- Race condition on shared mutable state without synchronization +- Error path that skips necessary cleanup +- `mmap()` return value checked against NULL instead of `MAP_FAILED` +- Statistics accumulation using `=` instead of `+=` (overwrite vs increment) +- Integer multiply without widening cast losing upper bits (16×16, 32×32, etc.) +- Unbounded descriptor chain traversal on guest/API-supplied indices +- `1 << n` used for 64-bit bitmask (undefined behavior if n >= 32) +- Variable assigned then unconditionally overwritten before read +- Same variable used as counter in nested loops +- `memcpy`/`memcmp` with same pointer as both arguments (UB or no-op logic error) +- `rte_pktmbuf_free_bulk()` on mbuf array where mbufs may come from different pools (Tx burst, ring dequeue) + +*Process and format errors:* +- Missing or malformed SPDX license +- Missing Signed-off-by +- Subject line over 60 characters +- Body lines over 75 characters +- Wrong tag order or format +- Missing required tags (Fixes, Coverity issue, Bugzilla ID) +- Forbidden tokens in code +- `__rte_experimental`/`__rte_internal` in .c files or not alone on line +- Compilation failures +- ABI breaks without proper versioning +- pthread mutex/cond/rwlock in shared memory without `PTHREAD_PROCESS_SHARED` + +*API design errors (new libraries only):* +- Ops/callback struct with 20+ function pointers in an installed header +- Callback struct members with no Doxygen documentation +- Void-returning callbacks for failable operations (errors silently swallowed) + +**Warning** (should fix): +- Subject line style issues (case, punctuation) +- Wrong component prefix +- Missing Cc: stable@dpdk.org for fixes +- Documentation gaps +- Documentation does not match code behavior +- PMD features missing from `doc/guides/nics/features/` matrix +- Device operations not documented per `features.rst` mappings +- Missing tests +- Functional tests not using TEST_ASSERT macros or unit_test_suite_runner +- New API not marked as `__rte_experimental` +- New API without testpmd hooks or functional tests +- New public function missing `RTE_EXPORT_*` macro +- API changes without release notes +- New drivers or subsystems without release notes +- Implicit comparisons (`!ptr` instead of `ptr == NULL`) +- Unnecessary variable initialization +- Unnecessary casts of `void *` +- Unnecessary NULL checks before free +- Inappropriate use of `rte_malloc()` or `rte_memcpy()` +- Use of `perror()`, `printf()`, `fprintf()` in libraries or drivers (allowed in examples and test code) +- Driver/library global variables without unique prefixes (static linking clash risk) +- Usage of deprecated APIs, macros, or functions in new code +- RST documentation using bullet lists where definition lists would be more appropriate +- Ops/callback struct with >5 function pointers in an installed header (ABI risk) +- New API using fixed enum+union where TLV pattern would be more extensible +- Installed header labeled "private" or "internal" in meson.build +- New library using global singleton instead of handle-based API +- Static function pointer array not declared `const` when contents are compile-time constant + +**Do NOT flag** (common false positives): +- Missing `version.map` updates (maps are auto-generated from `RTE_EXPORT_*` macros) +- Suggesting manual edits to any `version.map` file +- Copyright years within valid range (2013 to current year) +- Copyright held by individuals (never speculate about employers, company policies, or who "should" hold copyright) +- SPDX/copyright format that matches the standard DPDK format (do not call it "different" or "unusual") +- Meson file lines under 100 characters +- Case-sensitive term violations in code comments (words-case.txt applies to commit messages only) +- Comparisons using `== 0`, `!= 0`, `== NULL`, `!= NULL` as "implicit" (these ARE explicit) +- Comparisons wrapped in `likely()` or `unlikely()` macros - these are still explicit if using == or != +- Anything you determine is correct (do not mention non-issues or say "No issue here") +- `REGISTER_FAST_TEST` using `NOHUGE_OK`/`ASAN_OK` macros (this is the correct current format) +- Tag ordering: Signed-off-by directly after commit body (when no Fixes/Cc tags) is CORRECT +- Tag ordering: no blank line between Fixes/Cc, blank line before Signed-off-by (when Fixes/Cc present) is CORRECT +- Missing release notes for test-only changes (unit tests do not require release notes) +- Missing release notes for internal APIs or helper functions (only public APIs need release notes) +- Subject lines that are within the 60 character limit (count first, do not guess) +- Any item you later correct with "(Correction: ...)" or "actually acceptable" - just omit it +- Vague concerns ("should be verified", "should be checked") - if you're not sure it's wrong, don't flag it +- Items where you say "which is correct" or "this is correct" - if it's correct, don't mention it at all +- Fixes tags with correct format (12-char SHA, quoted subject) - NEVER ask for verification that commit exists +- Items where you conclude "no issue here" or "this is actually correct" - omit these entirely +- Clean patches in a series - do not include a patch just to say "no issues" or describe what it does +- Cross-patch compilation dependencies - you cannot determine patch ordering correctness from review +- Claims that a symbol "was removed in patch N" causing issues in patch M - assume author ordered correctly +- Any speculation about whether patches will compile when applied in sequence +- Mutexes/locks in process-private memory (standard `malloc`, stack, static non-shared) - these don't need `PTHREAD_PROCESS_SHARED` +- Use of `rte_spinlock_t` or `rte_rwlock_t` in shared memory (these work correctly without special init) + +**Info** (consider): +- Minor style preferences +- Optimization suggestions +- Alternative approaches + +--- + +# Response Format + +When you identify an issue: +1. **State the problem** (1 sentence) +2. **Why it matters** (1 sentence, only if not obvious) +3. **Suggested fix** (code snippet or specific action) + +Example: +This could panic if the string is NULL. + +--- + +## FINAL CHECK BEFORE SUBMITTING REVIEW + +Before outputting your review, do two separate passes: + +### Pass 1: Verify correctness bugs are included + +Ask: "Did I trace every error path for resource leaks? Did I check +for use-after-free? Did I verify error codes are propagated?" + +If you identified a potential correctness bug but talked yourself +out of it, **add it back**. It is better to report a possible bug +than to miss a real one. + +### Pass 2: Remove style/process false positives + +For EACH style/process item, ask: "Did I conclude this is actually +fine/correct/acceptable/no issue?" + +If YES, DELETE THAT ITEM. It should not be in your output. + +An item that says "X is wrong... actually this is correct" is a +FALSE POSITIVE and must be removed. This applies to style, format, +and process items only. + +**If your Errors section would be empty after this check, that's +fine -- it means the patches are good.** -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v10 2/6] devtools: add multi-provider AI patch review script 2026-02-19 17:48 ` [PATCH v10 0/6] AI-assisted code and documentation review Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger @ 2026-02-19 17:48 ` Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 3/6] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger ` (3 subsequent siblings) 5 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-19 17:48 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger This is an AI generated script to review DPDK patches against the AGENTS.md coding guidelines using AI language models. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini The script reads a patch file and the AGENTS.md guidelines, then submits them to the selected AI provider for review. Results are organized by severity level (Error, Warning, Info) as defined in the guidelines. Features: - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Verbose mode shows token usage statistics - Uses temporary files for API requests to handle large patches - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/analyze-patch.py 0001-net-ixgbe-fix-something.patch ./devtools/analyze-patch.py -p xai my-patch.patch ./devtools/analyze-patch.py -l # list providers Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/analyze-patch.py | 1334 +++++++++++++++++++++++++++++++++++++ 1 file changed, 1334 insertions(+) create mode 100755 devtools/analyze-patch.py diff --git a/devtools/analyze-patch.py b/devtools/analyze-patch.py new file mode 100755 index 0000000000..c77908fb3c --- /dev/null +++ b/devtools/analyze-patch.py @@ -0,0 +1,1334 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Analyze DPDK patches using AI providers. + +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import json +import os +import re +import subprocess +import sys +import tempfile +from datetime import date +from email.message import EmailMessage +from pathlib import Path +from typing import Iterator +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Large file handling modes +LARGE_FILE_MODES = ["error", "truncate", "chunk", "commits-only", "summary"] + +# Approximate tokens per character (conservative estimate for code) +CHARS_PER_TOKEN = 3.5 + +# Default token limits by provider (leaving room for system prompt and response) +PROVIDER_INPUT_LIMITS = { + "anthropic": 180000, # 200K context, reserve for system/response + "openai": 115000, # 128K context for gpt-4o + "xai": 115000, # Assume similar to OpenAI + "google": 900000, # Gemini has 1M context +} + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# LTS releases: any DPDK release with minor version .11 +# (e.g., 19.11, 20.11, 21.11, 22.11, 23.11, 24.11, 25.11, ...) + +SYSTEM_PROMPT_BASE = """\ +You are an expert DPDK code reviewer. Analyze patches for compliance with \ +DPDK coding standards and contribution guidelines. Provide clear, actionable \ +feedback organized by severity (Error, Warning, Info) as defined in the \ +guidelines.""" + +LTS_RULES = """ +LTS (Long Term Stable) branch rules apply: +- Only bug fixes allowed, no new features +- No new APIs (experimental or stable) +- ABI must remain unchanged +- Backported fixes should reference the original commit with Fixes: tag +- Copyright years should reflect when the code was originally written +- Be conservative: reject changes that aren't clearly bug fixes""" + +FORMAT_INSTRUCTIONS = { + "text": """Provide your review in plain text format.""", + "markdown": """Provide your review in Markdown format with: +- Headers (##) for each severity level (Errors, Warnings, Info) +- Bullet points for individual issues +- Code blocks (```) for code references +- Bold (**) for emphasis on key points""", + "html": """Provide your review in HTML format with: +- <h2> tags for each severity level (Errors, Warnings, Info) +- <ul>/<li> for individual issues +- <pre><code> for code references +- <strong> for emphasis on key points +- Use appropriate semantic HTML tags +- Do NOT include <html>, <head>, or <body> tags - just the content""", + "json": """Provide your review in JSON format with this structure: +{ + "summary": "Brief one-line summary of the review", + "errors": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "warnings": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "info": [ + {"issue": "description", "location": "file:line", "suggestion": "fix"} + ], + "passed_checks": ["list of checks that passed"], + "overall_status": "PASS|WARN|FAIL" +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """Please review the following DPDK patch file '{patch_name}' \ +against the AGENTS.md guidelines. Check for: + +1. Commit message format (subject line, body, tags) +2. License/copyright compliance +3. C coding style issues +4. API and documentation requirements +5. Any other guideline violations + +{format_instruction} + +--- PATCH CONTENT --- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def is_lts_release(release): + """Check if a release is an LTS release. + + Per DPDK project guidelines, any release with minor version .11 + is an LTS release (e.g., 19.11, 21.11, 23.11, 24.11, 25.11). + """ + if not release: + return False + # Check for explicit -lts suffix + if "-lts" in release.lower(): + return True + # Extract base version (e.g., "23.11" from "23.11.1" or "23.11-rc1") + version = release.split("-")[0] + parts = version.split(".") + if len(parts) >= 2: + try: + minor = int(parts[1]) + return minor == 11 + except ValueError: + pass + return False + + +def estimate_tokens(text): + """Estimate token count from text length.""" + return int(len(text) / CHARS_PER_TOKEN) + + +def split_mbox_patches(content): + """Split an mbox file into individual patches.""" + patches = [] + current_patch = [] + in_patch = False + + for line in content.split("\n"): + # Detect start of new message in mbox format + if line.startswith("From ") and ( + " Mon " in line + or " Tue " in line + or " Wed " in line + or " Thu " in line + or " Fri " in line + or " Sat " in line + or " Sun " in line + ): + if current_patch: + patches.append("\n".join(current_patch)) + current_patch = [line] + in_patch = True + elif in_patch: + current_patch.append(line) + + # Don't forget the last patch + if current_patch: + patches.append("\n".join(current_patch)) + + return patches if patches else [content] + + +def extract_commit_messages(content): + """Extract only commit messages from patch content.""" + patches = split_mbox_patches(content) + messages = [] + + for patch in patches: + lines = patch.split("\n") + msg_lines = [] + in_headers = True + in_body = False + found_subject = False + + for line in lines: + # Collect headers we care about + if in_headers: + if line.startswith("Subject:"): + msg_lines.append(line) + found_subject = True + elif line.startswith(("From:", "Date:")): + msg_lines.append(line) + elif line.startswith((" ", "\t")) and found_subject: + # Subject continuation + msg_lines.append(line) + elif line == "": + if found_subject: + in_headers = False + in_body = True + msg_lines.append("") + elif in_body: + # Stop at the diff + if line.startswith("---") and not line.startswith("----"): + break + if line.startswith("diff --git"): + break + msg_lines.append(line) + + if msg_lines: + messages.append("\n".join(msg_lines)) + + return "\n\n---\n\n".join(messages) + + +def truncate_content(content, max_tokens, provider): + """Truncate content to fit within token limit.""" + max_chars = int(max_tokens * CHARS_PER_TOKEN) + + if len(content) <= max_chars: + return content, False + + # Try to truncate at a reasonable boundary + truncated = content[:max_chars] + + # Find last complete diff hunk or patch boundary + last_diff = truncated.rfind("\ndiff --git") + last_patch = truncated.rfind("\nFrom ") + + if last_diff > max_chars * 0.5: + truncated = truncated[:last_diff] + elif last_patch > max_chars * 0.5: + truncated = truncated[:last_patch] + + truncated += "\n\n[... Content truncated due to size limits ...]\n" + return truncated, True + + +def chunk_content(content, max_tokens, provider) -> Iterator[tuple[str, int, int]]: + """Split content into chunks that fit within token limit. + + Yields tuples of (chunk_content, chunk_number, total_chunks). + """ + patches = split_mbox_patches(content) + + if len(patches) == 1: + # Single large patch - split by diff sections + yield from chunk_single_patch(content, max_tokens) + return + + # Multiple patches - group them to fit within limits + chunks = [] + current_chunk = [] + current_size = 0 + max_chars = int(max_tokens * CHARS_PER_TOKEN * 0.9) # 90% to leave margin + + for patch in patches: + patch_size = len(patch) + if current_size + patch_size > max_chars and current_chunk: + chunks.append("\n".join(current_chunk)) + current_chunk = [] + current_size = 0 + + if patch_size > max_chars: + # Single patch too large, truncate it + if current_chunk: + chunks.append("\n".join(current_chunk)) + current_chunk = [] + current_size = 0 + truncated, _ = truncate_content(patch, max_tokens * 0.9, provider) + chunks.append(truncated) + else: + current_chunk.append(patch) + current_size += patch_size + + if current_chunk: + chunks.append("\n".join(current_chunk)) + + total = len(chunks) + for i, chunk in enumerate(chunks, 1): + yield chunk, i, total + + +def chunk_single_patch(content, max_tokens) -> Iterator[tuple[str, int, int]]: + """Split a single large patch by diff sections.""" + max_chars = int(max_tokens * CHARS_PER_TOKEN * 0.9) + + # Extract header (everything before first diff) + first_diff = content.find("\ndiff --git") + if first_diff == -1: + # No diff sections, just truncate + truncated, _ = truncate_content(content, max_tokens * 0.9, "anthropic") + yield truncated, 1, 1 + return + + header = content[: first_diff + 1] + diff_content = content[first_diff + 1 :] + + # Split by diff sections + diffs = [] + current_diff = [] + for line in diff_content.split("\n"): + if line.startswith("diff --git") and current_diff: + diffs.append("\n".join(current_diff)) + current_diff = [] + current_diff.append(line) + if current_diff: + diffs.append("\n".join(current_diff)) + + # Group diffs into chunks + chunks = [] + current_chunk_diffs = [] + current_size = len(header) + + for diff in diffs: + diff_size = len(diff) + if current_size + diff_size > max_chars and current_chunk_diffs: + chunks.append(header + "\n".join(current_chunk_diffs)) + current_chunk_diffs = [] + current_size = len(header) + + if diff_size + len(header) > max_chars: + # Single diff too large + if current_chunk_diffs: + chunks.append(header + "\n".join(current_chunk_diffs)) + current_chunk_diffs = [] + truncated_diff = diff[: max_chars - len(header) - 100] + truncated_diff += "\n[... diff truncated ...]\n" + chunks.append(header + truncated_diff) + current_size = len(header) + else: + current_chunk_diffs.append(diff) + current_size += diff_size + + if current_chunk_diffs: + chunks.append(header + "\n".join(current_chunk_diffs)) + + total = len(chunks) + for i, chunk in enumerate(chunks, 1): + yield chunk, i, total + + +def get_summary_prompt(): + """Get prompt modifications for summary mode.""" + return """ +NOTE: This is a LARGE patch series. Provide a HIGH-LEVEL summary review only: +- Focus on overall architecture and design concerns +- Check commit message formatting across the series +- Identify any obvious policy violations +- Do NOT attempt detailed line-by-line code review +- Summarize the scope and purpose of the changes +""" + + +def format_combined_reviews(reviews, output_format, patch_name): + """Combine multiple chunk/patch reviews into a single output.""" + if output_format == "json": + combined = { + "patch_file": patch_name, + "sections": [ + {"label": label, "review": review} for label, review in reviews + ], + } + return json.dumps(combined, indent=2) + elif output_format == "html": + sections = [] + for label, review in reviews: + sections.append(f"<h2>{label}</h2>\n{review}") + return "\n<hr>\n".join(sections) + elif output_format == "markdown": + sections = [] + for label, review in reviews: + sections.append(f"## {label}\n\n{review}") + return "\n\n---\n\n".join(sections) + else: # text + sections = [] + for label, review in reviews: + sections.append(f"=== {label} ===\n\n{review}") + return "\n\n" + "=" * 60 + "\n\n".join(sections) + + +def build_system_prompt(review_date, release): + """Build system prompt with date and release context.""" + prompt = SYSTEM_PROMPT_BASE + prompt += f"\n\nCurrent date: {review_date}." + + if release: + prompt += f"\nTarget DPDK release: {release}." + if is_lts_release(release): + prompt += LTS_RULES + else: + prompt += "\nThis is a main branch or standard release." + prompt += "\nNew features and experimental APIs are allowed." + + return prompt + + +def build_anthropic_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": system_prompt}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + patch_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": system_prompt}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + patch_content, + }, + ], + } + + +def build_google_request( + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + user_prompt = USER_PROMPT.format( + patch_name=patch_name, format_instruction=format_instruction + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": system_prompt}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + patch_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format="text", + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + output_format, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: {usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: {usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def get_last_message_id(patch_content): + """Extract Message-ID from the last patch in an mbox.""" + msg_ids = re.findall( + r"^Message-I[Dd]:\s*(.+)$", patch_content, re.MULTILINE | re.IGNORECASE + ) + if msg_ids: + msg_id = msg_ids[-1].strip() + # Normalize: remove < > and add them back + msg_id = msg_id.strip("<>") + return f"<{msg_id}>" + return None + + +def get_last_subject(patch_content): + """Extract subject from the last patch in an mbox.""" + # Find all Subject lines with potential continuations + subjects = [] + lines = patch_content.split("\n") + i = 0 + while i < len(lines): + if lines[i].lower().startswith("subject:"): + subject = lines[i][8:].strip() + i += 1 + # Handle continuation lines + while i < len(lines) and lines[i].startswith((" ", "\t")): + subject += lines[i].strip() + i += 1 + subjects.append(subject) + else: + i += 1 + return subjects[-1] if subjects else None + + +def send_email( + to_addrs, cc_addrs, from_addr, subject, in_reply_to, body, dry_run=False +): + """Send review email using git send-email, sendmail, or msmtp.""" + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + email_text = msg.as_string() + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(email_text, file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Write to temp file for git send-email + with tempfile.NamedTemporaryFile(mode="w", suffix=".eml", delete=False) as f: + f.write(email_text) + temp_file = f.name + + try: + # Try git send-email first + if get_git_config("sendemail.smtpserver"): + # Build command with all arguments + flat_cmd = ["git", "send-email", "--confirm=never", "--quiet"] + for addr in to_addrs: + flat_cmd.extend(["--to", addr]) + for addr in cc_addrs: + flat_cmd.extend(["--cc", addr]) + if from_addr: + flat_cmd.extend(["--from", from_addr]) + if in_reply_to: + flat_cmd.extend(["--in-reply-to", in_reply_to]) + flat_cmd.append(temp_file) + + try: + subprocess.run(flat_cmd, check=True, capture_output=True) + print("Email sent via git send-email", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + # Try sendmail + try: + subprocess.run( + ["sendmail", "-t"], + input=email_text, + text=True, + capture_output=True, + check=True, + ) + print("Email sent via sendmail", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + # Try msmtp + try: + subprocess.run( + ["msmtp", "-t"], + input=email_text, + text=True, + capture_output=True, + check=True, + ) + print("Email sent via msmtp", file=sys.stderr) + return True + except (subprocess.CalledProcessError, FileNotFoundError): + pass + + error("Could not send email. Configure git send-email, sendmail, or msmtp.") + + finally: + os.unlink(temp_file) + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Analyze DPDK patches using AI providers", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s patch.patch # Review with default settings + %(prog)s -p openai my-patch.patch # Use OpenAI ChatGPT + %(prog)s -f markdown patch.patch # Output as Markdown + %(prog)s -f json -o review.json patch.patch # Save JSON to file + %(prog)s -f html -o review.html patch.patch # Save HTML to file + %(prog)s -r 24.11 patch.patch # Review for specific release + %(prog)s -r 24.11-lts patch.patch # Review for LTS branch + %(prog)s --send-email --to dev@dpdk.org series.mbox + %(prog)s --send-email --to dev@dpdk.org --dry-run series.mbox + +Large File Handling: + %(prog)s --split-patches series.mbox # Review each patch separately + %(prog)s --split-patches --patch-range 1-5 series.mbox # Review patches 1-5 + %(prog)s --large-file=truncate patch.mbox # Truncate to fit limit + %(prog)s --large-file=commits-only series.mbox # Review commit messages only + %(prog)s --large-file=summary series.mbox # High-level summary only + %(prog)s --large-file=chunk series.mbox # Split and review in chunks + +Large File Modes: + error - Fail with error (default) + truncate - Truncate content to fit token limit + chunk - Split into chunks and review each + commits-only - Extract and review only commit messages + summary - Request high-level summary review + +LTS Releases: + Use -r/--release with LTS version (e.g., 24.11-lts, 23.11) to enable + stricter review rules: bug fixes only, no new features or APIs. + Any DPDK release with minor version .11 is an LTS release. + """, + ) + + parser.add_argument("patch_file", nargs="?", help="Patch file to analyze") + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=4096, + help="Max tokens for response (default: 4096)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-o", + "--output", + metavar="FILE", + help="Write output to file instead of stdout", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Date and release options + parser.add_argument( + "-D", + "--date", + metavar="YYYY-MM-DD", + help="Review date context (default: today)", + ) + parser.add_argument( + "-r", + "--release", + metavar="VERSION", + help="Target DPDK release (e.g., 24.11, 23.11-lts)", + ) + + # Large file handling options + large_group = parser.add_argument_group("Large File Handling") + large_group.add_argument( + "--large-file", + choices=LARGE_FILE_MODES, + default="error", + metavar="MODE", + help="How to handle large files: error (default), truncate, " + "chunk, commits-only, summary", + ) + large_group.add_argument( + "--max-tokens", + type=int, + metavar="N", + help="Max input tokens (default: provider-specific)", + ) + large_group.add_argument( + "--split-patches", + action="store_true", + help="Split mbox into individual patches and review each separately", + ) + large_group.add_argument( + "--patch-range", + metavar="N-M", + help="Review only patches N through M (1-indexed, use with --split-patches)", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Check patch file is provided + if not args.patch_file: + parser.error("patch_file is required") + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + patch_path = Path(args.patch_file) + if not patch_path.exists(): + error(f"Patch file not found: {args.patch_file}") + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Determine review date + review_date = args.date or date.today().isoformat() + + # Build system prompt with date and release context + system_prompt = build_system_prompt(review_date, args.release) + + # Read files + agents_content = agents_path.read_text() + patch_content = patch_path.read_text() + patch_name = patch_path.name + + # Determine max tokens for this provider + max_input_tokens = args.max_tokens or PROVIDER_INPUT_LIMITS.get( + args.provider, 100000 + ) + + # Estimate token count + estimated_tokens = estimate_tokens(patch_content + agents_content) + + # Parse patch range if specified + patch_start, patch_end = None, None + if args.patch_range: + try: + if "-" in args.patch_range: + start, end = args.patch_range.split("-", 1) + patch_start = int(start) + patch_end = int(end) + else: + patch_start = patch_end = int(args.patch_range) + except ValueError: + error(f"Invalid --patch-range format: {args.patch_range}") + + # Handle --split-patches mode + if args.split_patches: + patches = split_mbox_patches(patch_content) + total_patches = len(patches) + + if total_patches == 1: + print( + f"Note: Only 1 patch found in mbox, --split-patches has no effect", + file=sys.stderr, + ) + else: + print( + f"Found {total_patches} patches in mbox", + file=sys.stderr, + ) + + # Apply patch range filter + if patch_start is not None: + if patch_start < 1 or patch_start > total_patches: + error( + f"Patch range start {patch_start} out of range (1-{total_patches})" + ) + if patch_end < patch_start or patch_end > total_patches: + error( + f"Patch range end {patch_end} out of range ({patch_start}-{total_patches})" + ) + patches = patches[patch_start - 1 : patch_end] + print( + f"Reviewing patches {patch_start}-{patch_end} ({len(patches)} patches)", + file=sys.stderr, + ) + + # Review each patch separately + all_reviews = [] + for i, patch in enumerate(patches, patch_start or 1): + patch_label = f"Patch {i}/{total_patches}" + print(f"\nReviewing {patch_label}...", file=sys.stderr) + + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + patch, + f"{patch_name} ({patch_label})", + args.output_format, + args.verbose, + ) + all_reviews.append((patch_label, review_text)) + + # Combine reviews + review_text = format_combined_reviews( + all_reviews, args.output_format, patch_name + ) + + # Skip the normal API call + estimated_tokens = 0 # Bypass size check since we've already processed + + # Check if content is too large + is_large = estimated_tokens > max_input_tokens + + if is_large: + print( + f"Warning: Estimated {estimated_tokens:,} tokens exceeds limit of " + f"{max_input_tokens:,}", + file=sys.stderr, + ) + + if args.large_file == "error": + error( + f"Patch file too large ({estimated_tokens:,} tokens). " + f"Use --large-file=truncate|chunk|commits-only|summary to handle, " + f"or --split-patches to review patches individually." + ) + elif args.large_file == "truncate": + print("Truncating content to fit token limit...", file=sys.stderr) + patch_content, was_truncated = truncate_content( + patch_content, max_input_tokens, args.provider + ) + if was_truncated: + print("Content was truncated.", file=sys.stderr) + elif args.large_file == "commits-only": + print("Extracting commit messages only...", file=sys.stderr) + patch_content = extract_commit_messages(patch_content) + new_estimate = estimate_tokens(patch_content + agents_content) + print( + f"Reduced to ~{new_estimate:,} tokens (commit messages only)", + file=sys.stderr, + ) + if new_estimate > max_input_tokens: + patch_content, _ = truncate_content( + patch_content, max_input_tokens, args.provider + ) + elif args.large_file == "summary": + print("Using summary mode for large patch...", file=sys.stderr) + system_prompt += get_summary_prompt() + patch_content, _ = truncate_content( + patch_content, max_input_tokens, args.provider + ) + elif args.large_file == "chunk": + print("Processing in chunks...", file=sys.stderr) + all_reviews = [] + for chunk, chunk_num, total_chunks in chunk_content( + patch_content, max_input_tokens, args.provider + ): + chunk_label = f"Chunk {chunk_num}/{total_chunks}" + print(f"Reviewing {chunk_label}...", file=sys.stderr) + + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + chunk, + f"{patch_name} ({chunk_label})", + args.output_format, + args.verbose, + ) + all_reviews.append((chunk_label, review_text)) + + # Combine chunk reviews + review_text = format_combined_reviews( + all_reviews, args.output_format, patch_name + ) + + # Skip the normal single API call below + estimated_tokens = 0 + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Review date: {review_date}", file=sys.stderr) + if args.release: + lts_status = " (LTS)" if is_lts_release(args.release) else "" + print(f"Target release: {args.release}{lts_status}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Patch file: {args.patch_file}", file=sys.stderr) + print(f"Estimated tokens: {estimated_tokens:,}", file=sys.stderr) + print(f"Max input tokens: {max_input_tokens:,}", file=sys.stderr) + if args.large_file != "error": + print(f"Large file mode: {args.large_file}", file=sys.stderr) + if args.split_patches: + print("Split patches: yes", file=sys.stderr) + if args.output: + print(f"Output file: {args.output}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API (unless already processed via chunks/split) + if estimated_tokens > 0: # Not already processed + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + system_prompt, + agents_content, + patch_content, + patch_name, + args.output_format, + args.verbose, + ) + + if not review_text: + error(f"No response received from {args.provider}") + + # Format output based on requested format + provider_name = config["name"] + + if args.output_format == "json": + # For JSON, try to parse and add metadata + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + # If AI didn't return valid JSON, wrap the text + review_data = {"raw_review": review_text} + + output_data = { + "metadata": { + "patch_file": patch_name, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "review_date": review_date, + "target_release": args.release, + "is_lts": is_lts_release(args.release) if args.release else False, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "html": + # Wrap HTML content with header + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"<br>Target release: {args.release}{lts_badge}" + output_text = f"""<!-- AI-generated review of {patch_name} --> +<!-- Reviewed using {provider_name} ({model}) on {review_date} --> +<div class="patch-review"> +<h1>Patch Review: {patch_name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model}) on {review_date}{release_info}</p> +{review_text} +</div> +""" + elif args.output_format == "markdown": + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"\n*Target release: {args.release}{lts_badge}*\n" + output_text = f"""# Patch Review: {patch_name} + +*Reviewed by {provider_name} ({model}) on {review_date}* +{release_info} +{review_text} +""" + else: # text + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"Target release: {args.release}{lts_badge}\n" + output_text = f"=== Patch Review: {patch_name} (via {provider_name}) ===\n" + output_text += f"Review date: {review_date}\n" + output_text += release_info + output_text += "\n" + review_text + + # Write output + if args.output: + Path(args.output).write_text(output_text) + print(f"Review written to: {args.output}", file=sys.stderr) + else: + print(output_text) + + # Send email if requested + if args.send_email: + # Email always uses plain text - warn if different format requested + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + in_reply_to = get_last_message_id(patch_content) + orig_subject = get_last_subject(patch_content) + + if orig_subject: + # Remove [PATCH n/m] prefix + review_subject = re.sub(r"^\[PATCH[^\]]*\]\s*", "", orig_subject) + review_subject = f"[REVIEW] {review_subject}" + else: + review_subject = f"[REVIEW] {patch_name}" + + # Build email body - always use plain text version + release_info = "" + if args.release: + lts_badge = " (LTS)" if is_lts_release(args.release) else "" + release_info = f"Target release: {args.release}{lts_badge}\n" + + email_body = f"""AI-generated review of {patch_name} +Reviewed using {provider_name} ({model}) on {review_date} +{release_info} +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print(f"In-Reply-To: {in_reply_to}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + in_reply_to, + email_body, + args.dry_run, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v10 3/6] devtools: add compare-reviews.sh for multi-provider analysis 2026-02-19 17:48 ` [PATCH v10 0/6] AI-assisted code and documentation review Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 2/6] devtools: add multi-provider AI patch review script Stephen Hemminger @ 2026-02-19 17:48 ` Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 4/6] devtools: add multi-provider AI documentation review script Stephen Hemminger ` (2 subsequent siblings) 5 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-19 17:48 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add script to run patch reviews across multiple AI providers for comparison purposes. The script automatically detects which providers have API keys configured and runs analyze-patch.py for each one. This allows users to compare review quality and feedback across different AI models. Features: - Auto-detects available providers based on environment variables - Optional provider selection via -p/--providers option - Saves individual reviews to separate files with -o/--output - Verbose mode passes through to underlying analyze-patch.py Usage: ./devtools/compare-reviews.sh my-patch.patch ./devtools/compare-reviews.sh -p anthropic,xai my-patch.patch ./devtools/compare-reviews.sh -o ./reviews my-patch.patch Output files are named <patch>-<provider>.txt when using the output directory option. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/compare-reviews.sh | 192 ++++++++++++++++++++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100755 devtools/compare-reviews.sh diff --git a/devtools/compare-reviews.sh b/devtools/compare-reviews.sh new file mode 100755 index 0000000000..a63eeffb71 --- /dev/null +++ b/devtools/compare-reviews.sh @@ -0,0 +1,192 @@ +#!/bin/bash +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +# Compare DPDK patch reviews across multiple AI providers +# Runs analyze-patch.py with each available provider + +set -e + +SCRIPT_DIR="$(dirname "$(readlink -f "$0")")" +ANALYZE_SCRIPT="${SCRIPT_DIR}/analyze-patch.py" +AGENTS_FILE="AGENTS.md" +OUTPUT_DIR="" +PROVIDERS="" +FORMAT="text" + +usage() { + cat <<EOF +Usage: $(basename "$0") [OPTIONS] <patch-file> + +Compare DPDK patch reviews across multiple AI providers. + +Options: + -a, --agents FILE Path to AGENTS.md file (default: AGENTS.md) + -o, --output DIR Save individual reviews to directory + -p, --providers LIST Comma-separated list of providers to use + (default: all providers with API keys set) + -f, --format FORMAT Output format: text, markdown, html, json + (default: text) + -v, --verbose Show verbose output from each provider + -h, --help Show this help message + +Environment Variables: + Set API keys for providers you want to use: + ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY + +Examples: + $(basename "$0") my-patch.patch + $(basename "$0") -p anthropic,openai my-patch.patch + $(basename "$0") -o ./reviews -f markdown my-patch.patch +EOF + exit "${1:-0}" +} + +error() { + echo "Error: $1" >&2 + exit 1 +} + +# Check which providers have API keys configured +get_available_providers() { + local available="" + + [[ -n "$ANTHROPIC_API_KEY" ]] && available="${available}anthropic," + [[ -n "$OPENAI_API_KEY" ]] && available="${available}openai," + [[ -n "$XAI_API_KEY" ]] && available="${available}xai," + [[ -n "$GOOGLE_API_KEY" ]] && available="${available}google," + + # Remove trailing comma + echo "${available%,}" +} + +# Get file extension for format +get_extension() { + case "$1" in + text) echo "txt" ;; + markdown) echo "md" ;; + html) echo "html" ;; + json) echo "json" ;; + *) echo "txt" ;; + esac +} + +# Parse command line options +VERBOSE="" + +while [[ $# -gt 0 ]]; do + case "$1" in + -a|--agents) + AGENTS_FILE="$2" + shift 2 + ;; + -o|--output) + OUTPUT_DIR="$2" + shift 2 + ;; + -p|--providers) + PROVIDERS="$2" + shift 2 + ;; + -f|--format) + FORMAT="$2" + shift 2 + ;; + -v|--verbose) + VERBOSE="-v" + shift + ;; + -h|--help) + usage 0 + ;; + -*) + error "Unknown option: $1" + ;; + *) + break + ;; + esac +done + +# Check for required arguments +if [[ $# -lt 1 ]]; then + echo "Error: No patch file specified" >&2 + usage 1 +fi + +PATCH_FILE="$1" + +if [[ ! -f "$PATCH_FILE" ]]; then + error "Patch file not found: $PATCH_FILE" +fi + +if [[ ! -f "$ANALYZE_SCRIPT" ]]; then + error "analyze-patch.py not found: $ANALYZE_SCRIPT" +fi + +# Validate format +case "$FORMAT" in + text|markdown|html|json) ;; + *) error "Invalid format: $FORMAT (must be text, markdown, html, or json)" ;; +esac + +# Get providers to use +if [[ -z "$PROVIDERS" ]]; then + PROVIDERS=$(get_available_providers) +fi + +if [[ -z "$PROVIDERS" ]]; then + error "No API keys configured. Set at least one of: "\ +"ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, GOOGLE_API_KEY" +fi + +# Create output directory if specified +if [[ -n "$OUTPUT_DIR" ]]; then + mkdir -p "$OUTPUT_DIR" +fi + +PATCH_BASENAME=$(basename "$PATCH_FILE") +PATCH_STEM="${PATCH_BASENAME%.*}" +EXT=$(get_extension "$FORMAT") + +echo "Reviewing patch: $PATCH_BASENAME" +echo "Providers: $PROVIDERS" +echo "Format: $FORMAT" +echo "========================================" +echo "" + +# Run review for each provider +IFS=',' read -ra PROVIDER_LIST <<< "$PROVIDERS" +for provider in "${PROVIDER_LIST[@]}"; do + echo ">>> Running review with: $provider" + echo "" + + if [[ -n "$OUTPUT_DIR" ]]; then + OUTPUT_FILE="${OUTPUT_DIR}/${PATCH_STEM}-${provider}.${EXT}" + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" | tee "$OUTPUT_FILE" + echo "" + echo "Saved to: $OUTPUT_FILE" + else + python3 "$ANALYZE_SCRIPT" \ + -p "$provider" \ + -a "$AGENTS_FILE" \ + -f "$FORMAT" \ + $VERBOSE \ + "$PATCH_FILE" + fi + + echo "" + echo "========================================" + echo "" +done + +echo "Review comparison complete." + +if [[ -n "$OUTPUT_DIR" ]]; then + echo "All reviews saved to: $OUTPUT_DIR" +fi -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v10 4/6] devtools: add multi-provider AI documentation review script 2026-02-19 17:48 ` [PATCH v10 0/6] AI-assisted code and documentation review Stephen Hemminger ` (2 preceding siblings ...) 2026-02-19 17:48 ` [PATCH v10 3/6] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger @ 2026-02-19 17:48 ` Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 5/6] doc: add AI-assisted patch review to contributing guide Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 6/6] MAINTAINERS: add section for AI review tools Stephen Hemminger 5 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-19 17:48 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add review-doc.py script that reviews DPDK documentation files for spelling, grammar, technical correctness, and clarity using AI language models. Supports batch processing of multiple files. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini Output formats (-f/--format): - text: plain text with extractable diff/msg markers (default) - markdown: formatted review document - html: complete HTML document with styling - json: structured data with metadata For each input file, the script produces: - <basename>.{txt,md,html,json}: review in selected format - <basename>.diff: unified diff (text/json, or with -d flag) - <basename>.msg: commit message (text/json, or with -d flag) The commit message prefix is automatically determined from the file path (e.g., doc/guides/prog_guide: for programmer's guide). Features: - Multiple file processing with glob support - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Configurable output directory via -o/--output-dir option - Output format selection via -f/--format option - Force diff/msg generation via -d/--diff option - Quiet mode (-q) suppresses stdout output - Verbose mode (-v) shows token usage and API details - Email integration using git sendemail configuration - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/review-doc.py doc/guides/prog_guide/mempool_lib.rst ./devtools/review-doc.py doc/guides/nics/*.rst ./devtools/review-doc.py -f html -d -o /tmp doc/guides/nics/*.rst ./devtools/review-doc.py --send-email --to dev@dpdk.org file.rst Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/review-doc.py | 1098 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1098 insertions(+) create mode 100755 devtools/review-doc.py diff --git a/devtools/review-doc.py b/devtools/review-doc.py new file mode 100755 index 0000000000..1366aa0f85 --- /dev/null +++ b/devtools/review-doc.py @@ -0,0 +1,1098 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Review DPDK documentation files using AI providers. + +Produces a diff file and commit message compliant with DPDK standards. +Accepts multiple documentation files and generates output for each. +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import getpass +import json +import os +import re +import smtplib +import ssl +import subprocess +import sys +from email.message import EmailMessage +from pathlib import Path +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Map output format to file extension +FORMAT_EXTENSIONS = { + "text": ".txt", + "markdown": ".md", + "html": ".html", + "json": ".json", +} + +# Additional markers for extracting diff/msg (used with --diff flag) +DIFF_MARKERS_INSTRUCTION = """ + +ADDITIONALLY, at the end of your response, include these exact markers for automated extraction: +---COMMIT_MESSAGE_START--- +(same commit message as above) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(same unified diff as above) +---UNIFIED_DIFF_END--- +""" + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# Commit prefix mappings based on file path +COMMIT_PREFIX_MAP = [ + ("doc/guides/prog_guide/", "doc/guides/prog_guide:"), + ("doc/guides/sample_app_ug/", "doc/guides/sample_app:"), + ("doc/guides/nics/", "doc/guides/nics:"), + ("doc/guides/cryptodevs/", "doc/guides/cryptodevs:"), + ("doc/guides/compressdevs/", "doc/guides/compressdevs:"), + ("doc/guides/eventdevs/", "doc/guides/eventdevs:"), + ("doc/guides/rawdevs/", "doc/guides/rawdevs:"), + ("doc/guides/bbdevs/", "doc/guides/bbdevs:"), + ("doc/guides/gpus/", "doc/guides/gpus:"), + ("doc/guides/dmadevs/", "doc/guides/dmadevs:"), + ("doc/guides/regexdevs/", "doc/guides/regexdevs:"), + ("doc/guides/mldevs/", "doc/guides/mldevs:"), + ("doc/guides/rel_notes/", "doc/guides/rel_notes:"), + ("doc/guides/linux_gsg/", "doc/guides/linux_gsg:"), + ("doc/guides/freebsd_gsg/", "doc/guides/freebsd_gsg:"), + ("doc/guides/windows_gsg/", "doc/guides/windows_gsg:"), + ("doc/guides/tools/", "doc/guides/tools:"), + ("doc/guides/testpmd_app_ug/", "doc/guides/testpmd:"), + ("doc/guides/howto/", "doc/guides/howto:"), + ("doc/guides/contributing/", "doc/guides/contributing:"), + ("doc/guides/platform/", "doc/guides/platform:"), + ("doc/guides/", "doc:"), + ("doc/api/", "doc/api:"), + ("doc/", "doc:"), +] + +SYSTEM_PROMPT = """\ +You are an expert technical documentation reviewer for DPDK. +Your task is to review documentation files and suggest improvements for: +- Spelling errors +- Grammar issues +- Technical correctness +- Clarity and readability +- Consistency with DPDK terminology + +IMPORTANT COMMIT MESSAGE RULES (from check-git-log.sh): +- Subject line MUST be ≤60 characters +- Format: "prefix: lowercase description" +- First word after colon must be lowercase (except acronyms like Rx, Tx, VF, MAC, API) +- Use imperative mood (e.g., "fix typo" not "fixed typo" or "fixes typo") +- NO trailing period on subject line +- NO punctuation marks: , ; ! ? & | +- NO underscores in subject after colon +- Body lines wrapped at 75 characters +- Body must NOT start with "It" +- Do NOT include Signed-off-by (user adds via git commit --sign) +- Only use "Fixes:" tag for actual errors in documentation, not style improvements + +Case-sensitive terms (must use exact case): +- Rx, Tx (not RX, TX, rx, tx) +- VF, PF (not vf, pf) +- MAC, VLAN, RSS, API +- Linux, Windows, FreeBSD + +For style/clarity improvements, do NOT use Fixes tag. +For actual errors (wrong information, broken examples), include Fixes tag \ +if you can identify the commit.""" + +FORMAT_INSTRUCTIONS = { + "text": """ +OUTPUT FORMAT: +You must output exactly two sections: + +1. COMMIT_MESSAGE section containing the complete commit message +2. UNIFIED_DIFF section containing the unified diff + +Use these exact markers: +---COMMIT_MESSAGE_START--- +(commit message here) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(unified diff here) +---UNIFIED_DIFF_END--- + +The diff should be in unified format that can be applied with "git apply". +If no changes are needed, output empty sections with a note.""", + "markdown": """ +OUTPUT FORMAT: +Provide your review in Markdown format with: + +## Summary +Brief description of changes + +## Commit Message +``` +(complete commit message here, ready to use) +``` + +## Changes +For each change: +### Issue N: Brief title +- **Location**: file path and line +- **Problem**: description +- **Fix**: suggested correction + +## Unified Diff +```diff +(unified diff here) +```""", + "html": """ +OUTPUT FORMAT: +Provide your review in HTML format with: +- <h2> for sections (Summary, Commit Message, Changes, Diff) +- <pre><code> for commit message and diff +- <ul>/<li> for individual issues +- Do NOT include <html>, <head>, or <body> tags - just the content + +Include sections for: Summary, Commit Message, Changes, Unified Diff""", + "json": """ +OUTPUT FORMAT: +Provide your review as JSON with this structure: +{ + "summary": "Brief description of changes", + "commit_message": "Complete commit message ready to use", + "changes": [ + { + "type": "spelling|grammar|technical|clarity|style", + "location": "line number or section", + "original": "original text", + "suggested": "corrected text", + "reason": "why this change" + } + ], + "diff": "unified diff as a string", + "stats": { + "total_issues": 0, + "spelling": 0, + "grammar": 0, + "technical": 0, + "clarity": 0 + } +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """\ +Review the following DPDK documentation file and provide improvements. + +File path: {doc_file} +Commit message prefix to use: {commit_prefix} + +{format_instruction} + +---DOCUMENT CONTENT--- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def get_smtp_config(): + """Get SMTP configuration from git config sendemail settings.""" + config = { + "server": get_git_config("sendemail.smtpserver"), + "port": get_git_config("sendemail.smtpserverport"), + "user": get_git_config("sendemail.smtpuser"), + "encryption": get_git_config("sendemail.smtpencryption"), + "password": get_git_config("sendemail.smtppass"), + } + + # Set defaults + if not config["port"]: + if config["encryption"] == "ssl": + config["port"] = "465" + else: + config["port"] = "587" + + # Convert port to int + if config["port"]: + config["port"] = int(config["port"]) + + return config + + +def get_commit_prefix(filepath): + """Determine commit message prefix from file path.""" + for prefix_path, prefix in COMMIT_PREFIX_MAP: + if filepath.startswith(prefix_path): + return prefix + return "doc:" + + +def build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": SYSTEM_PROMPT}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + doc_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + doc_content, + }, + ], + } + + +def build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": SYSTEM_PROMPT}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + doc_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: " f"{usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: " f"{usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def parse_review_text(review_text): + """Extract commit message and diff from text format response.""" + commit_msg = "" + diff = "" + + # Extract commit message + msg_match = re.search( + r"---COMMIT_MESSAGE_START---\s*\n(.*?)\n---COMMIT_MESSAGE_END---", + review_text, + re.DOTALL, + ) + if msg_match: + commit_msg = msg_match.group(1).strip() + + # Extract unified diff + diff_match = re.search( + r"---UNIFIED_DIFF_START---\s*\n(.*?)\n---UNIFIED_DIFF_END---", + review_text, + re.DOTALL, + ) + if diff_match: + diff = diff_match.group(1).strip() + # Clean up any markdown code fence if present + diff = re.sub(r"^```diff\s*\n?", "", diff) + diff = re.sub(r"\n?```\s*$", "", diff) + + return commit_msg, diff + + +def strip_diff_markers(text): + """Remove the diff/msg extraction markers from text.""" + # Remove commit message markers and content + text = re.sub( + r"\n*---COMMIT_MESSAGE_START---\s*\n.*?\n---COMMIT_MESSAGE_END---\s*", + "", + text, + flags=re.DOTALL, + ) + # Remove unified diff markers and content + text = re.sub( + r"\n*---UNIFIED_DIFF_START---\s*\n.*?\n---UNIFIED_DIFF_END---\s*", + "", + text, + flags=re.DOTALL, + ) + return text.strip() + + +def send_email( + to_addrs, + cc_addrs, + from_addr, + subject, + in_reply_to, + body, + dry_run=False, + verbose=False, +): + """Send review email via SMTP using git sendemail config.""" + # Build email message + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(msg.as_string(), file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Get SMTP configuration from git config + smtp_config = get_smtp_config() + + if not smtp_config["server"]: + error("No SMTP server configured. Set git config sendemail.smtpserver") + + server = smtp_config["server"] + port = smtp_config["port"] + user = smtp_config["user"] + encryption = smtp_config["encryption"] + + # Get password from environment or git config, or prompt + password = os.environ.get("SMTP_PASSWORD") or smtp_config["password"] + if user and not password: + password = getpass.getpass(f"SMTP password for {user}@{server}: ") + + if verbose: + print(f"SMTP server: {server}:{port}", file=sys.stderr) + print(f"SMTP user: {user or '(none)'}", file=sys.stderr) + print(f"Encryption: {encryption or 'starttls'}", file=sys.stderr) + + # Collect all recipients + all_recipients = list(to_addrs) + if cc_addrs: + all_recipients.extend(cc_addrs) + + try: + if encryption == "ssl": + # SSL/TLS connection from the start (port 465) + context = ssl.create_default_context() + with smtplib.SMTP_SSL(server, port, context=context) as smtp: + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + else: + # STARTTLS (port 587) or plain (port 25) + with smtplib.SMTP(server, port) as smtp: + smtp.ehlo() + if encryption == "tls" or port == 587: + context = ssl.create_default_context() + smtp.starttls(context=context) + smtp.ehlo() + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + + print(f"Email sent via SMTP ({server}:{port})", file=sys.stderr) + return True + + except smtplib.SMTPAuthenticationError as e: + error(f"SMTP authentication failed: {e}") + except smtplib.SMTPException as e: + error(f"SMTP error: {e}") + except OSError as e: + error(f"Connection error to {server}:{port}: {e}") + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Review DPDK documentation files using AI providers. " + "Accepts multiple files and generates output for each.", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s doc/guides/prog_guide/mempool_lib.rst + %(prog)s doc/guides/nics/*.rst # Review all NIC docs + %(prog)s -p openai -o /tmp doc/guides/nics/ixgbe.rst doc/guides/nics/i40e.rst + %(prog)s -f html -d -o /tmp/reviews doc/guides/nics/*.rst # HTML + diff files + %(prog)s -f json -o /tmp doc/guides/howto/flow_bifurcation.rst + %(prog)s --send-email --to dev@dpdk.org doc/guides/nics/ixgbe.rst + +Output files (in output-dir): + <basename>.txt|.md|.html|.json Review in selected format + <basename>.diff Unified diff (text/json, or with --diff) + <basename>.msg Commit message (text/json, or with --diff) + +After review: + git apply <basename>.diff + git commit -sF <basename>.msg + +SMTP Configuration (from git config): + sendemail.smtpserver SMTP server hostname + sendemail.smtpserverport SMTP port (default: 587 for TLS, 465 for SSL) + sendemail.smtpuser SMTP username + sendemail.smtpencryption 'tls' for STARTTLS, 'ssl' for SSL/TLS + sendemail.smtppass SMTP password (or set SMTP_PASSWORD env var) + +Example git config: + git config --global sendemail.smtpserver smtp.gmail.com + git config --global sendemail.smtpserverport 587 + git config --global sendemail.smtpuser yourname@gmail.com + git config --global sendemail.smtpencryption tls + """, + ) + + parser.add_argument( + "doc_files", + nargs="+", + metavar="doc_file", + help="Documentation file(s) to review", + ) + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=8192, + help="Max tokens for response (default: 8192)", + ) + parser.add_argument( + "-o", + "--output-dir", + default=".", + help="Output directory for all output files (default: .)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-q", + "--quiet", + action="store_true", + help="Suppress review output to stdout (only write files)", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-d", + "--diff", + action="store_true", + help="Always produce .diff and .msg files (automatic for text/json)", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + # Validate all doc files exist before processing + doc_paths = [] + for doc_file in args.doc_files: + doc_path = Path(doc_file) + if not doc_path.exists(): + error(f"Documentation file not found: {doc_file}") + doc_paths.append((doc_file, doc_path)) + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Read AGENTS.md once + agents_content = agents_path.read_text() + output_dir = Path(args.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + provider_name = config["name"] + + # Process each file + num_files = len(doc_paths) + for file_idx, (doc_file, doc_path) in enumerate(doc_paths, 1): + if num_files > 1: + print( + f"\n{'=' * 60}", + file=sys.stderr, + ) + print( + f"Processing file {file_idx}/{num_files}: {doc_file}", + file=sys.stderr, + ) + print( + f"{'=' * 60}", + file=sys.stderr, + ) + + # Determine output filenames + doc_basename = doc_path.stem + diff_file = output_dir / f"{doc_basename}.diff" + msg_file = output_dir / f"{doc_basename}.msg" + + # Get commit prefix + commit_prefix = get_commit_prefix(doc_file) + + # Read doc content + doc_content = doc_path.read_text() + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Doc file: {doc_file}", file=sys.stderr) + print(f"Commit prefix: {commit_prefix}", file=sys.stderr) + print(f"Output dir: {args.output_dir}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + args.output_format, + args.diff, + args.verbose, + ) + + if not review_text: + print( + f"Warning: No response received for {doc_file}", + file=sys.stderr, + ) + continue + + # Determine review output file + format_ext = FORMAT_EXTENSIONS[args.output_format] + review_file = output_dir / f"{doc_basename}{format_ext}" + + # Determine if we should write diff/msg files + write_diff_msg = args.diff or args.output_format in ("text", "json") + + # Extract commit message and diff first (before stripping markers) + commit_msg, diff = "", "" + if write_diff_msg: + if args.output_format == "json": + # Will extract from JSON below + pass + else: + # Parse from text format markers + commit_msg, diff = parse_review_text(review_text) + + # For non-text formats with --diff, strip the markers from display output + display_text = review_text + if args.diff and args.output_format in ("markdown", "html"): + display_text = strip_diff_markers(review_text) + + # Build formatted output text + if args.output_format == "text": + output_text = review_text + elif args.output_format == "json": + # Try to parse JSON response + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + print("Warning: Response is not valid JSON", file=sys.stderr) + review_data = {"raw_response": review_text} + + # Extract diff/msg from JSON if present + if write_diff_msg: + if isinstance(review_data, dict) and "raw_response" not in review_data: + commit_msg = review_data.get("commit_message", "") + diff = review_data.get("diff", "") + + # Add metadata + output_data = { + "metadata": { + "doc_file": doc_file, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "commit_prefix": commit_prefix, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "markdown": + output_text = f"""# Documentation Review: {doc_path.name} + +*Reviewed by {provider_name} ({model})* + +{display_text} +""" + elif args.output_format == "html": + output_text = f"""<!DOCTYPE html> +<html> +<head> +<meta charset="utf-8"> +<title>Review: {doc_path.name}</title> +<style> +body {{ font-family: system-ui, sans-serif; max-width: 900px; margin: 2em auto; padding: 0 1em; }} +h1 {{ color: #333; }} +.review-meta {{ color: #666; font-style: italic; }} +pre {{ background: #f5f5f5; padding: 1em; overflow-x: auto; }} +</style> +</head> +<body> +<h1>Documentation Review: {doc_path.name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model})</p> +<div class="review-content"> +{display_text} +</div> +</body> +</html> +""" + + # Write formatted review to file + review_file.write_text(output_text) + print(f"Review written to: {review_file}", file=sys.stderr) + + # Write diff/msg files + if write_diff_msg: + if commit_msg: + msg_file.write_text(commit_msg + "\n") + print(f"Commit message written to: {msg_file}", file=sys.stderr) + else: + msg_file.write_text("# No commit message generated\n") + print("Warning: Could not extract commit message", file=sys.stderr) + + if diff: + diff_file.write_text(diff + "\n") + print(f"Diff written to: {diff_file}", file=sys.stderr) + else: + diff_file.write_text("# No changes suggested\n") + print("Warning: Could not extract diff", file=sys.stderr) + + # Print to stdout unless quiet (or multiple files without verbose) + show_stdout = not args.quiet and (num_files == 1 or args.verbose) + if show_stdout: + print( + f"\n=== Documentation Review: {doc_path.name} " + f"(via {provider_name}) ===" + ) + print(output_text) + + # Print usage instructions for text format + if args.output_format == "text": + print("\n=== Output Files ===") + print(f"Commit message: {msg_file}") + print(f"Diff file: {diff_file}") + print("\nTo apply changes:") + print(f" git apply {diff_file}") + print(f" git commit -sF {msg_file}") + + # Send email if requested + if args.send_email: + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + review_subject = f"[REVIEW] {commit_prefix} {doc_path.name}" + + # Build email body + email_body = f"""AI-generated documentation review of {doc_file} +Reviewed using {provider_name} ({model}) + +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + None, + email_body, + args.dry_run, + args.verbose, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + # Print summary for multiple files + if num_files > 1: + print(f"\n{'=' * 60}", file=sys.stderr) + print(f"Processed {num_files} files", file=sys.stderr) + print(f"Output directory: {output_dir}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v10 5/6] doc: add AI-assisted patch review to contributing guide 2026-02-19 17:48 ` [PATCH v10 0/6] AI-assisted code and documentation review Stephen Hemminger ` (3 preceding siblings ...) 2026-02-19 17:48 ` [PATCH v10 4/6] devtools: add multi-provider AI documentation review script Stephen Hemminger @ 2026-02-19 17:48 ` Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 6/6] MAINTAINERS: add section for AI review tools Stephen Hemminger 5 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-19 17:48 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a new section to the contributing guide describing the analyze-patch.py script which uses AI providers to review patches against DPDK coding standards before submission to the mailing list. The new section covers basic usage, provider selection, patch series handling, LTS release review, and output format options. A note clarifies that AI review supplements but does not replace human review. Also add a reference to the script in the new driver guide's test tools checklist. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- doc/guides/contributing/new_driver.rst | 2 + doc/guides/contributing/patches.rst | 56 ++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) diff --git a/doc/guides/contributing/new_driver.rst b/doc/guides/contributing/new_driver.rst index 555e875329..6c0d356cfd 100644 --- a/doc/guides/contributing/new_driver.rst +++ b/doc/guides/contributing/new_driver.rst @@ -210,3 +210,5 @@ Be sure to run the following test tools per patch in a patch series: * `check-doc-vs-code.sh` * `check-spdx-tag.sh` * Build documentation and validate how output looks +* Optionally run ``analyze-patch.py`` for AI-assisted review + (see :ref:`ai_assisted_review` in the Contributing Guide) diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst index 5f554d47e6..74fc714d16 100644 --- a/doc/guides/contributing/patches.rst +++ b/doc/guides/contributing/patches.rst @@ -503,6 +503,62 @@ Additionally, when contributing to the DTS tool, patches should also be checked the ``dts-check-format.sh`` script in the ``devtools`` directory of the DPDK repo. To run the script, extra :ref:`Python dependencies <dts_deps>` are needed. + +.. _ai_assisted_review: + +AI-Assisted Patch Review +------------------------ + +Contributors may optionally use the ``devtools/analyze-patch.py`` script +to get an AI-assisted review of patches before submitting them to the mailing list. +The script checks patches against the DPDK coding standards and contribution +guidelines documented in ``AGENTS.md``. + +The script supports multiple AI providers (Anthropic Claude, OpenAI ChatGPT, +xAI Grok, Google Gemini). An API key for the chosen provider must be set +in the corresponding environment variable (see ``--list-providers``). + +Basic usage:: + + # Review a single patch (default provider: Anthropic Claude) + devtools/analyze-patch.py my-patch.patch + + # Use a different provider + devtools/analyze-patch.py -p openai my-patch.patch + + # Review for an LTS branch (enables stricter rules) + devtools/analyze-patch.py -r 24.11 my-patch.patch + + # List available providers and their API key variables + devtools/analyze-patch.py --list-providers + +For a patch series in an mbox file, the ``--split-patches`` option reviews +each patch individually:: + + devtools/analyze-patch.py --split-patches series.mbox + + # Review only a range of patches + devtools/analyze-patch.py --split-patches --patch-range 1-5 series.mbox + +When reviewing for a Long Term Stable (LTS) release, use the ``-r`` option +with the target version. Any DPDK release with minor version ``.11`` +(e.g., 23.11, 24.11) is automatically recognized as LTS, +and the script will enforce stricter rules: bug fixes only, no new features or APIs. + +Output can be formatted as plain text (default), Markdown, HTML, or JSON:: + + devtools/analyze-patch.py -f markdown -o review.md my-patch.patch + +The review guidelines in ``AGENTS.md`` cover commit message formatting, +SPDX licensing, C coding style, forbidden API usage, symbol export rules, +and other DPDK-specific requirements. + +.. note:: + + AI-assisted review is a supplement to, not a replacement for, + human review on the mailing list. + Always verify AI suggestions before acting on them. + .. _contrib_check_compilation: Checking Compilation -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v10 6/6] MAINTAINERS: add section for AI review tools 2026-02-19 17:48 ` [PATCH v10 0/6] AI-assisted code and documentation review Stephen Hemminger ` (4 preceding siblings ...) 2026-02-19 17:48 ` [PATCH v10 5/6] doc: add AI-assisted patch review to contributing guide Stephen Hemminger @ 2026-02-19 17:48 ` Stephen Hemminger 5 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-19 17:48 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add maintainer entries for the AI-assisted code review tooling: AGENTS.md, analyze-patch.py, compare-reviews.sh, and review-doc.py. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- MAINTAINERS | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 1b2f1ed2ba..48ceff9010 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -109,6 +109,14 @@ F: license/ F: .editorconfig F: .mailmap +AI review tools +M: Stephen Hemminger <stephen@networkplumber.org> +M: Aaron Conole <aconole@redhat.com> +F: AGENTS.md +F: devtools/analyze-patch.py +F: devtools/compare-reviews.sh +F: devtools/review-doc.py + Linux kernel uAPI headers M: Maxime Coquelin <maxime.coquelin@redhat.com> F: devtools/linux-uapi.sh -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH v7 0/4] devtools: add AI-assisted code review tools
@ 2026-01-26 18:40 Stephen Hemminger
2026-02-09 19:48 ` [PATCH v8 0/6] add AGENTS.md and scripts for AI code review Stephen Hemminger
0 siblings, 1 reply; 63+ messages in thread
From: Stephen Hemminger @ 2026-01-26 18:40 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
This series adds infrastructure for AI-assisted code review of DPDK
patches and documentation. It provides guidelines for AI tools and
scripts to interface with multiple AI providers.
The AGENTS.md file consolidates DPDK coding standards, commit message
requirements, and common review feedback into a format optimized for
AI code review tools. The accompanying scripts enable automated patch
analysis using Claude, ChatGPT, Grok, or Gemini.
Patches:
1. AGENTS.md - Guidelines document for AI review tools
2. analyze-patch.py - Multi-provider patch review script
3. compare-reviews.sh - Compare reviews across providers
4. review-doc.py - Documentation review with diff output
Changes in v7:
- Add "Review Philosophy" section to AGENTS.md with guidance on
confidence levels and concise feedback
- Add "Priority Areas" section covering security, correctness,
and architecture concerns that reviewers should focus on
- Minor code cleanups in analyze-patch.py
Changes in v6:
- Expanded AGENTS.md with "Unnecessary Code Patterns" section
- Added guidance on rte_malloc() and rte_memcpy() appropriate use
- Added symbol naming guidelines for static linking
- Improved email sending in review-doc.py with native SMTP support
Changes in v5:
- Added review-doc.py for documentation review
- Added email sending capability to scripts
- Expanded forbidden tokens documentation
Changes in v4:
- Added compare-reviews.sh script
- Multiple output formats (text, markdown, html, json)
- Improved error handling
Changes in v3:
- Added support for OpenAI, xAI Grok, and Google Gemini
- Added prompt caching for Anthropic to reduce costs
- Restructured AGENTS.md for better machine parsing
Changes in v2:
- Split into separate patches
- Added verbose mode with token usage statistics
Stephen Hemminger (4):
doc: add AGENTS.md for AI-powered code review tools
devtools: add multi-provider AI patch review script
devtools: add compare-reviews.sh for multi-provider analysis
devtools: add multi-provider AI documentation review script
AGENTS.md | 1000 +++++++++++++++++++++++++++++++++++
devtools/analyze-patch.py | 731 +++++++++++++++++++++++++
devtools/compare-reviews.sh | 192 +++++++
devtools/review-doc.py | 974 ++++++++++++++++++++++++++++++++++
4 files changed, 2897 insertions(+)
create mode 100644 AGENTS.md
create mode 100755 devtools/analyze-patch.py
create mode 100644 devtools/compare-reviews.sh
create mode 100755 devtools/review-doc.py
--
2.51.0
^ permalink raw reply [flat|nested] 63+ messages in thread* [PATCH v8 0/6] add AGENTS.md and scripts for AI code review 2026-01-26 18:40 [PATCH v7 0/4] devtools: add AI-assisted code " Stephen Hemminger @ 2026-02-09 19:48 ` Stephen Hemminger 2026-02-09 19:48 ` [PATCH v8 4/6] devtools: add multi-provider AI documentation review script Stephen Hemminger 0 siblings, 1 reply; 63+ messages in thread From: Stephen Hemminger @ 2026-02-09 19:48 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Add a structured reference document that enables AI code review tools to validate DPDK contributions against project standards. This document consolidates requirements from multiple sources into a machine-readable format optimized for automated validation workflows. The AGENTS.md file synthesizes guidelines from: - DPDK Contributing Code documentation (patches.rst) - DPDK Coding Style guidelines (coding_style.rst) - DPDK validation scripts (check-git-log.sh, checkpatches.sh) - Linux kernel patch submission process - SPDX License Identifier specification - DPDK Coccinelle scripts (cocci) - common items spotted on mailing list review Key sections include: - SPDX license and copyright header requirements - Commit message format with precise limits (60 char subject, 75 char body) and tag ordering rules - C coding style including explicit comparison requirements - Forbidden tokens table derived from checkpatches.sh - API tag placement rules for experimental and internal APIs - Patch validation checklists with severity levels The forbidden tokens section documents restrictions on deprecated atomics, logging functions, threading APIs, and compiler built-ins that are checked by the existing checkpatches.sh infrastructure. Severity levels (error/warning/info) align with the exit codes and messaging from check-git-log.sh and checkpatches.sh to help automated tools prioritize feedback appropriately. References: - https://doc.dpdk.org/guides/contributing/patches.html - https://doc.dpdk.org/guides/contributing/coding_style.html - devtools/check-git-log.sh - devtools/checkpatches.sh - devtools/cocci/ v8 - revisions to AGENTS.md to detect more bugs. previous prompt was screening out leaks where AI wasn't sure enough to report it. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Stephen Hemminger (6): doc: add AGENTS.md for AI code review tools devtools: add multi-provider AI patch review script devtools: add compare-reviews.sh for multi-provider analysis devtools: add multi-provider AI documentation review script doc: add AI-assisted patch review to contributing guide MAINTAINERS: add section for AI review tools AGENTS.md | 1514 ++++++++++++++++++++++++ MAINTAINERS | 8 + devtools/analyze-patch.py | 1334 +++++++++++++++++++++ devtools/compare-reviews.sh | 192 +++ devtools/review-doc.py | 1098 +++++++++++++++++ doc/guides/contributing/new_driver.rst | 2 + doc/guides/contributing/patches.rst | 56 + 7 files changed, 4204 insertions(+) create mode 100644 AGENTS.md create mode 100755 devtools/analyze-patch.py create mode 100755 devtools/compare-reviews.sh create mode 100755 devtools/review-doc.py -- 2.51.0 ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH v8 4/6] devtools: add multi-provider AI documentation review script 2026-02-09 19:48 ` [PATCH v8 0/6] add AGENTS.md and scripts for AI code review Stephen Hemminger @ 2026-02-09 19:48 ` Stephen Hemminger 0 siblings, 0 replies; 63+ messages in thread From: Stephen Hemminger @ 2026-02-09 19:48 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger, Aaron Conole Add review-doc.py script that reviews DPDK documentation files for spelling, grammar, technical correctness, and clarity using AI language models. Supports batch processing of multiple files. Supported AI providers: - Anthropic Claude (default) - OpenAI ChatGPT - xAI Grok - Google Gemini Output formats (-f/--format): - text: plain text with extractable diff/msg markers (default) - markdown: formatted review document - html: complete HTML document with styling - json: structured data with metadata For each input file, the script produces: - <basename>.{txt,md,html,json}: review in selected format - <basename>.diff: unified diff (text/json, or with -d flag) - <basename>.msg: commit message (text/json, or with -d flag) The commit message prefix is automatically determined from the file path (e.g., doc/guides/prog_guide: for programmer's guide). Features: - Multiple file processing with glob support - Provider selection via -p/--provider option - Custom model selection via -m/--model option - Configurable output directory via -o/--output-dir option - Output format selection via -f/--format option - Force diff/msg generation via -d/--diff option - Quiet mode (-q) suppresses stdout output - Verbose mode (-v) shows token usage and API details - Email integration using git sendemail configuration - Prompt caching support for Anthropic to reduce costs Usage: ./devtools/review-doc.py doc/guides/prog_guide/mempool_lib.rst ./devtools/review-doc.py doc/guides/nics/*.rst ./devtools/review-doc.py -f html -d -o /tmp doc/guides/nics/*.rst ./devtools/review-doc.py --send-email --to dev@dpdk.org file.rst Requires the appropriate API key environment variable to be set for the chosen provider (ANTHROPIC_API_KEY, OPENAI_API_KEY, XAI_API_KEY, or GOOGLE_API_KEY). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- devtools/review-doc.py | 1098 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1098 insertions(+) create mode 100755 devtools/review-doc.py diff --git a/devtools/review-doc.py b/devtools/review-doc.py new file mode 100755 index 0000000000..1366aa0f85 --- /dev/null +++ b/devtools/review-doc.py @@ -0,0 +1,1098 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2026 Stephen Hemminger + +""" +Review DPDK documentation files using AI providers. + +Produces a diff file and commit message compliant with DPDK standards. +Accepts multiple documentation files and generates output for each. +Supported providers: Anthropic Claude, OpenAI ChatGPT, xAI Grok, Google Gemini +""" + +import argparse +import getpass +import json +import os +import re +import smtplib +import ssl +import subprocess +import sys +from email.message import EmailMessage +from pathlib import Path +from urllib.request import Request, urlopen +from urllib.error import URLError, HTTPError + +# Output formats +OUTPUT_FORMATS = ["text", "markdown", "html", "json"] + +# Map output format to file extension +FORMAT_EXTENSIONS = { + "text": ".txt", + "markdown": ".md", + "html": ".html", + "json": ".json", +} + +# Additional markers for extracting diff/msg (used with --diff flag) +DIFF_MARKERS_INSTRUCTION = """ + +ADDITIONALLY, at the end of your response, include these exact markers for automated extraction: +---COMMIT_MESSAGE_START--- +(same commit message as above) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(same unified diff as above) +---UNIFIED_DIFF_END--- +""" + +# Provider configurations +PROVIDERS = { + "anthropic": { + "name": "Claude", + "endpoint": "https://api.anthropic.com/v1/messages", + "default_model": "claude-sonnet-4-5-20250929", + "env_var": "ANTHROPIC_API_KEY", + }, + "openai": { + "name": "ChatGPT", + "endpoint": "https://api.openai.com/v1/chat/completions", + "default_model": "gpt-4o", + "env_var": "OPENAI_API_KEY", + }, + "xai": { + "name": "Grok", + "endpoint": "https://api.x.ai/v1/chat/completions", + "default_model": "grok-3", + "env_var": "XAI_API_KEY", + }, + "google": { + "name": "Gemini", + "endpoint": "https://generativelanguage.googleapis.com/v1beta/models", + "default_model": "gemini-2.0-flash", + "env_var": "GOOGLE_API_KEY", + }, +} + +# Commit prefix mappings based on file path +COMMIT_PREFIX_MAP = [ + ("doc/guides/prog_guide/", "doc/guides/prog_guide:"), + ("doc/guides/sample_app_ug/", "doc/guides/sample_app:"), + ("doc/guides/nics/", "doc/guides/nics:"), + ("doc/guides/cryptodevs/", "doc/guides/cryptodevs:"), + ("doc/guides/compressdevs/", "doc/guides/compressdevs:"), + ("doc/guides/eventdevs/", "doc/guides/eventdevs:"), + ("doc/guides/rawdevs/", "doc/guides/rawdevs:"), + ("doc/guides/bbdevs/", "doc/guides/bbdevs:"), + ("doc/guides/gpus/", "doc/guides/gpus:"), + ("doc/guides/dmadevs/", "doc/guides/dmadevs:"), + ("doc/guides/regexdevs/", "doc/guides/regexdevs:"), + ("doc/guides/mldevs/", "doc/guides/mldevs:"), + ("doc/guides/rel_notes/", "doc/guides/rel_notes:"), + ("doc/guides/linux_gsg/", "doc/guides/linux_gsg:"), + ("doc/guides/freebsd_gsg/", "doc/guides/freebsd_gsg:"), + ("doc/guides/windows_gsg/", "doc/guides/windows_gsg:"), + ("doc/guides/tools/", "doc/guides/tools:"), + ("doc/guides/testpmd_app_ug/", "doc/guides/testpmd:"), + ("doc/guides/howto/", "doc/guides/howto:"), + ("doc/guides/contributing/", "doc/guides/contributing:"), + ("doc/guides/platform/", "doc/guides/platform:"), + ("doc/guides/", "doc:"), + ("doc/api/", "doc/api:"), + ("doc/", "doc:"), +] + +SYSTEM_PROMPT = """\ +You are an expert technical documentation reviewer for DPDK. +Your task is to review documentation files and suggest improvements for: +- Spelling errors +- Grammar issues +- Technical correctness +- Clarity and readability +- Consistency with DPDK terminology + +IMPORTANT COMMIT MESSAGE RULES (from check-git-log.sh): +- Subject line MUST be ≤60 characters +- Format: "prefix: lowercase description" +- First word after colon must be lowercase (except acronyms like Rx, Tx, VF, MAC, API) +- Use imperative mood (e.g., "fix typo" not "fixed typo" or "fixes typo") +- NO trailing period on subject line +- NO punctuation marks: , ; ! ? & | +- NO underscores in subject after colon +- Body lines wrapped at 75 characters +- Body must NOT start with "It" +- Do NOT include Signed-off-by (user adds via git commit --sign) +- Only use "Fixes:" tag for actual errors in documentation, not style improvements + +Case-sensitive terms (must use exact case): +- Rx, Tx (not RX, TX, rx, tx) +- VF, PF (not vf, pf) +- MAC, VLAN, RSS, API +- Linux, Windows, FreeBSD + +For style/clarity improvements, do NOT use Fixes tag. +For actual errors (wrong information, broken examples), include Fixes tag \ +if you can identify the commit.""" + +FORMAT_INSTRUCTIONS = { + "text": """ +OUTPUT FORMAT: +You must output exactly two sections: + +1. COMMIT_MESSAGE section containing the complete commit message +2. UNIFIED_DIFF section containing the unified diff + +Use these exact markers: +---COMMIT_MESSAGE_START--- +(commit message here) +---COMMIT_MESSAGE_END--- + +---UNIFIED_DIFF_START--- +(unified diff here) +---UNIFIED_DIFF_END--- + +The diff should be in unified format that can be applied with "git apply". +If no changes are needed, output empty sections with a note.""", + "markdown": """ +OUTPUT FORMAT: +Provide your review in Markdown format with: + +## Summary +Brief description of changes + +## Commit Message +``` +(complete commit message here, ready to use) +``` + +## Changes +For each change: +### Issue N: Brief title +- **Location**: file path and line +- **Problem**: description +- **Fix**: suggested correction + +## Unified Diff +```diff +(unified diff here) +```""", + "html": """ +OUTPUT FORMAT: +Provide your review in HTML format with: +- <h2> for sections (Summary, Commit Message, Changes, Diff) +- <pre><code> for commit message and diff +- <ul>/<li> for individual issues +- Do NOT include <html>, <head>, or <body> tags - just the content + +Include sections for: Summary, Commit Message, Changes, Unified Diff""", + "json": """ +OUTPUT FORMAT: +Provide your review as JSON with this structure: +{ + "summary": "Brief description of changes", + "commit_message": "Complete commit message ready to use", + "changes": [ + { + "type": "spelling|grammar|technical|clarity|style", + "location": "line number or section", + "original": "original text", + "suggested": "corrected text", + "reason": "why this change" + } + ], + "diff": "unified diff as a string", + "stats": { + "total_issues": 0, + "spelling": 0, + "grammar": 0, + "technical": 0, + "clarity": 0 + } +} +Output ONLY valid JSON, no markdown code fences or other text.""", +} + +USER_PROMPT = """\ +Review the following DPDK documentation file and provide improvements. + +File path: {doc_file} +Commit message prefix to use: {commit_prefix} + +{format_instruction} + +---DOCUMENT CONTENT--- +""" + + +def error(msg): + """Print error message and exit.""" + print(f"Error: {msg}", file=sys.stderr) + sys.exit(1) + + +def get_git_config(key): + """Get a value from git config.""" + try: + result = subprocess.run( + ["git", "config", "--get", key], + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + except (subprocess.CalledProcessError, FileNotFoundError): + return None + + +def get_smtp_config(): + """Get SMTP configuration from git config sendemail settings.""" + config = { + "server": get_git_config("sendemail.smtpserver"), + "port": get_git_config("sendemail.smtpserverport"), + "user": get_git_config("sendemail.smtpuser"), + "encryption": get_git_config("sendemail.smtpencryption"), + "password": get_git_config("sendemail.smtppass"), + } + + # Set defaults + if not config["port"]: + if config["encryption"] == "ssl": + config["port"] = "465" + else: + config["port"] = "587" + + # Convert port to int + if config["port"]: + config["port"] = int(config["port"]) + + return config + + +def get_commit_prefix(filepath): + """Determine commit message prefix from file path.""" + for prefix_path, prefix in COMMIT_PREFIX_MAP: + if filepath.startswith(prefix_path): + return prefix + return "doc:" + + +def build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Anthropic API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "system": [ + {"type": "text", "text": SYSTEM_PROMPT}, + { + "type": "text", + "text": agents_content, + "cache_control": {"type": "ephemeral"}, + }, + ], + "messages": [ + { + "role": "user", + "content": user_prompt + doc_content, + } + ], + } + + +def build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for OpenAI-compatible APIs.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "model": model, + "max_tokens": max_tokens, + "messages": [ + {"role": "system", "content": SYSTEM_PROMPT}, + {"role": "system", "content": agents_content}, + { + "role": "user", + "content": user_prompt + doc_content, + }, + ], + } + + +def build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, +): + """Build request payload for Google Gemini API.""" + format_instruction = FORMAT_INSTRUCTIONS.get(output_format, "") + if include_diff_markers and output_format not in ("text", "json"): + format_instruction += DIFF_MARKERS_INSTRUCTION + user_prompt = USER_PROMPT.format( + doc_file=doc_file, + commit_prefix=commit_prefix, + format_instruction=format_instruction, + ) + return { + "contents": [ + {"role": "user", "parts": [{"text": SYSTEM_PROMPT}]}, + {"role": "user", "parts": [{"text": agents_content}]}, + { + "role": "user", + "parts": [{"text": user_prompt + doc_content}], + }, + ], + "generationConfig": {"maxOutputTokens": max_tokens}, + } + + +def call_api( + provider, + api_key, + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format="text", + include_diff_markers=False, + verbose=False, +): + """Make API request to the specified provider.""" + config = PROVIDERS[provider] + + # Build request based on provider + if provider == "anthropic": + request_data = build_anthropic_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "x-api-key": api_key, + "anthropic-version": "2023-06-01", + } + url = config["endpoint"] + elif provider == "google": + request_data = build_google_request( + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = {"Content-Type": "application/json"} + url = f"{config['endpoint']}/{model}:generateContent?key={api_key}" + else: # openai, xai + request_data = build_openai_request( + model, + max_tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + output_format, + include_diff_markers, + ) + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {api_key}", + } + url = config["endpoint"] + + # Make request + request_body = json.dumps(request_data).encode("utf-8") + req = Request(url, data=request_body, headers=headers, method="POST") + + try: + with urlopen(req) as response: + result = json.loads(response.read().decode("utf-8")) + except HTTPError as e: + error_body = e.read().decode("utf-8") + try: + error_data = json.loads(error_body) + error(f"API error: {error_data.get('error', error_body)}") + except json.JSONDecodeError: + error(f"API error ({e.code}): {error_body}") + except URLError as e: + error(f"Connection error: {e.reason}") + + # Show verbose info + if verbose: + print("=== Token Usage ===", file=sys.stderr) + if provider == "anthropic": + usage = result.get("usage", {}) + print(f"Input tokens: {usage.get('input_tokens', 'N/A')}", file=sys.stderr) + print( + f"Cache creation: " f"{usage.get('cache_creation_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Cache read: {usage.get('cache_read_input_tokens', 0)}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('output_tokens', 'N/A')}", file=sys.stderr + ) + elif provider == "google": + usage = result.get("usageMetadata", {}) + print( + f"Prompt tokens: {usage.get('promptTokenCount', 'N/A')}", + file=sys.stderr, + ) + print( + f"Output tokens: {usage.get('candidatesTokenCount', 'N/A')}", + file=sys.stderr, + ) + else: # openai, xai + usage = result.get("usage", {}) + print( + f"Prompt tokens: {usage.get('prompt_tokens', 'N/A')}", file=sys.stderr + ) + print( + f"Completion tokens: " f"{usage.get('completion_tokens', 'N/A')}", + file=sys.stderr, + ) + print("===================", file=sys.stderr) + + # Extract response text + if provider == "anthropic": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + content = result.get("content", []) + return "".join( + block.get("text", "") for block in content if block.get("type") == "text" + ) + elif provider == "google": + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + candidates = result.get("candidates", []) + if not candidates: + error("No response from Gemini") + parts = candidates[0].get("content", {}).get("parts", []) + return "".join(part.get("text", "") for part in parts) + else: # openai, xai + if "error" in result: + error(f"API error: {result['error'].get('message', result)}") + choices = result.get("choices", []) + if not choices: + error("No response from API") + return choices[0].get("message", {}).get("content", "") + + +def parse_review_text(review_text): + """Extract commit message and diff from text format response.""" + commit_msg = "" + diff = "" + + # Extract commit message + msg_match = re.search( + r"---COMMIT_MESSAGE_START---\s*\n(.*?)\n---COMMIT_MESSAGE_END---", + review_text, + re.DOTALL, + ) + if msg_match: + commit_msg = msg_match.group(1).strip() + + # Extract unified diff + diff_match = re.search( + r"---UNIFIED_DIFF_START---\s*\n(.*?)\n---UNIFIED_DIFF_END---", + review_text, + re.DOTALL, + ) + if diff_match: + diff = diff_match.group(1).strip() + # Clean up any markdown code fence if present + diff = re.sub(r"^```diff\s*\n?", "", diff) + diff = re.sub(r"\n?```\s*$", "", diff) + + return commit_msg, diff + + +def strip_diff_markers(text): + """Remove the diff/msg extraction markers from text.""" + # Remove commit message markers and content + text = re.sub( + r"\n*---COMMIT_MESSAGE_START---\s*\n.*?\n---COMMIT_MESSAGE_END---\s*", + "", + text, + flags=re.DOTALL, + ) + # Remove unified diff markers and content + text = re.sub( + r"\n*---UNIFIED_DIFF_START---\s*\n.*?\n---UNIFIED_DIFF_END---\s*", + "", + text, + flags=re.DOTALL, + ) + return text.strip() + + +def send_email( + to_addrs, + cc_addrs, + from_addr, + subject, + in_reply_to, + body, + dry_run=False, + verbose=False, +): + """Send review email via SMTP using git sendemail config.""" + # Build email message + msg = EmailMessage() + msg["From"] = from_addr + msg["To"] = ", ".join(to_addrs) + if cc_addrs: + msg["Cc"] = ", ".join(cc_addrs) + msg["Subject"] = subject + if in_reply_to: + msg["In-Reply-To"] = in_reply_to + msg["References"] = in_reply_to + msg.set_content(body) + + if dry_run: + print("=== Email Preview (dry-run) ===", file=sys.stderr) + print(msg.as_string(), file=sys.stderr) + print("=== End Preview ===", file=sys.stderr) + return True + + # Get SMTP configuration from git config + smtp_config = get_smtp_config() + + if not smtp_config["server"]: + error("No SMTP server configured. Set git config sendemail.smtpserver") + + server = smtp_config["server"] + port = smtp_config["port"] + user = smtp_config["user"] + encryption = smtp_config["encryption"] + + # Get password from environment or git config, or prompt + password = os.environ.get("SMTP_PASSWORD") or smtp_config["password"] + if user and not password: + password = getpass.getpass(f"SMTP password for {user}@{server}: ") + + if verbose: + print(f"SMTP server: {server}:{port}", file=sys.stderr) + print(f"SMTP user: {user or '(none)'}", file=sys.stderr) + print(f"Encryption: {encryption or 'starttls'}", file=sys.stderr) + + # Collect all recipients + all_recipients = list(to_addrs) + if cc_addrs: + all_recipients.extend(cc_addrs) + + try: + if encryption == "ssl": + # SSL/TLS connection from the start (port 465) + context = ssl.create_default_context() + with smtplib.SMTP_SSL(server, port, context=context) as smtp: + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + else: + # STARTTLS (port 587) or plain (port 25) + with smtplib.SMTP(server, port) as smtp: + smtp.ehlo() + if encryption == "tls" or port == 587: + context = ssl.create_default_context() + smtp.starttls(context=context) + smtp.ehlo() + if user and password: + smtp.login(user, password) + smtp.send_message(msg, from_addr, all_recipients) + + print(f"Email sent via SMTP ({server}:{port})", file=sys.stderr) + return True + + except smtplib.SMTPAuthenticationError as e: + error(f"SMTP authentication failed: {e}") + except smtplib.SMTPException as e: + error(f"SMTP error: {e}") + except OSError as e: + error(f"Connection error to {server}:{port}: {e}") + + +def list_providers(): + """Print available providers and exit.""" + print("Available AI Providers:\n") + print(f"{'Provider':<12} {'Default Model':<30} {'API Key Variable'}") + print(f"{'--------':<12} {'-------------':<30} {'----------------'}") + for name, config in PROVIDERS.items(): + print(f"{name:<12} {config['default_model']:<30} {config['env_var']}") + sys.exit(0) + + +def main(): + parser = argparse.ArgumentParser( + description="Review DPDK documentation files using AI providers. " + "Accepts multiple files and generates output for each.", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=""" +Examples: + %(prog)s doc/guides/prog_guide/mempool_lib.rst + %(prog)s doc/guides/nics/*.rst # Review all NIC docs + %(prog)s -p openai -o /tmp doc/guides/nics/ixgbe.rst doc/guides/nics/i40e.rst + %(prog)s -f html -d -o /tmp/reviews doc/guides/nics/*.rst # HTML + diff files + %(prog)s -f json -o /tmp doc/guides/howto/flow_bifurcation.rst + %(prog)s --send-email --to dev@dpdk.org doc/guides/nics/ixgbe.rst + +Output files (in output-dir): + <basename>.txt|.md|.html|.json Review in selected format + <basename>.diff Unified diff (text/json, or with --diff) + <basename>.msg Commit message (text/json, or with --diff) + +After review: + git apply <basename>.diff + git commit -sF <basename>.msg + +SMTP Configuration (from git config): + sendemail.smtpserver SMTP server hostname + sendemail.smtpserverport SMTP port (default: 587 for TLS, 465 for SSL) + sendemail.smtpuser SMTP username + sendemail.smtpencryption 'tls' for STARTTLS, 'ssl' for SSL/TLS + sendemail.smtppass SMTP password (or set SMTP_PASSWORD env var) + +Example git config: + git config --global sendemail.smtpserver smtp.gmail.com + git config --global sendemail.smtpserverport 587 + git config --global sendemail.smtpuser yourname@gmail.com + git config --global sendemail.smtpencryption tls + """, + ) + + parser.add_argument( + "doc_files", + nargs="+", + metavar="doc_file", + help="Documentation file(s) to review", + ) + parser.add_argument( + "-p", + "--provider", + choices=PROVIDERS.keys(), + default="anthropic", + help="AI provider (default: anthropic)", + ) + parser.add_argument( + "-a", + "--agents", + default="AGENTS.md", + help="Path to AGENTS.md file (default: AGENTS.md)", + ) + parser.add_argument( + "-m", + "--model", + help="Model to use (default: provider-specific)", + ) + parser.add_argument( + "-t", + "--tokens", + type=int, + default=8192, + help="Max tokens for response (default: 8192)", + ) + parser.add_argument( + "-o", + "--output-dir", + default=".", + help="Output directory for all output files (default: .)", + ) + parser.add_argument( + "-v", + "--verbose", + action="store_true", + help="Show API request details", + ) + parser.add_argument( + "-q", + "--quiet", + action="store_true", + help="Suppress review output to stdout (only write files)", + ) + parser.add_argument( + "-f", + "--format", + choices=OUTPUT_FORMATS, + default="text", + dest="output_format", + help="Output format: text, markdown, html, json (default: text)", + ) + parser.add_argument( + "-d", + "--diff", + action="store_true", + help="Always produce .diff and .msg files (automatic for text/json)", + ) + parser.add_argument( + "-l", + "--list-providers", + action="store_true", + help="List available providers and exit", + ) + + # Email options + email_group = parser.add_argument_group("Email Options") + email_group.add_argument( + "--send-email", + action="store_true", + help="Send review via email", + ) + email_group.add_argument( + "--to", + action="append", + dest="to_addrs", + default=[], + metavar="ADDRESS", + help="Email recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--cc", + action="append", + dest="cc_addrs", + default=[], + metavar="ADDRESS", + help="CC recipient (can be specified multiple times)", + ) + email_group.add_argument( + "--from", + dest="from_addr", + metavar="ADDRESS", + help="From address (default: from git config)", + ) + email_group.add_argument( + "--dry-run", + action="store_true", + help="Show email without sending", + ) + + args = parser.parse_args() + + if args.list_providers: + list_providers() + + # Get provider config + config = PROVIDERS[args.provider] + model = args.model or config["default_model"] + + # Get API key + api_key = os.environ.get(config["env_var"]) + if not api_key: + error(f"{config['env_var']} environment variable not set") + + # Validate files + agents_path = Path(args.agents) + if not agents_path.exists(): + error(f"AGENTS.md not found: {args.agents}") + + # Validate all doc files exist before processing + doc_paths = [] + for doc_file in args.doc_files: + doc_path = Path(doc_file) + if not doc_path.exists(): + error(f"Documentation file not found: {doc_file}") + doc_paths.append((doc_file, doc_path)) + + # Validate email options + if args.send_email and not args.to_addrs: + error("--send-email requires at least one --to address") + + # Get from address for email + from_addr = args.from_addr + if args.send_email and not from_addr: + git_name = get_git_config("user.name") + git_email = get_git_config("user.email") + if git_email: + from_addr = f"{git_name} <{git_email}>" if git_name else git_email + else: + error("No --from specified and git user.email not configured") + + # Read AGENTS.md once + agents_content = agents_path.read_text() + output_dir = Path(args.output_dir) + output_dir.mkdir(parents=True, exist_ok=True) + provider_name = config["name"] + + # Process each file + num_files = len(doc_paths) + for file_idx, (doc_file, doc_path) in enumerate(doc_paths, 1): + if num_files > 1: + print( + f"\n{'=' * 60}", + file=sys.stderr, + ) + print( + f"Processing file {file_idx}/{num_files}: {doc_file}", + file=sys.stderr, + ) + print( + f"{'=' * 60}", + file=sys.stderr, + ) + + # Determine output filenames + doc_basename = doc_path.stem + diff_file = output_dir / f"{doc_basename}.diff" + msg_file = output_dir / f"{doc_basename}.msg" + + # Get commit prefix + commit_prefix = get_commit_prefix(doc_file) + + # Read doc content + doc_content = doc_path.read_text() + + if args.verbose: + print("=== Request ===", file=sys.stderr) + print(f"Provider: {args.provider}", file=sys.stderr) + print(f"Model: {model}", file=sys.stderr) + print(f"Output format: {args.output_format}", file=sys.stderr) + print(f"AGENTS file: {args.agents}", file=sys.stderr) + print(f"Doc file: {doc_file}", file=sys.stderr) + print(f"Commit prefix: {commit_prefix}", file=sys.stderr) + print(f"Output dir: {args.output_dir}", file=sys.stderr) + if args.send_email: + print("Send email: yes", file=sys.stderr) + print(f"To: {', '.join(args.to_addrs)}", file=sys.stderr) + if args.cc_addrs: + print(f"Cc: {', '.join(args.cc_addrs)}", file=sys.stderr) + print(f"From: {from_addr}", file=sys.stderr) + print("===============", file=sys.stderr) + + # Call API + review_text = call_api( + args.provider, + api_key, + model, + args.tokens, + agents_content, + doc_content, + doc_file, + commit_prefix, + args.output_format, + args.diff, + args.verbose, + ) + + if not review_text: + print( + f"Warning: No response received for {doc_file}", + file=sys.stderr, + ) + continue + + # Determine review output file + format_ext = FORMAT_EXTENSIONS[args.output_format] + review_file = output_dir / f"{doc_basename}{format_ext}" + + # Determine if we should write diff/msg files + write_diff_msg = args.diff or args.output_format in ("text", "json") + + # Extract commit message and diff first (before stripping markers) + commit_msg, diff = "", "" + if write_diff_msg: + if args.output_format == "json": + # Will extract from JSON below + pass + else: + # Parse from text format markers + commit_msg, diff = parse_review_text(review_text) + + # For non-text formats with --diff, strip the markers from display output + display_text = review_text + if args.diff and args.output_format in ("markdown", "html"): + display_text = strip_diff_markers(review_text) + + # Build formatted output text + if args.output_format == "text": + output_text = review_text + elif args.output_format == "json": + # Try to parse JSON response + try: + review_data = json.loads(review_text) + except json.JSONDecodeError: + print("Warning: Response is not valid JSON", file=sys.stderr) + review_data = {"raw_response": review_text} + + # Extract diff/msg from JSON if present + if write_diff_msg: + if isinstance(review_data, dict) and "raw_response" not in review_data: + commit_msg = review_data.get("commit_message", "") + diff = review_data.get("diff", "") + + # Add metadata + output_data = { + "metadata": { + "doc_file": doc_file, + "provider": args.provider, + "provider_name": provider_name, + "model": model, + "commit_prefix": commit_prefix, + }, + "review": review_data, + } + output_text = json.dumps(output_data, indent=2) + elif args.output_format == "markdown": + output_text = f"""# Documentation Review: {doc_path.name} + +*Reviewed by {provider_name} ({model})* + +{display_text} +""" + elif args.output_format == "html": + output_text = f"""<!DOCTYPE html> +<html> +<head> +<meta charset="utf-8"> +<title>Review: {doc_path.name}</title> +<style> +body {{ font-family: system-ui, sans-serif; max-width: 900px; margin: 2em auto; padding: 0 1em; }} +h1 {{ color: #333; }} +.review-meta {{ color: #666; font-style: italic; }} +pre {{ background: #f5f5f5; padding: 1em; overflow-x: auto; }} +</style> +</head> +<body> +<h1>Documentation Review: {doc_path.name}</h1> +<p class="review-meta">Reviewed by {provider_name} ({model})</p> +<div class="review-content"> +{display_text} +</div> +</body> +</html> +""" + + # Write formatted review to file + review_file.write_text(output_text) + print(f"Review written to: {review_file}", file=sys.stderr) + + # Write diff/msg files + if write_diff_msg: + if commit_msg: + msg_file.write_text(commit_msg + "\n") + print(f"Commit message written to: {msg_file}", file=sys.stderr) + else: + msg_file.write_text("# No commit message generated\n") + print("Warning: Could not extract commit message", file=sys.stderr) + + if diff: + diff_file.write_text(diff + "\n") + print(f"Diff written to: {diff_file}", file=sys.stderr) + else: + diff_file.write_text("# No changes suggested\n") + print("Warning: Could not extract diff", file=sys.stderr) + + # Print to stdout unless quiet (or multiple files without verbose) + show_stdout = not args.quiet and (num_files == 1 or args.verbose) + if show_stdout: + print( + f"\n=== Documentation Review: {doc_path.name} " + f"(via {provider_name}) ===" + ) + print(output_text) + + # Print usage instructions for text format + if args.output_format == "text": + print("\n=== Output Files ===") + print(f"Commit message: {msg_file}") + print(f"Diff file: {diff_file}") + print("\nTo apply changes:") + print(f" git apply {diff_file}") + print(f" git commit -sF {msg_file}") + + # Send email if requested + if args.send_email: + if args.output_format != "text": + print( + f"Note: Email will be sent as plain text regardless of " + f"--format={args.output_format}", + file=sys.stderr, + ) + + review_subject = f"[REVIEW] {commit_prefix} {doc_path.name}" + + # Build email body + email_body = f"""AI-generated documentation review of {doc_file} +Reviewed using {provider_name} ({model}) + +This is an automated review. Please verify all suggestions. + +--- + +{review_text} +""" + + if args.verbose: + print("", file=sys.stderr) + print("=== Email Details ===", file=sys.stderr) + print(f"Subject: {review_subject}", file=sys.stderr) + print("=====================", file=sys.stderr) + + send_email( + args.to_addrs, + args.cc_addrs, + from_addr, + review_subject, + None, + email_body, + args.dry_run, + args.verbose, + ) + + if not args.dry_run: + print("", file=sys.stderr) + print(f"Review sent to: {', '.join(args.to_addrs)}", file=sys.stderr) + + # Print summary for multiple files + if num_files > 1: + print(f"\n{'=' * 60}", file=sys.stderr) + print(f"Processed {num_files} files", file=sys.stderr) + print(f"Output directory: {output_dir}", file=sys.stderr) + + +if __name__ == "__main__": + main() -- 2.51.0 ^ permalink raw reply related [flat|nested] 63+ messages in thread
end of thread, other threads:[~2026-02-19 18:04 UTC | newest] Thread overview: 63+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-09 1:41 [RFC] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger 2026-01-09 9:46 ` Bruce Richardson 2026-01-10 17:34 ` Stephen Hemminger 2026-01-09 9:58 ` Marat Khalili 2026-01-10 17:28 ` Stephen Hemminger 2026-01-10 18:00 ` [RFC v2] " Stephen Hemminger 2026-01-13 15:28 ` [PATCH v3] " Stephen Hemminger 2026-01-16 17:46 ` [PATCH v4] " Stephen Hemminger 2026-01-20 14:21 ` Bruce Richardson 2026-01-20 17:35 ` Bruce Richardson 2026-01-23 0:42 ` [PATCH v5 0/4] devtools: add " Stephen Hemminger 2026-01-23 0:42 ` [PATCH v5 1/4] doc: add AGENTS.md for " Stephen Hemminger 2026-01-23 0:42 ` [PATCH v5 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger 2026-01-23 0:42 ` [PATCH v5 3/4] devtools: add script to compare AI reviews across providers Stephen Hemminger 2026-01-23 0:42 ` [PATCH v5 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 2026-01-23 10:55 ` [PATCH v5 0/4] devtools: add AI-powered code review tools Marat Khalili 2026-01-24 17:15 ` Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 " Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 1/4] doc: add AGENTS.md for " Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 3/4] devtools: add script to compare AI reviews across providers Stephen Hemminger 2026-01-24 19:00 ` [PATCH v6 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 0/4] devtools: add AI-assisted code review tools Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 1/4] doc: add AGENTS.md for AI-powered " Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 3/4] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger 2026-01-28 18:21 ` [PATCH v8 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 0/4] devtools: add AI-assisted code review tools Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 1/4] doc: add AGENTS.md for AI-powered " Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 3/4] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger 2026-01-29 20:38 ` [PATCH v9 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 2026-01-30 17:40 ` [PATCH v10 0/4] devtools: add AI-assisted code review infrastructure Stephen Hemminger 2026-01-30 17:41 ` [PATCH v10 1/4] doc: add AGENTS.md for AI-powered code review tools Stephen Hemminger 2026-01-30 17:41 ` [PATCH v10 2/4] devtools: add multi-provider AI patch review script Stephen Hemminger 2026-01-30 17:41 ` [PATCH v10 3/4] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger 2026-01-30 17:41 ` [PATCH v10 4/4] devtools: add multi-provider AI documentation review script Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 2/6] devtools: add multi-provider AI patch review script Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 3/6] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 4/6] devtools: add multi-provider AI documentation review script Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 5/6] doc: add AI-assisted patch review to contributing guide Stephen Hemminger 2026-02-05 17:51 ` [PATCH v8 6/6] MAINTAINERS: add section for AI review tools Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 0/6] devtools: AI-assisted code and documentation review Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger 2026-02-18 14:59 ` Aaron Conole 2026-02-19 17:47 ` Stephen Hemminger 2026-02-19 18:04 ` Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 2/6] devtools: add multi-provider AI patch review script Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 3/6] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 4/6] devtools: add multi-provider AI documentation review script Stephen Hemminger 2026-02-17 18:20 ` Andrew Bailey 2026-02-13 21:39 ` [PATCH v9 5/6] doc: add AI-assisted patch review to contributing guide Stephen Hemminger 2026-02-13 21:39 ` [PATCH v9 6/6] MAINTAINERS: add section for AI review tools Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 0/6] AI-assisted code and documentation review Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 1/6] doc: add AGENTS.md for AI code review tools Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 2/6] devtools: add multi-provider AI patch review script Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 3/6] devtools: add compare-reviews.sh for multi-provider analysis Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 4/6] devtools: add multi-provider AI documentation review script Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 5/6] doc: add AI-assisted patch review to contributing guide Stephen Hemminger 2026-02-19 17:48 ` [PATCH v10 6/6] MAINTAINERS: add section for AI review tools Stephen Hemminger -- strict thread matches above, loose matches on Subject: below -- 2026-01-26 18:40 [PATCH v7 0/4] devtools: add AI-assisted code " Stephen Hemminger 2026-02-09 19:48 ` [PATCH v8 0/6] add AGENTS.md and scripts for AI code review Stephen Hemminger 2026-02-09 19:48 ` [PATCH v8 4/6] devtools: add multi-provider AI documentation review script Stephen Hemminger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox