From: Chris Li <sparse@chrisli.org>
To: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: linux-sparse@vger.kernel.org,
Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Subject: Re: [PATCH RESEND2 1/4] parse: initial parsing of __attribute__((format))
Date: Mon, 1 Dec 2025 23:23:04 +0400 [thread overview]
Message-ID: <CACePvbVvsAPURh+jfb2Vh8cPsOzuR2HmzuD9j5Gf6GJyD2orng@mail.gmail.com> (raw)
In-Reply-To: <20251020153918.812235-2-ben.dooks@codethink.co.uk>
Hi Ben,
Thanks for the patch and sorry for the late reply.
Your format attribute series work applies to the sparse-dev tree fine
and "make check" runs fine as well. Thank you so much.
I have some trivial coding style of feedback for you, see the comments
below. Mostly just nitpicks, does not impact the coding behavior. Let
me know if you want to update a new series or I can be lazy and just
apply your current series.
Chris
On Mon, Oct 20, 2025 at 7:39 PM Ben Dooks <ben.dooks@codethink.co.uk> wrote:
>
> Add code to parse the __attribute__((format)) used to indicate that
> a variadic function takes a printf-style format string and where
> those are. Save the data in ctype ready for checking when such an
> function is encoutered.
>
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> ---
> parse.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> symbol.h | 9 ++++++-
> 2 files changed, 88 insertions(+), 2 deletions(-)
>
> diff --git a/parse.c b/parse.c
> index 3f67451e..af4e5b50 100644
> --- a/parse.c
> +++ b/parse.c
> @@ -86,7 +86,7 @@ static attr_t
> attribute_cleanup,
> attribute_designated_init,
> attribute_transparent_union, ignore_attribute,
> - attribute_mode, attribute_force;
> + attribute_mode, attribute_force, attribute_format;
>
> typedef struct symbol *to_mode_t(struct symbol *);
>
> @@ -121,6 +121,12 @@ static void asm_modifier(struct token *token, unsigned long *mods, unsigned long
> *mods |= mod;
> }
>
> +/* the types of formatting from __attribute__((format)) */
> +enum {
> + FMT_PRINTF = 0,
> + FMT_SCANF,
> +};
> +
> static struct symbol_op typedef_op = {
> .type = KW_MODIFIER,
> .declarator = storage_specifier,
> @@ -382,6 +388,10 @@ static struct symbol_op attr_force_op = {
> .attribute = attribute_force,
> };
>
> +static struct symbol_op attr_format_op = {
> + .attribute = attribute_format,
> +};
> +
> static struct symbol_op address_space_op = {
> .attribute = attribute_address_space,
> };
> @@ -441,6 +451,16 @@ static struct symbol_op mode_word_op = {
> .to_mode = to_word_mode
> };
>
> +static struct symbol_op attr_printf_op = {
> + .type = KW_FORMAT,
> + .class = FMT_PRINTF,
> +};
> +
> +static struct symbol_op attr_scanf_op = {
> + .type = KW_FORMAT,
> + .class = FMT_SCANF,
> +};
> +
> /*
> * Define the keyword and their effects.
> * The entries in the 'typedef' and put in NS_TYPEDEF and
> @@ -557,6 +577,9 @@ static struct init_keyword {
> D("pure", &attr_fun_op, .mods = MOD_PURE),
> A("const", &attr_fun_op, .mods = MOD_PURE),
> D("gnu_inline", &attr_fun_op, .mods = MOD_GNU_INLINE),
> + D("format", &attr_format_op),
> + D("printf", &attr_printf_op),
> + D("scanf", &attr_scanf_op),
>
> /* Modes */
> D("mode", &mode_op),
> @@ -1217,6 +1240,60 @@ static struct token *attribute_address_space(struct token *token, struct symbol
> return token;
> }
>
> +static int invalid_format_args(long long start, long long at)
> +{
> + return start < 0 || at < 0 || start > USHRT_MAX || at > USHRT_MAX ||
> + (start == at && start > 0) ||
> + (start == 0 && at == 0);
> +}
> +
> +static struct token *attribute_format(struct token *token, struct symbol *attr, struct decl_state *ctx)
> +{
> + struct expression *args[3];
I notice that you never use the args as an array, e.g. pass "args" to
any function. You always use args[n] as scalar.
In that case, it is better to make each args as individual variables
with proper names. When I read your patch, I need to lookup gcc
document for the format attribute to understand each args. I found
this:
format (archetype, string-index, first-to-check)
I assume that matches your three args. Then just name the expression
"archetype", "stridx", "first2check" something like that. You get the
idea. I just make up the variable name on the spot, you can probably
find a better variable name than I do.
> + struct symbol *fmt_sym = NULL;
> +
> + /* expecting format ( type, start, va_args at) */
> +
> + token = expect(token, '(', "after format attribute");
> + if (token_type(token) == TOKEN_IDENT)
> + fmt_sym = lookup_keyword(token->ident, NS_KEYWORD);
> + if (fmt_sym)
> + if (!fmt_sym->op || fmt_sym->op->type != KW_FORMAT)
This two if statement can be combined into one if statement with
compound test expression:
if (fmt_sym && (!fmt_sym->op || fmt_sym->op->type != KW_FORMAT))
> + fmt_sym = NULL;
> +
> + token = conditional_expression(token, &args[0]);
See above, give args[0] a proper name that would be more readable.
> + token = expect(token, ',', "format attribute type");
> + token = conditional_expression(token, &args[1]);
Same.
> + token = expect(token, ',', "format attribute type position");
> + token = conditional_expression(token, &args[2]);
Same.
> + token = expect(token, ')', "format attribute arg position");
> +
> + if (!fmt_sym || !args[0] || !args[1] || !args[2]) {
> + warning(token->pos, "incorrect format attribute");
In such cases, the kernel source code often bails out early to make
the rest of code flater.
e.g.:
"goto done" here or "return toke";
> + } else if (fmt_sym->op->class != FMT_PRINTF) {
If you follow the above suggestion. "} else if () {" becomes "if ()".
That looks cleaner
> + /* skip anything that isn't printf for the moment */
> + warning(token->pos, "only printf format attribute supported");
You can also bail out or return early here to save the following "else"
> + } else {
Use the above suggestion to remove one level of indentation.
> + long long start, at;
> +
> + start = get_expression_value(args[2]);
args[2] can use the nice variable name if you follow the above suggestion.
> + at = get_expression_value(args[1]);
> +
> + if (invalid_format_args(start, at)) {
> + warning(token->pos, "bad format positions");
Same here, bail out early can make the function flatter.
> + } else if (start == 0) {
> + /* nothing to do here, is va_list function */
Same here.
> + } else if (start < at) {
> + warning(token->pos, "format cannot be after va_args");
Same.
> + } else {
This else can be removed if bail out early above.
> + ctx->ctype.format.index = at;
> + ctx->ctype.format.first = start;
> + }
> + }
> +
> + return token;
> +}
> +
> static struct symbol *to_QI_mode(struct symbol *ctype)
> {
> if (ctype->ctype.base_type != &int_type)
> @@ -3007,6 +3084,8 @@ struct token *external_declaration(struct token *token, struct symbol_list **lis
>
> if (!(decl->ctype.modifiers & MOD_STATIC))
> decl->ctype.modifiers |= MOD_EXTERN;
> +
> + base_type->ctype.format = decl->ctype.format;
> } else if (base_type == &void_ctype && !(decl->ctype.modifiers & MOD_EXTERN)) {
> sparse_error(token->pos, "void declaration");
> }
> diff --git a/symbol.h b/symbol.h
> index 88130c15..0ea46da8 100644
> --- a/symbol.h
> +++ b/symbol.h
> @@ -82,8 +82,9 @@ enum keyword {
> KW_ASM = 1 << 5,
> KW_MODE = 1 << 6,
> KW_STATIC = 1 << 7,
> - // KW UNUSED = 1 << 8,
> + // KW_UNUSED = 1 << 8.
> KW_EXACT = 1 << 9,
Nobody is using KW_UNUSED. Just delete it and let KW_EXACT use 1 << 8.
> + KW_FORMAT = 1 << 10,
Move up this to 1 << 9.
> };
>
> struct context {
> @@ -95,12 +96,18 @@ extern struct context *alloc_context(void);
>
> DECLARE_PTR_LIST(context_list, struct context);
>
> +struct attr_format {
> + unsigned short index; /* index in argument list for format string */
> + unsigned short first; /* where first variadic argument is */
> +};
> +
> struct ctype {
> struct symbol *base_type;
> unsigned long modifiers;
> unsigned long alignment;
> struct context_list *contexts;
> struct ident *as;
> + struct attr_format format;
The struct attr_format is very small, adding it here should be fine.
Keep in mind that the struct ctype is a very common data structure in
sparse. Most of the ctype symbols will not have the attr_format
declared. We might want to move the non common ctype into ctype
extensions struct and only store a pointer to the extension struct if
not NULL.
Using the extension attribute I need to check NULL first. It will save
memory space for almost all of the common ctype.
Because the format member is very small, two shorts, a pointer would
be the same size. We can just add it as it is.
Chris
next prev parent reply other threads:[~2025-12-01 19:23 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-20 15:39 Add printf/scanf -Wformat checking Ben Dooks
2025-10-20 15:39 ` [PATCH RESEND2 1/4] parse: initial parsing of __attribute__((format)) Ben Dooks
2025-12-01 19:23 ` Chris Li [this message]
2025-10-20 15:39 ` [PATCH RESEND2 2/4] add -Wformat Ben Dooks
2025-10-20 15:39 ` [PATCH RESEND2 3/4] evaluate: check variadic argument types against formatting info Ben Dooks
2025-12-01 19:52 ` Chris Li
2025-10-20 15:39 ` [PATCH RESEND2 4/4] tests: add varargs printf format tests Ben Dooks
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACePvbVvsAPURh+jfb2Vh8cPsOzuR2HmzuD9j5Gf6GJyD2orng@mail.gmail.com \
--to=sparse@chrisli.org \
--cc=ben.dooks@codethink.co.uk \
--cc=linux-sparse@vger.kernel.org \
--cc=luc.vanoostenryck@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).