linux-sparse.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Li <sparse@chrisli.org>
To: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: linux-sparse@vger.kernel.org,
	 Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Subject: Re: [PATCH RESEND2 1/4] parse: initial parsing of __attribute__((format))
Date: Mon, 1 Dec 2025 23:23:04 +0400	[thread overview]
Message-ID: <CACePvbVvsAPURh+jfb2Vh8cPsOzuR2HmzuD9j5Gf6GJyD2orng@mail.gmail.com> (raw)
In-Reply-To: <20251020153918.812235-2-ben.dooks@codethink.co.uk>

Hi Ben,

Thanks for the patch and sorry for the late reply.

Your format attribute series work applies to the sparse-dev tree fine
and "make check" runs fine as well. Thank you so much.

I have some trivial coding style of feedback for you, see the comments
below. Mostly just nitpicks, does not impact the coding behavior. Let
me know if you want to update a new series or I can be lazy and just
apply your current series.

Chris

On Mon, Oct 20, 2025 at 7:39 PM Ben Dooks <ben.dooks@codethink.co.uk> wrote:
>
> Add code to parse the __attribute__((format)) used to indicate that
> a variadic function takes a printf-style format string and where
> those are. Save the data in ctype ready for checking when such an
> function is encoutered.
>
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> ---
>  parse.c  | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  symbol.h |  9 ++++++-
>  2 files changed, 88 insertions(+), 2 deletions(-)
>
> diff --git a/parse.c b/parse.c
> index 3f67451e..af4e5b50 100644
> --- a/parse.c
> +++ b/parse.c
> @@ -86,7 +86,7 @@ static attr_t
>         attribute_cleanup,
>         attribute_designated_init,
>         attribute_transparent_union, ignore_attribute,
> -       attribute_mode, attribute_force;
> +       attribute_mode, attribute_force, attribute_format;
>
>  typedef struct symbol *to_mode_t(struct symbol *);
>
> @@ -121,6 +121,12 @@ static void asm_modifier(struct token *token, unsigned long *mods, unsigned long
>         *mods |= mod;
>  }
>
> +/* the types of formatting from __attribute__((format)) */
> +enum {
> +       FMT_PRINTF = 0,
> +       FMT_SCANF,
> +};
> +
>  static struct symbol_op typedef_op = {
>         .type = KW_MODIFIER,
>         .declarator = storage_specifier,
> @@ -382,6 +388,10 @@ static struct symbol_op attr_force_op = {
>         .attribute = attribute_force,
>  };
>
> +static struct symbol_op attr_format_op = {
> +       .attribute = attribute_format,
> +};
> +
>  static struct symbol_op address_space_op = {
>         .attribute = attribute_address_space,
>  };
> @@ -441,6 +451,16 @@ static struct symbol_op mode_word_op = {
>         .to_mode = to_word_mode
>  };
>
> +static struct symbol_op attr_printf_op = {
> +       .type   = KW_FORMAT,
> +       .class  = FMT_PRINTF,
> +};
> +
> +static struct symbol_op attr_scanf_op = {
> +       .type   = KW_FORMAT,
> +       .class  = FMT_SCANF,
> +};
> +
>  /*
>   * Define the keyword and their effects.
>   * The entries in the 'typedef' and put in NS_TYPEDEF and
> @@ -557,6 +577,9 @@ static struct init_keyword {
>         D("pure",               &attr_fun_op,           .mods = MOD_PURE),
>         A("const",              &attr_fun_op,           .mods = MOD_PURE),
>         D("gnu_inline",         &attr_fun_op,           .mods = MOD_GNU_INLINE),
> +       D("format",             &attr_format_op),
> +       D("printf",             &attr_printf_op),
> +       D("scanf",              &attr_scanf_op),
>
>         /* Modes */
>         D("mode",               &mode_op),
> @@ -1217,6 +1240,60 @@ static struct token *attribute_address_space(struct token *token, struct symbol
>         return token;
>  }
>
> +static int invalid_format_args(long long start, long long at)
> +{
> +       return start < 0 || at < 0 || start > USHRT_MAX || at > USHRT_MAX ||
> +               (start == at && start > 0) ||
> +               (start == 0 && at == 0);
> +}
> +
> +static struct token *attribute_format(struct token *token, struct symbol *attr, struct decl_state *ctx)
> +{
> +       struct expression *args[3];

I notice that you never use the args as an array, e.g. pass "args" to
any function. You always use args[n] as scalar.
In that case, it is better to make each args as individual variables
with proper names. When I read your patch, I need to lookup gcc
document for the format attribute to understand each args. I found
this:

format (archetype, string-index, first-to-check)

I assume that matches your three args. Then just name the expression
"archetype", "stridx", "first2check" something like that. You get the
idea. I just make up the variable name on the spot, you can probably
find a better variable name than I do.

> +       struct symbol *fmt_sym = NULL;
> +
> +       /* expecting format ( type, start, va_args at) */
> +
> +       token = expect(token, '(', "after format attribute");
> +       if (token_type(token) == TOKEN_IDENT)
> +               fmt_sym = lookup_keyword(token->ident, NS_KEYWORD);
> +       if (fmt_sym)
> +               if (!fmt_sym->op || fmt_sym->op->type != KW_FORMAT)

This two if statement can be combined into one if statement with
compound test expression:
if (fmt_sym && (!fmt_sym->op || fmt_sym->op->type != KW_FORMAT))

> +                       fmt_sym = NULL;
> +
> +       token = conditional_expression(token, &args[0]);

See above, give args[0] a proper name that would be more readable.

> +       token = expect(token, ',', "format attribute type");
> +       token = conditional_expression(token, &args[1]);

Same.

> +       token = expect(token, ',', "format attribute type position");
> +       token = conditional_expression(token, &args[2]);

Same.

> +       token = expect(token, ')', "format attribute arg position");
> +
> +       if (!fmt_sym || !args[0] || !args[1] || !args[2]) {
> +               warning(token->pos, "incorrect format attribute");

In such cases, the kernel source code often bails out early to make
the rest of code flater.
e.g.:
                "goto done" here or "return toke";

> +       } else if (fmt_sym->op->class != FMT_PRINTF) {

If you follow the above suggestion. "} else if () {" becomes "if ()".
That looks cleaner

> +               /* skip anything that isn't printf for the moment */
> +               warning(token->pos, "only printf format attribute supported");

You can also bail out or return early here to save the following "else"

> +       } else {

Use the above suggestion to remove one level of indentation.

> +               long long start, at;
> +
> +               start = get_expression_value(args[2]);

args[2] can use the nice variable name if you follow the above suggestion.

> +               at = get_expression_value(args[1]);
> +
> +               if (invalid_format_args(start, at)) {
> +                       warning(token->pos, "bad format positions");

Same here, bail out early can make the function flatter.

> +               } else if (start == 0) {
> +                       /* nothing to do here, is va_list function */
Same here.

> +               } else if (start < at) {
> +                       warning(token->pos, "format cannot be after va_args");
Same.

> +               } else {

This else can be removed if bail out early above.

> +                       ctx->ctype.format.index = at;
> +                       ctx->ctype.format.first = start;
> +               }
> +       }
> +
> +       return token;
> +}
> +
>  static struct symbol *to_QI_mode(struct symbol *ctype)
>  {
>         if (ctype->ctype.base_type != &int_type)
> @@ -3007,6 +3084,8 @@ struct token *external_declaration(struct token *token, struct symbol_list **lis
>
>                 if (!(decl->ctype.modifiers & MOD_STATIC))
>                         decl->ctype.modifiers |= MOD_EXTERN;
> +
> +               base_type->ctype.format = decl->ctype.format;
>         } else if (base_type == &void_ctype && !(decl->ctype.modifiers & MOD_EXTERN)) {
>                 sparse_error(token->pos, "void declaration");
>         }
> diff --git a/symbol.h b/symbol.h
> index 88130c15..0ea46da8 100644
> --- a/symbol.h
> +++ b/symbol.h
> @@ -82,8 +82,9 @@ enum keyword {
>         KW_ASM          = 1 << 5,
>         KW_MODE         = 1 << 6,
>         KW_STATIC       = 1 << 7,
> -     // KW UNUSED      = 1 << 8,
> +       // KW_UNUSED    = 1 << 8.
>         KW_EXACT        = 1 << 9,

Nobody is using KW_UNUSED. Just delete it and let KW_EXACT use 1 << 8.

> +       KW_FORMAT       = 1 << 10,
Move up this to 1 << 9.


>  };
>
>  struct context {
> @@ -95,12 +96,18 @@ extern struct context *alloc_context(void);
>
>  DECLARE_PTR_LIST(context_list, struct context);
>
> +struct attr_format {
> +       unsigned short index;   /* index in argument list for format string */
> +       unsigned short first;   /* where first variadic argument is */
> +};
> +
>  struct ctype {
>         struct symbol *base_type;
>         unsigned long modifiers;
>         unsigned long alignment;
>         struct context_list *contexts;
>         struct ident *as;
> +       struct attr_format format;

The struct attr_format is very small, adding it here should be fine.
Keep in mind that the struct ctype is a very common data structure in
sparse. Most of the ctype symbols will not have the attr_format
declared. We might want to move the non common ctype into ctype
extensions struct and only store a pointer to the extension struct if
not NULL.

Using the extension attribute I need to check NULL first. It will save
memory space for almost all of the common  ctype.
Because the format member is very small, two shorts, a pointer would
be the same size. We can just add it as it is.


Chris

  reply	other threads:[~2025-12-01 19:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-20 15:39 Add printf/scanf -Wformat checking Ben Dooks
2025-10-20 15:39 ` [PATCH RESEND2 1/4] parse: initial parsing of __attribute__((format)) Ben Dooks
2025-12-01 19:23   ` Chris Li [this message]
2025-10-20 15:39 ` [PATCH RESEND2 2/4] add -Wformat Ben Dooks
2025-10-20 15:39 ` [PATCH RESEND2 3/4] evaluate: check variadic argument types against formatting info Ben Dooks
2025-12-01 19:52   ` Chris Li
2025-10-20 15:39 ` [PATCH RESEND2 4/4] tests: add varargs printf format tests Ben Dooks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACePvbVvsAPURh+jfb2Vh8cPsOzuR2HmzuD9j5Gf6GJyD2orng@mail.gmail.com \
    --to=sparse@chrisli.org \
    --cc=ben.dooks@codethink.co.uk \
    --cc=linux-sparse@vger.kernel.org \
    --cc=luc.vanoostenryck@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).