Linux kbuild/kconfig development
 help / color / mirror / Atom feed
From: "Pat Somaru" <patso@likewhatevs.io>
To: "Justin Stitt" <justinstitt@google.com>
Cc: "Nathan Chancellor" <nathan@kernel.org>,
	<linux-kbuild@vger.kernel.org>, "Nicolas Schier" <nsc@kernel.org>,
	"Bill Wendling" <morbo@google.com>,
	"Nick Desaulniers" <nick.desaulniers+lkml@gmail.com>,
	"clang-built-linux" <llvm@lists.linux.dev>
Subject: Re: [PATCH] scripts/clang-tools: Handle included .c files in gen_compile_commands
Date: Tue, 07 Oct 2025 20:09:57 -0400	[thread overview]
Message-ID: <DDCI68JLD5RP.1WAN6YKP7WWNU@likewhatevs.io> (raw)
In-Reply-To: <CAFhGd8quQ3i9wEGOJNxy5s_MuD=WdxwqL15qB1i=KY8fW1Cf_Q@mail.gmail.com>

Hi folks,

Thanks for the feedback so far!

>> I realize this script is not ran often but maybe there is some heurstic
>> or speedup that can be made to the regex.

I'll send a revision with this in a bit, thank you for the idea and
noticing this! On my machine, v1 of this patch runs in ~900ms, v2 runs
in 500ms and no patch runs in 400ms. I also confirmed v2 outputs an
identical json to that v1 does.

>>> However, my initial gut reaction is that I do not like this additional
>>> complexity. 

I don't see other ways to get LSP working, but admittedly, this is my first 
foray into kernel stuff. I kinda see a way to enable making LSP work via 
ifdef's, but I doubt folks would be OK with that at all. I can try different 
ideas if folks have any.

>>> I do see a number of files that include .c files so it may be worth 
>>> supporting this still but it feels like this heuristic could be
>>> fragile, especially since aside from clangd, we have no real way to
>>> validate that these files actually build correctly in the way that
>>> compile_commands.json describes.

I think the following helps with this, I ran cppcheck (i.e., not clangd)
against my local checkout with both pre-patch and post-patch 
compile_commands.json to get some signal on this:

====
$ cppcheck --project=../old_compile_commands.json 2>&1 | tail
...
Checking virt/lib/irqbypass.c ...
2668/2668 files checked 100% done
$ cppcheck --project=../old_compile_commands.json 2>&1 | grep -E 'error:|warning:' | wc -l
5
$ cppcheck --project=../new_compile_commands.json 2>&1 | tail
...
Checking virt/lib/irqbypass.c ...
2821/2821 files checked 100% done
$ cppcheck --project=../new_compile_commands.json 2>&1 | grep -E 'error:|warning:' | wc -l
7
====

Additionally, I wrapped clangd in a script and collected some data on how
it performs with the various compile_commands.json to help illustrate the 
properties of this patch:

====
script output with newly included files and old compile_commands.json (i.e. what folks encounter working on these files now):
files: 2820
clangd errors: 58452
clangd fatal_too_many_errors: 195
clangd timeouts: 36

script output with new compile_commands.json:
files: 2820
clangd errors: 58763
clangd fatal_too_many_errors: 164
clangd timeouts: 39

script output with old compile_commands.json:
files: 2667
clangd errors: 55995
clangd fatal_too_many_errors: 118
clangd timeouts: 36
====

Thank you for the feedback so far and for maintaining this wonderful
script. LSP makes kernel stuff like 10x more accessible!


On Tue Oct 7, 2025 at 3:59 PM EDT, Justin Stitt wrote:
> I seem to have dropped some folks from the CC list somehow (probably
> wrong mutt keybind).
>
> I've self-quoted myself below to preserve context.
>
> On Tue, Oct 7, 2025 at 12:55 PM Justin Stitt <justinstitt@google.com> wrote:
>>
>> Hi,
>>
>> On Mon, Oct 06, 2025 at 10:45:27PM -0400, Pat Somaru wrote:
>> > The gen_compile_commands.py script currently only creates entries for the
>> > primary source files found in .cmd files, but some kernel source files
>> > text-include others (i.e. kernel/sched/build_policy.c).
>> >
>> > This prevents tools like clangd from working properly on text-included c
>> > files, such as kernel/sched/ext.c because the generated compile_commands.json
>> > does not have entries for them.
>> >
>> > Extend process_line() to detect when a source file includes .c files, and
>> > generate additional compile_commands.json entries for them. For included c
>> > files, use the same compile flags as their parent and add their parents headers.
>> >
>> > This enables lsp tools like clangd to work properly on files like
>> > kernel/sched/ext.c
>> >
>> > Signed-off-by: Pat Somaru <patso@likewhatevs.io>
>>
>> FWIW, I tested this out and my clangd was much happier dealing with
>> kernel/sched/ext.c. Nathan's points are still valid, I'm just giving
>> user feedback.
>>
>> After benchmarking the script itself, I saw some performance hits:
>>
>> pre-patch, 5x run average: 0.590 seconds
>> post-patch, 5x run average: 2.164 seconds
>>
>> With this simple invocation:
>>
>>   ./scripts/clang-tools/gen_compile_commands.py -d build-master -o $TMP/compile_commands.json
>>
>> I realize this script is not ran often but maybe there is some heurstic
>> or speedup that can be made to the regex.
>>
>> Tested-by: Justin Stitt <justinstitt@google.com>
>>
>> > ---
>> >  scripts/clang-tools/gen_compile_commands.py | 126 +++++++++++++++++++-
>> >  1 file changed, 121 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/scripts/clang-tools/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py
>> > index 96e6e46ad1a70..02791efdc06d0 100755
>> > --- a/scripts/clang-tools/gen_compile_commands.py
>> > +++ b/scripts/clang-tools/gen_compile_commands.py
>> > @@ -149,8 +149,87 @@ def cmdfiles_for_modorder(modorder):
>> >                      yield to_cmdfile(mod_line.rstrip())
>> >
>> >
>> > +def extract_includes_from_file(source_file, root_directory):
>> > +    """Extract #include statements from a C file.
>> > +
>> > +    Args:
>> > +        source_file: Path to the source .c file to analyze
>> > +        root_directory: Root directory for resolving relative paths
>> > +
>> > +    Returns:
>> > +        List of header files that should be included (without quotes/brackets)
>> > +    """
>> > +    includes = []
>> > +    if not os.path.exists(source_file):
>> > +        return includes
>> > +
>> > +    try:
>> > +        with open(source_file, 'r') as f:
>> > +            for line in f:
>> > +                line = line.strip()
>> > +                # Look for #include statements.
>> > +                # Match both #include "header.h" and #include <header.h>.
>> > +                match = re.match(r'^\s*#\s*include\s*[<"]([^>"]*)[>"]', line)
>> > +                if match:
>> > +                    header = match.group(1)
>> > +                    # Skip including other .c files to avoid circular includes.
>> > +                    if not header.endswith('.c'):
>> > +                        # For relative includes (quoted), resolve path relative to source file.
>> > +                        if '"' in line:
>> > +                            src_dir = os.path.dirname(source_file)
>> > +                            header_path = os.path.join(src_dir, header)
>> > +                            if os.path.exists(header_path):
>> > +                                rel_header = os.path.relpath(header_path, root_directory)
>> > +                                includes.append(rel_header)
>> > +                            else:
>> > +                                includes.append(header)
>> > +                        else:
>> > +                            # System include like <linux/sched.h>.
>> > +                            includes.append(header)
>> > +    except IOError:
>> > +        pass
>> > +
>> > +    return includes
>> > +
>> > +
>> > +def find_included_c_files(source_file, root_directory):
>> > +    """Find .c files that are included by the given source file.
>> > +
>> > +    Args:
>> > +        source_file: Path to the source .c file
>> > +        root_directory: Root directory for resolving relative paths
>> > +
>> > +    Yields:
>> > +        Full paths to included .c files
>> > +    """
>> > +    if not os.path.exists(source_file):
>> > +        return
>> > +
>> > +    try:
>> > +        with open(source_file, 'r') as f:
>> > +            for line in f:
>> > +                line = line.strip()
>> > +                # Look for #include "*.c" patterns.
>> > +                match = re.match(r'^\s*#\s*include\s*"([^"]*\.c)"\s*$', line)
>> > +                if match:
>> > +                    included_file = match.group(1)
>> > +                    # Handle relative paths.
>> > +                    if not os.path.isabs(included_file):
>> > +                        src_dir = os.path.dirname(source_file)
>> > +                        included_file = os.path.join(src_dir, included_file)
>> > +
>> > +                    # Normalize the path.
>> > +                    included_file = os.path.normpath(included_file)
>> > +
>> > +                    # Check if the file exists.
>> > +                    if os.path.exists(included_file):
>> > +                        yield included_file
>> > +    except IOError:
>> > +        pass
>> > +
>> > +
>> >  def process_line(root_directory, command_prefix, file_path):
>> > -    """Extracts information from a .cmd line and creates an entry from it.
>> > +    """Extracts information from a .cmd line and creates entries from it.
>> >
>> >      Args:
>> >          root_directory: The directory that was searched for .cmd files. Usually
>> > @@ -160,7 +239,8 @@ def process_line(root_directory, command_prefix, file_path):
>> >              Usually relative to root_directory, but sometimes absolute.
>> >
>> >      Returns:
>> > -        An entry to append to compile_commands.
>> > +        A list of entries to append to compile_commands (may include multiple
>> > +        entries if the source file includes other .c files).
>> >
>> >      Raises:
>> >          ValueError: Could not find the extracted file based on file_path and
>> > @@ -176,11 +256,47 @@ def process_line(root_directory, command_prefix, file_path):
>> >      abs_path = os.path.realpath(os.path.join(root_directory, file_path))
>> >      if not os.path.exists(abs_path):
>> >          raise ValueError('File %s not found' % abs_path)
>> > -    return {
>> > +
>> > +    entries = []
>> > +
>> > +    # Create entry for the main source file.
>> > +    main_entry = {
>> >          'directory': root_directory,
>> >          'file': abs_path,
>> >          'command': prefix + file_path,
>> >      }
>> > +    entries.append(main_entry)
>> > +
>> > +    # Find and create entries for included .c files.
>> > +    for included_c_file in find_included_c_files(abs_path, root_directory):
>> > +        # For included .c files, create a compilation command that:
>> > +        # 1. Uses the same compilation flags as the parent file
>> > +        # 2. But compiles the included file directly (not the parent)
>> > +        # 3. Includes necessary headers from the parent file for proper macro resolution
>> > +
>> > +        # Convert absolute path to relative for the command.
>> > +        rel_path = os.path.relpath(included_c_file, root_directory)
>> > +
>> > +        # Extract includes from the parent file to provide proper compilation context.
>> > +        extra_includes = ''
>> > +        try:
>> > +            parent_includes = extract_includes_from_file(abs_path, root_directory)
>> > +            if parent_includes:
>> > +                extra_includes = ' ' + ' '.join('-include ' + inc for inc in parent_includes)
>> > +        except IOError:
>> > +            pass
>> > +
>> > +        included_entry = {
>> > +            'directory': root_directory,
>> > +            'file': included_c_file,
>> > +            # Use the same compilation prefix but target the included file directly.
>> > +            # Add extra headers for proper macro resolution.
>> > +            'command': prefix + extra_includes + ' ' + rel_path,
>> > +        }
>> > +        entries.append(included_entry)
>> > +        logging.debug('Added entry for included file: %s', included_c_file)
>> > +
>> > +    return entries
>> >
>> >
>> >  def main():
>> > @@ -213,9 +329,9 @@ def main():
>> >                  result = line_matcher.match(f.readline())
>> >                  if result:
>> >                      try:
>> > -                        entry = process_line(directory, result.group('command_prefix'),
>> > +                        entries = process_line(directory, result.group('command_prefix'),
>> >                                               result.group('file_path'))
>> > -                        compile_commands.append(entry)
>> > +                        compile_commands.extend(entries)
>> >                      except ValueError as err:
>> >                          logging.info('Could not add line from %s: %s',
>> >                                       cmdfile, err)
>> > --
>> > 2.51.0
>> >
>> >
>>
>> Thanks
>> Justin


  reply	other threads:[~2025-10-08  0:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-07  2:45 [PATCH] scripts/clang-tools: Handle included .c files in gen_compile_commands Pat Somaru
2025-10-07 16:33 ` Nathan Chancellor
2025-10-07 19:55 ` Justin Stitt
2025-10-07 19:59   ` Justin Stitt
2025-10-08  0:09     ` Pat Somaru [this message]
2025-10-08  0:37       ` Pat Somaru
2025-10-08  0:45         ` [PATCH v2] " Pat Somaru
2025-10-17 17:21           ` Eduard Zingerman
2025-10-31 22:56           ` Nathan Chancellor
2025-11-05 21:30           ` Nicolas Schier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DDCI68JLD5RP.1WAN6YKP7WWNU@likewhatevs.io \
    --to=patso@likewhatevs.io \
    --cc=justinstitt@google.com \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=nick.desaulniers+lkml@gmail.com \
    --cc=nsc@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox