public inbox for llvm@lists.linux.dev
 help / color / mirror / Atom feed
* Re: [PATCH] scripts/clang-tools: Handle included .c files in gen_compile_commands
       [not found] <20251007024527.1439747-1-patso@likewhatevs.io>
@ 2025-10-07 16:33 ` Nathan Chancellor
       [not found] ` <be6kxu36uphlnhedsi7gsayhuetmqixil4dv2lxe4plvgyqewx@jecdpummb52l>
  1 sibling, 0 replies; 8+ messages in thread
From: Nathan Chancellor @ 2025-10-07 16:33 UTC (permalink / raw)
  To: Pat Somaru
  Cc: linux-kbuild, Nicolas Schier, Justin Stitt, Bill Wendling,
	Nick Desaulniers, llvm

Hi Pat,

+ Nicolas and other LLVM folks

On Mon, Oct 06, 2025 at 10:45:27PM -0400, Pat Somaru wrote:
> The gen_compile_commands.py script currently only creates entries for the
> primary source files found in .cmd files, but some kernel source files
> text-include others (i.e. kernel/sched/build_policy.c).
> 
> This prevents tools like clangd from working properly on text-included c
> files, such as kernel/sched/ext.c because the generated compile_commands.json
> does not have entries for them.
> 
> Extend process_line() to detect when a source file includes .c files, and
> generate additional compile_commands.json entries for them. For included c
> files, use the same compile flags as their parent and add their parents headers.
> 
> This enables lsp tools like clangd to work properly on files like
> kernel/sched/ext.c
> 
> Signed-off-by: Pat Somaru <patso@likewhatevs.io>

I will try to review this more in-depth after the merge window is over,
as Nicolas will be handling applying patches for the 6.19 cycle.
However, my initial gut reaction is that I do not like this additional
complexity. I do see a number of files that include .c files so it may
be worth supporting this still but it feels like this heuristic could be
fragile, especially since aside from clangd, we have no real way to
validate that these files actually build correctly in the way that
compile_commands.json describes.

> ---
>  scripts/clang-tools/gen_compile_commands.py | 126 +++++++++++++++++++-
>  1 file changed, 121 insertions(+), 5 deletions(-)
> 
> diff --git a/scripts/clang-tools/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py
> index 96e6e46ad1a70..02791efdc06d0 100755
> --- a/scripts/clang-tools/gen_compile_commands.py
> +++ b/scripts/clang-tools/gen_compile_commands.py
> @@ -149,8 +149,87 @@ def cmdfiles_for_modorder(modorder):
>                      yield to_cmdfile(mod_line.rstrip())
>  
>  
> +def extract_includes_from_file(source_file, root_directory):
> +    """Extract #include statements from a C file.
> +
> +    Args:
> +        source_file: Path to the source .c file to analyze
> +        root_directory: Root directory for resolving relative paths
> +
> +    Returns:
> +        List of header files that should be included (without quotes/brackets)
> +    """
> +    includes = []
> +    if not os.path.exists(source_file):
> +        return includes
> +
> +    try:
> +        with open(source_file, 'r') as f:
> +            for line in f:
> +                line = line.strip()
> +                # Look for #include statements.
> +                # Match both #include "header.h" and #include <header.h>.
> +                match = re.match(r'^\s*#\s*include\s*[<"]([^>"]*)[>"]', line)
> +                if match:
> +                    header = match.group(1)
> +                    # Skip including other .c files to avoid circular includes.
> +                    if not header.endswith('.c'):
> +                        # For relative includes (quoted), resolve path relative to source file.
> +                        if '"' in line:
> +                            src_dir = os.path.dirname(source_file)
> +                            header_path = os.path.join(src_dir, header)
> +                            if os.path.exists(header_path):
> +                                rel_header = os.path.relpath(header_path, root_directory)
> +                                includes.append(rel_header)
> +                            else:
> +                                includes.append(header)
> +                        else:
> +                            # System include like <linux/sched.h>.
> +                            includes.append(header)
> +    except IOError:
> +        pass
> +
> +    return includes
> +
> +
> +def find_included_c_files(source_file, root_directory):
> +    """Find .c files that are included by the given source file.
> +
> +    Args:
> +        source_file: Path to the source .c file
> +        root_directory: Root directory for resolving relative paths
> +
> +    Yields:
> +        Full paths to included .c files
> +    """
> +    if not os.path.exists(source_file):
> +        return
> +
> +    try:
> +        with open(source_file, 'r') as f:
> +            for line in f:
> +                line = line.strip()
> +                # Look for #include "*.c" patterns.
> +                match = re.match(r'^\s*#\s*include\s*"([^"]*\.c)"\s*$', line)
> +                if match:
> +                    included_file = match.group(1)
> +                    # Handle relative paths.
> +                    if not os.path.isabs(included_file):
> +                        src_dir = os.path.dirname(source_file)
> +                        included_file = os.path.join(src_dir, included_file)
> +
> +                    # Normalize the path.
> +                    included_file = os.path.normpath(included_file)
> +
> +                    # Check if the file exists.
> +                    if os.path.exists(included_file):
> +                        yield included_file
> +    except IOError:
> +        pass
> +
> +
>  def process_line(root_directory, command_prefix, file_path):
> -    """Extracts information from a .cmd line and creates an entry from it.
> +    """Extracts information from a .cmd line and creates entries from it.
>  
>      Args:
>          root_directory: The directory that was searched for .cmd files. Usually
> @@ -160,7 +239,8 @@ def process_line(root_directory, command_prefix, file_path):
>              Usually relative to root_directory, but sometimes absolute.
>  
>      Returns:
> -        An entry to append to compile_commands.
> +        A list of entries to append to compile_commands (may include multiple
> +        entries if the source file includes other .c files).
>  
>      Raises:
>          ValueError: Could not find the extracted file based on file_path and
> @@ -176,11 +256,47 @@ def process_line(root_directory, command_prefix, file_path):
>      abs_path = os.path.realpath(os.path.join(root_directory, file_path))
>      if not os.path.exists(abs_path):
>          raise ValueError('File %s not found' % abs_path)
> -    return {
> +
> +    entries = []
> +
> +    # Create entry for the main source file.
> +    main_entry = {
>          'directory': root_directory,
>          'file': abs_path,
>          'command': prefix + file_path,
>      }
> +    entries.append(main_entry)
> +
> +    # Find and create entries for included .c files.
> +    for included_c_file in find_included_c_files(abs_path, root_directory):
> +        # For included .c files, create a compilation command that:
> +        # 1. Uses the same compilation flags as the parent file
> +        # 2. But compiles the included file directly (not the parent)
> +        # 3. Includes necessary headers from the parent file for proper macro resolution
> +
> +        # Convert absolute path to relative for the command.
> +        rel_path = os.path.relpath(included_c_file, root_directory)
> +
> +        # Extract includes from the parent file to provide proper compilation context.
> +        extra_includes = ''
> +        try:
> +            parent_includes = extract_includes_from_file(abs_path, root_directory)
> +            if parent_includes:
> +                extra_includes = ' ' + ' '.join('-include ' + inc for inc in parent_includes)
> +        except IOError:
> +            pass
> +
> +        included_entry = {
> +            'directory': root_directory,
> +            'file': included_c_file,
> +            # Use the same compilation prefix but target the included file directly.
> +            # Add extra headers for proper macro resolution.
> +            'command': prefix + extra_includes + ' ' + rel_path,
> +        }
> +        entries.append(included_entry)
> +        logging.debug('Added entry for included file: %s', included_c_file)
> +
> +    return entries
>  
>  
>  def main():
> @@ -213,9 +329,9 @@ def main():
>                  result = line_matcher.match(f.readline())
>                  if result:
>                      try:
> -                        entry = process_line(directory, result.group('command_prefix'),
> +                        entries = process_line(directory, result.group('command_prefix'),
>                                               result.group('file_path'))
> -                        compile_commands.append(entry)
> +                        compile_commands.extend(entries)
>                      except ValueError as err:
>                          logging.info('Could not add line from %s: %s',
>                                       cmdfile, err)
> -- 
> 2.51.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] scripts/clang-tools: Handle included .c files in gen_compile_commands
       [not found] ` <be6kxu36uphlnhedsi7gsayhuetmqixil4dv2lxe4plvgyqewx@jecdpummb52l>
@ 2025-10-07 19:59   ` Justin Stitt
  2025-10-08  0:09     ` Pat Somaru
  0 siblings, 1 reply; 8+ messages in thread
From: Justin Stitt @ 2025-10-07 19:59 UTC (permalink / raw)
  To: Pat Somaru
  Cc: Nathan Chancellor, linux-kbuild, Nicolas Schier, Bill Wendling,
	Nick Desaulniers, clang-built-linux

I seem to have dropped some folks from the CC list somehow (probably
wrong mutt keybind).

I've self-quoted myself below to preserve context.

On Tue, Oct 7, 2025 at 12:55 PM Justin Stitt <justinstitt@google.com> wrote:
>
> Hi,
>
> On Mon, Oct 06, 2025 at 10:45:27PM -0400, Pat Somaru wrote:
> > The gen_compile_commands.py script currently only creates entries for the
> > primary source files found in .cmd files, but some kernel source files
> > text-include others (i.e. kernel/sched/build_policy.c).
> >
> > This prevents tools like clangd from working properly on text-included c
> > files, such as kernel/sched/ext.c because the generated compile_commands.json
> > does not have entries for them.
> >
> > Extend process_line() to detect when a source file includes .c files, and
> > generate additional compile_commands.json entries for them. For included c
> > files, use the same compile flags as their parent and add their parents headers.
> >
> > This enables lsp tools like clangd to work properly on files like
> > kernel/sched/ext.c
> >
> > Signed-off-by: Pat Somaru <patso@likewhatevs.io>
>
> FWIW, I tested this out and my clangd was much happier dealing with
> kernel/sched/ext.c. Nathan's points are still valid, I'm just giving
> user feedback.
>
> After benchmarking the script itself, I saw some performance hits:
>
> pre-patch, 5x run average: 0.590 seconds
> post-patch, 5x run average: 2.164 seconds
>
> With this simple invocation:
>
>   ./scripts/clang-tools/gen_compile_commands.py -d build-master -o $TMP/compile_commands.json
>
> I realize this script is not ran often but maybe there is some heurstic
> or speedup that can be made to the regex.
>
> Tested-by: Justin Stitt <justinstitt@google.com>
>
> > ---
> >  scripts/clang-tools/gen_compile_commands.py | 126 +++++++++++++++++++-
> >  1 file changed, 121 insertions(+), 5 deletions(-)
> >
> > diff --git a/scripts/clang-tools/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py
> > index 96e6e46ad1a70..02791efdc06d0 100755
> > --- a/scripts/clang-tools/gen_compile_commands.py
> > +++ b/scripts/clang-tools/gen_compile_commands.py
> > @@ -149,8 +149,87 @@ def cmdfiles_for_modorder(modorder):
> >                      yield to_cmdfile(mod_line.rstrip())
> >
> >
> > +def extract_includes_from_file(source_file, root_directory):
> > +    """Extract #include statements from a C file.
> > +
> > +    Args:
> > +        source_file: Path to the source .c file to analyze
> > +        root_directory: Root directory for resolving relative paths
> > +
> > +    Returns:
> > +        List of header files that should be included (without quotes/brackets)
> > +    """
> > +    includes = []
> > +    if not os.path.exists(source_file):
> > +        return includes
> > +
> > +    try:
> > +        with open(source_file, 'r') as f:
> > +            for line in f:
> > +                line = line.strip()
> > +                # Look for #include statements.
> > +                # Match both #include "header.h" and #include <header.h>.
> > +                match = re.match(r'^\s*#\s*include\s*[<"]([^>"]*)[>"]', line)
> > +                if match:
> > +                    header = match.group(1)
> > +                    # Skip including other .c files to avoid circular includes.
> > +                    if not header.endswith('.c'):
> > +                        # For relative includes (quoted), resolve path relative to source file.
> > +                        if '"' in line:
> > +                            src_dir = os.path.dirname(source_file)
> > +                            header_path = os.path.join(src_dir, header)
> > +                            if os.path.exists(header_path):
> > +                                rel_header = os.path.relpath(header_path, root_directory)
> > +                                includes.append(rel_header)
> > +                            else:
> > +                                includes.append(header)
> > +                        else:
> > +                            # System include like <linux/sched.h>.
> > +                            includes.append(header)
> > +    except IOError:
> > +        pass
> > +
> > +    return includes
> > +
> > +
> > +def find_included_c_files(source_file, root_directory):
> > +    """Find .c files that are included by the given source file.
> > +
> > +    Args:
> > +        source_file: Path to the source .c file
> > +        root_directory: Root directory for resolving relative paths
> > +
> > +    Yields:
> > +        Full paths to included .c files
> > +    """
> > +    if not os.path.exists(source_file):
> > +        return
> > +
> > +    try:
> > +        with open(source_file, 'r') as f:
> > +            for line in f:
> > +                line = line.strip()
> > +                # Look for #include "*.c" patterns.
> > +                match = re.match(r'^\s*#\s*include\s*"([^"]*\.c)"\s*$', line)
> > +                if match:
> > +                    included_file = match.group(1)
> > +                    # Handle relative paths.
> > +                    if not os.path.isabs(included_file):
> > +                        src_dir = os.path.dirname(source_file)
> > +                        included_file = os.path.join(src_dir, included_file)
> > +
> > +                    # Normalize the path.
> > +                    included_file = os.path.normpath(included_file)
> > +
> > +                    # Check if the file exists.
> > +                    if os.path.exists(included_file):
> > +                        yield included_file
> > +    except IOError:
> > +        pass
> > +
> > +
> >  def process_line(root_directory, command_prefix, file_path):
> > -    """Extracts information from a .cmd line and creates an entry from it.
> > +    """Extracts information from a .cmd line and creates entries from it.
> >
> >      Args:
> >          root_directory: The directory that was searched for .cmd files. Usually
> > @@ -160,7 +239,8 @@ def process_line(root_directory, command_prefix, file_path):
> >              Usually relative to root_directory, but sometimes absolute.
> >
> >      Returns:
> > -        An entry to append to compile_commands.
> > +        A list of entries to append to compile_commands (may include multiple
> > +        entries if the source file includes other .c files).
> >
> >      Raises:
> >          ValueError: Could not find the extracted file based on file_path and
> > @@ -176,11 +256,47 @@ def process_line(root_directory, command_prefix, file_path):
> >      abs_path = os.path.realpath(os.path.join(root_directory, file_path))
> >      if not os.path.exists(abs_path):
> >          raise ValueError('File %s not found' % abs_path)
> > -    return {
> > +
> > +    entries = []
> > +
> > +    # Create entry for the main source file.
> > +    main_entry = {
> >          'directory': root_directory,
> >          'file': abs_path,
> >          'command': prefix + file_path,
> >      }
> > +    entries.append(main_entry)
> > +
> > +    # Find and create entries for included .c files.
> > +    for included_c_file in find_included_c_files(abs_path, root_directory):
> > +        # For included .c files, create a compilation command that:
> > +        # 1. Uses the same compilation flags as the parent file
> > +        # 2. But compiles the included file directly (not the parent)
> > +        # 3. Includes necessary headers from the parent file for proper macro resolution
> > +
> > +        # Convert absolute path to relative for the command.
> > +        rel_path = os.path.relpath(included_c_file, root_directory)
> > +
> > +        # Extract includes from the parent file to provide proper compilation context.
> > +        extra_includes = ''
> > +        try:
> > +            parent_includes = extract_includes_from_file(abs_path, root_directory)
> > +            if parent_includes:
> > +                extra_includes = ' ' + ' '.join('-include ' + inc for inc in parent_includes)
> > +        except IOError:
> > +            pass
> > +
> > +        included_entry = {
> > +            'directory': root_directory,
> > +            'file': included_c_file,
> > +            # Use the same compilation prefix but target the included file directly.
> > +            # Add extra headers for proper macro resolution.
> > +            'command': prefix + extra_includes + ' ' + rel_path,
> > +        }
> > +        entries.append(included_entry)
> > +        logging.debug('Added entry for included file: %s', included_c_file)
> > +
> > +    return entries
> >
> >
> >  def main():
> > @@ -213,9 +329,9 @@ def main():
> >                  result = line_matcher.match(f.readline())
> >                  if result:
> >                      try:
> > -                        entry = process_line(directory, result.group('command_prefix'),
> > +                        entries = process_line(directory, result.group('command_prefix'),
> >                                               result.group('file_path'))
> > -                        compile_commands.append(entry)
> > +                        compile_commands.extend(entries)
> >                      except ValueError as err:
> >                          logging.info('Could not add line from %s: %s',
> >                                       cmdfile, err)
> > --
> > 2.51.0
> >
> >
>
> Thanks
> Justin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] scripts/clang-tools: Handle included .c files in gen_compile_commands
  2025-10-07 19:59   ` Justin Stitt
@ 2025-10-08  0:09     ` Pat Somaru
  2025-10-08  0:37       ` Pat Somaru
  0 siblings, 1 reply; 8+ messages in thread
From: Pat Somaru @ 2025-10-08  0:09 UTC (permalink / raw)
  To: Justin Stitt
  Cc: Nathan Chancellor, linux-kbuild, Nicolas Schier, Bill Wendling,
	Nick Desaulniers, clang-built-linux

Hi folks,

Thanks for the feedback so far!

>> I realize this script is not ran often but maybe there is some heurstic
>> or speedup that can be made to the regex.

I'll send a revision with this in a bit, thank you for the idea and
noticing this! On my machine, v1 of this patch runs in ~900ms, v2 runs
in 500ms and no patch runs in 400ms. I also confirmed v2 outputs an
identical json to that v1 does.

>>> However, my initial gut reaction is that I do not like this additional
>>> complexity. 

I don't see other ways to get LSP working, but admittedly, this is my first 
foray into kernel stuff. I kinda see a way to enable making LSP work via 
ifdef's, but I doubt folks would be OK with that at all. I can try different 
ideas if folks have any.

>>> I do see a number of files that include .c files so it may be worth 
>>> supporting this still but it feels like this heuristic could be
>>> fragile, especially since aside from clangd, we have no real way to
>>> validate that these files actually build correctly in the way that
>>> compile_commands.json describes.

I think the following helps with this, I ran cppcheck (i.e., not clangd)
against my local checkout with both pre-patch and post-patch 
compile_commands.json to get some signal on this:

====
$ cppcheck --project=../old_compile_commands.json 2>&1 | tail
...
Checking virt/lib/irqbypass.c ...
2668/2668 files checked 100% done
$ cppcheck --project=../old_compile_commands.json 2>&1 | grep -E 'error:|warning:' | wc -l
5
$ cppcheck --project=../new_compile_commands.json 2>&1 | tail
...
Checking virt/lib/irqbypass.c ...
2821/2821 files checked 100% done
$ cppcheck --project=../new_compile_commands.json 2>&1 | grep -E 'error:|warning:' | wc -l
7
====

Additionally, I wrapped clangd in a script and collected some data on how
it performs with the various compile_commands.json to help illustrate the 
properties of this patch:

====
script output with newly included files and old compile_commands.json (i.e. what folks encounter working on these files now):
files: 2820
clangd errors: 58452
clangd fatal_too_many_errors: 195
clangd timeouts: 36

script output with new compile_commands.json:
files: 2820
clangd errors: 58763
clangd fatal_too_many_errors: 164
clangd timeouts: 39

script output with old compile_commands.json:
files: 2667
clangd errors: 55995
clangd fatal_too_many_errors: 118
clangd timeouts: 36
====

Thank you for the feedback so far and for maintaining this wonderful
script. LSP makes kernel stuff like 10x more accessible!


On Tue Oct 7, 2025 at 3:59 PM EDT, Justin Stitt wrote:
> I seem to have dropped some folks from the CC list somehow (probably
> wrong mutt keybind).
>
> I've self-quoted myself below to preserve context.
>
> On Tue, Oct 7, 2025 at 12:55 PM Justin Stitt <justinstitt@google.com> wrote:
>>
>> Hi,
>>
>> On Mon, Oct 06, 2025 at 10:45:27PM -0400, Pat Somaru wrote:
>> > The gen_compile_commands.py script currently only creates entries for the
>> > primary source files found in .cmd files, but some kernel source files
>> > text-include others (i.e. kernel/sched/build_policy.c).
>> >
>> > This prevents tools like clangd from working properly on text-included c
>> > files, such as kernel/sched/ext.c because the generated compile_commands.json
>> > does not have entries for them.
>> >
>> > Extend process_line() to detect when a source file includes .c files, and
>> > generate additional compile_commands.json entries for them. For included c
>> > files, use the same compile flags as their parent and add their parents headers.
>> >
>> > This enables lsp tools like clangd to work properly on files like
>> > kernel/sched/ext.c
>> >
>> > Signed-off-by: Pat Somaru <patso@likewhatevs.io>
>>
>> FWIW, I tested this out and my clangd was much happier dealing with
>> kernel/sched/ext.c. Nathan's points are still valid, I'm just giving
>> user feedback.
>>
>> After benchmarking the script itself, I saw some performance hits:
>>
>> pre-patch, 5x run average: 0.590 seconds
>> post-patch, 5x run average: 2.164 seconds
>>
>> With this simple invocation:
>>
>>   ./scripts/clang-tools/gen_compile_commands.py -d build-master -o $TMP/compile_commands.json
>>
>> I realize this script is not ran often but maybe there is some heurstic
>> or speedup that can be made to the regex.
>>
>> Tested-by: Justin Stitt <justinstitt@google.com>
>>
>> > ---
>> >  scripts/clang-tools/gen_compile_commands.py | 126 +++++++++++++++++++-
>> >  1 file changed, 121 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/scripts/clang-tools/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py
>> > index 96e6e46ad1a70..02791efdc06d0 100755
>> > --- a/scripts/clang-tools/gen_compile_commands.py
>> > +++ b/scripts/clang-tools/gen_compile_commands.py
>> > @@ -149,8 +149,87 @@ def cmdfiles_for_modorder(modorder):
>> >                      yield to_cmdfile(mod_line.rstrip())
>> >
>> >
>> > +def extract_includes_from_file(source_file, root_directory):
>> > +    """Extract #include statements from a C file.
>> > +
>> > +    Args:
>> > +        source_file: Path to the source .c file to analyze
>> > +        root_directory: Root directory for resolving relative paths
>> > +
>> > +    Returns:
>> > +        List of header files that should be included (without quotes/brackets)
>> > +    """
>> > +    includes = []
>> > +    if not os.path.exists(source_file):
>> > +        return includes
>> > +
>> > +    try:
>> > +        with open(source_file, 'r') as f:
>> > +            for line in f:
>> > +                line = line.strip()
>> > +                # Look for #include statements.
>> > +                # Match both #include "header.h" and #include <header.h>.
>> > +                match = re.match(r'^\s*#\s*include\s*[<"]([^>"]*)[>"]', line)
>> > +                if match:
>> > +                    header = match.group(1)
>> > +                    # Skip including other .c files to avoid circular includes.
>> > +                    if not header.endswith('.c'):
>> > +                        # For relative includes (quoted), resolve path relative to source file.
>> > +                        if '"' in line:
>> > +                            src_dir = os.path.dirname(source_file)
>> > +                            header_path = os.path.join(src_dir, header)
>> > +                            if os.path.exists(header_path):
>> > +                                rel_header = os.path.relpath(header_path, root_directory)
>> > +                                includes.append(rel_header)
>> > +                            else:
>> > +                                includes.append(header)
>> > +                        else:
>> > +                            # System include like <linux/sched.h>.
>> > +                            includes.append(header)
>> > +    except IOError:
>> > +        pass
>> > +
>> > +    return includes
>> > +
>> > +
>> > +def find_included_c_files(source_file, root_directory):
>> > +    """Find .c files that are included by the given source file.
>> > +
>> > +    Args:
>> > +        source_file: Path to the source .c file
>> > +        root_directory: Root directory for resolving relative paths
>> > +
>> > +    Yields:
>> > +        Full paths to included .c files
>> > +    """
>> > +    if not os.path.exists(source_file):
>> > +        return
>> > +
>> > +    try:
>> > +        with open(source_file, 'r') as f:
>> > +            for line in f:
>> > +                line = line.strip()
>> > +                # Look for #include "*.c" patterns.
>> > +                match = re.match(r'^\s*#\s*include\s*"([^"]*\.c)"\s*$', line)
>> > +                if match:
>> > +                    included_file = match.group(1)
>> > +                    # Handle relative paths.
>> > +                    if not os.path.isabs(included_file):
>> > +                        src_dir = os.path.dirname(source_file)
>> > +                        included_file = os.path.join(src_dir, included_file)
>> > +
>> > +                    # Normalize the path.
>> > +                    included_file = os.path.normpath(included_file)
>> > +
>> > +                    # Check if the file exists.
>> > +                    if os.path.exists(included_file):
>> > +                        yield included_file
>> > +    except IOError:
>> > +        pass
>> > +
>> > +
>> >  def process_line(root_directory, command_prefix, file_path):
>> > -    """Extracts information from a .cmd line and creates an entry from it.
>> > +    """Extracts information from a .cmd line and creates entries from it.
>> >
>> >      Args:
>> >          root_directory: The directory that was searched for .cmd files. Usually
>> > @@ -160,7 +239,8 @@ def process_line(root_directory, command_prefix, file_path):
>> >              Usually relative to root_directory, but sometimes absolute.
>> >
>> >      Returns:
>> > -        An entry to append to compile_commands.
>> > +        A list of entries to append to compile_commands (may include multiple
>> > +        entries if the source file includes other .c files).
>> >
>> >      Raises:
>> >          ValueError: Could not find the extracted file based on file_path and
>> > @@ -176,11 +256,47 @@ def process_line(root_directory, command_prefix, file_path):
>> >      abs_path = os.path.realpath(os.path.join(root_directory, file_path))
>> >      if not os.path.exists(abs_path):
>> >          raise ValueError('File %s not found' % abs_path)
>> > -    return {
>> > +
>> > +    entries = []
>> > +
>> > +    # Create entry for the main source file.
>> > +    main_entry = {
>> >          'directory': root_directory,
>> >          'file': abs_path,
>> >          'command': prefix + file_path,
>> >      }
>> > +    entries.append(main_entry)
>> > +
>> > +    # Find and create entries for included .c files.
>> > +    for included_c_file in find_included_c_files(abs_path, root_directory):
>> > +        # For included .c files, create a compilation command that:
>> > +        # 1. Uses the same compilation flags as the parent file
>> > +        # 2. But compiles the included file directly (not the parent)
>> > +        # 3. Includes necessary headers from the parent file for proper macro resolution
>> > +
>> > +        # Convert absolute path to relative for the command.
>> > +        rel_path = os.path.relpath(included_c_file, root_directory)
>> > +
>> > +        # Extract includes from the parent file to provide proper compilation context.
>> > +        extra_includes = ''
>> > +        try:
>> > +            parent_includes = extract_includes_from_file(abs_path, root_directory)
>> > +            if parent_includes:
>> > +                extra_includes = ' ' + ' '.join('-include ' + inc for inc in parent_includes)
>> > +        except IOError:
>> > +            pass
>> > +
>> > +        included_entry = {
>> > +            'directory': root_directory,
>> > +            'file': included_c_file,
>> > +            # Use the same compilation prefix but target the included file directly.
>> > +            # Add extra headers for proper macro resolution.
>> > +            'command': prefix + extra_includes + ' ' + rel_path,
>> > +        }
>> > +        entries.append(included_entry)
>> > +        logging.debug('Added entry for included file: %s', included_c_file)
>> > +
>> > +    return entries
>> >
>> >
>> >  def main():
>> > @@ -213,9 +329,9 @@ def main():
>> >                  result = line_matcher.match(f.readline())
>> >                  if result:
>> >                      try:
>> > -                        entry = process_line(directory, result.group('command_prefix'),
>> > +                        entries = process_line(directory, result.group('command_prefix'),
>> >                                               result.group('file_path'))
>> > -                        compile_commands.append(entry)
>> > +                        compile_commands.extend(entries)
>> >                      except ValueError as err:
>> >                          logging.info('Could not add line from %s: %s',
>> >                                       cmdfile, err)
>> > --
>> > 2.51.0
>> >
>> >
>>
>> Thanks
>> Justin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] scripts/clang-tools: Handle included .c files in gen_compile_commands
  2025-10-08  0:09     ` Pat Somaru
@ 2025-10-08  0:37       ` Pat Somaru
  2025-10-08  0:45         ` [PATCH v2] " Pat Somaru
  0 siblings, 1 reply; 8+ messages in thread
From: Pat Somaru @ 2025-10-08  0:37 UTC (permalink / raw)
  To: patso
  Cc: justinstitt, linux-kbuild, llvm, morbo, nathan,
	nick.desaulniers+lkml, nsc

The gen_compile_commands.py script currently only creates entries for the
primary source files found in .cmd files, but some kernel source files
text-include others (i.e. kernel/sched/build_policy.c).

This prevents tools like clangd from working properly on text-includedd c
files, such as kernel/sched/ext.c because the generated compile_commands.json
does not have entries for them.

Extend process_line() to detect when a source file includes .c files, and
generate additional compile_commands.json entries for them. For included c
files, use the same compile flags as their parent and add their parents headers.

This enables lsp tools like clangd to work properly on files like
kernel/sched/ext.c

Signed-off-by: Pat Somaru <patso@likewhatevs.io>
Tested-by: Justin Stitt <justinstitt@google.com>
---
 scripts/clang-tools/gen_compile_commands.py | 135 +++++++++++++++++++-
 1 file changed, 128 insertions(+), 7 deletions(-)

diff --git a/scripts/clang-tools/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py
index 96e6e46ad1a70..6f4afa92a4665 100755
--- a/scripts/clang-tools/gen_compile_commands.py
+++ b/scripts/clang-tools/gen_compile_commands.py
@@ -21,6 +21,12 @@ _DEFAULT_LOG_LEVEL = 'WARNING'
 _FILENAME_PATTERN = r'^\..*\.cmd$'
 _LINE_PATTERN = r'^(saved)?cmd_[^ ]*\.o := (?P<command_prefix>.* )(?P<file_path>[^ ]*\.[cS]) *(;|$)'
 _VALID_LOG_LEVELS = ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
+
+# Pre-compiled regexes for better performance
+_INCLUDE_PATTERN = re.compile(r'^\s*#\s*include\s*[<"]([^>"]*)[>"]')
+_C_INCLUDE_PATTERN = re.compile(r'^\s*#\s*include\s*"([^"]*\.c)"\s*$')
+_FILENAME_MATCHER = re.compile(_FILENAME_PATTERN)
+
 # The tools/ directory adopts a different build system, and produces .cmd
 # files in a different format. Do not support it.
 _EXCLUDE_DIRS = ['.git', 'Documentation', 'include', 'tools']
@@ -82,7 +88,6 @@ def cmdfiles_in_dir(directory):
         The path to a .cmd file.
     """
 
-    filename_matcher = re.compile(_FILENAME_PATTERN)
     exclude_dirs = [ os.path.join(directory, d) for d in _EXCLUDE_DIRS ]
 
     for dirpath, dirnames, filenames in os.walk(directory, topdown=True):
@@ -92,7 +97,7 @@ def cmdfiles_in_dir(directory):
             continue
 
         for filename in filenames:
-            if filename_matcher.match(filename):
+            if _FILENAME_MATCHER.match(filename):
                 yield os.path.join(dirpath, filename)
 
 
@@ -149,8 +154,87 @@ def cmdfiles_for_modorder(modorder):
                     yield to_cmdfile(mod_line.rstrip())
 
 
+def extract_includes_from_file(source_file, root_directory):
+    """Extract #include statements from a C file.
+
+    Args:
+        source_file: Path to the source .c file to analyze
+        root_directory: Root directory for resolving relative paths
+
+    Returns:
+        List of header files that should be included (without quotes/brackets)
+    """
+    includes = []
+    if not os.path.exists(source_file):
+        return includes
+
+    try:
+        with open(source_file, 'r') as f:
+            for line in f:
+                line = line.strip()
+                # Look for #include statements.
+                # Match both #include "header.h" and #include <header.h>.
+                match = _INCLUDE_PATTERN.match(line)
+                if match:
+                    header = match.group(1)
+                    # Skip including other .c files to avoid circular includes.
+                    if not header.endswith('.c'):
+                        # For relative includes (quoted), resolve path relative to source file.
+                        if '"' in line:
+                            src_dir = os.path.dirname(source_file)
+                            header_path = os.path.join(src_dir, header)
+                            if os.path.exists(header_path):
+                                rel_header = os.path.relpath(header_path, root_directory)
+                                includes.append(rel_header)
+                            else:
+                                includes.append(header)
+                        else:
+                            # System include like <linux/sched.h>.
+                            includes.append(header)
+    except IOError:
+        pass
+
+    return includes
+
+
+def find_included_c_files(source_file, root_directory):
+    """Find .c files that are included by the given source file.
+
+    Args:
+        source_file: Path to the source .c file
+        root_directory: Root directory for resolving relative paths
+
+    Yields:
+        Full paths to included .c files
+    """
+    if not os.path.exists(source_file):
+        return
+
+    try:
+        with open(source_file, 'r') as f:
+            for line in f:
+                line = line.strip()
+                # Look for #include "*.c" patterns.
+                match = _C_INCLUDE_PATTERN.match(line)
+                if match:
+                    included_file = match.group(1)
+                    # Handle relative paths.
+                    if not os.path.isabs(included_file):
+                        src_dir = os.path.dirname(source_file)
+                        included_file = os.path.join(src_dir, included_file)
+
+                    # Normalize the path.
+                    included_file = os.path.normpath(included_file)
+
+                    # Check if the file exists.
+                    if os.path.exists(included_file):
+                        yield included_file
+    except IOError:
+        pass
+
+
 def process_line(root_directory, command_prefix, file_path):
-    """Extracts information from a .cmd line and creates an entry from it.
+    """Extracts information from a .cmd line and creates entries from it.
 
     Args:
         root_directory: The directory that was searched for .cmd files. Usually
@@ -160,7 +244,8 @@ def process_line(root_directory, command_prefix, file_path):
             Usually relative to root_directory, but sometimes absolute.
 
     Returns:
-        An entry to append to compile_commands.
+        A list of entries to append to compile_commands (may include multiple
+        entries if the source file includes other .c files).
 
     Raises:
         ValueError: Could not find the extracted file based on file_path and
@@ -176,11 +261,47 @@ def process_line(root_directory, command_prefix, file_path):
     abs_path = os.path.realpath(os.path.join(root_directory, file_path))
     if not os.path.exists(abs_path):
         raise ValueError('File %s not found' % abs_path)
-    return {
+
+    entries = []
+
+    # Create entry for the main source file.
+    main_entry = {
         'directory': root_directory,
         'file': abs_path,
         'command': prefix + file_path,
     }
+    entries.append(main_entry)
+
+    # Find and create entries for included .c files.
+    for included_c_file in find_included_c_files(abs_path, root_directory):
+        # For included .c files, create a compilation command that:
+        # 1. Uses the same compilation flags as the parent file
+        # 2. But compiles the included file directly (not the parent)
+        # 3. Includes necessary headers from the parent file for proper macro resolution
+
+        # Convert absolute path to relative for the command.
+        rel_path = os.path.relpath(included_c_file, root_directory)
+
+        # Extract includes from the parent file to provide proper compilation context.
+        extra_includes = ''
+        try:
+            parent_includes = extract_includes_from_file(abs_path, root_directory)
+            if parent_includes:
+                extra_includes = ' ' + ' '.join('-include ' + inc for inc in parent_includes)
+        except IOError:
+            pass
+
+        included_entry = {
+            'directory': root_directory,
+            'file': included_c_file,
+            # Use the same compilation prefix but target the included file directly.
+            # Add extra headers for proper macro resolution.
+            'command': prefix + extra_includes + ' ' + rel_path,
+        }
+        entries.append(included_entry)
+        logging.debug('Added entry for included file: %s', included_c_file)
+
+    return entries
 
 
 def main():
@@ -213,9 +334,9 @@ def main():
                 result = line_matcher.match(f.readline())
                 if result:
                     try:
-                        entry = process_line(directory, result.group('command_prefix'),
+                        entries = process_line(directory, result.group('command_prefix'),
                                              result.group('file_path'))
-                        compile_commands.append(entry)
+                        compile_commands.extend(entries)
                     except ValueError as err:
                         logging.info('Could not add line from %s: %s',
                                      cmdfile, err)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2] scripts/clang-tools: Handle included .c files in gen_compile_commands
  2025-10-08  0:37       ` Pat Somaru
@ 2025-10-08  0:45         ` Pat Somaru
  2025-10-17 17:21           ` Eduard Zingerman
                             ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Pat Somaru @ 2025-10-08  0:45 UTC (permalink / raw)
  To: patso
  Cc: justinstitt, linux-kbuild, llvm, morbo, nathan,
	nick.desaulniers+lkml, nsc

The gen_compile_commands.py script currently only creates entries for the
primary source files found in .cmd files, but some kernel source files
text-include others (i.e. kernel/sched/build_policy.c).

This prevents tools like clangd from working properly on text-includedd c
files, such as kernel/sched/ext.c because the generated compile_commands.json
does not have entries for them.

Extend process_line() to detect when a source file includes .c files, and
generate additional compile_commands.json entries for them. For included c
files, use the same compile flags as their parent and add their parents headers.

This enables lsp tools like clangd to work properly on files like
kernel/sched/ext.c

Signed-off-by: Pat Somaru <patso@likewhatevs.io>
Tested-by: Justin Stitt <justinstitt@google.com>
---
 scripts/clang-tools/gen_compile_commands.py | 135 +++++++++++++++++++-
 1 file changed, 128 insertions(+), 7 deletions(-)

diff --git a/scripts/clang-tools/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py
index 96e6e46ad1a70..6f4afa92a4665 100755
--- a/scripts/clang-tools/gen_compile_commands.py
+++ b/scripts/clang-tools/gen_compile_commands.py
@@ -21,6 +21,12 @@ _DEFAULT_LOG_LEVEL = 'WARNING'
 _FILENAME_PATTERN = r'^\..*\.cmd$'
 _LINE_PATTERN = r'^(saved)?cmd_[^ ]*\.o := (?P<command_prefix>.* )(?P<file_path>[^ ]*\.[cS]) *(;|$)'
 _VALID_LOG_LEVELS = ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
+
+# Pre-compiled regexes for better performance
+_INCLUDE_PATTERN = re.compile(r'^\s*#\s*include\s*[<"]([^>"]*)[>"]')
+_C_INCLUDE_PATTERN = re.compile(r'^\s*#\s*include\s*"([^"]*\.c)"\s*$')
+_FILENAME_MATCHER = re.compile(_FILENAME_PATTERN)
+
 # The tools/ directory adopts a different build system, and produces .cmd
 # files in a different format. Do not support it.
 _EXCLUDE_DIRS = ['.git', 'Documentation', 'include', 'tools']
@@ -82,7 +88,6 @@ def cmdfiles_in_dir(directory):
         The path to a .cmd file.
     """
 
-    filename_matcher = re.compile(_FILENAME_PATTERN)
     exclude_dirs = [ os.path.join(directory, d) for d in _EXCLUDE_DIRS ]
 
     for dirpath, dirnames, filenames in os.walk(directory, topdown=True):
@@ -92,7 +97,7 @@ def cmdfiles_in_dir(directory):
             continue
 
         for filename in filenames:
-            if filename_matcher.match(filename):
+            if _FILENAME_MATCHER.match(filename):
                 yield os.path.join(dirpath, filename)
 
 
@@ -149,8 +154,87 @@ def cmdfiles_for_modorder(modorder):
                     yield to_cmdfile(mod_line.rstrip())
 
 
+def extract_includes_from_file(source_file, root_directory):
+    """Extract #include statements from a C file.
+
+    Args:
+        source_file: Path to the source .c file to analyze
+        root_directory: Root directory for resolving relative paths
+
+    Returns:
+        List of header files that should be included (without quotes/brackets)
+    """
+    includes = []
+    if not os.path.exists(source_file):
+        return includes
+
+    try:
+        with open(source_file, 'r') as f:
+            for line in f:
+                line = line.strip()
+                # Look for #include statements.
+                # Match both #include "header.h" and #include <header.h>.
+                match = _INCLUDE_PATTERN.match(line)
+                if match:
+                    header = match.group(1)
+                    # Skip including other .c files to avoid circular includes.
+                    if not header.endswith('.c'):
+                        # For relative includes (quoted), resolve path relative to source file.
+                        if '"' in line:
+                            src_dir = os.path.dirname(source_file)
+                            header_path = os.path.join(src_dir, header)
+                            if os.path.exists(header_path):
+                                rel_header = os.path.relpath(header_path, root_directory)
+                                includes.append(rel_header)
+                            else:
+                                includes.append(header)
+                        else:
+                            # System include like <linux/sched.h>.
+                            includes.append(header)
+    except IOError:
+        pass
+
+    return includes
+
+
+def find_included_c_files(source_file, root_directory):
+    """Find .c files that are included by the given source file.
+
+    Args:
+        source_file: Path to the source .c file
+        root_directory: Root directory for resolving relative paths
+
+    Yields:
+        Full paths to included .c files
+    """
+    if not os.path.exists(source_file):
+        return
+
+    try:
+        with open(source_file, 'r') as f:
+            for line in f:
+                line = line.strip()
+                # Look for #include "*.c" patterns.
+                match = _C_INCLUDE_PATTERN.match(line)
+                if match:
+                    included_file = match.group(1)
+                    # Handle relative paths.
+                    if not os.path.isabs(included_file):
+                        src_dir = os.path.dirname(source_file)
+                        included_file = os.path.join(src_dir, included_file)
+
+                    # Normalize the path.
+                    included_file = os.path.normpath(included_file)
+
+                    # Check if the file exists.
+                    if os.path.exists(included_file):
+                        yield included_file
+    except IOError:
+        pass
+
+
 def process_line(root_directory, command_prefix, file_path):
-    """Extracts information from a .cmd line and creates an entry from it.
+    """Extracts information from a .cmd line and creates entries from it.
 
     Args:
         root_directory: The directory that was searched for .cmd files. Usually
@@ -160,7 +244,8 @@ def process_line(root_directory, command_prefix, file_path):
             Usually relative to root_directory, but sometimes absolute.
 
     Returns:
-        An entry to append to compile_commands.
+        A list of entries to append to compile_commands (may include multiple
+        entries if the source file includes other .c files).
 
     Raises:
         ValueError: Could not find the extracted file based on file_path and
@@ -176,11 +261,47 @@ def process_line(root_directory, command_prefix, file_path):
     abs_path = os.path.realpath(os.path.join(root_directory, file_path))
     if not os.path.exists(abs_path):
         raise ValueError('File %s not found' % abs_path)
-    return {
+
+    entries = []
+
+    # Create entry for the main source file.
+    main_entry = {
         'directory': root_directory,
         'file': abs_path,
         'command': prefix + file_path,
     }
+    entries.append(main_entry)
+
+    # Find and create entries for included .c files.
+    for included_c_file in find_included_c_files(abs_path, root_directory):
+        # For included .c files, create a compilation command that:
+        # 1. Uses the same compilation flags as the parent file
+        # 2. But compiles the included file directly (not the parent)
+        # 3. Includes necessary headers from the parent file for proper macro resolution
+
+        # Convert absolute path to relative for the command.
+        rel_path = os.path.relpath(included_c_file, root_directory)
+
+        # Extract includes from the parent file to provide proper compilation context.
+        extra_includes = ''
+        try:
+            parent_includes = extract_includes_from_file(abs_path, root_directory)
+            if parent_includes:
+                extra_includes = ' ' + ' '.join('-include ' + inc for inc in parent_includes)
+        except IOError:
+            pass
+
+        included_entry = {
+            'directory': root_directory,
+            'file': included_c_file,
+            # Use the same compilation prefix but target the included file directly.
+            # Add extra headers for proper macro resolution.
+            'command': prefix + extra_includes + ' ' + rel_path,
+        }
+        entries.append(included_entry)
+        logging.debug('Added entry for included file: %s', included_c_file)
+
+    return entries
 
 
 def main():
@@ -213,9 +334,9 @@ def main():
                 result = line_matcher.match(f.readline())
                 if result:
                     try:
-                        entry = process_line(directory, result.group('command_prefix'),
+                        entries = process_line(directory, result.group('command_prefix'),
                                              result.group('file_path'))
-                        compile_commands.append(entry)
+                        compile_commands.extend(entries)
                     except ValueError as err:
                         logging.info('Could not add line from %s: %s',
                                      cmdfile, err)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] scripts/clang-tools: Handle included .c files in gen_compile_commands
  2025-10-08  0:45         ` [PATCH v2] " Pat Somaru
@ 2025-10-17 17:21           ` Eduard Zingerman
  2025-10-31 22:56           ` Nathan Chancellor
  2025-11-05 21:30           ` Nicolas Schier
  2 siblings, 0 replies; 8+ messages in thread
From: Eduard Zingerman @ 2025-10-17 17:21 UTC (permalink / raw)
  To: Pat Somaru
  Cc: justinstitt, linux-kbuild, llvm, morbo, nathan,
	nick.desaulniers+lkml, nsc

On Tue, 2025-10-07 at 20:45 -0400, Pat Somaru wrote:
> The gen_compile_commands.py script currently only creates entries for the
> primary source files found in .cmd files, but some kernel source files
> text-include others (i.e. kernel/sched/build_policy.c).
> 
> This prevents tools like clangd from working properly on text-includedd c
> files, such as kernel/sched/ext.c because the generated compile_commands.json
> does not have entries for them.
> 
> Extend process_line() to detect when a source file includes .c files, and
> generate additional compile_commands.json entries for them. For included c
> files, use the same compile flags as their parent and add their parents headers.
> 
> This enables lsp tools like clangd to work properly on files like
> kernel/sched/ext.c
> 
> Signed-off-by: Pat Somaru <patso@likewhatevs.io>
> Tested-by: Justin Stitt <justinstitt@google.com>
> ---

I tested navigation to work in emacs+clangd combination with this
patch for the following files:
- kernel/sched/pelt.c
- kernel/sched/cputime.c
- kernel/sched/deadline.c

All included from the kernel/sched/build_policy.c.
Navigation works for entities declared in the same file, entities
declared in headers included from build_policy.c, entities declared in
headers included from {pelt,cputime,deadline}.c.

Note that sched/ code [ab]uses the pattern with .c file includes a
lot. W/o this patch clangd is completely oblivious to the content of
the above files. Jump to definition is almost useless, who-calls
queries ignore functions declared in these files etc.

Tested-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] scripts/clang-tools: Handle included .c files in gen_compile_commands
  2025-10-08  0:45         ` [PATCH v2] " Pat Somaru
  2025-10-17 17:21           ` Eduard Zingerman
@ 2025-10-31 22:56           ` Nathan Chancellor
  2025-11-05 21:30           ` Nicolas Schier
  2 siblings, 0 replies; 8+ messages in thread
From: Nathan Chancellor @ 2025-10-31 22:56 UTC (permalink / raw)
  To: Pat Somaru
  Cc: justinstitt, linux-kbuild, llvm, morbo, nick.desaulniers+lkml,
	nsc

Hi Pat,

On Tue, Oct 07, 2025 at 08:45:28PM -0400, Pat Somaru wrote:
> The gen_compile_commands.py script currently only creates entries for the
> primary source files found in .cmd files, but some kernel source files
> text-include others (i.e. kernel/sched/build_policy.c).
> 
> This prevents tools like clangd from working properly on text-includedd c
> files, such as kernel/sched/ext.c because the generated compile_commands.json
> does not have entries for them.
> 
> Extend process_line() to detect when a source file includes .c files, and
> generate additional compile_commands.json entries for them. For included c
> files, use the same compile flags as their parent and add their parents headers.
> 
> This enables lsp tools like clangd to work properly on files like
> kernel/sched/ext.c
> 
> Signed-off-by: Pat Somaru <patso@likewhatevs.io>
> Tested-by: Justin Stitt <justinstitt@google.com>

I have taken a look over this and it seems fine to me from a Python
perspective. I trust that you will be around if there are any problems
that crop up in testing but given that this should only impact files
that include .c files, I expect the fallout to be minimal.

Reviewed-by: Nathan Chancellor <nathan@kernel.org>

Nicolas should be able to pick this up for kbuild-next.

> ---
>  scripts/clang-tools/gen_compile_commands.py | 135 +++++++++++++++++++-
>  1 file changed, 128 insertions(+), 7 deletions(-)
> 
> diff --git a/scripts/clang-tools/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py
> index 96e6e46ad1a70..6f4afa92a4665 100755
> --- a/scripts/clang-tools/gen_compile_commands.py
> +++ b/scripts/clang-tools/gen_compile_commands.py
> @@ -21,6 +21,12 @@ _DEFAULT_LOG_LEVEL = 'WARNING'
>  _FILENAME_PATTERN = r'^\..*\.cmd$'
>  _LINE_PATTERN = r'^(saved)?cmd_[^ ]*\.o := (?P<command_prefix>.* )(?P<file_path>[^ ]*\.[cS]) *(;|$)'
>  _VALID_LOG_LEVELS = ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
> +
> +# Pre-compiled regexes for better performance
> +_INCLUDE_PATTERN = re.compile(r'^\s*#\s*include\s*[<"]([^>"]*)[>"]')
> +_C_INCLUDE_PATTERN = re.compile(r'^\s*#\s*include\s*"([^"]*\.c)"\s*$')
> +_FILENAME_MATCHER = re.compile(_FILENAME_PATTERN)
> +
>  # The tools/ directory adopts a different build system, and produces .cmd
>  # files in a different format. Do not support it.
>  _EXCLUDE_DIRS = ['.git', 'Documentation', 'include', 'tools']
> @@ -82,7 +88,6 @@ def cmdfiles_in_dir(directory):
>          The path to a .cmd file.
>      """
>  
> -    filename_matcher = re.compile(_FILENAME_PATTERN)
>      exclude_dirs = [ os.path.join(directory, d) for d in _EXCLUDE_DIRS ]
>  
>      for dirpath, dirnames, filenames in os.walk(directory, topdown=True):
> @@ -92,7 +97,7 @@ def cmdfiles_in_dir(directory):
>              continue
>  
>          for filename in filenames:
> -            if filename_matcher.match(filename):
> +            if _FILENAME_MATCHER.match(filename):
>                  yield os.path.join(dirpath, filename)
>  
>  
> @@ -149,8 +154,87 @@ def cmdfiles_for_modorder(modorder):
>                      yield to_cmdfile(mod_line.rstrip())
>  
>  
> +def extract_includes_from_file(source_file, root_directory):
> +    """Extract #include statements from a C file.
> +
> +    Args:
> +        source_file: Path to the source .c file to analyze
> +        root_directory: Root directory for resolving relative paths
> +
> +    Returns:
> +        List of header files that should be included (without quotes/brackets)
> +    """
> +    includes = []
> +    if not os.path.exists(source_file):
> +        return includes
> +
> +    try:
> +        with open(source_file, 'r') as f:
> +            for line in f:
> +                line = line.strip()
> +                # Look for #include statements.
> +                # Match both #include "header.h" and #include <header.h>.
> +                match = _INCLUDE_PATTERN.match(line)
> +                if match:
> +                    header = match.group(1)
> +                    # Skip including other .c files to avoid circular includes.
> +                    if not header.endswith('.c'):
> +                        # For relative includes (quoted), resolve path relative to source file.
> +                        if '"' in line:
> +                            src_dir = os.path.dirname(source_file)
> +                            header_path = os.path.join(src_dir, header)
> +                            if os.path.exists(header_path):
> +                                rel_header = os.path.relpath(header_path, root_directory)
> +                                includes.append(rel_header)
> +                            else:
> +                                includes.append(header)
> +                        else:
> +                            # System include like <linux/sched.h>.
> +                            includes.append(header)
> +    except IOError:
> +        pass
> +
> +    return includes
> +
> +
> +def find_included_c_files(source_file, root_directory):
> +    """Find .c files that are included by the given source file.
> +
> +    Args:
> +        source_file: Path to the source .c file
> +        root_directory: Root directory for resolving relative paths
> +
> +    Yields:
> +        Full paths to included .c files
> +    """
> +    if not os.path.exists(source_file):
> +        return
> +
> +    try:
> +        with open(source_file, 'r') as f:
> +            for line in f:
> +                line = line.strip()
> +                # Look for #include "*.c" patterns.
> +                match = _C_INCLUDE_PATTERN.match(line)
> +                if match:
> +                    included_file = match.group(1)
> +                    # Handle relative paths.
> +                    if not os.path.isabs(included_file):
> +                        src_dir = os.path.dirname(source_file)
> +                        included_file = os.path.join(src_dir, included_file)
> +
> +                    # Normalize the path.
> +                    included_file = os.path.normpath(included_file)
> +
> +                    # Check if the file exists.
> +                    if os.path.exists(included_file):
> +                        yield included_file
> +    except IOError:
> +        pass
> +
> +
>  def process_line(root_directory, command_prefix, file_path):
> -    """Extracts information from a .cmd line and creates an entry from it.
> +    """Extracts information from a .cmd line and creates entries from it.
>  
>      Args:
>          root_directory: The directory that was searched for .cmd files. Usually
> @@ -160,7 +244,8 @@ def process_line(root_directory, command_prefix, file_path):
>              Usually relative to root_directory, but sometimes absolute.
>  
>      Returns:
> -        An entry to append to compile_commands.
> +        A list of entries to append to compile_commands (may include multiple
> +        entries if the source file includes other .c files).
>  
>      Raises:
>          ValueError: Could not find the extracted file based on file_path and
> @@ -176,11 +261,47 @@ def process_line(root_directory, command_prefix, file_path):
>      abs_path = os.path.realpath(os.path.join(root_directory, file_path))
>      if not os.path.exists(abs_path):
>          raise ValueError('File %s not found' % abs_path)
> -    return {
> +
> +    entries = []
> +
> +    # Create entry for the main source file.
> +    main_entry = {
>          'directory': root_directory,
>          'file': abs_path,
>          'command': prefix + file_path,
>      }
> +    entries.append(main_entry)
> +
> +    # Find and create entries for included .c files.
> +    for included_c_file in find_included_c_files(abs_path, root_directory):
> +        # For included .c files, create a compilation command that:
> +        # 1. Uses the same compilation flags as the parent file
> +        # 2. But compiles the included file directly (not the parent)
> +        # 3. Includes necessary headers from the parent file for proper macro resolution
> +
> +        # Convert absolute path to relative for the command.
> +        rel_path = os.path.relpath(included_c_file, root_directory)
> +
> +        # Extract includes from the parent file to provide proper compilation context.
> +        extra_includes = ''
> +        try:
> +            parent_includes = extract_includes_from_file(abs_path, root_directory)
> +            if parent_includes:
> +                extra_includes = ' ' + ' '.join('-include ' + inc for inc in parent_includes)
> +        except IOError:
> +            pass
> +
> +        included_entry = {
> +            'directory': root_directory,
> +            'file': included_c_file,
> +            # Use the same compilation prefix but target the included file directly.
> +            # Add extra headers for proper macro resolution.
> +            'command': prefix + extra_includes + ' ' + rel_path,
> +        }
> +        entries.append(included_entry)
> +        logging.debug('Added entry for included file: %s', included_c_file)
> +
> +    return entries
>  
>  
>  def main():
> @@ -213,9 +334,9 @@ def main():
>                  result = line_matcher.match(f.readline())
>                  if result:
>                      try:
> -                        entry = process_line(directory, result.group('command_prefix'),
> +                        entries = process_line(directory, result.group('command_prefix'),
>                                               result.group('file_path'))
> -                        compile_commands.append(entry)
> +                        compile_commands.extend(entries)
>                      except ValueError as err:
>                          logging.info('Could not add line from %s: %s',
>                                       cmdfile, err)
> -- 
> 2.51.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] scripts/clang-tools: Handle included .c files in gen_compile_commands
  2025-10-08  0:45         ` [PATCH v2] " Pat Somaru
  2025-10-17 17:21           ` Eduard Zingerman
  2025-10-31 22:56           ` Nathan Chancellor
@ 2025-11-05 21:30           ` Nicolas Schier
  2 siblings, 0 replies; 8+ messages in thread
From: Nicolas Schier @ 2025-11-05 21:30 UTC (permalink / raw)
  To: Pat Somaru
  Cc: justinstitt, linux-kbuild, llvm, morbo, nathan,
	nick.desaulniers+lkml

On Tue, 07 Oct 2025 20:45:28 -0400, Pat Somaru wrote:
> The gen_compile_commands.py script currently only creates entries for the
> primary source files found in .cmd files, but some kernel source files
> text-include others (i.e. kernel/sched/build_policy.c).
> 
> This prevents tools like clangd from working properly on text-includedd c
> files, such as kernel/sched/ext.c because the generated compile_commands.json
> does not have entries for them.
> 
> [...]

Applied to kbuild-next, thanks!

[1/1] scripts/clang-tools: Handle included .c files in gen_compile_commands
      https://git.kernel.org/kbuild/c/4e455b73

Please note that commit hashes might change in case of issues with
kbuild-next.

Best regards,
-- 
Nicolas


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-05 21:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20251007024527.1439747-1-patso@likewhatevs.io>
2025-10-07 16:33 ` [PATCH] scripts/clang-tools: Handle included .c files in gen_compile_commands Nathan Chancellor
     [not found] ` <be6kxu36uphlnhedsi7gsayhuetmqixil4dv2lxe4plvgyqewx@jecdpummb52l>
2025-10-07 19:59   ` Justin Stitt
2025-10-08  0:09     ` Pat Somaru
2025-10-08  0:37       ` Pat Somaru
2025-10-08  0:45         ` [PATCH v2] " Pat Somaru
2025-10-17 17:21           ` Eduard Zingerman
2025-10-31 22:56           ` Nathan Chancellor
2025-11-05 21:30           ` Nicolas Schier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox