public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1][RFC] symlinks: use unsigned int for flags
@ 2026-01-20 15:22 Tian Yuchen
  2026-01-20 15:36 ` Tian Yuchen
  2026-01-21  9:33 ` Patrick Steinhardt
  0 siblings, 2 replies; 4+ messages in thread
From: Tian Yuchen @ 2026-01-20 15:22 UTC (permalink / raw)
  To: git; +Cc: gitster

The 'flags' and 'track_flags' fields in symlinks.c are used
strictly as a collection of bits (using bitwise operators including
&, |, ~). Using a signed integer for bitmasks may lead to undefined
behavior with shift operations and logic errors if the MSB is touched.

Change these fields from 'int' to 'unsigned int' to align with C
standards and typical usage patterns.

Signed-off-by: Tian Yuchen <a3205153416@gmail.com>
---
 symlinks.c | 12 ++++++------
 symlinks.h |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/symlinks.c b/symlinks.c
index 9cc090d42c..ed63891149 100644
--- a/symlinks.c
+++ b/symlinks.c
@@ -74,11 +74,11 @@ static inline void reset_lstat_cache(struct cache_def *cache)
  */
 static int lstat_cache_matchlen(struct cache_def *cache,
 				const char *name, int len,
-				int *ret_flags, int track_flags,
+				unsigned int *ret_flags, unsigned int track_flags,
 				int prefix_len_stat_func)
 {
 	int match_len, last_slash, last_slash_dir, previous_slash;
-	int save_flags, ret, saved_errno = 0;
+	unsigned int save_flags, ret, saved_errno = 0;
 	struct stat st;
 
 	if (cache->track_flags != track_flags ||
@@ -192,10 +192,10 @@ static int lstat_cache_matchlen(struct cache_def *cache,
 	return match_len;
 }
 
-static int lstat_cache(struct cache_def *cache, const char *name, int len,
-		       int track_flags, int prefix_len_stat_func)
+static unsigned int lstat_cache(struct cache_def *cache, const char *name, int len,
+		       unsigned int track_flags, int prefix_len_stat_func)
 {
-	int flags;
+	unsigned int flags;
 	(void)lstat_cache_matchlen(cache, name, len, &flags, track_flags,
 			prefix_len_stat_func);
 	return flags;
@@ -234,7 +234,7 @@ int check_leading_path(const char *name, int len, int warn_on_lstat_err)
 static int threaded_check_leading_path(struct cache_def *cache, const char *name,
 				       int len, int warn_on_lstat_err)
 {
-	int flags;
+	unsigned int flags;
 	int match_len = lstat_cache_matchlen(cache, name, len, &flags,
 			   FL_SYMLINK|FL_NOENT|FL_DIR, USE_ONLY_LSTAT);
 	int saved_errno = errno;
diff --git a/symlinks.h b/symlinks.h
index 7ae3d5b856..25bf04f54f 100644
--- a/symlinks.h
+++ b/symlinks.h
@@ -5,8 +5,8 @@
 
 struct cache_def {
 	struct strbuf path;
-	int flags;
-	int track_flags;
+	unsigned int flags;
+	unsigned int track_flags;
 	int prefix_len_stat_func;
 };
 #define CACHE_DEF_INIT { \
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v1][RFC] symlinks: use unsigned int for flags
  2026-01-20 15:22 [PATCH v1][RFC] symlinks: use unsigned int for flags Tian Yuchen
@ 2026-01-20 15:36 ` Tian Yuchen
  2026-01-21  9:39   ` Patrick Steinhardt
  2026-01-21  9:33 ` Patrick Steinhardt
  1 sibling, 1 reply; 4+ messages in thread
From: Tian Yuchen @ 2026-01-20 15:36 UTC (permalink / raw)
  To: git; +Cc: gitster

Me as total newbie to the git community (also preparing for GSoC
2026), welcome comments or any possible suggestions!

While preparing v2 to fix the return type of lstat_cache(), a broader
question regarding coding style came to mind:

I realized that even without changing the return type, the code
compiles and runs because of C's implicit integer conversion
 (since the flag values don't exceed INT_MAX).

My question is: In the Git codebase, are such "safe" implicit conversions
generally tolerated to minimize code churn, or is it considered a
best practice to strictly avoid them and match types explicitly whenever
possible?

I want to ensure I have the right standard for type strictness in
 future contributions.

Tian Yuchen <a3205153416@gmail.com> 于2026年1月20日周二 23:22写道:
>
> The 'flags' and 'track_flags' fields in symlinks.c are used
> strictly as a collection of bits (using bitwise operators including
> &, |, ~). Using a signed integer for bitmasks may lead to undefined
> behavior with shift operations and logic errors if the MSB is touched.
>
> Change these fields from 'int' to 'unsigned int' to align with C
> standards and typical usage patterns.
>
> Signed-off-by: Tian Yuchen <a3205153416@gmail.com>
> ---
>  symlinks.c | 12 ++++++------
>  symlinks.h |  4 ++--
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/symlinks.c b/symlinks.c
> index 9cc090d42c..ed63891149 100644
> --- a/symlinks.c
> +++ b/symlinks.c
> @@ -74,11 +74,11 @@ static inline void reset_lstat_cache(struct cache_def *cache)
>   */
>  static int lstat_cache_matchlen(struct cache_def *cache,
>                                 const char *name, int len,
> -                               int *ret_flags, int track_flags,
> +                               unsigned int *ret_flags, unsigned int track_flags,
>                                 int prefix_len_stat_func)
>  {
>         int match_len, last_slash, last_slash_dir, previous_slash;
> -       int save_flags, ret, saved_errno = 0;
> +       unsigned int save_flags, ret, saved_errno = 0;
>         struct stat st;
>
>         if (cache->track_flags != track_flags ||
> @@ -192,10 +192,10 @@ static int lstat_cache_matchlen(struct cache_def *cache,
>         return match_len;
>  }
>
> -static int lstat_cache(struct cache_def *cache, const char *name, int len,
> -                      int track_flags, int prefix_len_stat_func)
> +static unsigned int lstat_cache(struct cache_def *cache, const char *name, int len,
> +                      unsigned int track_flags, int prefix_len_stat_func)
>  {
> -       int flags;
> +       unsigned int flags;
>         (void)lstat_cache_matchlen(cache, name, len, &flags, track_flags,
>                         prefix_len_stat_func);
>         return flags;
> @@ -234,7 +234,7 @@ int check_leading_path(const char *name, int len, int warn_on_lstat_err)
>  static int threaded_check_leading_path(struct cache_def *cache, const char *name,
>                                        int len, int warn_on_lstat_err)
>  {
> -       int flags;
> +       unsigned int flags;
>         int match_len = lstat_cache_matchlen(cache, name, len, &flags,
>                            FL_SYMLINK|FL_NOENT|FL_DIR, USE_ONLY_LSTAT);
>         int saved_errno = errno;
> diff --git a/symlinks.h b/symlinks.h
> index 7ae3d5b856..25bf04f54f 100644
> --- a/symlinks.h
> +++ b/symlinks.h
> @@ -5,8 +5,8 @@
>
>  struct cache_def {
>         struct strbuf path;
> -       int flags;
> -       int track_flags;
> +       unsigned int flags;
> +       unsigned int track_flags;
>         int prefix_len_stat_func;
>  };
>  #define CACHE_DEF_INIT { \
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1][RFC] symlinks: use unsigned int for flags
  2026-01-20 15:22 [PATCH v1][RFC] symlinks: use unsigned int for flags Tian Yuchen
  2026-01-20 15:36 ` Tian Yuchen
@ 2026-01-21  9:33 ` Patrick Steinhardt
  1 sibling, 0 replies; 4+ messages in thread
From: Patrick Steinhardt @ 2026-01-21  9:33 UTC (permalink / raw)
  To: Tian Yuchen; +Cc: git, gitster

On Tue, Jan 20, 2026 at 11:22:19PM +0800, Tian Yuchen wrote:
> The 'flags' and 'track_flags' fields in symlinks.c are used
> strictly as a collection of bits (using bitwise operators including
> &, |, ~).

That's one important data point. The other important data point is
whether or not we ever use any negative values here. But these flags are
defined like this:

    #define FL_DIR      (1 << 0)
    #define FL_NOENT    (1 << 1)
    #define FL_SYMLINK  (1 << 2)
    #define FL_LSTATERR (1 << 3)
    #define FL_ERR      (1 << 4)
    #define FL_FULLPATH (1 << 5)

So they indeed are only ever positive.

> Using a signed integer for bitmasks may lead to undefined
> behavior with shift operations and logic errors if the MSB is touched.

Mostly a theoretical issue, but we indeed prefer to use unsigned ints
for bitsets like this.

> diff --git a/symlinks.c b/symlinks.c
> index 9cc090d42c..ed63891149 100644
> --- a/symlinks.c
> +++ b/symlinks.c
> @@ -74,11 +74,11 @@ static inline void reset_lstat_cache(struct cache_def *cache)
>   */
>  static int lstat_cache_matchlen(struct cache_def *cache,
>  				const char *name, int len,
> -				int *ret_flags, int track_flags,
> +				unsigned int *ret_flags, unsigned int track_flags,

`ret_flags` is also a combination of `FL_*` flags, so it's never
expected to be negative, either.

>  				int prefix_len_stat_func)
>  {
>  	int match_len, last_slash, last_slash_dir, previous_slash;
> -	int save_flags, ret, saved_errno = 0;
> +	unsigned int save_flags, ret, saved_errno = 0;

Changing the type of `save_flags` makes sense, but you also change the
type of `ret` and `saved_errno` to become unsigned, which does not make
sense.

> @@ -192,10 +192,10 @@ static int lstat_cache_matchlen(struct cache_def *cache,
>  	return match_len;
>  }
>  
> -static int lstat_cache(struct cache_def *cache, const char *name, int len,
> -		       int track_flags, int prefix_len_stat_func)
> +static unsigned int lstat_cache(struct cache_def *cache, const char *name, int len,
> +		       unsigned int track_flags, int prefix_len_stat_func)
>  {
> -	int flags;
> +	unsigned int flags;
>  	(void)lstat_cache_matchlen(cache, name, len, &flags, track_flags,
>  			prefix_len_stat_func);
>  	return flags;

All callers already pass unsigned flags, so that's good. But there's two
callers that use the return value of this function:

  - `threaded_has_symlink_leading_path()` uses `lstat_cache() & FL_SYMLINK`.

  - `threaded_has_dirs_only_path()` uses `lstat_cache() & FL_DIR`.

Both of these functions really only care about whether or not the
statement evaluates to a truish value. Maybe it would make sense to have
a preparatory commit where convert the return values of those functions
to be booleans?

> @@ -234,7 +234,7 @@ int check_leading_path(const char *name, int len, int warn_on_lstat_err)
>  static int threaded_check_leading_path(struct cache_def *cache, const char *name,
>  				       int len, int warn_on_lstat_err)
>  {
> -	int flags;
> +	unsigned int flags;
>  	int match_len = lstat_cache_matchlen(cache, name, len, &flags,
>  			   FL_SYMLINK|FL_NOENT|FL_DIR, USE_ONLY_LSTAT);
>  	int saved_errno = errno;

Looks good.

> diff --git a/symlinks.h b/symlinks.h
> index 7ae3d5b856..25bf04f54f 100644
> --- a/symlinks.h
> +++ b/symlinks.h
> @@ -5,8 +5,8 @@
>  
>  struct cache_def {
>  	struct strbuf path;
> -	int flags;
> -	int track_flags;
> +	unsigned int flags;
> +	unsigned int track_flags;
>  	int prefix_len_stat_func;
>  };
>  #define CACHE_DEF_INIT { \

There is only one user of `struct cache_def` outside of "symlink.c",
which is "preload-index.c". That user doesn't care about the flag
definitions though, so this conversion looks good to me.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1][RFC] symlinks: use unsigned int for flags
  2026-01-20 15:36 ` Tian Yuchen
@ 2026-01-21  9:39   ` Patrick Steinhardt
  0 siblings, 0 replies; 4+ messages in thread
From: Patrick Steinhardt @ 2026-01-21  9:39 UTC (permalink / raw)
  To: Tian Yuchen; +Cc: git, gitster

On Tue, Jan 20, 2026 at 11:36:25PM +0800, Tian Yuchen wrote:
> Me as total newbie to the git community (also preparing for GSoC
> 2026), welcome comments or any possible suggestions!
> 
> While preparing v2 to fix the return type of lstat_cache(), a broader
> question regarding coding style came to mind:
> 
> I realized that even without changing the return type, the code
> compiles and runs because of C's implicit integer conversion
>  (since the flag values don't exceed INT_MAX).
> 
> My question is: In the Git codebase, are such "safe" implicit conversions
> generally tolerated to minimize code churn, or is it considered a
> best practice to strictly avoid them and match types explicitly whenever
> possible?

I wouldn't say "strictly". We have lots of cases where we do in fact
rely on implicit conversions. In some cases it's a code smell, in lots
of other cases it's fine. We have tried to become a bit more mindful
around such implicit conversions as those have bitten us in the past,
but we're not on a crusade against them.

So I guess the answer is "it depends". A slow trickle of improvements in
this area does make sense, but we don't want to convert all of our code
base in large patch series just for the sake of it.

Patrick

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-01-21  9:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-20 15:22 [PATCH v1][RFC] symlinks: use unsigned int for flags Tian Yuchen
2026-01-20 15:36 ` Tian Yuchen
2026-01-21  9:39   ` Patrick Steinhardt
2026-01-21  9:33 ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox