public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* thoughts wanted on dead code hunting?
@ 2024-05-09 12:08 Dr. David Alan Gilbert
  2024-05-15  0:14 ` Kees Cook
  0 siblings, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2024-05-09 12:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: keescook, nathan, luc.vanoostenryck

Hi,
  I've stumbled into finding various types of dead code in the kernel
and am after peoples thoughts and suggestions.
  Apologies if this is a bit long.

  It started off after I noticed an unused LIST_HEAD in the parallel
code, so I sent some patches to clean that and related code up
(waiting review!), but then wrote a hacky script to find more
(and then mutex's); and there's a handful of patches removing those
on list (please review!).

  But then I noticed associated with those were some unused structs;
so I went struct hunting, and that's where I've got a problem.

  That's found me ~200 candidates; where I guess 150ish are probably
real; but my hacky script is, well trivial and hacky, so they each
need eyeballing, then a git lookup to see why they're unused, and a
compile just to make there's not some subtle macro somewhere.

 ** Questions:
  a) Can anyone think of a better tool than my script (see bottom)?
   The simplicity is a blessing & a curse - it doesn't know about
   #ifdef's so I don't need to try lots of configs, but at the same
   time, it can't tell if the struct actually gets used in a macro
   and I have to eyeball for a struct which is assigned to as
   a variable at declaration time.

  b) The dead structs are all over; so they've mostly been individual
  patches rather than a big patch series - how do people feel about
  another 150ish similar patches ?

 ** other Thoughts
  * The dead code is all over; from incredibly obscure drivers, to
   common stuff like ftrace and iscsi.

  * Most of them seem to be the remains of previous cleanups or
   refactors where someone has removed the function that used
   it but forgot to remove the list or struct.  Sometimes that happened
   prior to the first commit, so it's always been dead in the tree.

  * There's a few cases where people have added 'static' to a variable
   to cleanup compiler warnings, but actually they just needed to
   delete the variable.

  * A harder problem is unused structure members; some I've spotted
   by accident, some follow from what else I delete; e.g. if you
   delete a LIST_HEAD, there's a good chance there's a struct somewhere
   with the list entry in it that's no longer used.

  * It's not just the kernel; I've just mopped up a few struct's in Mesa
    as well; but different coding standards make the script harder in
    places; e.g. X uses typedef struct... everywhere so then you have
    the problem of hunting the use of the typedef name.

Anyway, that's way too long, all thoughts welcome.
(Reviews, even more welcome, as they get merged I'll work through my list
for a few more).

Scripts below,

Dave
(I've cc'd a few people at a guess for people who might suggest tools;
but please copy in anyone else who might)

* hacky script for finding unused LIST_HEAD

  (ie print a count of times the name of the list is used in the same file;
   if it's 1 it's worth looking at)

ag 'static LIST_HEAD'| sed -e 's/[():]/ /'g |
while read FNAME LINE STATIC DEF VARNAME TRAIL
do
echo ">>>" $FNAME ' : ' $VARNAME
echo -n "Count: "
grep $VARNAME $FNAME | wc -l 
grep $VARNAME $FNAME
done

* hacky script for finding unused struct's

  (ie print a count of time the name of the struct is used in the same file;
  only bother with .c files; doesn't spot assignments to initialise the struct
  on later lines, gets confused by struct's with short names etc).

grep -r '^struct [^(=]* {'| tr ':' ' ' |
while read FNAME STRUCT NAME TAIL
do
  echo "$FNAME" | grep -q '[.]c$' || continue
echo ">>>" $FNAME ' : ' $NAME
echo -n "Count: "
grep $NAME $FNAME | wc -l 
#grep $VARNAME $FNAME
done

-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thoughts wanted on dead code hunting?
  2024-05-09 12:08 thoughts wanted on dead code hunting? Dr. David Alan Gilbert
@ 2024-05-15  0:14 ` Kees Cook
  2024-05-16  0:40   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 5+ messages in thread
From: Kees Cook @ 2024-05-15  0:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: linux-kernel, nathan, luc.vanoostenryck

On Thu, May 09, 2024 at 12:08:56PM +0000, Dr. David Alan Gilbert wrote:
>   That's found me ~200 candidates; where I guess 150ish are probably
> real; but my hacky script is, well trivial and hacky, so they each
> need eyeballing, then a git lookup to see why they're unused, and a
> compile just to make there's not some subtle macro somewhere.

Nice finds! People are usually big fans of code removal patches. :)

>  ** Questions:
>   a) Can anyone think of a better tool than my script (see bottom)?
>    The simplicity is a blessing & a curse - it doesn't know about
>    #ifdef's so I don't need to try lots of configs, but at the same
>    time, it can't tell if the struct actually gets used in a macro
>    and I have to eyeball for a struct which is assigned to as
>    a variable at declaration time.

I'm not sure I've seen anything better.

I tend to use stuff like Coccinelle (spatch) for finding specific struct
usage, but it can sometimes be slow when trying to process headers
recursively. e.g.:

// Options: --recursive-includes
@find@
struct to_be_removed INSTANCE;
struct to_be_removed *POINTER;

(
*       INSTANCE
|
*       POINTER
)


(I bet this could be improved, but it should be a usable example.)

So this might very a given struct isn't used.

>   b) The dead structs are all over; so they've mostly been individual
>   patches rather than a big patch series - how do people feel about
>   another 150ish similar patches ?

Generally the smaller patches are preferred. For this kind of thing,
though, I'd probably collect them by individual header files, rather
than one-patch-per-struct.

If you have one giant patch, this tool can help break it up into
per-subsystem patches (it isn't perfect, but does its best):
https://github.com/kees/kernel-tools/blob/trunk/split-on-maintainer

>   * There's a few cases where people have added 'static' to a variable
>    to cleanup compiler warnings, but actually they just needed to
>    delete the variable.

Hah. Yeah, these are nice to find and remove.

>   * A harder problem is unused structure members; some I've spotted
>    by accident, some follow from what else I delete; e.g. if you
>    delete a LIST_HEAD, there's a good chance there's a struct somewhere
>    with the list entry in it that's no longer used.

This is especially tricky because a giant amount of structs in the
kernel actually describe over-the-wire or on-hardware structures that
maybe the kernel doesn't care about all the members, but they're still
needed to keep the layout correct.

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thoughts wanted on dead code hunting?
  2024-05-15  0:14 ` Kees Cook
@ 2024-05-16  0:40   ` Dr. David Alan Gilbert
  2024-05-16  3:24     ` Kees Cook
  0 siblings, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2024-05-16  0:40 UTC (permalink / raw)
  To: Kees Cook; +Cc: linux-kernel, nathan, luc.vanoostenryck

* Kees Cook (keescook@chromium.org) wrote:
> On Thu, May 09, 2024 at 12:08:56PM +0000, Dr. David Alan Gilbert wrote:
> >   That's found me ~200 candidates; where I guess 150ish are probably
> > real; but my hacky script is, well trivial and hacky, so they each
> > need eyeballing, then a git lookup to see why they're unused, and a
> > compile just to make there's not some subtle macro somewhere.
> 
> Nice finds! People are usually big fans of code removal patches. :)

Thanks; removing the LIST_HEADs actually saves bytes in the binary;
just removing the structs themselves still cleans up the source and
occasionally it's noticing something else left along with it.

> >  ** Questions:
> >   a) Can anyone think of a better tool than my script (see bottom)?
> >    The simplicity is a blessing & a curse - it doesn't know about
> >    #ifdef's so I don't need to try lots of configs, but at the same
> >    time, it can't tell if the struct actually gets used in a macro
> >    and I have to eyeball for a struct which is assigned to as
> >    a variable at declaration time.
> 
> I'm not sure I've seen anything better.
> 
> I tend to use stuff like Coccinelle (spatch) for finding specific struct
> usage, but it can sometimes be slow when trying to process headers
> recursively. e.g.:
> 
> // Options: --recursive-includes
> @find@
> struct to_be_removed INSTANCE;
> struct to_be_removed *POINTER;
> 
> (
> *       INSTANCE
> |
> *       POINTER
> )
> 
> 
> (I bet this could be improved, but it should be a usable example.)

Hmm, now if I could use coccinelle it would be more tolerant of coding
style and slight variations than my script.
However, trying that tiny example, I get:
  File "play.cocci", line 10, column 1, charpos = 141
    around = '',
    whole content = )

so it seems to be objecting to something at the end of the file?
I ran that with:

make coccicheck COCCI=play.cocci M=arch/x86
with Fedora 40's coccinelle-1.1.1-30.20230624git0afff7f.fc40.x86_64

> So this might very a given struct isn't used.
> 
> >   b) The dead structs are all over; so they've mostly been individual
> >   patches rather than a big patch series - how do people feel about
> >   another 150ish similar patches ?
> 
> Generally the smaller patches are preferred. For this kind of thing,
> though, I'd probably collect them by individual header files, rather
> than one-patch-per-struct.

Yeh; although note so far I've only been looking for unused structs
that are defined in a .c file rather than ones in headers.
Those are relatively easy to find, because I'm only looking in one .c
at a time (although that does hit corner cases like:

  header:
       struct foo;
       struct baa {
          struct foo *p;
       };

  .c file:
       struct foo {
         stuff
       };

       func(struct baa *b) {
          b->p  something
       }

  so foo is defined in the C file but the symbol 'foo' is never
used again in it.

> If you have one giant patch, this tool can help break it up into
> per-subsystem patches (it isn't perfect, but does its best):
> https://github.com/kees/kernel-tools/blob/trunk/split-on-maintainer

Thanks.

> >   * There's a few cases where people have added 'static' to a variable
> >    to cleanup compiler warnings, but actually they just needed to
> >    delete the variable.
> 
> Hah. Yeah, these are nice to find and remove.
> 
> >   * A harder problem is unused structure members; some I've spotted
> >    by accident, some follow from what else I delete; e.g. if you
> >    delete a LIST_HEAD, there's a good chance there's a struct somewhere
> >    with the list entry in it that's no longer used.
> 
> This is especially tricky because a giant amount of structs in the
> kernel actually describe over-the-wire or on-hardware structures that
> maybe the kernel doesn't care about all the members, but they're still
> needed to keep the layout correct.

Oh yeh; and also I'm not deleting unused struct's if they look like they're
describing some firmware or hardware struct, even if the kernel doesn't currently
use it.

Dave

> -Kees
> 
> -- 
> Kees Cook
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thoughts wanted on dead code hunting?
  2024-05-16  0:40   ` Dr. David Alan Gilbert
@ 2024-05-16  3:24     ` Kees Cook
  2024-05-16 11:56       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 5+ messages in thread
From: Kees Cook @ 2024-05-16  3:24 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: linux-kernel, nathan, luc.vanoostenryck

On Thu, May 16, 2024 at 12:40:40AM +0000, Dr. David Alan Gilbert wrote:
> * Kees Cook (keescook@chromium.org) wrote:
> > // Options: --recursive-includes
> > @find@
> > struct to_be_removed INSTANCE;
> > struct to_be_removed *POINTER;

Oops, I missed this line:

@@

> > 
> > (
> > *       INSTANCE
> > |
> > *       POINTER
> > )
> > 
> > 
> > (I bet this could be improved, but it should be a usable example.)
> 
> Hmm, now if I could use coccinelle it would be more tolerant of coding
> style and slight variations than my script.
> However, trying that tiny example, I get:
>   File "play.cocci", line 10, column 1, charpos = 141
>     around = '',
>     whole content = )
> 
> so it seems to be objecting to something at the end of the file?

Sorry, yes, missed the "@@" between the identifiers and the code pattern
to match.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thoughts wanted on dead code hunting?
  2024-05-16  3:24     ` Kees Cook
@ 2024-05-16 11:56       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2024-05-16 11:56 UTC (permalink / raw)
  To: Kees Cook; +Cc: linux-kernel, nathan, luc.vanoostenryck

* Kees Cook (keescook@chromium.org) wrote:
> On Thu, May 16, 2024 at 12:40:40AM +0000, Dr. David Alan Gilbert wrote:
> > * Kees Cook (keescook@chromium.org) wrote:
> > > // Options: --recursive-includes
> > > @find@
> > > struct to_be_removed INSTANCE;
> > > struct to_be_removed *POINTER;
> 
> Oops, I missed this line:
> 
> @@
> 
> > > 
> > > (
> > > *       INSTANCE
> > > |
> > > *       POINTER
> > > )
> > > 
> > > 
> > > (I bet this could be improved, but it should be a usable example.)
> > 
> > Hmm, now if I could use coccinelle it would be more tolerant of coding
> > style and slight variations than my script.
> > However, trying that tiny example, I get:
> >   File "play.cocci", line 10, column 1, charpos = 141
> >     around = '',
> >     whole content = )
> > 
> > so it seems to be objecting to something at the end of the file?
> 
> Sorry, yes, missed the "@@" between the identifiers and the code pattern
> to match.

Ah right, that's done it - I'd noticed there wasn't a @@ but couldn't
figure out where to put them.

After a bit of guessing, I also added a 

virtual context

at the top and now:

make coccicheck COCCI=play.cocci M=arch/x86 MODE=context

works - thanks!  I'll have a play.

Dave

> -- 
> Kees Cook
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-05-16 11:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-09 12:08 thoughts wanted on dead code hunting? Dr. David Alan Gilbert
2024-05-15  0:14 ` Kees Cook
2024-05-16  0:40   ` Dr. David Alan Gilbert
2024-05-16  3:24     ` Kees Cook
2024-05-16 11:56       ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox