Re: [RFC 0/8] Introduce an extensible static analyzer

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Alberto Faria <afaria@redhat.com>
Cc: qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	qemu-block@nongnu.org, "Denis V. Lunev" <den@openvz.org>,
	Emanuele Giuseppe Esposito <eesposit@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Ronnie Sahlberg <ronniesahlberg@gmail.com>,
	Hanna Reitz <hreitz@redhat.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>, Peter Xu <peterx@redhat.com>,
	Alberto Garcia <berto@igalia.com>, John Snow <jsnow@redhat.com>,
	Eric Blake <eblake@redhat.com>, Fam Zheng <fam@euphon.net>,
	Markus Armbruster <armbru@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>,
	Peter Lieven <pl@kamp.de>
Subject: Re: [RFC 0/8] Introduce an extensible static analyzer
Date: Tue, 5 Jul 2022 17:12:01 +0100	[thread overview]
Message-ID: <YsRi0SfAK3SVwQ2H@redhat.com> (raw)
In-Reply-To: <CAELaAXxdBvtxf2fXu1OxerBH+dTY_iti8CF-GMgGZpaWxgK_Gg@mail.gmail.com>

On Tue, Jul 05, 2022 at 12:28:55PM +0100, Alberto Faria wrote:
> On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >      for i in `git ls-tree --name-only -r HEAD:`
> >      do
> >         clang-tidy $i 1>/dev/null 2>&1
> >      done
> 
> All of those invocations are probably failing quickly due to missing
> includes and other problems, since the location of the compilation
> database and some other arguments haven't been specified.

Opps yes, I was waaaay too minimalist in testing that.

> 
> Accounting for those problems (and enabling just one random C++ check):
> 
>     $ time clang-tidy -p build \
>         --extra-arg-before=-Wno-unknown-warning-option \
>         --extra-arg='-isystem [...]' \
>         --checks='-*,clang-analyzer-cplusplus.Move' \
>         $( find block -name '*.c' )
>     [...]
> 
>     real    3m0.260s
>     user    2m58.041s
>     sys     0m1.467s

Only analysing the block tree, but if we consider a static analysis
framework is desirable to use for whole of qemu.git, lets see the
numbers for everything.

What follows was done on  my P1 Gen2 thinkpad with 6 core / 12 threads,
where I use 'make -j 12' normally.

First as a benchmark, lets see a full compile of whole of QEMU (with
GCC) on Fedora 36 x86_64

    => 14 minutes

I find this waaaaay too slow though, so I typically configure QEMU with
--target-list=x86_64-softmmu since that suffices 90% of the time.

   => 2 minutes

A 'make check' on this x86_64-only build takes another 2 minutes.

Now, a static analysis baseline across the whole tree with default
tests enabled

 $ clang-tidy --quiet -p build $(git ls-tree -r --name-only HEAD: | grep '\.c$')

  => 45 minutes

wow, wasn't expecting it to be that slow !

Lets restrict to just the block/ dir

 $ clang-tidy --quiet -p build $(find block -name '*.c')

  => 4 minutes

And further restrict to just 1 simple check

 $ clang-tidy --quiet   --checks='-*,clang-analyzer-cplusplus.Move'  -p build $(find block -name '*.c')
  => 2 minutes 30

So extrapolated just that single check would probably work out
at 30 mins for the whole tree.

Overall this isn't cheap, and in the same order of magnitude
as a full compile. I guess this shouldn't be that surprising
really.

> Single-threaded static-analyzer.py without any checks:
> 
>     $ time ./static-analyzer.py build block -j 1
>     Analyzed 79 translation units in 16.0 seconds.
> 
>     real    0m16.665s
>     user    0m15.967s
>     sys     0m0.604s
> 
> And with just the 'return-value-never-used' check enabled for a
> somewhat fairer comparison:
> 
>     $ time ./static-analyzer.py build block -j 1 \
>         -c return-value-never-used
>     Analyzed 79 translation units in 61.5 seconds.
> 
>     real    1m2.080s
>     user    1m1.372s
>     sys     0m0.513s
> 
> Which is good news!

On my machine, a whole tree analysis allowing parallel execution
(iiuc no -j arg means use all cores):

  ./static-analyzer.py build  $(git ls-tree -r --name-only HEAD: | grep '\.c$'

   => 13 minutes

Or just the block layer

  ./static-analyzer.py build  $(find block -name '*.c')

   => 45 seconds

So your script is faster than clang-tidy, which suggests python probably
isn't the major dominating factor in speed, at least at this point in
time.

Still, a full tree analysis time of 13 minutes, compared to  my normal
'make' time of 2 minutes is an order of magnitude.

One thing that I noticed when doing this is that we can only really
do static analysis of files that we can actually compile on the host.
IOW, if on a Linux host, we don't get static analysis of code that
is targetted at FreeBSD / Windows platforms. Obvious really, since
libclang has to do a full parse and this will lack header files
for those platforms. That's just the tradeoff you have to accept
when using a compiler for static analysis, vs a tool that does
"dumb" string based regex matching to detect mistakes. Both kinds
of tools likely have their place for different tasks.

Overall I think a libclang based analysis tool will be useful, but
I can't see us enabling it as a standard part of 'make check'
given the time penalty.

Feels like something that'll have to be opt-in to a large degree
for regular contributors. In terms of gating CI though, it is less
of an issue, since we massively parallelize jobs. As long as we
have a dedicated build job just for running this static analysis
check in isolation, and NOT as 'make check' in all existing jobs,
it can happen in parallel with all the other build jobs, and we
won't notice the speed.

In summary, I think this approach is viable despite the speed
penalty provided we dont wire it into 'make check' by default.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

next prev parent reply	other threads:[~2022-07-05 16:13 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-02 11:33 [RFC 0/8] Introduce an extensible static analyzer Alberto Faria
2022-07-02 11:33 ` [RFC 1/8] Add " Alberto Faria
2022-07-02 11:33 ` [RFC 2/8] Drop some unused static function return values Alberto Faria
2022-07-02 11:33 ` [RFC 3/8] static-analyzer: Enforce coroutine_fn restrictions for direct calls Alberto Faria
2022-07-02 11:33 ` [RFC 4/8] Fix some direct calls from non-coroutine_fn to coroutine_fn Alberto Faria
2022-07-02 14:13   ` Paolo Bonzini
2022-07-03 22:20     ` Alberto Faria
2022-07-02 11:33 ` [RFC 5/8] static-analyzer: Enforce coroutine_fn restrictions on function pointers Alberto Faria
2022-07-04 14:16   ` Víctor Colombo
2022-07-04 16:57     ` Alberto Faria
2022-07-04 17:46       ` Víctor Colombo
2022-07-04 18:04         ` Alberto Faria
2022-07-04 19:06           ` Víctor Colombo
2022-07-02 11:33 ` [RFC 6/8] Fix some coroutine_fn indirect calls and pointer assignments Alberto Faria
2022-07-02 11:33 ` [RFC 7/8] block: Add no_coroutine_fn marker Alberto Faria
2022-07-02 11:33 ` [RFC 8/8] Avoid calls from coroutine_fn to no_coroutine_fn Alberto Faria
2022-07-02 14:17 ` [RFC 0/8] Introduce an extensible static analyzer Paolo Bonzini
2022-07-04 16:28 ` Daniel P. Berrangé
2022-07-04 19:30   ` Alberto Faria
2022-07-05  7:16     ` Daniel P. Berrangé
2022-07-05 11:28       ` Alberto Faria
2022-07-05 16:12         ` Daniel P. Berrangé [this message]
2022-07-06  9:54           ` Alberto Faria
2022-07-06 10:15             ` Daniel P. Berrangé
2022-07-08 17:18               ` Alberto Faria
2022-07-05 16:13 ` Daniel P. Berrangé
2022-07-06  9:56   ` Alberto Faria

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YsRi0SfAK3SVwQ2H@redhat.com \
    --to=berrange@redhat.com \
    --cc=afaria@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berto@igalia.com \
    --cc=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=eesposit@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=ronniesahlberg@gmail.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.