* -fanalyzer thoughts
@ 2022-09-14 12:43 Kees Cook
2022-09-26 17:11 ` Koschel, J. (Jakob)
2022-09-26 21:11 ` David Malcolm
0 siblings, 2 replies; 3+ messages in thread
From: Kees Cook @ 2022-09-14 12:43 UTC (permalink / raw)
To: David Malcolm; +Cc: linux-hardening, j.koschel
Hi!
Thanks for the talk today! I sent a patch for the aic79xx_osm.c issue
you mentioned:
https://lore.kernel.org/linux-hardening/20220914115953.3854029-1-keescook@chromium.org/
I didn't have a chance to add some more comments and ask a question
before the session ended, so here I am in email, CCing the kernel
hardening list in case other folks want to chime in. :)
You asked, "Should I try to have GCC type-check __user vs __kernel,
or leave it to sparse?" I would *love* to get this in the compiler
proper. Not nearly enough people are running sparse, so its output has
become quite noisy, which means more and more regressions are slipping
into the kernel. I was surprised a while back to discover that kernel's
use of the address_space and noderef attributes weren't supported by
GCC. It does seems like it'd make a good attribute (for which there
is existing precedent), rather than polluting the global namespace,
as AVR does:
https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
Clang seems to support the address_space and noderef attributes:
https://clang.llvm.org/docs/LanguageExtensions.html#memory-references-to-specified-segments
https://clang.llvm.org/docs/AttributeReference.html#noderef
But when I tried a while back to make it work, it fell over:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=clang/address_space&id=beff911c13390a71b3f7921fd82ec6a71ca75c02
If these get implemented in GCC, it'd be good to coordinate with Clang
too, to make sure it works sanely in the kernel.
The question I had was if you had seen this LPC presentation:
https://lpc.events/event/16/contributions/1211/
"How I started chasing speculative type confusion bugs in the kernel and
ended up with 'real' ones"
The authors used Clang's "Data Flow Sanitizer" and built a working
taint/sink system that seems like it could be used for MUCH more analysis
than just what they were looking it (as they hint at too).
https://clang.llvm.org/docs/DataFlowSanitizer.html
https://github.com/vusec/kdfsan-linux/commit/45614ee1a3a0d7b98c5cecb1b747184279bc615c
I wonder if DFSan could be ported to GCC? It seems to overlap logically
with some of the -fanalyzer work, but I don't know the internals for
either, so I likely have no idea what I'm talking about. ;)
Related, I wonder if LTO builds would help with -fanalyzer's control
flow analysis? (DFSan requires LTO.) Getting the kernel built with LTO
under GCC seems to be an on-going project, but no pull requests have
been sent lately:
https://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/linux.git/log/?h=lto
Maybe poking them from your side might help that get landed? I think
people are interested in having LTO for the kernel for the performance
gains it can provide.
The second-to-last slide in my presentation (in the "bonus slides"
section) has slightly more context about LTO and the kernel:
https://lpc.events/event/16/contributions/1173/
https://outflux.net/slides/2022/lpc/features.pdf
Thanks!
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: -fanalyzer thoughts
2022-09-14 12:43 -fanalyzer thoughts Kees Cook
@ 2022-09-26 17:11 ` Koschel, J. (Jakob)
2022-09-26 21:11 ` David Malcolm
1 sibling, 0 replies; 3+ messages in thread
From: Koschel, J. (Jakob) @ 2022-09-26 17:11 UTC (permalink / raw)
To: Kees Cook; +Cc: David Malcolm, linux-hardening@vger.kernel.org
Hi,
> On 14. Sep 2022, at 14:43, Kees Cook <keescook@chromium.org> wrote:
>
> Hi!
>
> [...]
>
>
> The question I had was if you had seen this LPC presentation:
> https://lpc.events/event/16/contributions/1211/
> "How I started chasing speculative type confusion bugs in the kernel and
> ended up with 'real' ones"
>
> The authors used Clang's "Data Flow Sanitizer" and built a working
> taint/sink system that seems like it could be used for MUCH more analysis
> than just what they were looking it (as they hint at too).
> https://clang.llvm.org/docs/DataFlowSanitizer.html
> https://github.com/vusec/kdfsan-linux/commit/45614ee1a3a0d7b98c5cecb1b747184279bc615c
>
> I wonder if DFSan could be ported to GCC? It seems to overlap logically
> with some of the -fanalyzer work, but I don't know the internals for
> either, so I likely have no idea what I'm talking about. ;)
I'm not sure how much dynamic and static taint analysis would actually
share but I bet there are a few things that they would still have in common.
David, your talk on static taint analysis with GCC in the kernel was
really great!
I think the attributes you were talking about to mark
e.g. user controlled input (syscall inputs & user copies) is something
any taint analysis would benefit from! We added custom markers for this
in our linux repo fork but if they would be upstreamed any type of
taint analysis tool could benefit from them (GCC & CLANG: static &
dynamic analysis, or something completely else).
DFSan has max 256 taint labels if I remember correctly.
I believe this was also mentioned by someone in the audience:
Different taint labels, for example, for MSR reads, syscall arguments
& network input would definitely be useful instead of a generic __tainted
argument.
>
>
> Related, I wonder if LTO builds would help with -fanalyzer's control
> flow analysis? (DFSan requires LTO.) Getting the kernel built with LTO
> under GCC seems to be an on-going project, but no pull requests have
> been sent lately:
> https://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/linux.git/log/?h=lto
> Maybe poking them from your side might help that get landed? I think
> people are interested in having LTO for the kernel for the performance
> gains it can provide.
I *think* DFSan (like Asan) does not depend on LTO, it can also run at
compile time. In our project LTO was helpful because it was simply
easier to run LLVM passes at that stage and we didn't have to
recompile when adjusting our passes (only relink).
Regardless seeing LTO also with GCC would be really great :)
>
> The second-to-last slide in my presentation (in the "bonus slides"
> section) has slightly more context about LTO and the kernel:
> https://lpc.events/event/16/contributions/1173/
> https://outflux.net/slides/2022/lpc/features.pdf
>
> Thanks!
>
> -Kees
>
> --
> Kees Cook
Btw this is the paper [1] I was mentioning that looked at an attack vector
on the hardware - OS boundary. It might have some interesting pointers
on what could be tainted except for system call arguments.
- Jakob
[1] https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_04A-1_Song_paper.pdf
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: -fanalyzer thoughts
2022-09-14 12:43 -fanalyzer thoughts Kees Cook
2022-09-26 17:11 ` Koschel, J. (Jakob)
@ 2022-09-26 21:11 ` David Malcolm
1 sibling, 0 replies; 3+ messages in thread
From: David Malcolm @ 2022-09-26 21:11 UTC (permalink / raw)
To: Kees Cook; +Cc: linux-hardening, j.koschel
On Wed, 2022-09-14 at 05:43 -0700, Kees Cook wrote:
> Hi!
>
> Thanks for the talk today! I sent a patch for the aic79xx_osm.c issue
> you mentioned:
> https://lore.kernel.org/linux-hardening/20220914115953.3854029-1-keescook@chromium.org/
Thanks!
>
> I didn't have a chance to add some more comments and ask a question
> before the session ended, so here I am in email, CCing the kernel
> hardening list in case other folks want to chime in. :)
Sorry for the belated response (back-to-back conferences and travel).
>
> You asked, "Should I try to have GCC type-check __user vs __kernel,
> or leave it to sparse?" I would *love* to get this in the compiler
> proper. Not nearly enough people are running sparse, so its output
> has
> become quite noisy, which means more and more regressions are
> slipping
> into the kernel. I was surprised a while back to discover that
> kernel's
> use of the address_space and noderef attributes weren't supported by
> GCC. It does seems like it'd make a good attribute (for which there
> is existing precedent), rather than polluting the global namespace,
> as AVR does:
> https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
>
> Clang seems to support the address_space and noderef attributes:
> https://clang.llvm.org/docs/LanguageExtensions.html#memory-references-to-specified-segments
> https://clang.llvm.org/docs/AttributeReference.html#noderef
> But when I tried a while back to make it work, it fell over:
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=clang/address_space&id=beff911c13390a71b3f7921fd82ec6a71ca75c02
> If these get implemented in GCC, it'd be good to coordinate with
> Clang
> too, to make sure it works sanely in the kernel.
I've been experimenting with implementing this in GCC.
It turned out that GCC's bugzilla had a bunch of existing RFE bugs for
sparse support filed back in 2014, so I've created a tracker bug to
make it easier to find them; see:
https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=sparse
and I'm hoping to get at least some of this done for GCC 13 (though
feature freeze is about 5 weeks away...)
>
>
> The question I had was if you had seen this LPC presentation:
> https://lpc.events/event/16/contributions/1211/
> "How I started chasing speculative type confusion bugs in the kernel
> and
> ended up with 'real' ones"
>
> The authors used Clang's "Data Flow Sanitizer" and built a working
> taint/sink system that seems like it could be used for MUCH more
> analysis
> than just what they were looking it (as they hint at too).
> https://clang.llvm.org/docs/DataFlowSanitizer.html
> https://github.com/vusec/kdfsan-linux/commit/45614ee1a3a0d7b98c5cecb1b747184279bc615c
>
> I wonder if DFSan could be ported to GCC? It seems to overlap
> logically
> with some of the -fanalyzer work, but I don't know the internals for
> either, so I likely have no idea what I'm talking about. ;)
Thanks for the links, both Kasper and DFSan look really interesting.
If I'm reading things right DFSan seems to be a run-time thing,
modifying the generated code to sanitize it, whereas GCC's -fanalyzer
is a compile-time thing, so I don't think it's directly compatible.
>
>
> Related, I wonder if LTO builds would help with -fanalyzer's control
> flow analysis? (DFSan requires LTO.)
> Getting the kernel built with LTO
> under GCC seems to be an on-going project, but no pull requests have
> been sent lately:
> https://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/linux.git/log/?h=lto
> Maybe poking them from your side might help that get landed? I think
> people are interested in having LTO for the kernel for the
> performance
> gains it can provide.
Unfortunately, building with LTO tends to break -fanalyzer by exploding
the complexity of the analysis: I have an implementation of call
summarization to try to tame this, but it's buggy. So a fair amount of
work would need to happen at the -fanalyzer side in addition to getting
the kernel to just build with LTO, so it's not been a priority for me.
>
> The second-to-last slide in my presentation (in the "bonus slides"
> section) has slightly more context about LTO and the kernel:
> https://lpc.events/event/16/contributions/1173/
> https://outflux.net/slides/2022/lpc/features.pdf
>
Thanks; this is all very helpful
Dave
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-09-26 21:12 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-14 12:43 -fanalyzer thoughts Kees Cook
2022-09-26 17:11 ` Koschel, J. (Jakob)
2022-09-26 21:11 ` David Malcolm
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox