All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Rasmus Villemoes <rv-qQsb+v5E8BnlAoU/VqSP6n9LOBIZ5rWg@public.gmane.org>
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] check_protypes.pl: semi-automatic consistency checks
Date: Thu, 22 May 2014 08:39:03 +0200	[thread overview]
Message-ID: <537D9B87.1070200@gmail.com> (raw)
In-Reply-To: <1400544046-4503-1-git-send-email-rv-qQsb+v5E8BnlAoU/VqSP6n9LOBIZ5rWg@public.gmane.org>

On 05/20/2014 02:00 AM, Rasmus Villemoes wrote:
> Most SYNOPSES are almost compilable as-is. Doing so may cause the
> compiler to barf something about conflicting prototypes, which means
> that we've found an inconsistency between the man-pages and the
> installed headers. One then needs to manually check (e.g., consulting
> POSIX, the linux kernel or some other source) who's right.
> 
> This script is an attempt at automating the task of extracting the
> synopsis, removing non-code phrases which are present in many
> synopses, and running gcc on the result. All temporary files are
> created in /tmp/somedir; if a particular man-page passes with no
> remarks, those are automatically cleaned up. Otherwise, we leave them
> for the user to inspect.
> 
> I'm not sure whether it is worth including in the git repository, but
> since I just had an rm -rf accident followed by a successful first
> experience with extundelete, I want to make sure that these bits reach
> a machine where my stupid fingers can't touch them.

Thanks for this script Rasmus. I'll save it for later use.
A question (which you may or may not be interested in ;-)):
How feasible do you think it would be to write a script that
tells me the FTM requirements for a given API? (Note that the
answer will very across glibc versions, so such a script would 
need as input both header files and and some glibc-version-specific
table mapping the __USE internal macros back to FTM settings.

Cheers,

Michael


> Signed-off-by: Rasmus Villemoes <rv-qQsb+v5E8BnlAoU/VqSP6n9LOBIZ5rWg@public.gmane.org>
> ---
>  scripts/check_proto_arch.txt |   3 +
>  scripts/check_proto_skip.txt |  44 +++++++++
>  scripts/check_prototypes.pl  | 223 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 270 insertions(+)
>  create mode 100644 scripts/check_proto_arch.txt
>  create mode 100644 scripts/check_proto_skip.txt
>  create mode 100755 scripts/check_prototypes.pl
> 
> diff --git a/scripts/check_proto_arch.txt b/scripts/check_proto_arch.txt
> new file mode 100644
> index 0000000..568557e
> --- /dev/null
> +++ b/scripts/check_proto_arch.txt
> @@ -0,0 +1,3 @@
> +perfmonctl.2	IA-64
> +spu_create.2	ppc
> +spu_run.2	ppc
> diff --git a/scripts/check_proto_skip.txt b/scripts/check_proto_skip.txt
> new file mode 100644
> index 0000000..7d4b4ff
> --- /dev/null
> +++ b/scripts/check_proto_skip.txt
> @@ -0,0 +1,44 @@
> +# The usual conventions: Empty lines and lines starting with # are
> +# ignored. Other lines are supposed to contain key-value pairs for the
> +# %skip hash. The key is a man-page to skip, the value is optional and
> +# can for example be a comment explaining why we skip it.
> +
> +_syscall.2
> +arch_prctl.2
> +bdflush.2
> +
> +eventfd.2	<http://thread.gmane.org/gmane.comp.lib.glibc.alpha/41725>
> +
> +# nmask is not const, and addr is unsigned long, not void*.
> +get_mempolicy.2	numactl
> +
> +# keyutils.h should include <sys/types.h>, since it uses uid_t, gid_t,
> +# size_t
> +add_key.2	libkeyutils 
> +request_key.2	libkeyutils 
> +keyctl.2	libkeyutils
> +
> +# prctl() is really a varargs function
> +prctl.2
> +# ptrace() is really a varargs function
> +ptrace.2
> +reboot.2
> +# There's no reasonable way to check setpgid.2 automatically...
> +setpgid.2
> +recvmmsg.2	<https://bugzilla.kernel.org/show_bug.cgi?id=75371>
> +
> +# Can't check pseudo-prototypes for macros assert{,_perror}
> +assert.3
> +assert_perror.3
> +
> +cfree.3
> +# Can't check pseudo-prototypes for macros CMSG_*
> +cmsg.3
> +
> +# htobe16 and friends are really macros
> +endian.3
> +
> +# macros
> +fpclassify.3
> +
> +finite.3
> diff --git a/scripts/check_prototypes.pl b/scripts/check_prototypes.pl
> new file mode 100755
> index 0000000..f5d00c4
> --- /dev/null
> +++ b/scripts/check_prototypes.pl
> @@ -0,0 +1,223 @@
> +#!/usr/bin/perl
> +#
> +# File: check_prototypes.pl
> +# Time-stamp: <2014-05-20 01:30:47 villemoes>
> +# Author: Rasmus Villemoes
> +#
> +# Usage: ./check_prototypes.pl ../man[23]/some_man_pages
> +#
> +# The basic idea behind the script is rather simple: Extract the
> +# SYNOPSIS from the man-page, remove text which is often present, and
> +# hope that the remainder is valid C. Try to compile it, and if gcc
> +# complains, it may be because the prototypes in the SYNOPSIS does not
> +# match those provided by the #included header files.
> +
> +use strict;
> +use warnings;
> +
> +use File::Temp qw/ tempfile tempdir /;
> +use File::Basename;
> +use File::Slurp;
> +use List::Util qw/max/;
> +
> +my $verbose = 2;
> +my $tmpd = tempdir("manpagecheck_XXXXXX", TMPDIR => 1, CLEANUP => 0);
> +my $CC = "gcc";
> +
> +
> +my %has_header_cache = ();
> +sub has_header {
> +    my $h = shift;
> +    return $has_header_cache{$h} if exists $has_header_cache{$h};
> +
> +    # Check the obvious place first.
> +    if (-r "/usr/include/$h") {
> +	$has_header_cache{$h} = 1;
> +	return 1;
> +    }
> +    # Now ask gcc.
> +    my $cfile = "${tmpd}/check_header.c";
> +    write_file($cfile, "#include <${h}>\n")
> +	or die "error writing temporary file $cfile: $!";
> +    system("${CC} -E ${cfile} > /dev/null 2> /dev/null");
> +    $has_header_cache{$h} = ($? == 0);
> +    unlink $cfile;
> +    return $has_header_cache{$h};
> +}
> +
> +
> +sub msg {
> +    my $pri = shift;
> +    return if $verbose < $pri;
> +    my $fmt = shift;
> +    my $s = sprintf $fmt, @_;
> +    $s .= "\n" unless $s =~ m/\n$/;
> +    print STDOUT $s;
> +};
> +
> +
> +sub read_hash {
> +    my $href = shift;
> +    my $file = shift;
> +    return unless -e $file;
> +    open(my $fh, '<', $file)
> +	or die "unable to open $file: $!";
> +    while (<$fh>) {
> +	chomp;
> +	s/^\s+//;
> +	next if $_ eq '';
> +	next if m/^#/;
> +	my ($key, $val) = split /\s+/, $_, 2;
> +	$href->{$key} = $val;
> +    }
> +}
> +
> +# I skip some pages: In some cases, the interface is so messy
> +# (e.g. conflicting definitions by multiple standards, or some
> +# mysterious varargs function) that automatic checking is
> +# pointless. But it may also be the header files which are wrong; in
> +# some of those cases I've submitted a bug report to the appropriate
> +# instance.
> +my %skip;
> +my $skipfile = 'check_proto_skip.txt';
> +read_hash(\%skip, $skipfile);
> +
> +# Also hardcode a few arch-only syscalls.
> +# fixme: figure out a way to ensure $arch is "normalized" to one of "ia-64", "ppc", "x86_64", ...
> +my $arch = lc(qx(uname -p));
> +my %arch_only;
> +my $archfile = 'check_proto_arch.txt';
> +read_hash(\%arch_only, $archfile);
> +
> +
> +# Some synopses need a little tweaking before they are valid C.
> +my %tweaks;
> +
> +# remove the raw syscall prototype
> +$tweaks{'clone.2'} = sub { $_[0] =~ s/long clone\([^()]*\);//; };
> +
> +# remove partial struct definition.
> +$tweaks{'sched_setparam.2'} = sub { $_[0] =~ s/struct sched_param \{[^{}]+\};//; };
> +$tweaks{'swapon.2'} = sub { $_[0] =~ s/^\s*#include <asm\/page\.h>.*$//m; };
> +$tweaks{'open.2'} = sub {
> +    # open and openat are actually varargs functions, but creat is not.
> +    $_[0] =~ s/(open(?:at)?\(.*)mode_t mode/$1.../g;
> +    $_[0] =~ s/int open(?:at)?\(.*flags\)//g;
> +};
> +$tweaks{'open_by_handle_at.2'} = sub { $_[0] =~ s/^/struct file_handle;\n/; };
> +
> +# Remove the pseudo-prototypes of the function-like macros FD_*.
> +$tweaks{'select.2'} = $tweaks{'select_tut.2'}
> +    = sub { $_[0] =~ s/^\s*(int|void)\s+FD_[A-Z]+\(.*\);\s*$//mg; };
> +
> +$tweaks{'des_crypt.3'} = sub { $_[0] =~ s/^\s*int\s+DES_FAILED.*//m; };
> +
> +$tweaks{'exec.3'} = sub { $_[0] =~ s/\Q..., char * const envp[]\E/.../; };
> +
> +# Some interfaces are defined in terms of e.g. __pid_t, and only if
> +# sys/types.h is included does one get the appropriate typedefs. To
> +# avoid cluttering the man-pages with #include <sys/types.h>, we just
> +# fake it.
> +sub include_sys_types { $_[0] =~ s@^@#include <sys/types.h>\n@; }
> +$tweaks{'getrlimit.2'} = \&include_sys_types;
> +$tweaks{'getdirentries.3'} = \&include_sys_types;
> +
> +
> +my @trouble = ();
> +
> +for my $f (@ARGV) {
> +    my $base = basename($f);
> +
> +    next if (-s $f < 100); # crude check for a man link
> +    if (exists $skip{$base}) {
> +	msg(2, "skipping %s: %s", $f, $skip{$base} // "explicitly excluded");
> +	next;
> +    }
> +    if (exists $arch_only{$base} && $arch ne lc($arch_only{$base})) {
> +	msg(2, "skipping %s: %s only", $f, $arch_only{$base});
> +	next;
> +    }
> +
> +    # fixme: is there a better way to get the man-page stripped of all formatting?
> +    my $manpage = qx/MANWIDTH=2000 man $f/;
> +    if (!($manpage =~ m/SYNOPSIS(.*?)DESCRIPTION/s)) {
> +	msg(1, "skipping %s: missing SYNOPSIS\n", $f);
> +	next;
> +    };
> +    my $synops = $1;
> +
> +    # Remove text which is present in some synopses. Matching against
> +    # rather specific strings helps to ensure that a consistent style
> +    # is used throughout the man-pages (because if some text is not
> +    # removed by this, gcc will complain).
> +    $synops =~ s/^\s*Feature Test Macro Requirements for.*//sm;
> +    $synops =~ s/^\s*Link with -l.*\.\s*$//m;
> +    $synops =~ s/^\s*Each of these requires linking with -l.*\.\s*$//m; # encrypt.3 uses this wording
> +    $synops =~ s/^\s*Note: There (is|are) no glibc wrappers? for th(is|ese) system calls?; see NOTES\.\s*$//m;
> +    $synops =~ s/^\s*See NOTES for information on feature test macro requirements\.\s*$//m;
> +
> +    # If the synopsis mentions _GNU_SOURCE, we define it to check as
> +    # much as possible. But we don't unconditionally define it: We get
> +    # a sort-of false positive for some files (getitimer.2,
> +    # getrlimit.2 etc.), since various headers play a game with
> +    # typedef'ing __foobar_t as an enum type if __USE_GNU, and int
> +    # otherwise.
> +    my $gnu_source = ($synops =~ s/^\s*#define _GNU_SOURCE\b.*//m) ? '-D_GNU_SOURCE' : '';
> +    my $xopen_source = 0;
> +    while ($synops =~ s/^\s*#define _XOPEN_SOURCE (\b[0-9]+\b)?.*//m) {
> +	$xopen_source = max($xopen_source, defined $1 ? $1 : 1);
> +    }
> +    $xopen_source = $xopen_source ? "-D_XOPEN_SOURCE=${xopen_source}" : '';
> +    my $bsd_source = ($synops =~ s/^\s*#define _BSD_SOURCE\b.*//m) ? '-D_BSD_SOURCE' : '';
> +
> +    # Apply individual tweaks.
> +    $tweaks{$base}($synops) if exists $tweaks{$base};
> +
> +    # Find all needed headers.
> +    my @headers = ($synops =~ m/#include <([^>]+)>/g);
> +    if (!@headers) {
> +	msg(1, "skipping %s: no header files mentioned in SYNOPSIS\n", $f);
> +	next;
> +    }
> +    my @missing_headers = grep {!has_header($_)} @headers;
> +    if (@missing_headers) {
> +	msg(1, "skipping %s: missing header file(s) %s\n", $f, join(",", @missing_headers));
> +	next;
> +    }
> +
> +    my $cfile = "${tmpd}/${base}.c";
> +    my $auxfile = "${tmpd}/${base}.aux";
> +    my $outfile = "${tmpd}/${base}.out";
> +    my $errfile = "${tmpd}/${base}.err";
> +    my $cmdfile = "${tmpd}/${base}.cmdline";
> +
> +    my $cmdline = "${CC} ${gnu_source} ${xopen_source} ${bsd_source} " .
> +	"-aux-info ${auxfile} -c -o /dev/null ${cfile} > ${outfile} 2> ${errfile}";
> +    write_file($cfile, $synops)
> +	or die "error writing temporary file $cfile: $!";
> +    write_file($cmdfile, $cmdline . "\n")
> +	or die "error writing temporary file $cmdfile: $!";
> +
> +    system($cmdline);
> +
> +    if ($? == 0 && -s $errfile == 0) {
> +    	unlink $cfile, $auxfile, $outfile, $errfile, $cmdfile
> +    	    or die "error cleaning up: $!";
> +    }
> +    else {
> +	push @trouble, $base;
> +    }
> +}
> +
> +if (@trouble) {
> +    print "Problems encountered with the following files:\n";
> +    for (@trouble) {
> +	print "  $_\n";
> +    }
> +    print "The files in the directory ${tmpd} contain the details.\n";
> +}
> +else {
> +    rmdir $tmpd;
> +}
> +
> +
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2014-05-22  6:39 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-20  0:00 [PATCH] check_protypes.pl: semi-automatic consistency checks Rasmus Villemoes
     [not found] ` <1400544046-4503-1-git-send-email-rv-qQsb+v5E8BnlAoU/VqSP6n9LOBIZ5rWg@public.gmane.org>
2014-05-22  6:39   ` Michael Kerrisk (man-pages) [this message]
     [not found]     ` <537D9B87.1070200-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-05-22  9:09       ` Rasmus Villemoes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=537D9B87.1070200@gmail.com \
    --to=mtk.manpages-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=rv-qQsb+v5E8BnlAoU/VqSP6n9LOBIZ5rWg@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.