* [PATCH 0/10] Sparse linker
@ 2008-09-03 21:55 alexey.zaytsev
2008-09-03 21:55 ` [PATCH 01/10] Serialization engine alexey.zaytsev
[not found] ` <70318cbf0809031808u8610f3h4b3d53a7b76a7799@mail.gmail.com>
0 siblings, 2 replies; 23+ messages in thread
From: alexey.zaytsev @ 2008-09-03 21:55 UTC (permalink / raw)
To: linux-sparse; +Cc: Josh Triplett, Chris Li, Codrin Alexandru Grajdeanu
Hello.
I've been working on a "sparse linker" this summer as my Google
Summer of Code project. Wasn't neraly as productive as I hoped,
but I've got some results that I would like to share. Moreover,
I plan continuing this work, and would like to hear comments on
what was done so far.
The design didn't change much from what was proposed. We run
sparse to generate a "sparse object" file containing a list of
symbols, then run the "linker" to unite those object files into
bigger ones. This way, in the end we get a file containing all
the global symbols appearing in the program. After learning
more on the subject, I now agree that we should include the
intermediate code representation into the object files.
The implementation is built around a generic serialization
mechanism [PATCH 01]. It handles many sorts of complex data
structures, with pointers, cycles, unions, etc. E.g. it is able
to serialize beasts like the sparse pointer lists. The price
for this is a four byte overhead prepended to every
serializable structure by the allocation wrapper. Also, you
have to use a macro when declaring a serializable structure
(or an array of such) statically. One limitation I was unable
to overcome is the inability to work with structures used both
stand-alone and embedded into bigger ones. Luckily, we have no
such cases in the sparse codebase. The serializer produces C
code, containing the data structures beind serialized. For the
structure definitions, the generated code includes the original
headers, defining the structures. After serializing a bunch of
possibly interconnecded structures, and running cc over the
generated code, one might get a static or dynamic library
containing the copies of the serialized data structures, with
all the pointer interconnections included. This way loading
the data is trivial, and very memory efficient, and the whole
dump-restore process should be totally transparent, e.g. it
should be possible serialize the sparse() output, and run
check_symbols() after loading the data from an other program.
One thing that bothers me, is, if gcc would be able to process
the huge data files, containing all the "code" of bigger
projects like the Linux kernel. Will see.
Being able to serialize any data, generating the symbol lists
becomes as trivial as defining the data structures
corresponding to source files and symbols [PATCH 06], deriving
a symbol list from the sparse output, joining it into a ptr
list and serializing it [PATCH 07]. The linker needs to dlopen
the input "sparse objects", merge the symbol lists, and
serialize the result [PATCH 08]. The generated code compilation
is handled by the cgcc, cld and car wrappers [PATCH 09]. To
look up symbols in sparse object files, a simple program is
included [PATCH 10].
The plan is now to proceed with dumping the linearized code.
Please take a look at the code, ask if anything needs
clarification, and don't hesitate for criticism. If you've got
ideas on how the linker might be extended and used, or
have a different approach to the problem, please drop a message.
You can also look at the code at
http://svcs.cs.pdx.edu/gitweb?p=sparse-soc2008.git;a=shortlog;h=gsoc2008-linker
or grab it from
git://svcs.cs.pdx.edu/git/sparse-soc2008 branch gsoc2008-linker
For those brave that would actually like to see how it works,
that's how I'd run the thing over the sparse codebase:
make CC="cgcc -v -emit-code" LD=cld AR=car
and then
./where sparse.sparse.so linearize_statement
And no, the patches are not ment for mainline inclusion right
now.
P.S:
If you don't like being on the CC list, I'd miss your opinion,
but would drop you from any further notifications on the
project, just drop me a message.
^ permalink raw reply [flat|nested] 23+ messages in thread* [PATCH 01/10] Serialization engine 2008-09-03 21:55 [PATCH 0/10] Sparse linker alexey.zaytsev @ 2008-09-03 21:55 ` alexey.zaytsev 2008-09-03 21:55 ` [PATCH 02/10] Handle -emit_code and the -o file options alexey.zaytsev [not found] ` <70318cbf0809031808u8610f3h4b3d53a7b76a7799@mail.gmail.com> 1 sibling, 1 reply; 23+ messages in thread From: alexey.zaytsev @ 2008-09-03 21:55 UTC (permalink / raw) To: linux-sparse Cc: Josh Triplett, Chris Li, Codrin Alexandru Grajdeanu, Alexey Zaytsev From: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> --- Makefile | 11 ++- serialization-test.c | 120 +++++++++++++++++++++++++ serialization-test.h | 32 +++++++ serialization.c | 99 +++++++++++++++++++++ serialization.h | 240 ++++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 499 insertions(+), 3 deletions(-) create mode 100644 serialization-test.c create mode 100644 serialization-test.h create mode 100644 serialization.c create mode 100644 serialization.h diff --git a/Makefile b/Makefile index 077003c..721979e 100644 --- a/Makefile +++ b/Makefile @@ -27,7 +27,7 @@ INCLUDEDIR=$(PREFIX)/include PKGCONFIGDIR=$(LIBDIR)/pkgconfig PROGRAMS=test-lexing test-parsing obfuscate compile graph sparse test-linearize example \ - test-unssa test-dissect ctags + test-unssa test-dissect ctags serialization-test INST_PROGRAMS=sparse cgcc @@ -40,12 +40,13 @@ endif LIB_H= token.h parse.h lib.h symbol.h scope.h expression.h target.h \ linearize.h bitmap.h ident-list.h compat.h flow.h allocate.h \ - storage.h ptrlist.h dissect.h + storage.h ptrlist.h dissect.h serialization.h LIB_OBJS= target.o parse.o tokenize.o pre-process.o symbol.o lib.o scope.o \ expression.o show-parse.o evaluate.o expand.o inline.o linearize.o \ sort.o allocate.o compat-$(OS).o ptrlist.o \ - flow.o cse.o simplify.o memops.o liveness.o storage.o unssa.o dissect.o + flow.o cse.o simplify.o memops.o liveness.o storage.o unssa.o dissect.o \ + serialization.o LIB_FILE= libsparse.a SLIB_FILE= libsparse.so @@ -135,6 +136,9 @@ ctags: ctags.o $(LIBS) c2xml: c2xml.o $(LIBS) $(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $< $(LIBS) `pkg-config --libs libxml-2.0` +serialization-test: serialization-test.o $(LIBS) + $(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $< $(LIBS) + $(LIB_FILE): $(LIB_OBJS) $(QUIET_AR)$(AR) rcs $@ $(LIB_OBJS) @@ -185,6 +189,7 @@ compat-linux.o: compat/strtold.c compat/mmap-blob.c \ compat-solaris.o: compat/mmap-blob.c $(LIB_H) compat-mingw.o: $(LIB_H) compat-cygwin.o: $(LIB_H) +serialization-test.o: $(LIB_H) pre-process.h: $(QUIET_GEN)echo "#define GCC_INTERNAL_INCLUDE \"`$(CC) -print-file-name=`\"" > pre-process.h diff --git a/serialization-test.c b/serialization-test.c new file mode 100644 index 0000000..f9bb3d4 --- /dev/null +++ b/serialization-test.c @@ -0,0 +1,120 @@ +/* + * Copyright (C) 2008 Alexey Zaytsev + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * Alternatively, this program may be distributed under the + * Open Software License version 1.1. + */ + +#include <stdio.h> +#include <stdlib.h> +#include <assert.h> +#include <string.h> + + +#include "serialization-test.h" + +static struct a *__alloc_a_core(int n) +{ + return malloc(sizeof(struct a) + n); +} + +static struct b *__alloc_b_core(int n) +{ + return malloc(sizeof(struct b) + n); +} + + +void __free_a_core(struct a *t) +{ + +} + +void __free_b_core(struct b *t) +{ + +} + +int dump_a(struct serialization_stream *s, struct a *w) +{ + emit_int(s, w, d); + emit_ptr(s, w, b, b_ptr); + + return 0; +} + +int dump_b(struct serialization_stream *s, struct b *w) +{ + emit_int(s, w, k); + emit_ptr(s, w, a, a_ptr); + + return 0; +} + +WRAP(a, "serialization-test.h", dump_a); +WRAP(b, "serialization-test.h", dump_b); + +int main(int argc, char **argv) +{ + struct serialization_stream *s; + + struct a *a1 = __alloc_a(0); + struct a *a2 = __alloc_a(0); + struct a *a3 = __alloc_a(0); + struct a *a4 = __alloc_a(0); + + struct b *b1 = __alloc_b(0); + struct b *b2 = __alloc_b(0); + struct b *b3 = __alloc_b(0); + struct b *b4 = __alloc_b(0); + struct b *b5 = __alloc_b(0); + + a1->d = 1; + a2->d = 2; + a3->d = 3; + a4->d = 4; + + a1->b_ptr = b1; + a2->b_ptr = b2; + a3->b_ptr = b3; + a4->b_ptr = b4; + + + b1->k = 11; + b2->k = 12; + b3->k = 13; + b4->k = 14; + b5->k = 15; + + b1->a_ptr = a2; + b2->a_ptr = a3; + b3->a_ptr = a1; + b4->a_ptr = a2; + b5->a_ptr = a1; + + s = new_serialization_stream("test"); + if (!s) { + perror("Failed to open serialization stream"); + exit(1); + } + + printf("Dumping:\n"); + + printf("a1 = %p\n", a1); + serialize_a(s, a1, "a1_entry"); + /* Note that we serializaed only one object, a1. + * All the other objects appering in the generated file + * were serializaed as it's dependencies. If you try to + * serialize any of them again, only the global pointer + * will be added. */ + + fini_serialization_stream(s); + + return 0; + +} + diff --git a/serialization-test.h b/serialization-test.h new file mode 100644 index 0000000..a1980e7 --- /dev/null +++ b/serialization-test.h @@ -0,0 +1,32 @@ +/* + * Copyright (C) 2008 Alexey Zaytsev + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * Alternatively, this program may be distributed under the + * Open Software License version 1.1. + */ + +#ifndef SERIALIZATION_TEST_H +#define SERIALIZATION_TEST_H + +#include "serialization.h" + +struct a { + int d; + struct b *b_ptr; +}; + +struct b { + int k; + char c; + struct a *a_ptr; +}; + +DECLARE_WRAPPER(a); +DECLARE_WRAPPER(b); + +#endif diff --git a/serialization.c b/serialization.c new file mode 100644 index 0000000..79d4fda --- /dev/null +++ b/serialization.c @@ -0,0 +1,99 @@ +/* + * Copyright (C) 2008 Alexey Zaytsev + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * Alternatively, this program may be distributed under the + * Open Software License version 1.1. + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <errno.h> +#include <limits.h> +#include <libgen.h> +#include "serialization.h" + + +int serialization_stream_enqueue(struct serialization_stream *s, void *unit, + int (*serializer) (struct serialization_stream *s, void *unit)) +{ + struct serialization_sched_work *w; + + w = malloc(sizeof(w[0])); + if (!w) + return -ENOMEM; + + w->stream = s; + w->unit = unit; + w->serializer = serializer; + w->next = s->queue; + s->queue = w; + + return 0; +} + +int process_serialization_queue(struct serialization_stream *s) +{ + int ret = 0; + struct serialization_sched_work *w; + + while (s->queue) { + w = s->queue; + s->queue = s->queue->next; + ret = w->serializer(s, w->unit); + free(w); + } + return ret; +} + +struct serialization_stream *new_serialization_stream(const char *file) +{ + struct serialization_stream *s; + char tmp[PATH_MAX+1]; + + s = malloc(sizeof(s[0])); + s->queue = NULL; + + strncpy(tmp, file, PATH_MAX); + s->definition_f = fopen(strncat(tmp, ".sparse.c", PATH_MAX), "w+"); + if (!s->definition_f) + goto out_definitions; + + strncpy(tmp, file, PATH_MAX); + s->declaration_f = fopen(strncat(tmp, ".sparse_declarations.c", PATH_MAX), "w+"); + if (!s->declaration_f) + goto out_declarations; + + fprintf(s->definition_f, "#include \"%s\"\n\n", basename(tmp)); + fprintf(s->definition_f, "#define NULL ((void *)0)\n"); + + return s; + +out_declarations: + fclose(s->definition_f); +out_definitions: + free(s); + + return NULL; +} + +void fini_serialization_stream(struct serialization_stream *s) +{ + + /* Just in case something was left over */ + process_serialization_queue(s); + + fprintf(s->declaration_f, "\n"); + fprintf(s->definition_f, "\n"); + + fclose(s->definition_f); + fclose(s->declaration_f); + + free(s); +} + diff --git a/serialization.h b/serialization.h new file mode 100644 index 0000000..7805f50 --- /dev/null +++ b/serialization.h @@ -0,0 +1,240 @@ +/* + * Copyright (C) 2008 Alexey Zaytsev + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * Alternatively, this program may be distributed under the + * Open Software License version 1.1. + */ + +#ifndef SERIALIZATION_H +#define SERIALIZATION_H + +#include <stdio.h> +#include <stddef.h> +#include <ctype.h> + +#define wrapper_overhead(w) (sizeof(w[0]) - sizeof(w->payload)) +#define container(ptr, type, member) \ + (type *)((void *)(ptr) - offsetof(type, member)) + +/* + * This structure is prepended to every object of serializable + * type. Do not bloat. + */ +struct serialization_mdata { + unsigned int index:30; + unsigned int declared:1; + unsigned int defined:1; +}; + +struct serialization_stream { + FILE *declaration_f; + FILE *definition_f; + struct serialization_sched_work *queue; +}; +struct serialization_sched_work { + struct serialization_stream *stream; + void *unit; + int (*serializer) (struct serialization_stream *s, + void *unit); + struct serialization_sched_work *next; +}; + + +int serialization_stream_enqueue(struct serialization_stream *s, void *unit, + int (serializaer) (struct serialization_stream *s, void *unit)); +int process_serialization_queue(struct serialization_stream *s); +struct serialization_stream *new_serialization_stream(const char *file); +void fini_serialization_stream(struct serialization_stream *s); + + +#define DO_WRAP(type, type_name, type_header, allocator, allocator_name, \ + deallocator, deallocator_name, serializer) \ + static int type_name##_index = 0; \ + type *allocator_name(int n) \ + { \ + struct type_name##_wrapper *w; \ + w = (struct type_name##_wrapper *) \ + allocator(n + wrapper_overhead(w)); \ + if (!w) \ + return NULL; \ + return &w->payload; \ + } \ + void deallocator_name(type *t) \ + { \ + struct type_name##_wrapper *w; \ + w = container(t, struct type_name##_wrapper, payload); \ + deallocator((type *)w); \ + } \ + static int do_serialize_##type_name(struct serialization_stream *s, \ + void *unit) \ + { \ + struct type_name##_wrapper *w = unit; \ + int ret; \ + fprintf(s->definition_f, "static struct " #type_name "_wrapper "\ + "__" #type_name "_%d = {\n\t.payload = {\n", \ + w->meta.index); \ + ret = serializer(s, &w->payload); \ + fprintf(s->definition_f, "\t},\n};\n"); \ + if (ret) \ + fprintf(stderr, "Warning: Failed to serialize a " #type \ + ": %d\n", ret); \ + return ret; \ + } \ + int schedule_##type_name##_serialization(struct serialization_stream *s,\ + type *t) \ + { \ + struct type_name##_wrapper *w; \ + if (!t) \ + return 0; /* Tried to serialize a NULL pointer */ \ + w = container(t, struct type_name##_wrapper, payload); \ + if (w->meta.declared) \ + return 0; /* Either already serialized or waiting \ + * in the queue */ \ + if (!type_name##_index) \ + fprintf(s->declaration_f, \ + "\n#include %s\n", #type_header); \ + \ + w->meta.index = type_name##_index++; \ + fprintf(s->declaration_f, \ + "static struct " #type_name "_wrapper " \ + "__" #type_name "_%d;\n", \ + w->meta.index); \ + w->meta.declared = 1; \ + return serialization_stream_enqueue(s, w, \ + do_serialize_##type_name); \ + } \ + int serialize_##type_name(struct serialization_stream *s, \ + type *t, const char *name) \ + { \ + int ret; \ + schedule_##type_name##_serialization(s, t); \ + ret = process_serialization_queue(s); \ + if (!ret && name) \ + ret = label_##type_name##_entry(s, t, name); \ + return ret; \ + } \ + int label_##type_name##_entry(struct serialization_stream *s, type *t, \ + const char *name) \ + { \ + struct type_name##_wrapper *w; \ + if (!t) { \ + fprintf(s->definition_f, #type " *%s = NULL;", name); \ + return 0; \ + } \ + w = container(t, struct type_name##_wrapper, payload); \ + if (!w->meta.declared) { \ + fprintf(stderr, "Warning: Trying to label an undefined" \ + " '" #type "'\n"); \ + return -1; \ + } \ + fprintf(s->definition_f, #type " *%s = &" \ + "__" #type_name "_%d.payload;\n", name, w->meta.index); \ + return 0; \ + } + + +#define WRAP(type_name, type_header, serializer) \ + DO_WRAP(struct type_name, type_name, type_header, \ + __alloc_##type_name##_core, __alloc_##type_name, \ + __free_##type_name##_core, __free_##type_name, serializer) + +#define emit_int(struam, parent, field) \ + do { \ + int __i = parent->field; \ + fprintf(s->definition_f, "\t\t." #field " = %d,\n", __i); \ + } while (0) + +#define emit_cstring(stream, parent, field) \ + do { \ + char *__tmp = parent->field; \ + if (!__tmp) { \ + fprintf(stream->definition_f, \ + "\t\t." #field " = NULL,\n"); \ + break; \ + } \ + fprintf(stream->definition_f, "\t\t." #field " = \""); \ + while (*__tmp) { \ + if (isprint(*__tmp)) \ + fputc(*__tmp, stream->definition_f); \ + else \ + fprintf(stream->definition_f, "\\x%2x", \ + *__tmp); \ + __tmp++; \ + } \ + fprintf(stream->definition_f, "\",\n"); \ + } while (0); + +#define do_emit_ptr(stream, parent, type, type_name, field) \ + do { \ + struct type_name##_wrapper *__w; \ + void *__ptr = parent->field; \ + if (!__ptr) { \ + fprintf(stream->definition_f, \ + "\t\t." #field " = NULL,\n"); \ + break; \ + } \ + schedule_##type_name##_serialization(stream, __ptr); \ + __w = container(__ptr, struct type_name##_wrapper, payload); \ + fprintf(stream->definition_f, "\t\t." #field " = &" \ + "__" #type_name "_%d.payload,\n", __w->meta.index); \ + } while (0) + +#define emit_ptr(stream, parent, type_name, field) \ + do_emit_ptr(stream, parent, struct type_name, type_name, \ + field) + +#define emit_ptr_array(stream, parent, type, type_name, field, nr) \ + do { \ + struct type_name##_wrapper *__w; \ + void *__ptr = parent->field; \ + int __i ; \ + fprintf(stream->definition_f, "\t\t." #field " = {\n"); \ + for (__i = 0; __i < nr; __i++) { \ + __ptr = parent->field[__i]; \ + if (__ptr) { \ + __w = container(__ptr, \ + struct type_name##_wrapper, payload); \ + schedule_##type_name##_serialization(s, __ptr); \ + fprintf(stream->definition_f, \ + "\t\t\t&" "__" #type_name \ + "_%d.payload,\n", __w->meta.index); \ + } \ + } \ + fprintf(stream->definition_f, "\t\t},\n"); \ + } while (0) + +#define DO_DECLARE_WRAPPER(type, type_name, allocator_name, deallocator_name) \ + struct type_name##_wrapper { \ + struct serialization_mdata meta; \ + type payload; \ + }; \ + type *allocator_name(int x); \ + void deallocator_name(type *t); \ + int schedule_##type_name##_serialization(struct serialization_stream *s,\ + type *ptr); \ + int serialize_##type_name(struct serialization_stream *s, \ + type *t, const char *name); \ + int emit_##type_name##_ptr(struct serialization_stream *s, \ + type *t); \ + int label_##type_name##_entry(struct serialization_stream *s, \ + type *t, const char *name); + +#define DECLARE_WRAPPER(type_name) \ + DO_DECLARE_WRAPPER(struct type_name, type_name, \ + __alloc_##type_name, __free_##type_name) + +/* If you need to serialize statically-allocated data, use these. */ +#define DO_DECLARE_SERIALIZABLE(type, type_name, name) \ + struct type_name##_wrapper __##name_wrapped; \ + type *name = &__##wrapped.payload; + +#define DECLARE_SERIALIZABLE(type, name) \ + DO_DECLARE_SERIALIZABLE(struct type, type, name) + +#endif + -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 02/10] Handle -emit_code and the -o file options. 2008-09-03 21:55 ` [PATCH 01/10] Serialization engine alexey.zaytsev @ 2008-09-03 21:55 ` alexey.zaytsev 2008-09-03 21:55 ` [PATCH 03/10] Check stdin if no input files given, like cc1 alexey.zaytsev 0 siblings, 1 reply; 23+ messages in thread From: alexey.zaytsev @ 2008-09-03 21:55 UTC (permalink / raw) To: linux-sparse Cc: Josh Triplett, Chris Li, Codrin Alexandru Grajdeanu, Alexey Zaytsev From: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> --- lib.c | 19 ++++++++++++++++--- lib.h | 3 +++ 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/lib.c b/lib.c index e274750..edb8ac9 100644 --- a/lib.c +++ b/lib.c @@ -214,6 +214,8 @@ int dbg_entry = 0; int dbg_dead = 0; int preprocess_only; +int emit_code = 0; +const char *output_file = "a.out"; static enum { STANDARD_C89, STANDARD_C94, @@ -347,11 +349,14 @@ static char **handle_switch_m(char *arg, char **next) static char **handle_switch_o(char *arg, char **next) { - if (!strcmp (arg, "o")) { // "-o foo" - if (!*++next) + if (!strcmp (arg, "o")) { // "-o foo" + next++; + if (!*next) die("argument to '-o' is missing"); + output_file = *next; + } else { // "-ofoo" + output_file = ++arg; } - // else "-ofoo" return next; } @@ -587,6 +592,12 @@ static char **handle_dirafter(char *arg, char **next) return next; } +static char **handle_emit_code(char *arg, char **next) +{ + emit_code = 1; + return next; +} + struct switches { const char *name; char **(*fn)(char *, char **); @@ -597,6 +608,7 @@ static char **handle_switch(char *arg, char **next) static struct switches cmd[] = { { "nostdinc", handle_nostdinc }, { "dirafter", handle_dirafter }, + { "emit-code", handle_emit_code }, { NULL, NULL } }; struct switches *s; @@ -849,6 +861,7 @@ struct symbol_list *sparse_initialize(int argc, char **argv, struct string_list continue; } add_ptr_list_notag(filelist, arg); + } handle_switch_W_finalize(); handle_switch_v_finalize(); diff --git a/lib.h b/lib.h index b22fa93..b74be7c 100644 --- a/lib.h +++ b/lib.h @@ -112,6 +112,9 @@ extern int Wdeclarationafterstatement; extern int dbg_entry; extern int dbg_dead; +extern int emit_code; +extern const char *output_file; + extern void declare_builtin_functions(void); extern void create_builtin_stream(void); extern struct symbol_list *sparse_initialize(int argc, char **argv, struct string_list **files); -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 03/10] Check stdin if no input files given, like cc1. 2008-09-03 21:55 ` [PATCH 02/10] Handle -emit_code and the -o file options alexey.zaytsev @ 2008-09-03 21:55 ` alexey.zaytsev 2008-09-03 21:55 ` [PATCH 04/10] Add char *first_string(struct string_list *) alexey.zaytsev 0 siblings, 1 reply; 23+ messages in thread From: alexey.zaytsev @ 2008-09-03 21:55 UTC (permalink / raw) To: linux-sparse Cc: Josh Triplett, Chris Li, Codrin Alexandru Grajdeanu, Alexey Zaytsev From: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> --- lib.c | 42 +++++++++++++++++++++--------------------- lib.h | 6 +++--- sparse.c | 3 +++ 3 files changed, 27 insertions(+), 24 deletions(-) diff --git a/lib.c b/lib.c index edb8ac9..0e30424 100644 --- a/lib.c +++ b/lib.c @@ -215,7 +215,7 @@ int dbg_dead = 0; int preprocess_only; int emit_code = 0; -const char *output_file = "a.out"; +const char *output_file = NULL; static enum { STANDARD_C89, STANDARD_C94, @@ -856,38 +856,38 @@ struct symbol_list *sparse_initialize(int argc, char **argv, struct string_list if (!arg) break; - if (arg[0] == '-' && arg[1]) { - args = handle_switch(arg+1, args); + if (arg[0] == '-') { + if (arg[1]) + args = handle_switch(arg+1, args); continue; } add_ptr_list_notag(filelist, arg); } + handle_switch_W_finalize(); handle_switch_v_finalize(); - list = NULL; - if (!ptr_list_empty(filelist)) { - // Initialize type system - init_ctype(); + // Initialize type system + init_ctype(); - create_builtin_stream(); - add_pre_buffer("#define __CHECKER__ 1\n"); - if (!preprocess_only) - declare_builtin_functions(); + create_builtin_stream(); + add_pre_buffer("#define __CHECKER__ 1\n"); + if (!preprocess_only) + declare_builtin_functions(); - list = sparse_initial(); + list = sparse_initial(); + + /* + * Protect the initial token allocations, since + * they need to survive all the others + */ + protect_token_alloc(); - /* - * Protect the initial token allocations, since - * they need to survive all the others - */ - protect_token_alloc(); - } return list; } -struct symbol_list * sparse_keep_tokens(char *filename) +struct symbol_list * sparse_keep_tokens(const char *filename) { struct symbol_list *res; @@ -902,7 +902,7 @@ struct symbol_list * sparse_keep_tokens(char *filename) } -struct symbol_list * __sparse(char *filename) +struct symbol_list * __sparse(const char *filename) { struct symbol_list *res; @@ -915,7 +915,7 @@ struct symbol_list * __sparse(char *filename) return res; } -struct symbol_list * sparse(char *filename) +struct symbol_list * sparse(const char *filename) { struct symbol_list *res = __sparse(filename); diff --git a/lib.h b/lib.h index b74be7c..19a724f 100644 --- a/lib.h +++ b/lib.h @@ -118,9 +118,9 @@ extern const char *output_file; extern void declare_builtin_functions(void); extern void create_builtin_stream(void); extern struct symbol_list *sparse_initialize(int argc, char **argv, struct string_list **files); -extern struct symbol_list *__sparse(char *filename); -extern struct symbol_list *sparse_keep_tokens(char *filename); -extern struct symbol_list *sparse(char *filename); +extern struct symbol_list *__sparse(const char *filename); +extern struct symbol_list *sparse_keep_tokens(const char *filename); +extern struct symbol_list *sparse(const char *filename); static inline int symbol_list_size(struct symbol_list *list) { diff --git a/sparse.c b/sparse.c index 785a6f6..b7a1f8b 100644 --- a/sparse.c +++ b/sparse.c @@ -602,6 +602,9 @@ int main(int argc, char **argv) // Expand, linearize and show it. check_symbols(sparse_initialize(argc, argv, &filelist)); + if (ptr_list_empty(filelist)) + check_symbols(sparse("-")); + FOR_EACH_PTR_NOTAG(filelist, file) { check_symbols(sparse(file)); } END_FOR_EACH_PTR_NOTAG(file); -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 04/10] Add char *first_string(struct string_list *) 2008-09-03 21:55 ` [PATCH 03/10] Check stdin if no input files given, like cc1 alexey.zaytsev @ 2008-09-03 21:55 ` alexey.zaytsev 2008-09-03 21:55 ` [PATCH 05/10] Serializable ptr lists alexey.zaytsev 0 siblings, 1 reply; 23+ messages in thread From: alexey.zaytsev @ 2008-09-03 21:55 UTC (permalink / raw) To: linux-sparse Cc: Josh Triplett, Chris Li, Codrin Alexandru Grajdeanu, Alexey Zaytsev From: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> --- lib.h | 5 +++++ ptrlist.h | 17 ++++++++++++++++- 2 files changed, 21 insertions(+), 1 deletions(-) diff --git a/lib.h b/lib.h index 19a724f..532e7a4 100644 --- a/lib.h +++ b/lib.h @@ -186,6 +186,11 @@ static inline pseudo_t first_pseudo(struct pseudo_list *head) return first_ptr_list((struct ptr_list *)head); } +static inline char *first_string(struct string_list *head) +{ + return first_ptr_list_notag((struct ptr_list *)head); +} + static inline void concat_symbol_list(struct symbol_list *from, struct symbol_list **to) { concat_ptr_list((struct ptr_list *)from, (struct ptr_list **)to); diff --git a/ptrlist.h b/ptrlist.h index dae0906..fe43de1 100644 --- a/ptrlist.h +++ b/ptrlist.h @@ -73,15 +73,30 @@ static inline void *first_ptr_list(struct ptr_list *list) return PTR_ENTRY(list, 0); } -static inline void *last_ptr_list(struct ptr_list *list) +static inline void *first_ptr_list_notag(struct ptr_list *list) { + if (!list) + return NULL; + return PTR_ENTRY_NOTAG(list, 0); +} +static inline void *last_ptr_list(struct ptr_list *list) +{ if (!list) return NULL; list = list->prev; return PTR_ENTRY(list, list->nr-1); } +static inline void *last_ptr_list_notag(struct ptr_list *list) +{ + if (!list) + return NULL; + list = list->prev; + return PTR_ENTRY_NOTAG(list, list->nr-1); +} + + #define DO_PREPARE(head, ptr, __head, __list, __nr, PTR_ENTRY) \ do { \ struct ptr_list *__head = (struct ptr_list *) (head); \ -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 05/10] Serializable ptr lists. 2008-09-03 21:55 ` [PATCH 04/10] Add char *first_string(struct string_list *) alexey.zaytsev @ 2008-09-03 21:55 ` alexey.zaytsev 2008-09-03 21:55 ` [PATCH 06/10] Linker core, serialization and helper functions alexey.zaytsev 0 siblings, 1 reply; 23+ messages in thread From: alexey.zaytsev @ 2008-09-03 21:55 UTC (permalink / raw) To: linux-sparse Cc: Josh Triplett, Chris Li, Codrin Alexandru Grajdeanu, Alexey Zaytsev From: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> --- ptrlist.c | 30 +++++++++++++++++++++++++++--- ptrlist.h | 7 +++++++ 2 files changed, 34 insertions(+), 3 deletions(-) diff --git a/ptrlist.c b/ptrlist.c index 2620412..fb6b6db 100644 --- a/ptrlist.c +++ b/ptrlist.c @@ -12,9 +12,33 @@ #include "ptrlist.h" #include "allocate.h" #include "compat.h" +#include "lib.h" +#include "serialization.h" + +static int ptr_list_serializer(struct serialization_stream *s, struct ptr_list *w) +{ + die("Don't serialize abstract ptr lists, serialize your custom ones."); + return 0; +} + +__DECLARE_ALLOCATOR(struct ptr_list, ptr_list_core); +__ALLOCATOR(struct ptr_list, "ptr list", ptr_list_core); +DO_WRAP(struct ptr_list, ptr_list, "ptrlist.h", __alloc_ptr_list_core, + __alloc_ptrlist, __free_ptr_list_core, __free_ptrlist, + ptr_list_serializer); + + +struct ptr_list *fail_ptrlist_allocation(int i) +{ + die("Don't try to allocate ptr_list instances directly, use __add_ptr_list instead."); + return NULL; +} + +void fail_ptrlist_free(void *p) +{ + die("Don't free ptr list instances directly, use free_ptr_list instead."); +} -__DECLARE_ALLOCATOR(struct ptr_list, ptrlist); -__ALLOCATOR(struct ptr_list, "ptr list", ptrlist); int ptr_list_size(struct ptr_list *head) { @@ -95,7 +119,7 @@ restart: entry = next; } while (entry != head); } -} +} void split_ptr_list_head(struct ptr_list *head) { diff --git a/ptrlist.h b/ptrlist.h index fe43de1..1a16819 100644 --- a/ptrlist.h +++ b/ptrlist.h @@ -7,6 +7,8 @@ * (C) Copyright Linus Torvalds 2003-2005 */ +#include "serialization.h" + #define container(ptr, type, member) \ (type *)((void *)(ptr) - offsetof(type, member)) @@ -32,6 +34,8 @@ struct ptr_list { void *list[LIST_NODE_NR]; }; +DO_DECLARE_WRAPPER(struct ptr_list, ptr_list, __alloc_ptrlist, __free_ptrlist); + #define ptr_list_empty(x) ((x) == NULL) void * undo_ptr_list_last(struct ptr_list **head); @@ -46,6 +50,9 @@ extern void __free_ptr_list(struct ptr_list **); extern int ptr_list_size(struct ptr_list *); extern int linearize_ptr_list(struct ptr_list *, void **, int); +/* To be used by custom ptr list serializers. */ +struct ptr_list *fail_ptrlist_allocation(int); +void fail_ptrlist_free(void *); /* * Hey, who said that you can't do overloading in C? * -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 06/10] Linker core, serialization and helper functions. 2008-09-03 21:55 ` [PATCH 05/10] Serializable ptr lists alexey.zaytsev @ 2008-09-03 21:55 ` alexey.zaytsev 2008-09-03 21:55 ` [PATCH 07/10] Let sparse serialize the symbol table of the checked file alexey.zaytsev 0 siblings, 1 reply; 23+ messages in thread From: alexey.zaytsev @ 2008-09-03 21:55 UTC (permalink / raw) To: linux-sparse Cc: Josh Triplett, Chris Li, Codrin Alexandru Grajdeanu, Alexey Zaytsev From: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> --- Makefile | 6 +++- link.c | 57 ++++++++++++++++++++++++++++++++++++++++ link.h | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 148 insertions(+), 2 deletions(-) create mode 100644 link.c create mode 100644 link.h diff --git a/Makefile b/Makefile index 721979e..dd1fe8a 100644 --- a/Makefile +++ b/Makefile @@ -40,13 +40,13 @@ endif LIB_H= token.h parse.h lib.h symbol.h scope.h expression.h target.h \ linearize.h bitmap.h ident-list.h compat.h flow.h allocate.h \ - storage.h ptrlist.h dissect.h serialization.h + storage.h ptrlist.h dissect.h serialization.h link.h LIB_OBJS= target.o parse.o tokenize.o pre-process.o symbol.o lib.o scope.o \ expression.o show-parse.o evaluate.o expand.o inline.o linearize.o \ sort.o allocate.o compat-$(OS).o ptrlist.o \ flow.o cse.o simplify.o memops.o liveness.o storage.o unssa.o dissect.o \ - serialization.o + serialization.o link.o LIB_FILE= libsparse.a SLIB_FILE= libsparse.so @@ -189,7 +189,9 @@ compat-linux.o: compat/strtold.c compat/mmap-blob.c \ compat-solaris.o: compat/mmap-blob.c $(LIB_H) compat-mingw.o: $(LIB_H) compat-cygwin.o: $(LIB_H) +serialization.o: $(LIB_H) serialization-test.o: $(LIB_H) +link.o: $(LIB_H) pre-process.h: $(QUIET_GEN)echo "#define GCC_INTERNAL_INCLUDE \"`$(CC) -print-file-name=`\"" > pre-process.h diff --git a/link.c b/link.c new file mode 100644 index 0000000..8d2eadf --- /dev/null +++ b/link.c @@ -0,0 +1,57 @@ + +#define _GNU_SOURCE +#include <string.h> + +#include "lib.h" +#include "link.h" + +__ALLOCATOR(struct sold_srcfile, "sold_srcfile", sold_srcfile_core); +__ALLOCATOR(struct sold_symbol, "sold_smbol", sold_symbol_core); + +int sold_srcfile_serialize(struct serialization_stream *s, + struct sold_srcfile *w); + +int sold_symbol_serialize(struct serialization_stream *s, + struct sold_symbol *w); + +int sold_symbol_list_serialize(struct serialization_stream *s, + struct ptr_list *w); + +WRAP(sold_srcfile, <sparse/link.h>, sold_srcfile_serialize); +WRAP(sold_symbol, <sparse/link.h>, sold_symbol_serialize); +DO_WRAP(struct ptr_list, sold_symbol_list, <sparse/link.h>, + fail_ptrlist_allocation, __alloc_sold_symbol_list, + fail_ptrlist_free, __free_sold_symbol_list, + sold_symbol_list_serialize); + +int sold_srcfile_serialize(struct serialization_stream *s, + struct sold_srcfile *w) +{ + emit_cstring(s, w, source); + emit_cstring(s, w, buildroot); + emit_cstring(s, w, cflags); + + return 0; +} + +int sold_symbol_serialize(struct serialization_stream *s, + struct sold_symbol *w) +{ + emit_cstring(s, w, name); + emit_ptr(s, w, sold_srcfile, source); + emit_int(s, w, type); + + return 0; +} + +int sold_symbol_list_serialize(struct serialization_stream *s, + struct ptr_list *w) +{ + emit_int(s, w, nr); + emit_ptr(s, w, sold_symbol_list, prev); + emit_ptr(s, w, sold_symbol_list, next); + emit_ptr_array(s, w, struct sold_symbol, sold_symbol, list, w->nr); + return 0; +} + + diff --git a/link.h b/link.h new file mode 100644 index 0000000..eed83ff --- /dev/null +++ b/link.h @@ -0,0 +1,87 @@ + +#ifndef LINK_H +#define LINK_H + +#include <string.h> +#include <errno.h> + + +#include "ptrlist.h" +#include "allocate.h" +#include "symbol.h" +#include "serialization.h" + + +struct sold_srcfile { + char *source; + char *buildroot; + char *cflags; +}; + +enum sold_sym_type { + SOLD_SYM_FUNCTION, + SOLD_SYM_DATA, + SOLD_SYM_OTHER, +}; + +struct sold_symbol { + struct sold_srcfile *source; + enum sold_sym_type type; + char *name; +}; + + +__DECLARE_ALLOCATOR(struct sold_srcfile, sold_srcfile_core); +__DECLARE_ALLOCATOR(struct sold_symbol, sold_symbol_core); + +DECLARE_PTR_LIST(sold_symbol_list, struct sold_symbol); + +DO_DECLARE_WRAPPER(struct sold_srcfile, sold_srcfile, + __alloc_sold_srcfile, __free_sold_srcfile); +DO_DECLARE_WRAPPER(struct sold_symbol, sold_symbol, + __alloc_sold_symbol, __free_sold_symbol); +DO_DECLARE_WRAPPER(struct ptr_list, sold_symbol_list, + __alloc_sold_symbol_list, __free_sold_symbol_list); + +static inline void do_add_sold_sym_list(struct sold_symbol_list **list, + struct sold_symbol *sym) +{ + add_ptr_list(list, sym); +} + +static inline int add_sold_sym_list(struct sold_symbol_list **list, + struct symbol *sym, struct sold_srcfile *file) +{ + struct sold_symbol *sold_sym; + + if (!sym->ident) + error_die(sym->pos, "Trying to serialize non-bound symbol"); + + sold_sym = __alloc_sold_symbol(0); + if (!sold_sym) + return -ENOMEM; + + sold_sym->name = sym->ident->name; + sold_sym->source = file; + sold_sym->type = SOLD_SYM_OTHER; + if (sym->type == SYM_NODE) { + if (sym->ctype.base_type->type == SYM_FN) + sold_sym->type = SOLD_SYM_FUNCTION; + else + sold_sym->type = SOLD_SYM_DATA; + } + + do_add_sold_sym_list(list, sold_sym); + + return 0; +} + +static inline void concat_sold_sym_list(struct sold_symbol_list *from, + struct sold_symbol_list **to) +{ + concat_ptr_list((struct ptr_list *)from, (struct ptr_list **)to); +} + +#endif + + -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 07/10] Let sparse serialize the symbol table of the checked file 2008-09-03 21:55 ` [PATCH 06/10] Linker core, serialization and helper functions alexey.zaytsev @ 2008-09-03 21:55 ` alexey.zaytsev 2008-09-03 21:55 ` [PATCH 08/10] Sparse Object Link eDitor alexey.zaytsev 0 siblings, 1 reply; 23+ messages in thread From: alexey.zaytsev @ 2008-09-03 21:55 UTC (permalink / raw) To: linux-sparse Cc: Josh Triplett, Chris Li, Codrin Alexandru Grajdeanu, Alexey Zaytsev From: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> --- sparse.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 102 insertions(+), 3 deletions(-) diff --git a/sparse.c b/sparse.c index b7a1f8b..1b25d7e 100644 --- a/sparse.c +++ b/sparse.c @@ -8,6 +8,8 @@ * * Licensed under the Open Software License version 1.1 */ +#define _GNU_SOURCE + #include <stdarg.h> #include <stdlib.h> #include <stdio.h> @@ -17,6 +19,7 @@ #include <fcntl.h> #include "lib.h" +#include "link.h" #include "allocate.h" #include "token.h" #include "parse.h" @@ -595,18 +598,114 @@ static void check_symbols(struct symbol_list *list) } END_FOR_EACH_PTR(sym); } +/* Yeah, that's how cc1 seems to handle this */ +static char *out_name(const char *file) +{ + char *lastdot; + char *out_file; + + out_file = strdup(file); + lastdot = rindex(out_file, '.'); + + if (lastdot && *lastdot+1 == '\0') { /* file.c. */ + *lastdot = '\0'; + lastdot = rindex(out_file, '.'); + } + if (!lastdot) /* file */ + return out_file; + + *lastdot = '\0'; /* Remove everything after the last dot */ + strcat(out_file, ".o"); + + return out_file; +} + +static void accumulate_symbols(struct symbol_list *list, const char *file, + char *buildroot, char *cflags, struct sold_symbol_list **sold_list) +{ + struct sold_srcfile *src; + struct symbol *sym; + + src = __alloc_sold_srcfile(0); + src->buildroot = buildroot; + src->cflags = cflags; + src->source = (char *) file; + + FOR_EACH_PTR(list, sym) { + add_sold_sym_list(sold_list, sym, src); + } END_FOR_EACH_PTR(sym); +} + +static char *concat_args(int argc, char **argv) +{ + int i; + int len = 0; + char *args; + char *pos; + + for (i = 1; i < argc; i++) + len += strlen(argv[i]) + 1; + + args = malloc(len + 1); + pos = args; + for (i = 1; i < argc; i++) { + len = strlen(argv[i]); + memcpy(pos, argv[i], len); + pos += (len); + *(pos++) = ' '; + } + *(pos++) = '\0'; + + return args; +} + + int main(int argc, char **argv) { struct string_list *filelist = NULL; + struct sold_symbol_list *sold_symbols = NULL; + struct symbol_list *symbols; + struct serialization_stream *s; char *file; + char *args = concat_args(argc, argv); + char *buildroot = get_current_dir_name(); // Expand, linearize and show it. check_symbols(sparse_initialize(argc, argv, &filelist)); - if (ptr_list_empty(filelist)) - check_symbols(sparse("-")); + file = first_string(filelist); + if (!file || !strcmp(file, "-")) { + symbols = sparse("-"); + check_symbols(symbols); + if (emit_code) + accumulate_symbols(symbols, "stdin", buildroot, + args, &sold_symbols); + /* cc1 too doesn't handle "file.c -" well.*/ + } + + if (!output_file) { + if (file && strcmp(file, "-")) + output_file = out_name(file); + else + output_file = "sparsedump"; + } FOR_EACH_PTR_NOTAG(filelist, file) { - check_symbols(sparse(file)); + symbols = sparse(file); + check_symbols(symbols); + if (emit_code) + accumulate_symbols(symbols, file, buildroot, + args, &sold_symbols); } END_FOR_EACH_PTR_NOTAG(file); + + if (emit_code) { + s = new_serialization_stream(output_file); + if (!s) { + perror("Failed to open the serialization stream"); + exit(1); + } + serialize_sold_symbol_list(s, (struct ptr_list *) sold_symbols, "symbols"); + fini_serialization_stream(s); + } + return 0; } -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 08/10] Sparse Object Link eDitor 2008-09-03 21:55 ` [PATCH 07/10] Let sparse serialize the symbol table of the checked file alexey.zaytsev @ 2008-09-03 21:55 ` alexey.zaytsev 2008-09-03 21:55 ` [PATCH 09/10] Rewrite cgcc, add cld and car to wrap ld and ar alexey.zaytsev 0 siblings, 1 reply; 23+ messages in thread From: alexey.zaytsev @ 2008-09-03 21:55 UTC (permalink / raw) To: linux-sparse Cc: Josh Triplett, Chris Li, Codrin Alexandru Grajdeanu, Alexey Zaytsev From: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> --- Makefile | 9 +++- sold.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 133 insertions(+), 3 deletions(-) create mode 100644 sold.c diff --git a/Makefile b/Makefile index dd1fe8a..4fa2f82 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,6 @@ VERSION=0.4.1 OS = linux - CC = gcc CFLAGS = -O2 -finline-functions -fno-strict-aliasing -g CFLAGS += -Wall -Wwrite-strings @@ -27,10 +26,10 @@ INCLUDEDIR=$(PREFIX)/include PKGCONFIGDIR=$(LIBDIR)/pkgconfig PROGRAMS=test-lexing test-parsing obfuscate compile graph sparse test-linearize example \ - test-unssa test-dissect ctags serialization-test + test-unssa test-dissect ctags serialization-test sold -INST_PROGRAMS=sparse cgcc +INST_PROGRAMS=sparse cgcc sold INST_MAN1=sparse.1 cgcc.1 ifeq ($(HAVE_LIBXML),yes) @@ -139,6 +138,9 @@ c2xml: c2xml.o $(LIBS) serialization-test: serialization-test.o $(LIBS) $(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $< $(LIBS) +sold: sold.o $(LIBS) + $(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $< $(LIBS) -ldl + $(LIB_FILE): $(LIB_OBJS) $(QUIET_AR)$(AR) rcs $@ $(LIB_OBJS) @@ -192,6 +194,7 @@ compat-cygwin.o: $(LIB_H) serialization.o: $(LIB_H) serialization-test.o: $(LIB_H) link.o: $(LIB_H) +sold.o: $(LIB_H) pre-process.h: $(QUIET_GEN)echo "#define GCC_INTERNAL_INCLUDE \"`$(CC) -print-file-name=`\"" > pre-process.h diff --git a/sold.c b/sold.c new file mode 100644 index 0000000..3da0cad --- /dev/null +++ b/sold.c @@ -0,0 +1,127 @@ + +#include <stdio.h> +#include <limits.h> +#include <dlfcn.h> +#include <stdlib.h> + +#include "lib.h" +#include "link.h" +#include "ptrlist.h" + +static const char *output = NULL; + +static char **handle_switch(char *arg, char **next) +{ + switch(*arg) { + case 'o': + if (!strcmp (arg, "o")) { // "-o foo" + next++; + if (!*next) + die("argument to '-o' is missing"); + output = *next; + } else { // "-ofoo" + output = ++arg; + } + break; + case 'm': + if (!strcmp (arg, "m")) { // "-m foo" + next++; + if (!*next) + die("argument to '-m' is missing"); + } + break; + case 'T': + if (!strcmp (arg, "m")) { // "-T foo" + next++; + if (!*next) + die("argument to '-T' is missing"); + } + break; + } + + return next; +} + +char *input_name(const char *file) +{ + static char buf[PATH_MAX+1]; + snprintf(buf, PATH_MAX, "./%s.sparse.so", file); + return buf; +} + +int main(int argc, char **argv) +{ + char **args; + const char sym_type_tbl[] = {'F', 'D', 'O'}; /* Func, Data, Other */ + struct string_list *file_list = NULL; + struct sold_symbol_list *out_symbols = NULL; + struct sold_symbol *sym; + struct serialization_stream *s; + char *file; + char *input_file; + + + args = argv; + for (;;) { + char *arg = *++args; + if (!arg) + break; + + if (arg[0] == '-') { + if (arg[1]) + args = handle_switch(arg+1, args); + continue; + } + add_ptr_list_notag(&file_list, arg); + } + + output = output ? output : "a.out"; + + printf("output_file = %s\n", output); + + FOR_EACH_PTR_NOTAG(file_list, file) { + void *handle; + struct sold_symbol_list **symbols; + + printf("Input file: %s\n", file); + input_file = input_name(file); + printf("Sparse object: %s\n", input_file); + + handle = dlopen(input_file, RTLD_NOW); + if (!handle) { + fprintf(stderr, "%s: Can't open input file %s. Ignoring it.\n", + argv[0], input_file); + continue; + } + symbols = dlsym(handle, "symbols"); + if (!symbols) { + fprintf(stderr, "%s: %s: this input file does not " + "look like a sparse object file. Ignoring it.\n", + argv[0], input_file); + continue; + } + + FOR_EACH_PTR(*symbols, sym) { + printf ("%c %s from %s\n", sym_type_tbl[sym->type], + sym->name, sym->source->source); + } END_FOR_EACH_PTR (sym); + + concat_sold_sym_list(*symbols, &out_symbols); + } END_FOR_EACH_PTR_NOTAG(file); + + printf("Resulting object file (%s):\n", output); + FOR_EACH_PTR (out_symbols, sym) { + printf ("%c %s from %s\n", sym_type_tbl[sym->type], + sym->name, sym->source->source); + } END_FOR_EACH_PTR (sym); + + s = new_serialization_stream(output); + if (!s) { + perror("Failed to open the serialization stream"); + exit(1); + } + serialize_sold_symbol_list(s, (struct ptr_list *) out_symbols, "symbols"); + fini_serialization_stream(s); + + return 0; +} -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 09/10] Rewrite cgcc, add cld and car to wrap ld and ar 2008-09-03 21:55 ` [PATCH 08/10] Sparse Object Link eDitor alexey.zaytsev @ 2008-09-03 21:55 ` alexey.zaytsev 2008-09-03 21:55 ` [PATCH 10/10] A simple demonstrational program that looks up symbols in sparse object files alexey.zaytsev 0 siblings, 1 reply; 23+ messages in thread From: alexey.zaytsev @ 2008-09-03 21:55 UTC (permalink / raw) To: linux-sparse Cc: Josh Triplett, Chris Li, Codrin Alexandru Grajdeanu, Alexey Zaytsev From: Alexey Zaytsev <alexey.zaytsev@gmail.com> cgcc now compiles the serialized data produced by sparse and also now it is able to handle multiple source files well. cld and car are there to integrate the sparse linker into your build environment, wrapping ld and ar, and compiling the data serialized by the sparse linker. Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> --- Makefile | 2 +- car | 72 ++++++++++++++++++ cgcc | 252 ++++++++++++++++++++++++++++++++++++++++++++++++++++---------- cld | 84 +++++++++++++++++++++ 4 files changed, 369 insertions(+), 41 deletions(-) create mode 100755 car create mode 100755 cld diff --git a/Makefile b/Makefile index 4fa2f82..877634c 100644 --- a/Makefile +++ b/Makefile @@ -29,7 +29,7 @@ PROGRAMS=test-lexing test-parsing obfuscate compile graph sparse test-linearize test-unssa test-dissect ctags serialization-test sold -INST_PROGRAMS=sparse cgcc sold +INST_PROGRAMS=sparse sold cgcc cld car INST_MAN1=sparse.1 cgcc.1 ifeq ($(HAVE_LIBXML),yes) diff --git a/car b/car new file mode 100755 index 0000000..5da892d --- /dev/null +++ b/car @@ -0,0 +1,72 @@ +#!/usr/bin/perl -w + +use strict; + +my $ar = $ENV{'REAL_AR'} || 'ar'; +my $linker = $ENV{'CLD'} || 'cld'; + +my $verbose = 1; +my $do_real_ar = 1; +my $do_ar = 1; + +my $options = ""; + +# We assume the first argument be the option string, +# the second - the output name and all other - input names. + +die "Not enough arguments" if @ARGV < 2; + +foreach(@ARGV) { + if ($_ eq '-no-archive') { + $do_ar = 0; + next; + } + + if ($_ eq '-no-real-archive') { + $do_real_ar = 0; + next; + } + + if ($_ eq '-check-verbose') { + $verbose = 1; + next; + } + + $options .= ' ' . "e_arg ($_); +} + +if ($do_real_ar) { + print STDERR "$ar $options\n" if $verbose; + my $res = system("$ar $options"); + $res >>= 8; + + exit $res if $res; +} + +# We use our existing linker wrapper, as it basically does +# the same thing. +if ($do_ar) { + shift @ARGV; # remove the option string. + my $out = shift @ARGV; + my $inputs = join(" ", @ARGV); # everything else is an input file + + if ($inputs) { + print STDERR "$linker -no-real-link -o $out $inputs"; + system ("$linker -no-real-link -o $out $inputs"); + } +} + +exit 0; + +# ----------------------------------------------------------------------------- +# Simple arg-quoting function. Just adds backslashes when needed. + +sub quote_arg { + my ($arg) = @_; + return "''" if $arg eq ''; + return join ('', + map { + m|^[-a-zA-Z0-9._/,=]+$| ? $_ : "\\" . $_; + } (split (//, $arg))); +} + diff --git a/cgcc b/cgcc index 89adbed..4484d0c 100755 --- a/cgcc +++ b/cgcc @@ -1,75 +1,242 @@ #!/usr/bin/perl -w # ----------------------------------------------------------------------------- +use strict; + my $cc = $ENV{'REAL_CC'} || 'cc'; +my $host_cc = $ENV{'HOST_CC'} || 'cc'; my $check = $ENV{'CHECK'} || 'sparse'; +my $linker = $ENV{'SOLD'} || 'sold'; + +my $cc_args = ""; +my $link_args = ""; + +my @check_args_array; my $m32 = 0; my $m64 = 0; my $has_specs = 0; -my $gendeps = 0; -my $do_check = 0; +my $do_check = 1; +my $follow_check_status = 1; my $do_compile = 1; +my $do_emit_code = 0; +my $do_link = 1; my $verbose = 0; -foreach (@ARGV) { - # Look for a .c file. We don't want to run the checker on .o or .so files - # in the link run. (This simplistic check knows nothing about options - # with arguments, but it seems to do the job.) - $do_check = 1 if /^[^-].*\.c$/; +my @inputs; +my $output; - # Ditto for stdin. - $do_check = 1 if $_ eq '-'; +my $compile_res = 0; +my $check_res = 0; - $m32 = 1 if /^-m32$/; - $m64 = 1 if /^-m64$/; - $gendeps = 1 if /^-M$/; +# split the gcc and sparse options. +while (@ARGV) { + my $quote_arg; + $_ = shift @ARGV; - if (/^-specs=(.*)$/) { - $check .= &add_specs ($1); - $has_specs = 1; - next; + if ($_ eq '-no-compile') { + $do_compile = 0; + next; } - if ($_ eq '-no-compile') { - $do_compile = 0; - next; + if ($_ eq '-no-check') { + $do_check = 0; + next; + } + + # Don't abort the build if the checker fails. + if ($_ eq '-no-follow-check-status') { + $follow_check_status = 0; + next; + } + + if (/^-v+$/) { + $verbose = 1; + next; # You probably didn't mean passing -v to gcc, right? + # Run with REAL_CC='cc -v' in such case. } # If someone adds "-E", don't pre-process twice. - $do_compile = 0 if $_ eq '-E'; + # We can't disable compilation, because the caller may be asking + # gcc for its version (e.g. the Linux kernel does this). + $do_check = 0 if ($_ eq '-E' && $do_compile); - $verbose = 1 if $_ eq '-v'; + # Don't check if called to generate dependencies. + $do_check = 0 if /^-M[MFGPQT]?$/; my $this_arg = ' ' . "e_arg ($_); - $cc .= $this_arg unless &check_only_option ($_); - $check .= $this_arg unless &cc_only_option ($_); + $cc_args .= $this_arg unless &check_only_option ($_); + push @check_args_array, $_ unless &cc_only_option ($_); } -if ($gendeps) { - $do_compile = 1; - $do_check = 0; +if ($do_compile) { + print STDERR "cgcc: $cc $cc_args\n" if $verbose; + $compile_res = system ("$cc $cc_args") >> 8; } if ($do_check) { - if (!$has_specs) { - $check .= &add_specs ('host_arch_specs'); - $check .= &add_specs ('host_os_specs'); + my $check_args = ""; + + my @src_inputs; + my @link_inputs; + my $src_out; + my $link_out; + + while(@check_args_array) { + $_ = shift @check_args_array; + + # There are some options that cause gcc to only print some + # information, an not actually compile anything. + + if (/^--(targer-)?help$/ || + /^-dump(specs|version|machine)$/ || + /^-print-*/) { + + exit $compile_res; + } + + if (/-arch=(\.*)$/) { + $check_args .= &add_specs ($1); + $has_specs = 1; + next; + } + + # Everything that does not start with a dash, or is a dash itself is an input + if (!/^-+\w+/) { + push @inputs, $_; + next; # We do not pass input files to sparse, instead + # we run sparse for each input. + } + + $m32 = 1 if /^-m32$/; + $m64 = 1 if /^-m64$/; + + if ($_ eq '-emit-code') { + $do_emit_code = 1; + } + + $do_link = 0 if $_ eq '-c' or $_ eq '-S'; + + # Output file. + if (/^-o(.*)/) { + if ($1) { # -ofoo + $output = $1; + } else { # -o foo + $output = shift @check_args_array; + if (!$output) { # terminal -o + die("$0: argument to '-o' is missing"); + } + } + next; + } + + if (/^-Wl,/) { + my @args = split /,/; + shift @args; + if ($do_link) { + $link_args .= join(" ", @args) . " "; + } else { + push @inputs, @args; # Yes! Gcc interprets the -Wl options + # as inputs files, when runnign with -c. + } + next; + } + + # Now there are some options that take arguments possibly separated + # with spaces. We have to recognise them to destinguish between + # option arguments and input files. Not really tested... + if (/^-D(.*)$/ || + /^-I(.*)$/ || + /^-B(.*)$/ || + /^-MF(.*)$/ || + /^-MT(.*)$/ || + /^-MQ(.*)$/ || + /^-include$/ || + /^-imacros$/ || + /^-isystem$/) { + $check_args .= ' ' . "e_arg ($_); + if (!$1) { + $check_args .= ' ' . "e_arg (shift @check_args_array); + # sparse should warn is the argument is actually missing. + } + next; + } + + # Ok, just a usual gcc option. + $check_args .= ' ' . "e_arg ($_); } - print "$check\n" if $verbose; - if ($do_compile) { - system ($check); - } else { - exec ($check); + + # gcc can have two types of inputs, sources and objects files. + # The sources we compile, the objects we link if needed + foreach(@inputs) { + if (/(\.[ch]|^-)$/) { + push @src_inputs, $_; + } else { + if ($do_link) { + push @link_inputs, $_; + } else { + warn "$0: $_: input file ignored because linking not performed"; + } + } } -} -if ($do_compile) { - print "$cc\n" if $verbose; - exec ($cc); + if ($output) { + if ($do_link) { + $link_out = $output; + } else { + $src_out = $output; + die "$0: cannot specify -o with -c or -S with multiple files" if @src_inputs > 1; + } + } + + foreach (@src_inputs) { + my $in = $_; + my $out = $_; + + # Won't feed stdin to both gcc and sparse. + if ($in =~ s/^-$//) { + next if $do_compile; + } + + $out =~ s/\.[ch]$/\.o/; + $out =~ s/^-$/sparsedump/; + + # In case output was explicitly specified with -o and no linking + # is performed, there can't be more than one source. + + $out = $src_out ? $src_out : $out; + + $check .= &add_specs ('host_arch_specs'); + $check .= &add_specs ('host_os_specs'); + + print STDERR "cgcc: $check $check_args $in -o $out\n" if $verbose; + + $check_res = system ("$check $check_args $in -o $out") >> 8; + if ($do_emit_code && !$check_res) { + # compile the code generated by sparse. + print "$host_cc -shared -fPIC $out.sparse.c -o $out.sparse.so\n" if $verbose; + system ("$host_cc -shared -fPIC $out.sparse.c -o $out.sparse.so"); + } + + # Add the checked file to the link list. + push @link_inputs, "$out"; + } + + if ($do_emit_code && !$check_res && $do_link && @link_inputs > 0) { + my $out = $link_out ? $link_out : "a.out"; + my $link_cmd = "$linker $link_args -o $out " . join(" ", @link_inputs); + print STDERR "cgcc: $link_cmd\n" if $verbose; + my $res = system($link_cmd); + if ($do_emit_code && !($res >> 8)) { + # compile the code generated by the linker. + print "$host_cc -shared -fPIC $out.sparse.c -o $out.sparse.so\n" if $verbose; + system ("$host_cc -shared -fPIC $out.sparse.c -o $out.sparse.so"); + } + } } -exit 0; +my $res = $follow_check_status ? ($check_res | $compile_res ) : $compile_res; +exit $res; # ----------------------------------------------------------------------------- # Check if an option is for "check" only. @@ -78,6 +245,9 @@ sub check_only_option { my ($arg) = @_; return 1 if $arg =~ /^-W(no-?)?(default-bitfield-sign|one-bit-signed-bitfield|cast-truncate|bitwise|typesign|context|undef|ptr-subtraction-blows|cast-to-as|decl|transparent-union|address-space|enum-mismatch|do-while|old-initializer|non-pointer-null|paren-string|return-void)$/; return 1 if $arg =~ /^-v(no-?)?(entry|dead)$/; + return 1 if $arg =~ /^-emit-code$/; + return 1 if $arg =~ /^-arch=.*$/; + return 1 if $arg =~ /^-v+$/; # You almost certainly don't want to pass -v to gcc. return 0; } @@ -90,6 +260,8 @@ sub cc_only_option { # ones. Don't include it just because a project wants to pass -Wall to cc. # If you really want cgcc to run sparse with -Wall, use # CHECK="sparse -Wall". + + return 1 if $arg =~ /^-specs=.*$/; return 1 if $arg =~ /^-Wall$/; return 0; } @@ -166,7 +338,7 @@ sub float_types { 'EPSILON' => '1.92592994438723585305597794258492732e-34', 'DENORM_MIN' => '6.47517511943802511092443895822764655e-4966', }, - ); + ); my @types = (['FLT','F'], ['DBL',''], ['LDBL','L']); while (@types) { diff --git a/cld b/cld new file mode 100755 index 0000000..95196f4 --- /dev/null +++ b/cld @@ -0,0 +1,84 @@ +#!/usr/bin/perl -w + +use strict; + +my $host_cc = $ENV{'HOST_CC'} || 'cc'; +my $real_linker = $ENV{'REAL_LD'} || 'ld'; +my $linker = $ENV{'SOLD'} || 'sold'; + +my $verbose = 1; +my $linker_options = ""; + +my $do_link = 1; +my $do_real_link = 1; + +my $out; + +while(@ARGV) { + $_ = shift @ARGV; + + if ($_ eq '-no-link') { + $do_link = 0; + next; + } + + if ($_ eq '-no-real-link') { + $do_real_link = 0; + next; + } + + if ($_ eq '-check-verbose') { + $verbose = 1; + next; + } + + # We need to know the output file name to compile the + # generated code. + if (/^-o(.*)/) { + if ($1) { # -ofoo + $out = $1; + } else { # -o foo + $out = shift @ARGV; + if (!$out) { # terminal -o + die("$0: argument to '-o' is missing"); + } + } + next; + } + + $linker_options .= ' ' . "e_arg ($_); +} + +$out = 'a.out' if !$out; + +if ($do_real_link) { + print STDERR "$real_linker $linker_options -o $out\n" if $verbose; + my $res = system ("$real_linker $linker_options -o $out"); + $res >>= 8; + exit $res if $res; +} + +if ($do_link) { + + print STDERR "$linker $linker_options -o $out\n" if $verbose; + my $res = system ("$linker $linker_options -o $out"); + exit 0 if $res >> 8; # Don't fail the whole build. + + # compile the code generated by sold. + print STDERR "$host_cc -shared -fPIC $out.sparse.c -o $out.sparse.so\n" if $verbose; + system ("$host_cc -shared -fPIC $out.sparse.c -o $out.sparse.so"); +} + +exit 0; + +# ----------------------------------------------------------------------------- +# Simple arg-quoting function. Just adds backslashes when needed. + +sub quote_arg { + my ($arg) = @_; + return "''" if $arg eq ''; + return join ('', + map { + m|^[-a-zA-Z0-9._/,=]+$| ? $_ : "\\" . $_; + } (split (//, $arg))); +} -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 10/10] A simple demonstrational program that looks up symbols in sparse object files. 2008-09-03 21:55 ` [PATCH 09/10] Rewrite cgcc, add cld and car to wrap ld and ar alexey.zaytsev @ 2008-09-03 21:55 ` alexey.zaytsev 0 siblings, 0 replies; 23+ messages in thread From: alexey.zaytsev @ 2008-09-03 21:55 UTC (permalink / raw) To: linux-sparse Cc: Josh Triplett, Chris Li, Codrin Alexandru Grajdeanu, Alexey Zaytsev From: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> --- Makefile | 6 +++++- where.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 66 insertions(+), 1 deletions(-) create mode 100644 where.c diff --git a/Makefile b/Makefile index 877634c..446e053 100644 --- a/Makefile +++ b/Makefile @@ -26,7 +26,7 @@ INCLUDEDIR=$(PREFIX)/include PKGCONFIGDIR=$(LIBDIR)/pkgconfig PROGRAMS=test-lexing test-parsing obfuscate compile graph sparse test-linearize example \ - test-unssa test-dissect ctags serialization-test sold + test-unssa test-dissect ctags serialization-test sold where INST_PROGRAMS=sparse sold cgcc cld car @@ -141,6 +141,9 @@ serialization-test: serialization-test.o $(LIBS) sold: sold.o $(LIBS) $(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $< $(LIBS) -ldl +where: where.o $(LIBS) + $(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $< $(LIBS) -ldl + $(LIB_FILE): $(LIB_OBJS) $(QUIET_AR)$(AR) rcs $@ $(LIB_OBJS) @@ -195,6 +198,7 @@ serialization.o: $(LIB_H) serialization-test.o: $(LIB_H) link.o: $(LIB_H) sold.o: $(LIB_H) +where: $(LIB_H) pre-process.h: $(QUIET_GEN)echo "#define GCC_INTERNAL_INCLUDE \"`$(CC) -print-file-name=`\"" > pre-process.h diff --git a/where.c b/where.c new file mode 100644 index 0000000..03241f9 --- /dev/null +++ b/where.c @@ -0,0 +1,61 @@ + +#include <stdio.h> +#include <dlfcn.h> +#include <stdlib.h> +#include <limits.h> + +#include "lib.h" +#include "link.h" + +int main(int argc, char **argv) +{ + void *handle; + char *err; + struct sold_symbol *sym; + struct sold_symbol_list **symbols; + const char sym_type_tbl[] = {'F', 'D', 'O'}; + char input[PATH_MAX]; + + + if (argc < 2) { + fprintf(stderr, "usage: %s <sparse_object_file> [symbol_name]\n", argv[0]); + exit(1); + } + + snprintf(input, PATH_MAX, "./%s", argv[1]); + + dlerror(); + handle = dlopen(input, RTLD_NOW); + if (!handle) { + fprintf(stderr, "%s: Can't open input file %s: %s\n", + argv[0], input, dlerror()); + exit(1); + } + + symbols = dlsym(handle, "symbols"); + err = dlerror(); + if (!symbols) { + if (err) { + fprintf(stderr, "%s: Can't process the input file %s: %s\n", + argv[0], input, err); + exit(1); + } + /* empty symbol list. */ + exit(0); + } + + if (argc > 2) {/* Look up the symbol */ + FOR_EACH_PTR(*symbols, sym) { + if (!strcmp(argv[2], sym->name)) + printf("%c %s %s\n", sym_type_tbl[sym->type], + sym->name, sym->source->source); + } END_FOR_EACH_PTR(sym); + } else {/* Just list all symbols */ + FOR_EACH_PTR(*symbols, sym) { + printf("%c %s %s\n", sym_type_tbl[sym->type], + sym->name, sym->source->source); + } END_FOR_EACH_PTR(sym); + } + + return 0; +} -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 23+ messages in thread
[parent not found: <70318cbf0809031808u8610f3h4b3d53a7b76a7799@mail.gmail.com>]
* Fwd: [PATCH 0/10] Sparse linker [not found] ` <70318cbf0809031808u8610f3h4b3d53a7b76a7799@mail.gmail.com> @ 2008-09-04 1:16 ` Christopher Li 2008-09-04 1:54 ` Tommy Thorn 2008-09-04 4:03 ` Alexey Zaytsev 0 siblings, 2 replies; 23+ messages in thread From: Christopher Li @ 2008-09-04 1:16 UTC (permalink / raw) To: linux-sparse, Josh Triplett, Codrin Alexandru Grajdeanu, alexey.zaytsev Oops, forget to CC the list. Chris ---------- Forwarded message ---------- From: Chris Li <sparse@chrisli.org> Date: Wed, Sep 3, 2008 at 6:08 PM Subject: Re: [PATCH 0/10] Sparse linker To: alexey.zaytsev@gmail.com On Wed, Sep 3, 2008 at 2:55 PM, <alexey.zaytsev@gmail.com> wrote: > more on the subject, I now agree that we should include the > intermediate code representation into the object files. Good. > for this is a four byte overhead prepended to every > serializable structure by the allocation wrapper. Also, you I would rather not have that 4 byte prepended to every structure. Serialize is just one short stage of the life cycle of those c structures. Having the permanent extra space for just that is unnecessary. That 4 bytes meta data also limits what C structure you can work on. All you need is being able to map a point into some serialize object to keep track which object is tracked and which one is not. After you serialized the data. The meta data can be drop completely. So the price to pay is for every unknown object pointer, you need to do a dictionary look up. Only during the dumping stage. But that price is actually very small, when you dumping objects. You are mostly limit by the disk any way. The plus side is: you can work with any objects. You don't need to waste extra memory for serialization when you are not doing serialization. You can leave the object allocation code unchanged. > have to use a macro when declaring a serializable structure > (or an array of such) statically. One limitation I was unable > to overcome is the inability to work with structures used both > stand-alone and embedded into bigger ones. Luckily, we have no Like the ctype member inside the "struct symbol"? > list and serializing it [PATCH 07]. The linker needs to dlopen Do you use the stander share library for dlopen and dynamic linking the sparse objects? > the input "sparse objects", merge the symbol lists, and > serialize the result [PATCH 08]. The generated code compilation > is handled by the cgcc, cld and car wrappers [PATCH 09]. To > look up symbols in sparse object files, a simple program is > included [PATCH 10]. Do you dump your sparse object in ELF format? Chris ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Fwd: [PATCH 0/10] Sparse linker 2008-09-04 1:16 ` Fwd: [PATCH 0/10] Sparse linker Christopher Li @ 2008-09-04 1:54 ` Tommy Thorn 2008-09-04 4:03 ` Alexey Zaytsev 1 sibling, 0 replies; 23+ messages in thread From: Tommy Thorn @ 2008-09-04 1:54 UTC (permalink / raw) To: Christopher Li Cc: linux-sparse, Josh Triplett, Codrin Alexandru Grajdeanu, alexey.zaytsev Christopher Li wrote: > I would rather not have that 4 byte prepended to every > structure. Serialize is just one short stage of the life cycle > of those c structures. Having the permanent extra space > for just that is unnecessary. That 4 bytes meta data also > limits what C structure you can work on. All you need > is being able to map a point into some serialize object > to keep track which object is tracked and which one is not. > > After you serialized the data. The meta data can be drop > completely. So the price to pay is for every unknown object > pointer, you need to do a dictionary look up. Only during > the dumping stage. But that price is actually very small, > when you dumping objects. You are mostly limit by the disk > any way. The plus side is: you can work with any objects. > You don't need to waste extra memory for serialization > when you are not doing serialization. You can leave the > object allocation code unchanged. > I concur and just wanted to point out that this technique has been used in the garbage collector for functional languages for the same reason: the type information is very small and almost completely static; no need to replicate it all over the data. It does make marshaling (this is the common terminology for what Alex calls "serialization") slightly more complicated. Tommy ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/10] Sparse linker 2008-09-04 1:16 ` Fwd: [PATCH 0/10] Sparse linker Christopher Li 2008-09-04 1:54 ` Tommy Thorn @ 2008-09-04 4:03 ` Alexey Zaytsev 2008-09-04 7:27 ` Christopher Li 1 sibling, 1 reply; 23+ messages in thread From: Alexey Zaytsev @ 2008-09-04 4:03 UTC (permalink / raw) To: Christopher Li; +Cc: linux-sparse, Josh Triplett, Codrin Alexandru Grajdeanu On Thu, Sep 4, 2008 at 5:16 AM, Christopher Li <sparse@chrisli.org> wrote: > Oops, forget to CC the list. > > Chris > > ---------- Forwarded message ---------- > From: Chris Li <sparse@chrisli.org> > Date: Wed, Sep 3, 2008 at 6:08 PM > Subject: Re: [PATCH 0/10] Sparse linker > To: alexey.zaytsev@gmail.com > > > On Wed, Sep 3, 2008 at 2:55 PM, <alexey.zaytsev@gmail.com> wrote: > >> more on the subject, I now agree that we should include the >> intermediate code representation into the object files. > > Good. > >> for this is a four byte overhead prepended to every >> serializable structure by the allocation wrapper. Also, you > > I would rather not have that 4 byte prepended to every > structure. Serialize is just one short stage of the life cycle > of those c structures. Having the permanent extra space > for just that is unnecessary. That 4 bytes meta data also > limits what C structure you can work on. All you need > is being able to map a point into some serialize object > to keep track which object is tracked and which one is not. > > After you serialized the data. The meta data can be drop > completely. So the price to pay is for every unknown object > pointer, you need to do a dictionary look up. Only during > the dumping stage. But that price is actually very small, > when you dumping objects. You are mostly limit by the disk > any way. The plus side is: you can work with any objects. > You don't need to waste extra memory for serialization > when you are not doing serialization. You can leave the > object allocation code unchanged. Thanks for the comment, I will look into this idea. I just realized that I'm actually unable to serialize stand-alone arrays right now. The array members should be there, but not sequentially. > >> have to use a macro when declaring a serializable structure >> (or an array of such) statically. One limitation I was unable >> to overcome is the inability to work with structures used both >> stand-alone and embedded into bigger ones. Luckily, we have no > > Like the ctype member inside the "struct symbol"? I still hope I made no mistake that struct ctype is used stand-alone only as a temporary variable, and is never pointed to from an other data structure that we would like to serialize. > >> list and serializing it [PATCH 07]. The linker needs to dlopen > > Do you use the stander share library for dlopen and dynamic linking > the sparse objects? If I understand the question right, no. Every "sparse object" .so has a "struct ptr_list *symbols" entry (in fact, the only non-static entry) that points to the serialized ptr list of the "struct sold_symbol". The linker dlopen()'s the .so and hooks to the entry, for every input object file. After that, it simply calls ptr_list_concat() on the opened symbol lists, and serializes the resulting combined list. There is of course nothing wrong if we modify the data obtained from the .so, as it is cow-mmaped. > > >> the input "sparse objects", merge the symbol lists, and >> serialize the result [PATCH 08]. The generated code compilation >> is handled by the cgcc, cld and car wrappers [PATCH 09]. To >> look up symbols in sparse object files, a simple program is >> included [PATCH 10]. > > Do you dump your sparse object in ELF format? Well, I serialize the data into C, and then compile it into .so, if that was the question. You might want to apply the first patch and look at the serialization-test output. > > Chris > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/10] Sparse linker 2008-09-04 4:03 ` Alexey Zaytsev @ 2008-09-04 7:27 ` Christopher Li 2008-09-04 9:41 ` Alexey Zaytsev 0 siblings, 1 reply; 23+ messages in thread From: Christopher Li @ 2008-09-04 7:27 UTC (permalink / raw) To: Alexey Zaytsev; +Cc: linux-sparse, Josh Triplett, Codrin Alexandru Grajdeanu On Wed, Sep 3, 2008 at 9:03 PM, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: > If I understand the question right, no. Every "sparse object" .so has a > "struct ptr_list *symbols" entry (in fact, the only non-static entry) that > points to the serialized ptr list of the "struct sold_symbol". The linker > dlopen()'s the .so and hooks to the entry, for every input object file. > After that, it simply calls ptr_list_concat() on the opened symbol lists, > and serializes the resulting combined list. There is of course nothing > wrong if we modify the data obtained from the .so, as it is cow-mmaped. ... > Well, I serialize the data into C, and then compile it into .so, if > that was the question. You might want to apply the first patch > and look at the serialization-test output. OK. I just realized that you are building a completely different kind of "linker" than I have in mind. Generate C source file and let gcc to compile and link it is an interesting idea. But I think it is a step back wards. For starts, how do you handle the case that the symbol from your input file have conflict on the function define in the loader itself? If I understand your plan correctly, I don't see how it can handle the following case: file a.c: void foo(void) { printf("%p\n", &bar); } file b.c bar() { printf("%p\n", &foo); } So do you put the extern symbol in your .so as symbol as well? If you don't, how do you link the extern symbol to where it is defined? If you do, then you can't load a.so alone because "bar" symbol is not resolved. And you can't load b.so because "foo" symbol can't resolve. If you link a.o and b.o together into ab.so. Then you pretty much need to link the whole linux kernel (except the modules) into one flat file. It defeat the purpose of having the linker and loader to load single sparse object file one at a time. Ideally the checker should be able to dynamic load the sparse object file only when it is needed. Chris ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/10] Sparse linker 2008-09-04 7:27 ` Christopher Li @ 2008-09-04 9:41 ` Alexey Zaytsev 2008-09-04 10:35 ` Christopher Li 0 siblings, 1 reply; 23+ messages in thread From: Alexey Zaytsev @ 2008-09-04 9:41 UTC (permalink / raw) To: Christopher Li; +Cc: linux-sparse, Josh Triplett, Codrin Alexandru Grajdeanu On Thu, Sep 4, 2008 at 11:27 AM, Christopher Li <sparse@chrisli.org> wrote: > On Wed, Sep 3, 2008 at 9:03 PM, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: >> If I understand the question right, no. Every "sparse object" .so has a >> "struct ptr_list *symbols" entry (in fact, the only non-static entry) that >> points to the serialized ptr list of the "struct sold_symbol". The linker >> dlopen()'s the .so and hooks to the entry, for every input object file. >> After that, it simply calls ptr_list_concat() on the opened symbol lists, >> and serializes the resulting combined list. There is of course nothing >> wrong if we modify the data obtained from the .so, as it is cow-mmaped. > ... >> Well, I serialize the data into C, and then compile it into .so, if >> that was the question. You might want to apply the first patch >> and look at the serialization-test output. > > OK. I just realized that you are building a completely different kind > of "linker" than I have in mind. > > Generate C source file and let gcc to compile and link it is an > interesting idea. But I think it is a step back wards. > No, that's not how it works. ;) Please compile and run the code. And look at what is actually generated. Or wait a bit, I'll try to describe the serialization process in more detail. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/10] Sparse linker 2008-09-04 9:41 ` Alexey Zaytsev @ 2008-09-04 10:35 ` Christopher Li 2008-09-04 13:29 ` Alexey Zaytsev 0 siblings, 1 reply; 23+ messages in thread From: Christopher Li @ 2008-09-04 10:35 UTC (permalink / raw) To: Alexey Zaytsev; +Cc: linux-sparse, Josh Triplett, Codrin Alexandru Grajdeanu On Thu, Sep 4, 2008 at 2:41 AM, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: > No, that's not how it works. ;) > Please compile and run the code. And look at what is actually generated. > Or wait a bit, I'll try to describe the serialization process in more detail. > I did. It generate C *source* code like this: =============cut ============= #include "test.sparse_declarations.c" #define NULL ((void *)0) static struct a_wrapper __a_0 = { .payload = { .d = 1, .b_ptr = &__b_0.payload, }, }; static struct b_wrapper __b_0 = { .payload = { .k = 11, .a_ptr = &__a_1.payload, }, }; ============ paste =========== I assume you intend to use a real compiler(gcc) to compile and link that code, no? I haven't fully understand how you use that piece of C code. But my gut feeling is that we shouldn't need to do that C source code generation at all. Chris ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/10] Sparse linker 2008-09-04 10:35 ` Christopher Li @ 2008-09-04 13:29 ` Alexey Zaytsev 2008-09-04 13:35 ` Alexey Zaytsev 0 siblings, 1 reply; 23+ messages in thread From: Alexey Zaytsev @ 2008-09-04 13:29 UTC (permalink / raw) To: Christopher Li; +Cc: linux-sparse, Josh Triplett, Codrin Alexandru Grajdeanu On Thu, Sep 4, 2008 at 2:35 PM, Christopher Li <sparse@chrisli.org> wrote: > On Thu, Sep 4, 2008 at 2:41 AM, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: >> No, that's not how it works. ;) >> Please compile and run the code. And look at what is actually generated. >> Or wait a bit, I'll try to describe the serialization process in more detail. >> > > I did. It generate C *source* code like this: > > =============cut ============= > #include "test.sparse_declarations.c" > > #define NULL ((void *)0) > static struct a_wrapper __a_0 = { > .payload = { > .d = 1, > .b_ptr = &__b_0.payload, > }, > }; > static struct b_wrapper __b_0 = { > .payload = { > .k = 11, > .a_ptr = &__a_1.payload, > }, > }; > ============ paste =========== > > I assume you intend to use a real compiler(gcc) to compile > and link that code, no? > > I haven't fully understand how you use that piece of C code. But my > gut feeling is that we shouldn't need to do that C source code > generation at all. Ok, let me try to explain how the stuff works. Please note that in fact two files are generated, output.sparse.c and output.sparse_declarations.c. This is required to have only one pass over the serialized data. When we are in the process of serializing "struct a1", and it points to a "struct b2", we can add b2 to the "serialization queue" and dump it after we finish with a1, but we need to have the declaration somewhere before a1, so we add it to the _declarations.c file, and #include it from near the output.sparse.c's start. Now let's look at an example (a simplified version of serialization-test): ===== test.h ===== struct a { int d; struct a *a_ptr; }; DECLARE_WRAPPER(a); ^-- Declares struct a_wrapper {struct serialization_mdata meta; struct a payload;}; and allocation wrapper prototypes. ====== test.c ===== [... helper functions ...] .- That's the actual user-defined serialization function. v All that is needed to serialize any "struct a"; int dump_a(struct serialization_stream *s, struct a *w) { emit_int(s, w, d); <-- dump the int a.d field. emit_ptr(s, w, a, a_ptr); <-- dump the struct a * a.a_ptr field. /* We could choose to not dump some fields, or choose to dump * them conditionally, etc */ return 0; } WRAP(a, "test.h", dump_a); ^-- Defines the allocation wrappers, so that when you call __alloc_a(0), a "struct a_wrapper" is allocated, and a pointer to its payload field (of type "struct a") is returned. Also defines the serialization functions for this type. The second argument is the header that contains the "struct a" and "struct a_swapper" definitions. It's #included into the generated file. Now we allocate a few "struct a" instances, cross-reference them, and call the serialization function on one of them: int main(int argc, char **argv) { struct serialization_stream *s; struct a *aa = __alloc_a(0); struct a *ab = __alloc_a(0); struct a *ac = __alloc_a(0); aa->d = 1; ab->d = 2; ac->d = 3; aa->a_ptr = ab; ab->a_ptr = ac; ac->a_ptr = aa s = new_serialization_stream("test"); serialize_a(s, aa, "aa"); ^- This function was defined through WRAP(a, ...); Look at the DO_WRAP monster from serialization.h: serialize_a() does the following: 1 It calls schedule_a_serialization, that: 88 int schedule_##type_name##_serialization(struct serialization_stream *s,\ 89 type *t) \ 90 { \ 91 struct type_name##_wrapper *w; \ 92 if (!t) \ 93 return 0; /* Tried to serialize a NULL pointer */ \ 94 w = container(t, struct type_name##_wrapper, payload); \ 95 (1.1) if (w->meta.declared) \ 96 return 0; /* Either already serialized or waiting \ 97 * in the queue */ \ 98 (1.2) if (!type_name##_index) \ 99 fprintf(s->declaration_f, \ 100 "\n#include %s\n", #type_header); \ 101 \ 102 w->meta.index = type_name##_index++; \ 103 (1.3) fprintf(s->declaration_f, \ 104 "static struct " #type_name "_wrapper " \ 105 "__" #type_name "_%d;\n", \ 106 w->meta.index); \ 107 w->meta.declared = 1; \ 108 (1.4) return serialization_stream_enqueue(s, w, \ 109 do_serialize_##type_name); \ 110 } 1.1 checks the metadata associated with this instance to see if it was already serialized 1.2 If not, checks if any structure of type "struct a" was serialized, and adds #include "test.h" into the output.sparse_declarations.c. a_index beind a global counter associated with "struct a", that is incrementd every time a "struct a" is being serialized Its values are assigned to the serialized instances. This resulting in: test.sparse_declarations.c:2 #include "serialization-test.h" 1.3 Defines an struct a_wrapper instance in the declarations file and marks the aa instance as being serialized: This resulting in: test.sparse_declarations.c:3 static struct a_wrapper __a_0; 1.4 Calls the serialization_stream_enqueue() function, which allocates a struct serialization_sched_work that binds the aa instance and the dump_a() user-supplied dunmper function (through do_serialize_a()) to the "serialization stream" s. Returns to serialize_a(). 2 Calls process_serialization_queue(), that for every enqueued data instance, calls the do_serialize_##type_name, that was bound to it at step 1.3. The idea is, that if your structure references numerous other structures, they all will be added to the queue by your user-supplied serializer (transparently, through the emit_ptr function) and serialized before the loop exits: 40 int process_serialization_queue(struct serialization_stream *s) 41 { 42 int ret = 0; 43 struct serialization_sched_work *w; 45 while (s->queue) { 46 w = s->queue; 47 s->queue = s->queue->next; 48 ret = w->serializer(s, w->unit); <- Here new structures might get added 49 free(w); ^ to the queue as dependencies. 50 } \ calls do_serialize_a(...) ret = w->serializer(s, w->unit) points at do_serialize_a() that does: 73 static int do_serialize_##type_name(struct serialization_stream *s, \ 74 void *unit) \ 75 { \ 76 struct type_name##_wrapper *w = unit; \ 77 int ret; \ 78 (2.1) fprintf(s->definition_f, "static struct " #type_name "_wrapper "\ 79 "__" #type_name "_%d = {\n\t.payload = {\n", \ 80 w->meta.index); \ 81 (2.2) ret = serializer(s, &w->payload); <-- dump_a() \ 82 (2.3) fprintf(s->definition_f, "\t},\n};\n"); \ 83 if (ret) \ 84 fprintf(stderr, "Warning: Failed to serialize a " #type \ 85 ": %d\n", ret); \ 86 return ret; \ 87 } \ 2.1 In the output.sparse.c file it adds an a_wrapper instance, numbered acording to the index assocoated to the structure (step 1.2): This resulting in: test.sparse.c:4 static struct a_wrapper __a_0 = { test.sparse.c:5 .payload = { Note that __a_0 is derived not from the serialized instance's name (aa), but from the type name and the instance's index. 2.2 Finally runs the user-supplied function (dump_a): That does the following: 2.3.1 calls emit_int to dump the a.d field: emit_int(s, w, d); emit_int being: * 146 #define emit_int(s, parent, field) \ 147 do { \ 148 int __i = parent->field; \ 149 fprintf(s->definition_f, "\t\t." #field " = %d,\n", __i); \ 150 } while (0) and resulting into: test.sparse.c:6 .d = 1, 2.3.2 calls emit_ptr(s, w, a, a_ptr) that does: .------.- Here type being the name of v v the pointed-to type. 172 #define do_emit_ptr(stream, parent, type, type_name, field) ** \ 173 do { \ 174 struct type_name##_wrapper *__w; \ 175 void *__ptr = parent->field; \ 176 (2.3.2.1) if (!__ptr) { \ 177 fprintf(stream->definition_f, \ 178 "\t\t." #field " = NULL,\n"); \ 179 break; \ 180 } \ 181 (2.3.2.2) schedule_##type_name##_serialization(stream, __ptr); \ 182 __w = container(__ptr, struct type_name##_wrapper, payload); \ 183 (2.3.2.3) fprintf(stream->definition_f, "\t\t." #field " = &" \ 184 "__" #type_name "_%d.payload,\n", __w->meta.index); \ 185 } while (0) 2.3.2.1 check the pointer for NULL. 2.3.2.2 Schedules the pointed-to structure for serializetion. The pointed-to structure's is passed as the third argument to emit_ptr(). See point 1 on how schedule_##type_name##_serialization works, resulting into the pointed-to being added to the declatarion file and to the serialization qeueue: test.sparse_declarations.c:4 static struct a_wrapper __a_1; 2.3.2.3 Dumps the requisted field (a_ptr), resulting in: test.sparse.c:7 .a_ptr = &__a_1.payload, __a_1 being the pointed-to structure's wrapper, declared, but not dumped yet. 2.3 After the user-supplied function returns, closes the now serialized structure: test.sparse.c:8 }, test.sparse.c:9 }; 2.4 Now the process_serialization_queue's loop iterates again, as a new instance (ab) was added at the step 2.3.2.2, and again, as this instance references a third struct (ac). ac references the first struct, aa, but schedule_a_serialization() would see at step 1.1 that it was already serialized, and would return right away, leadin to the loop termination. After this, we should have the following data: test.sparse_declarations.c:2 #include "serialization-test.h" test.sparse_declarations.c:3 static struct a_wrapper __a_0; test.sparse_declarations.c:4 static struct a_wrapper __a_1; test.sparse_declarations.c:5 static struct a_wrapper __a_2; test.sparse.c:1 #include "test.sparse_declarations.c" test.sparse.c:2 test.sparse.c:3 #define NULL ((void *)0) test.sparse.c:4 static struct a_wrapper __a_0 = { test.sparse.c:5 .payload = { test.sparse.c:6 .d = 1, test.sparse.c:7 .a_ptr = &__a_1.payload, test.sparse.c:8 }, test.sparse.c:9 }; test.sparse.c:0 static struct a_wrapper __a_1 = { test.sparse.c:1 .payload = { test.sparse.c:2 .d = 2, test.sparse.c:3 .a_ptr = &__a_2.payload, test.sparse.c:4 }, test.sparse.c:5 }; test.sparse.c:6 static struct a_wrapper __a_2 = { test.sparse.c:7 .payload = { test.sparse.c:8 .d = 3, test.sparse.c:9 .a_ptr = &__a_0.payload, test.sparse.c:0 }, test.sparse.c:1 }; 3 At this point, we've got all the data, except it's all static. One final touch is to add a global reference to the structure that we serialized (aa): 117 if (!ret && name) \ 118 ret = label_##type_name##_entry(s, t, name); \ 119 return ret; \ 120 } \ 121 int label_##type_name##_entry(struct serialization_stream *s, type *t, \ 122 const char *name) \ 123 { \ 124 struct type_name##_wrapper *w; \ 125 if (!t) { \ 126 fprintf(s->definition_f, #type " *%s = NULL;", name); \ 127 return 0; \ 128 } \ 129 w = container(t, struct type_name##_wrapper, payload); \ 130 if (!w->meta.declared) { \ 131 fprintf(stderr, "Warning: Trying to label an undefined" \ 132 " '" #type "'\n"); \ 133 return -1; \ 134 } \ 135 fprintf(s->definition_f, #type " *%s = &" \ 136 "__" #type_name "_%d.payload;\n", name, w->meta.index); \ 137 return 0; \ 138 } label_a_entry() doing the job: test.sparse.c:22 struct a *aa = &__a_0.payload; If we decide to call serialize_a() on the other two structures, only the global pointers would be added, as the structure's metadata contains both the definition flag and the instance's index. * emit_int was fixed, it worked only occasionally. ;) ** seems like we don't need to pass the "type" any more. Uff. Seems like that's how it works. Now (or after a bit more looking at the code), it should be clear, that if in the program we have a "struct a", wrapped into a "struct a_wrapper" and being serialized, you would see exactly the same struct appearing in the output file, with the fields you have chosen to serialize. Q: You tried to look smart or what? A: Yes, the work was inspired by the ptr lists, and I hope I managed to beat Linus here, as ptr lists are perfectly serializable. ;) > > Chris > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/10] Sparse linker 2008-09-04 13:29 ` Alexey Zaytsev @ 2008-09-04 13:35 ` Alexey Zaytsev 2008-09-04 19:04 ` Christopher Li 0 siblings, 1 reply; 23+ messages in thread From: Alexey Zaytsev @ 2008-09-04 13:35 UTC (permalink / raw) To: Christopher Li; +Cc: linux-sparse, Josh Triplett, Codrin Alexandru Grajdeanu On Thu, Sep 4, 2008 at 5:29 PM, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: > On Thu, Sep 4, 2008 at 2:35 PM, Christopher Li <sparse@chrisli.org> wrote: >> On Thu, Sep 4, 2008 at 2:41 AM, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: >>> No, that's not how it works. ;) >>> Please compile and run the code. And look at what is actually generated. >>> Or wait a bit, I'll try to describe the serialization process in more detail. >>> >> >> I did. It generate C *source* code like this: >> >> =============cut ============= >> #include "test.sparse_declarations.c" >> >> #define NULL ((void *)0) >> static struct a_wrapper __a_0 = { >> .payload = { >> .d = 1, >> .b_ptr = &__b_0.payload, >> }, >> }; >> static struct b_wrapper __b_0 = { >> .payload = { >> .k = 11, >> .a_ptr = &__a_1.payload, >> }, >> }; >> ============ paste =========== >> >> I assume you intend to use a real compiler(gcc) to compile >> and link that code, no? >> >> I haven't fully understand how you use that piece of C code. But my >> gut feeling is that we shouldn't need to do that C source code >> generation at all. > Ok, let me try to explain how the stuff works. Please note that in Ugh, my pretty code listings got corrupted by the bloody gmail. Here is a better version: http://zaytsev.su/explanation.txt ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/10] Sparse linker 2008-09-04 13:35 ` Alexey Zaytsev @ 2008-09-04 19:04 ` Christopher Li 2008-09-04 20:21 ` Alexey Zaytsev 0 siblings, 1 reply; 23+ messages in thread From: Christopher Li @ 2008-09-04 19:04 UTC (permalink / raw) To: Alexey Zaytsev; +Cc: linux-sparse, Josh Triplett, Codrin Alexandru Grajdeanu On Thu, Sep 4, 2008 at 6:35 AM, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: >> Ok, let me try to explain how the stuff works. Please note that in > > Ugh, my pretty code listings got corrupted by the bloody gmail. > Here is a better version: http://zaytsev.su/explanation.txt Thanks for your detail explain. It just confirm my reading of your code. I stand by my original feedback: - Using C source code as the output format is bad and unnecessary. It depend on gcc to process the intermediate C source file. - Using dlopen to load the module does not have the fine grain control of the which symbol need to resolve and which is doesn't. The linked sparse object code for the whole linux kernel will be huge. Dynamic loading of 300M bytes of .so file is not fun. - I can see you link all the define symbol together that way. In order to do inter-function check effectively, we need the have the reverse mapping as well. It need to perform task like this: "Get me a list of the function who has reference to spin_lock()". If I am writing a spin_lock checker. I can look at who used spin_lock and only load those functions as needed. It is much better than scanning every single one of the kernel function to search for the spin_lock function call. - The extra 4 bytes per structure storage on disk can be eliminated. I agree you need some meta data to track the object before you dump them to the file. But they don't need to be on the disk object at all. If you group same type of object together as an array. The index of the object is implicit as the array index. If the C struct is fixed size. It is trivial to locate the object. If the C struct is variable size, currently on sparse each object knows what size it is. You do need any index array to look it up. But this array can be build on object loading time. They don't have to be on the disk either. Then you can get ride of the wrapper structure on the disk file format all together. The writer patch I send out use those tricks already. You are welcome to poke around it. Chris ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/10] Sparse linker 2008-09-04 19:04 ` Christopher Li @ 2008-09-04 20:21 ` Alexey Zaytsev 2008-09-04 21:24 ` Christopher Li 0 siblings, 1 reply; 23+ messages in thread From: Alexey Zaytsev @ 2008-09-04 20:21 UTC (permalink / raw) To: Christopher Li; +Cc: linux-sparse, Josh Triplett, Codrin Alexandru Grajdeanu On Thu, Sep 4, 2008 at 11:04 PM, Christopher Li <sparse@chrisli.org> wrote: > On Thu, Sep 4, 2008 at 6:35 AM, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: > >>> Ok, let me try to explain how the stuff works. Please note that in >> >> Ugh, my pretty code listings got corrupted by the bloody gmail. >> Here is a better version: http://zaytsev.su/explanation.txt > > Thanks for your detail explain. It just confirm my reading of your > code. I stand by my original feedback: > > - Using C source code as the output format is bad and unnecessary. > It depend on gcc to process the intermediate C source file. > Mostly ack here, but I still think the C code has two advantages over binaries: It's easy to read, and it's an easy way to get the shared library filled with the data, see below. The huge disadvantage is the time and the memory it takes to compile the C code. > - Using dlopen to load the module does not have the fine grain control > of the which symbol need to resolve and which is doesn't. The linked > sparse object code for the whole linux kernel will be huge. Dynamic > loading of 300M bytes of .so file is not fun. Here I have to disagree. Loading the data from an .so might actually the most evfficient method. See, the bulk of data of the .so is simply mmap'ed read-only, with only the GOT being read-write, and when mapping with RTLD_LAZY, the pointers are resolved only when you follow them, completely transparently to us. You don't need the fine-grained control, the OS just does the right thing for you. And if the checker needs to look at the bulk of the data, it cat dlopen with RTLD_NOW. When multiple different checkers are being run over the .so, the bulk of memory is shared between the processes, which I think matters a lot. The memory is cheap, but now the number of cores is growing. E.g. if you've got 4 cores and 4 gigs of RAM, it's only one gig per core, and wasting 300 megabytes per process just to load the data doasn't look like a good idea. > - I can see you link all the define symbol together that way. In order to do > inter-function check effectively, we need the have the reverse mapping > as well. It need to perform task like this: > "Get me a list of the function who has reference to spin_lock()". > > If I am writing a spin_lock checker. I can look at who used spin_lock > and only load those functions as needed. > It is much better than scanning every single one of the kernel function to > search for the spin_lock function call. That should be completely possible with both approaches. I don't see any difference here. > > - The extra 4 bytes per structure storage on disk can be eliminated. > I agree you need some meta data to track the object before you dump > them to the file. But they don't need to be on the disk object at all. Agreed. I'll rethink the implementation. > > If you group same type of object together as an array. The index of the > object is implicit as the array index. If the C struct is fixed size. It is > trivial to locate the object. This way, you don't have the transparency. You either need to load all the data into memory, one structure after the other, and link them together, basically going the same stuff dlopen() does for you, or you'll need to use special functions/macros to access the data from your checker. > > If the C struct is variable size, currently on sparse each object knows > what size it is. You do need any index array to look it up. But this > array can be build on object loading time. They don't have to be on > the disk either. > > Then you can get ride of the wrapper structure on the disk file format > all together. > > The writer patch I send out use those tricks already. You are welcome to > poke around it. I'm looking into it now. Thank you for sharing. One crazy idea is... why can't we actually produce shared object binaries directly... Maybe it won't be all that hard to generate valid ELF... Just crazy probably. > > Chris > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/10] Sparse linker 2008-09-04 20:21 ` Alexey Zaytsev @ 2008-09-04 21:24 ` Christopher Li 2008-09-05 9:49 ` Alexey Zaytsev 0 siblings, 1 reply; 23+ messages in thread From: Christopher Li @ 2008-09-04 21:24 UTC (permalink / raw) To: Alexey Zaytsev; +Cc: linux-sparse, Josh Triplett, Codrin Alexandru Grajdeanu On Thu, Sep 4, 2008 at 1:21 PM, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: > Mostly ack here, but I still think the C code has two advantages over > binaries: It's easy to read, and it's an easy way to get the shared > library filled with the data, see below. It does not stop you to have some parsing tool to generate readable format from the object dump. But using the C source as primary way to dump object is letting the tail whack the dog. The on disk format should be optimized towards easy for checker rather than human to read it. > The huge disadvantage is the time and the memory it takes to compile > the C code. And the run time dependency of gcc. > Here I have to disagree. Loading the data from an .so might actually the > most evfficient method. See, the bulk of data of the .so is simply mmap'ed > read-only, with only the GOT being read-write, and when mapping with > RTLD_LAZY, the pointers are resolved only when you follow them, completely > transparently to us. You don't need the fine-grained control, the OS just does > the right thing for you. And if the checker needs to look at the bulk > of the data, Are you sure? Quote the man page: =================== RTLD_LAZY Perform lazy binding. Only resolve symbols as the code that references them is executed. If the symbol is never referenced, then it is never resolved. (Lazy binding is only performed for function references; references to variables are always immediately bound when the library is loaded.) =================== Your symbol is store as DATA nodes. Not functions. You never EXECUTE your sparse object code. The RTLD_LAZY has ZERO effect on them. All the symbol has to be immediately bounded. How can you tell which data pointer is lazy bound given that all the data value is possible in the pointer? > it cat dlopen with RTLD_NOW. When multiple different checkers are being run > over the .so, the bulk of memory is shared between the processes, which I > think matters a lot. The memory is cheap, but now the number of cores > is growing. > E.g. if you've got 4 cores and 4 gigs of RAM, it's only one gig per > core, and wasting > 300 megabytes per process just to load the data doasn't look like a good idea. Even they are mmaped. Every symbol have to be touch up. So they need to swap in and COW. The COW memory can't be shared between process at all. This is against the tradition of sparse being a small and neat tools. I have to NACK this approach especially I know there is alternative better way to do it. My laptop does not have 4 gigs of ram and it only have one core, but I still want to run the checker as fast as possible on it. > That should be completely possible with both approaches. I don't see any > difference here. I don't think so. See above comment about RTLD_LAZY. > > This way, you don't have the transparency. You either need to load all the > data into memory, one structure after the other, and link them together, > basically going the same stuff dlopen() does for you, or you'll need to > use special functions/macros to access the data from your checker. Yes, it need one bit of information of this symbol has been resolve or not. That does not need to test inside the checker though. The loader can make sure the symbol that the specific checker want are all resolved before it hand it over to the checker. On the typical checking path, there is only very small percent of the data checker care about. Spending CPU and memory on those structure that the checker don't care is a big waste. We don't really need to link them into one big piece of object as long as we can efficiently look up which object contain the symbol I want. I see linking into one big piece of object and have to load them together as serious disadvantage. After the checker is done with object, ideally the checker can release it. I don't want the checker load every objects into memory before it can work on it. This obviously does not scale. > I'm looking into it now. Thank you for sharing. > > One crazy idea is... why can't we actually produce shared object binaries > directly... Maybe it won't be all that hard to generate valid ELF... > Just crazy probably. I don't mind to use ELF format as long as it is simple and easy to use. Keep in mind that the object file format used by sparse has slightly different design goals. I did try ELF a little bit but I did not go very far. Chris ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/10] Sparse linker 2008-09-04 21:24 ` Christopher Li @ 2008-09-05 9:49 ` Alexey Zaytsev 0 siblings, 0 replies; 23+ messages in thread From: Alexey Zaytsev @ 2008-09-05 9:49 UTC (permalink / raw) To: Christopher Li; +Cc: linux-sparse, Josh Triplett, Codrin Alexandru Grajdeanu On Fri, Sep 5, 2008 at 1:24 AM, Christopher Li <sparse@chrisli.org> wrote: > On Thu, Sep 4, 2008 at 1:21 PM, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: >> Mostly ack here, but I still think the C code has two advantages over >> binaries: It's easy to read, and it's an easy way to get the shared >> library filled with the data, see below. > > It does not stop you to have some parsing tool to generate readable > format from the object dump. But using the C source as primary way to > dump object is letting the tail whack the dog. The on disk format should > be optimized towards easy for checker rather than human to read it. > >> The huge disadvantage is the time and the memory it takes to compile >> the C code. > > And the run time dependency of gcc. > >> Here I have to disagree. Loading the data from an .so might actually the >> most evfficient method. See, the bulk of data of the .so is simply mmap'ed >> read-only, with only the GOT being read-write, and when mapping with >> RTLD_LAZY, the pointers are resolved only when you follow them, completely >> transparently to us. You don't need the fine-grained control, the OS just does >> the right thing for you. And if the checker needs to look at the bulk >> of the data, > > Are you sure? > > Quote the man page: > =================== > RTLD_LAZY > Perform lazy binding. Only resolve symbols as the code that > references them is executed. If the symbol is never referenced, then > it is never resolved. (Lazy binding is only performed for function > references; references to variables are always immediately bound when > the library is loaded.) > =================== > > Your symbol is store as DATA nodes. Not functions. You never EXECUTE > your sparse object code. The RTLD_LAZY has ZERO effect on them. All the symbol > has to be immediately bounded. How can you tell which data pointer is lazy bound > given that all the data value is possible in the pointer? > Confirmed, I was wrong. >> it cat dlopen with RTLD_NOW. When multiple different checkers are being run >> over the .so, the bulk of memory is shared between the processes, which I >> think matters a lot. The memory is cheap, but now the number of cores >> is growing. >> E.g. if you've got 4 cores and 4 gigs of RAM, it's only one gig per >> core, and wasting >> 300 megabytes per process just to load the data doasn't look like a good idea. > > Even they are mmaped. Every symbol have to be touch up. So they need > to swap in and COW. The COW memory can't be shared between process > at all. This is against the tradition of sparse being a small and neat tools. And also here. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2008-09-05 9:49 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-03 21:55 [PATCH 0/10] Sparse linker alexey.zaytsev
2008-09-03 21:55 ` [PATCH 01/10] Serialization engine alexey.zaytsev
2008-09-03 21:55 ` [PATCH 02/10] Handle -emit_code and the -o file options alexey.zaytsev
2008-09-03 21:55 ` [PATCH 03/10] Check stdin if no input files given, like cc1 alexey.zaytsev
2008-09-03 21:55 ` [PATCH 04/10] Add char *first_string(struct string_list *) alexey.zaytsev
2008-09-03 21:55 ` [PATCH 05/10] Serializable ptr lists alexey.zaytsev
2008-09-03 21:55 ` [PATCH 06/10] Linker core, serialization and helper functions alexey.zaytsev
2008-09-03 21:55 ` [PATCH 07/10] Let sparse serialize the symbol table of the checked file alexey.zaytsev
2008-09-03 21:55 ` [PATCH 08/10] Sparse Object Link eDitor alexey.zaytsev
2008-09-03 21:55 ` [PATCH 09/10] Rewrite cgcc, add cld and car to wrap ld and ar alexey.zaytsev
2008-09-03 21:55 ` [PATCH 10/10] A simple demonstrational program that looks up symbols in sparse object files alexey.zaytsev
[not found] ` <70318cbf0809031808u8610f3h4b3d53a7b76a7799@mail.gmail.com>
2008-09-04 1:16 ` Fwd: [PATCH 0/10] Sparse linker Christopher Li
2008-09-04 1:54 ` Tommy Thorn
2008-09-04 4:03 ` Alexey Zaytsev
2008-09-04 7:27 ` Christopher Li
2008-09-04 9:41 ` Alexey Zaytsev
2008-09-04 10:35 ` Christopher Li
2008-09-04 13:29 ` Alexey Zaytsev
2008-09-04 13:35 ` Alexey Zaytsev
2008-09-04 19:04 ` Christopher Li
2008-09-04 20:21 ` Alexey Zaytsev
2008-09-04 21:24 ` Christopher Li
2008-09-05 9:49 ` Alexey Zaytsev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).