From: alexey.zaytsev@gmail.com
To: linux-sparse@vger.kernel.org
Cc: Josh Triplett <josh@kernel.org>, Chris Li <christ.li@gmail.com>,
Codrin Alexandru Grajdeanu <grcodal@gmail.com>
Subject: [PATCH 0/10] Sparse linker
Date: Thu, 4 Sep 2008 01:55:44 +0400 [thread overview]
Message-ID: <1220478954-22678-1-git-send-email-alexey.zaytsev@gmail.com> (raw)
Hello.
I've been working on a "sparse linker" this summer as my Google
Summer of Code project. Wasn't neraly as productive as I hoped,
but I've got some results that I would like to share. Moreover,
I plan continuing this work, and would like to hear comments on
what was done so far.
The design didn't change much from what was proposed. We run
sparse to generate a "sparse object" file containing a list of
symbols, then run the "linker" to unite those object files into
bigger ones. This way, in the end we get a file containing all
the global symbols appearing in the program. After learning
more on the subject, I now agree that we should include the
intermediate code representation into the object files.
The implementation is built around a generic serialization
mechanism [PATCH 01]. It handles many sorts of complex data
structures, with pointers, cycles, unions, etc. E.g. it is able
to serialize beasts like the sparse pointer lists. The price
for this is a four byte overhead prepended to every
serializable structure by the allocation wrapper. Also, you
have to use a macro when declaring a serializable structure
(or an array of such) statically. One limitation I was unable
to overcome is the inability to work with structures used both
stand-alone and embedded into bigger ones. Luckily, we have no
such cases in the sparse codebase. The serializer produces C
code, containing the data structures beind serialized. For the
structure definitions, the generated code includes the original
headers, defining the structures. After serializing a bunch of
possibly interconnecded structures, and running cc over the
generated code, one might get a static or dynamic library
containing the copies of the serialized data structures, with
all the pointer interconnections included. This way loading
the data is trivial, and very memory efficient, and the whole
dump-restore process should be totally transparent, e.g. it
should be possible serialize the sparse() output, and run
check_symbols() after loading the data from an other program.
One thing that bothers me, is, if gcc would be able to process
the huge data files, containing all the "code" of bigger
projects like the Linux kernel. Will see.
Being able to serialize any data, generating the symbol lists
becomes as trivial as defining the data structures
corresponding to source files and symbols [PATCH 06], deriving
a symbol list from the sparse output, joining it into a ptr
list and serializing it [PATCH 07]. The linker needs to dlopen
the input "sparse objects", merge the symbol lists, and
serialize the result [PATCH 08]. The generated code compilation
is handled by the cgcc, cld and car wrappers [PATCH 09]. To
look up symbols in sparse object files, a simple program is
included [PATCH 10].
The plan is now to proceed with dumping the linearized code.
Please take a look at the code, ask if anything needs
clarification, and don't hesitate for criticism. If you've got
ideas on how the linker might be extended and used, or
have a different approach to the problem, please drop a message.
You can also look at the code at
http://svcs.cs.pdx.edu/gitweb?p=sparse-soc2008.git;a=shortlog;h=gsoc2008-linker
or grab it from
git://svcs.cs.pdx.edu/git/sparse-soc2008 branch gsoc2008-linker
For those brave that would actually like to see how it works,
that's how I'd run the thing over the sparse codebase:
make CC="cgcc -v -emit-code" LD=cld AR=car
and then
./where sparse.sparse.so linearize_statement
And no, the patches are not ment for mainline inclusion right
now.
P.S:
If you don't like being on the CC list, I'd miss your opinion,
but would drop you from any further notifications on the
project, just drop me a message.
next reply other threads:[~2008-09-03 21:53 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-03 21:55 alexey.zaytsev [this message]
2008-09-03 21:55 ` [PATCH 01/10] Serialization engine alexey.zaytsev
2008-09-03 21:55 ` [PATCH 02/10] Handle -emit_code and the -o file options alexey.zaytsev
2008-09-03 21:55 ` [PATCH 03/10] Check stdin if no input files given, like cc1 alexey.zaytsev
2008-09-03 21:55 ` [PATCH 04/10] Add char *first_string(struct string_list *) alexey.zaytsev
2008-09-03 21:55 ` [PATCH 05/10] Serializable ptr lists alexey.zaytsev
2008-09-03 21:55 ` [PATCH 06/10] Linker core, serialization and helper functions alexey.zaytsev
2008-09-03 21:55 ` [PATCH 07/10] Let sparse serialize the symbol table of the checked file alexey.zaytsev
2008-09-03 21:55 ` [PATCH 08/10] Sparse Object Link eDitor alexey.zaytsev
2008-09-03 21:55 ` [PATCH 09/10] Rewrite cgcc, add cld and car to wrap ld and ar alexey.zaytsev
2008-09-03 21:55 ` [PATCH 10/10] A simple demonstrational program that looks up symbols in sparse object files alexey.zaytsev
[not found] ` <70318cbf0809031808u8610f3h4b3d53a7b76a7799@mail.gmail.com>
2008-09-04 1:16 ` Fwd: [PATCH 0/10] Sparse linker Christopher Li
2008-09-04 1:54 ` Tommy Thorn
2008-09-04 4:03 ` Alexey Zaytsev
2008-09-04 7:27 ` Christopher Li
2008-09-04 9:41 ` Alexey Zaytsev
2008-09-04 10:35 ` Christopher Li
2008-09-04 13:29 ` Alexey Zaytsev
2008-09-04 13:35 ` Alexey Zaytsev
2008-09-04 19:04 ` Christopher Li
2008-09-04 20:21 ` Alexey Zaytsev
2008-09-04 21:24 ` Christopher Li
2008-09-05 9:49 ` Alexey Zaytsev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1220478954-22678-1-git-send-email-alexey.zaytsev@gmail.com \
--to=alexey.zaytsev@gmail.com \
--cc=christ.li@gmail.com \
--cc=grcodal@gmail.com \
--cc=josh@kernel.org \
--cc=linux-sparse@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).