linux-sparse.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Alexey Zaytsev" <alexey.zaytsev@gmail.com>
To: Christopher Li <sparse@chrisli.org>
Cc: linux-sparse@vger.kernel.org, Josh Triplett <josh@kernel.org>,
	Codrin Alexandru Grajdeanu <grcodal@gmail.com>
Subject: Re: [PATCH 0/10] Sparse linker
Date: Fri, 5 Sep 2008 00:21:10 +0400	[thread overview]
Message-ID: <f19298770809041321p53e81b93i6a6f62395b53af88@mail.gmail.com> (raw)
In-Reply-To: <70318cbf0809041204y75fa8f58vd6d1cfc7317b4fff@mail.gmail.com>

On Thu, Sep 4, 2008 at 11:04 PM, Christopher Li <sparse@chrisli.org> wrote:
> On Thu, Sep 4, 2008 at 6:35 AM, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
>
>>> Ok, let me try to explain how the stuff works. Please note that in
>>
>> Ugh, my pretty code listings got corrupted by the bloody gmail.
>> Here is a better version: http://zaytsev.su/explanation.txt
>
> Thanks for your detail explain. It just confirm my reading of your
> code. I stand by my original feedback:
>
> - Using C source code as the output format is bad and unnecessary.
>  It depend on gcc to process the intermediate C source file.
>
Mostly ack here, but I still think the C code has two advantages over
binaries: It's easy to read, and it's an easy way to get the shared
library filled with the data, see below.

The huge disadvantage is the time and the memory it takes to compile
the C code.

> - Using dlopen to load the module does not have the fine grain control
>  of the which symbol need to resolve and which is doesn't. The linked
>  sparse object code for the whole linux kernel will be huge. Dynamic
>  loading of 300M bytes of .so file is not fun.

Here I have to disagree. Loading the data from an .so might actually the
most evfficient method. See, the bulk of data of the .so is simply mmap'ed
read-only, with only the GOT being read-write, and when mapping with
RTLD_LAZY, the pointers are resolved only when you follow them, completely
transparently to us. You don't need the fine-grained control, the OS just does
the right thing for you. And if the checker needs to look at the bulk
of the data,
it cat dlopen with RTLD_NOW. When multiple different checkers are being run
over the .so, the bulk of memory is shared between the processes, which I
think matters a lot. The memory is cheap, but now the number of cores
is growing.
E.g. if you've got 4 cores and 4 gigs of RAM, it's only one gig per
core, and wasting
300 megabytes per process just to load the data doasn't look like a good idea.

> - I can see you link all the define symbol together that way. In order to do
>  inter-function check effectively, we need the have the reverse mapping
>  as well. It need to perform task like this:
>  "Get me a list of the function who has reference to spin_lock()".
>
>  If I am writing a spin_lock checker.  I can look at who used spin_lock
>  and only load those functions as needed.
>  It is much better than scanning every single one of the kernel function to
>  search for the spin_lock function call.

That should be completely possible with both approaches. I don't see any
difference here.

>
> - The extra 4 bytes per structure storage on disk can be eliminated.
>  I agree you need some meta data to track the object before you dump
>  them to the file. But they don't need to be on the disk object at all.

Agreed. I'll rethink the implementation.

>
>  If you group same type of object together as an array. The index of the
>  object is implicit as the array index. If the C struct is fixed size. It is
>  trivial to locate the object.

This way, you don't have the transparency. You either need to load all the
data into memory, one structure after the other, and link them together,
basically going the same stuff dlopen() does for you,  or you'll need to
use special functions/macros to access the data from your checker.

>
>  If the C struct is variable size, currently on sparse each object knows
>  what size it is. You do need any index array to look it up. But this
>  array can be build on object loading time. They don't have to be on
>  the disk either.
>
>  Then you can get ride of the wrapper structure on the disk file format
>  all together.
>
>  The writer patch I send out use those tricks already. You are welcome to
>  poke around it.

I'm looking into it now. Thank you for sharing.

One crazy idea is... why can't we actually produce shared object binaries
directly... Maybe it won't be all that hard to generate valid ELF...
Just crazy probably.

>
> Chris
>

  reply	other threads:[~2008-09-04 20:21 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-03 21:55 [PATCH 0/10] Sparse linker alexey.zaytsev
2008-09-03 21:55 ` [PATCH 01/10] Serialization engine alexey.zaytsev
2008-09-03 21:55   ` [PATCH 02/10] Handle -emit_code and the -o file options alexey.zaytsev
2008-09-03 21:55     ` [PATCH 03/10] Check stdin if no input files given, like cc1 alexey.zaytsev
2008-09-03 21:55       ` [PATCH 04/10] Add char *first_string(struct string_list *) alexey.zaytsev
2008-09-03 21:55         ` [PATCH 05/10] Serializable ptr lists alexey.zaytsev
2008-09-03 21:55           ` [PATCH 06/10] Linker core, serialization and helper functions alexey.zaytsev
2008-09-03 21:55             ` [PATCH 07/10] Let sparse serialize the symbol table of the checked file alexey.zaytsev
2008-09-03 21:55               ` [PATCH 08/10] Sparse Object Link eDitor alexey.zaytsev
2008-09-03 21:55                 ` [PATCH 09/10] Rewrite cgcc, add cld and car to wrap ld and ar alexey.zaytsev
2008-09-03 21:55                   ` [PATCH 10/10] A simple demonstrational program that looks up symbols in sparse object files alexey.zaytsev
     [not found] ` <70318cbf0809031808u8610f3h4b3d53a7b76a7799@mail.gmail.com>
2008-09-04  1:16   ` Fwd: [PATCH 0/10] Sparse linker Christopher Li
2008-09-04  1:54     ` Tommy Thorn
2008-09-04  4:03     ` Alexey Zaytsev
2008-09-04  7:27       ` Christopher Li
2008-09-04  9:41         ` Alexey Zaytsev
2008-09-04 10:35           ` Christopher Li
2008-09-04 13:29             ` Alexey Zaytsev
2008-09-04 13:35               ` Alexey Zaytsev
2008-09-04 19:04                 ` Christopher Li
2008-09-04 20:21                   ` Alexey Zaytsev [this message]
2008-09-04 21:24                     ` Christopher Li
2008-09-05  9:49                       ` Alexey Zaytsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f19298770809041321p53e81b93i6a6f62395b53af88@mail.gmail.com \
    --to=alexey.zaytsev@gmail.com \
    --cc=grcodal@gmail.com \
    --cc=josh@kernel.org \
    --cc=linux-sparse@vger.kernel.org \
    --cc=sparse@chrisli.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).