Re: [PATCH 0/10] Sparse linker

linux-sparse.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Christopher Li" <sparse@chrisli.org>
To: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Cc: linux-sparse@vger.kernel.org, Josh Triplett <josh@kernel.org>,
	Codrin Alexandru Grajdeanu <grcodal@gmail.com>
Subject: Re: [PATCH 0/10] Sparse linker
Date: Thu, 4 Sep 2008 14:24:57 -0700	[thread overview]
Message-ID: <70318cbf0809041424n1a773e0t3a68414a44ce79f3@mail.gmail.com> (raw)
In-Reply-To: <f19298770809041321p53e81b93i6a6f62395b53af88@mail.gmail.com>

On Thu, Sep 4, 2008 at 1:21 PM, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> Mostly ack here, but I still think the C code has two advantages over
> binaries: It's easy to read, and it's an easy way to get the shared
> library filled with the data, see below.

It does not stop you to have some parsing tool to generate readable
format from the object dump. But using the C source as primary way to
dump object is letting the tail whack the dog. The on disk format should
be optimized towards easy for checker rather than human to read it.

> The huge disadvantage is the time and the memory it takes to compile
> the C code.

And the run time dependency of gcc.

> Here I have to disagree. Loading the data from an .so might actually the
> most evfficient method. See, the bulk of data of the .so is simply mmap'ed
> read-only, with only the GOT being read-write, and when mapping with
> RTLD_LAZY, the pointers are resolved only when you follow them, completely
> transparently to us. You don't need the fine-grained control, the OS just does
> the right thing for you. And if the checker needs to look at the bulk
> of the data,

Are you sure?

Quote the man page:
===================
RTLD_LAZY
    Perform lazy binding. Only resolve symbols as the code that
references them is executed. If the symbol is never referenced, then
it is never resolved. (Lazy binding is only performed for function
references; references to variables are always immediately bound when
the library is loaded.)
===================

Your symbol is store as DATA  nodes. Not functions. You never EXECUTE
your sparse object code. The RTLD_LAZY has ZERO effect on them. All the symbol
has to be immediately bounded. How can you tell which data pointer is lazy bound
given that all the data value is possible in the pointer?

> it cat dlopen with RTLD_NOW. When multiple different checkers are being run
> over the .so, the bulk of memory is shared between the processes, which I
> think matters a lot. The memory is cheap, but now the number of cores
> is growing.
> E.g. if you've got 4 cores and 4 gigs of RAM, it's only one gig per
> core, and wasting
> 300 megabytes per process just to load the data doasn't look like a good idea.

Even they are mmaped. Every symbol have to be touch up. So they need
to swap in and COW. The COW memory can't be shared between process
at all.  This is against the tradition of sparse being a small and neat tools.

I have to NACK this approach especially I know there is alternative better
way to do it. My laptop does not have 4 gigs of ram and it only have one core,
but I still want to run the checker as fast as possible on it.

> That should be completely possible with both approaches. I don't see any
> difference here.

I don't think so. See above comment about RTLD_LAZY.

>
> This way, you don't have the transparency. You either need to load all the
> data into memory, one structure after the other, and link them together,
> basically going the same stuff dlopen() does for you,  or you'll need to
> use special functions/macros to access the data from your checker.

Yes, it need one bit of information of this symbol has been resolve
or not.

That does not need to test inside the checker though.
The loader can make sure the symbol that the specific checker want
are all resolved before it hand it over to the checker. On the typical checking
path, there is only very small percent of the data checker care about.
Spending CPU and memory on those structure that the checker  don't
care is a big waste.

We don't really need to link them into one big piece of object as long as
we can efficiently look up which object contain the symbol I want.
I see linking into one big piece of object and have to load them together
as serious disadvantage.

After the checker is done with object, ideally the checker can release it.
I don't want the checker load every objects into memory before it
can work on it. This obviously does not scale.

> I'm looking into it now. Thank you for sharing.
>
> One crazy idea is... why can't we actually produce shared object binaries
> directly... Maybe it won't be all that hard to generate valid ELF...
> Just crazy probably.

I don't mind to use ELF format as long as it is simple and easy to
use. Keep in mind that the object file format used by sparse has
slightly different design goals. I did try ELF a little bit but I did not go
very far.

Chris

next prev parent reply	other threads:[~2008-09-04 21:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-03 21:55 [PATCH 0/10] Sparse linker alexey.zaytsev
2008-09-03 21:55 ` [PATCH 01/10] Serialization engine alexey.zaytsev
2008-09-03 21:55   ` [PATCH 02/10] Handle -emit_code and the -o file options alexey.zaytsev
2008-09-03 21:55     ` [PATCH 03/10] Check stdin if no input files given, like cc1 alexey.zaytsev
2008-09-03 21:55       ` [PATCH 04/10] Add char *first_string(struct string_list *) alexey.zaytsev
2008-09-03 21:55         ` [PATCH 05/10] Serializable ptr lists alexey.zaytsev
2008-09-03 21:55           ` [PATCH 06/10] Linker core, serialization and helper functions alexey.zaytsev
2008-09-03 21:55             ` [PATCH 07/10] Let sparse serialize the symbol table of the checked file alexey.zaytsev
2008-09-03 21:55               ` [PATCH 08/10] Sparse Object Link eDitor alexey.zaytsev
2008-09-03 21:55                 ` [PATCH 09/10] Rewrite cgcc, add cld and car to wrap ld and ar alexey.zaytsev
2008-09-03 21:55                   ` [PATCH 10/10] A simple demonstrational program that looks up symbols in sparse object files alexey.zaytsev
     [not found] ` <70318cbf0809031808u8610f3h4b3d53a7b76a7799@mail.gmail.com>
2008-09-04  1:16   ` Fwd: [PATCH 0/10] Sparse linker Christopher Li
2008-09-04  1:54     ` Tommy Thorn
2008-09-04  4:03     ` Alexey Zaytsev
2008-09-04  7:27       ` Christopher Li
2008-09-04  9:41         ` Alexey Zaytsev
2008-09-04 10:35           ` Christopher Li
2008-09-04 13:29             ` Alexey Zaytsev
2008-09-04 13:35               ` Alexey Zaytsev
2008-09-04 19:04                 ` Christopher Li
2008-09-04 20:21                   ` Alexey Zaytsev
2008-09-04 21:24                     ` Christopher Li [this message]
2008-09-05  9:49                       ` Alexey Zaytsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=70318cbf0809041424n1a773e0t3a68414a44ce79f3@mail.gmail.com \
    --to=sparse@chrisli.org \
    --cc=alexey.zaytsev@gmail.com \
    --cc=grcodal@gmail.com \
    --cc=josh@kernel.org \
    --cc=linux-sparse@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).