* Disassembly of 00000
@ 2013-02-02 4:53 horseriver
2013-02-02 18:37 ` Brian Raiter
2013-02-03 16:42 ` Robert Plantz
0 siblings, 2 replies; 8+ messages in thread
From: horseriver @ 2013-02-02 4:53 UTC (permalink / raw)
To: linux-assembly
hi:)
I have a question about disassemblly utility .
If I fill an elf's text section with some random data,then
how does the disas command work for these data?
Is there occasion that several sequence of bytes can not be translated
into legal instructions?
thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disassembly of 00000
2013-02-02 4:53 Disassembly of 00000 horseriver
@ 2013-02-02 18:37 ` Brian Raiter
2013-02-02 18:54 ` horseriver
2013-02-03 16:42 ` Robert Plantz
1 sibling, 1 reply; 8+ messages in thread
From: Brian Raiter @ 2013-02-02 18:37 UTC (permalink / raw)
To: linux-assembly
> I have a question about disassemblly utility .
> If I fill an elf's text section with some random data,then
> how does the disas command work for these data?
> Is there occasion that several sequence of bytes can not be translated
> into legal instructions?
Yes, definitely. In those cases a typical disassembler will just mark
the first byte as being literally emitted and try to resume
disassembly at the next byte. For example, using ndisasm v2.07:
$ echo -e '\017zz' | ndisasm -
00000000 0F db 0x0f
00000001 7A7A jpe 0x7d
00000003 0A db 0x0a
b
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disassembly of 00000
2013-02-02 18:37 ` Brian Raiter
@ 2013-02-02 18:54 ` horseriver
2013-02-03 5:44 ` Sofiane Akermoun
[not found] ` <CAN0_x-Kw9TxbD4EHmD1Ns5BJtBXToW_qsXSZWRzEtKxObXMhSQ@mail.gmail.com>
0 siblings, 2 replies; 8+ messages in thread
From: horseriver @ 2013-02-02 18:54 UTC (permalink / raw)
To: linux-assembly
On Sat, Feb 02, 2013 at 10:37:32AM -0800, Brian Raiter wrote:
> > I have a question about disassemblly utility .
> > If I fill an elf's text section with some random data,then
> > how does the disas command work for these data?
> > Is there occasion that several sequence of bytes can not be translated
> > into legal instructions?
>
> Yes, definitely. In those cases a typical disassembler will just mark
> the first byte as being literally emitted and try to resume
> disassembly at the next byte. For example, using ndisasm v2.07:
Thanks!
What is "literally emitted" meaning here?
I guess you mean a const value definition.
How does disassembler check the number of bytes which consist an instruction ?
Can only noe byte tell the instruction's length?
>
> $ echo -e '\017zz' | ndisasm -
> 00000000 0F db 0x0f
> 00000001 7A7A jpe 0x7d
> 00000003 0A db 0x0a
>
> b
> --
> To unsubscribe from this list: send the line "unsubscribe linux-assembly" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disassembly of 00000
2013-02-02 18:54 ` horseriver
@ 2013-02-03 5:44 ` Sofiane Akermoun
[not found] ` <CAN0_x-Kw9TxbD4EHmD1Ns5BJtBXToW_qsXSZWRzEtKxObXMhSQ@mail.gmail.com>
1 sibling, 0 replies; 8+ messages in thread
From: Sofiane Akermoun @ 2013-02-03 5:44 UTC (permalink / raw)
To: horseriver; +Cc: linux-assembly
Hello,
Answers of you previous questions:
How does disassembler check the number of bytes which consist an instruction ?
A disassembler just uses a table. The intel documentation
describes all the instruction, with the method on how do decode it
(not really the method you have to write it of course).
A disassembler should know where to start to disassemble, else he will
start by disassembling bad data.. and probably fail on an unknown or
unlogical instruction, or the worst case everything will be ok.
When i wrote a "disassembler just uses a table".. it is not really
true, it is not true at all.. disassembler use also heuristic engine,
some data, or function, could be store in data section for exemple or
of course data could be stored in .text section. Good disassembler
know how to interpret them, this is why a disassembler know if he is
reading a string or an executable code
Can only one byte tell the instruction's length?
For few instruction yes for most of them.
I sort ,on intel, all opcodes in about 16 famillies. I mean i need 16
different method to get just the size of the current instruction (i
did it for my personnal position independant code length disassembler
written in assembly of course :)).
Some instructions are just 1 byte and go like: push ES, pushf, RETN.,
some are 1 byte + 1 byte or 1 byte + 1 dword/word
Others use the template Prefix byte(0 to 4), opcode bytes (1 to 3),
modReg r/m byte, optionnal scaled indexed bytes, and a displacement or
and or an immediate value eache one are 0 to 4 bytes.
In theory you have to know if an opcode use a modReg byte, and if yes
parse it to know if the following parameters are used.
Sofiane Akermoun
akersof@gmail.com
2013/2/2 horseriver <horserivers@gmail.com>
>
> On Sat, Feb 02, 2013 at 10:37:32AM -0800, Brian Raiter wrote:
> > > I have a question about disassemblly utility .
> > > If I fill an elf's text section with some random data,then
> > > how does the disas command work for these data?
> > > Is there occasion that several sequence of bytes can not be translated
> > > into legal instructions?
> >
> > Yes, definitely. In those cases a typical disassembler will just mark
> > the first byte as being literally emitted and try to resume
> > disassembly at the next byte. For example, using ndisasm v2.07:
>
> Thanks!
> What is "literally emitted" meaning here?
> I guess you mean a const value definition.
> How does disassembler check the number of bytes which consist an instruction ?
> Can only noe byte tell the instruction's length?
>
> >
> > $ echo -e '\017zz' | ndisasm -
> > 00000000 0F db 0x0f
> > 00000001 7A7A jpe 0x7d
> > 00000003 0A db 0x0a
> >
> > b
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-assembly" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-assembly" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Sofiane AKERMOUN
akersof@gmail.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Disassembly of 00000
2013-02-02 4:53 Disassembly of 00000 horseriver
2013-02-02 18:37 ` Brian Raiter
@ 2013-02-03 16:42 ` Robert Plantz
1 sibling, 0 replies; 8+ messages in thread
From: Robert Plantz @ 2013-02-03 16:42 UTC (permalink / raw)
To: horseriver, linux-assembly@vger.kernel.org
Section 9.3 of my textbook (www.lulu.com/spotlight/bobplantz) provides a *very brief* introduction to x86-64 machine language (0s and 1s) coding. If it would help you, I would be happy to send you a pdf copy of this section. A free preview copy of my book is available at bob.cs.sonoma.edu.
Please note that the discussion in my book only describes the general nature of how instructions are coded in bit patterns. But once you understand this, I think it is obvious that trying to "disassemble" a set of random bit patterns would be meaningless. One could make some educated guesses, but there is no way to know for sure.
Back in the 70s, I heard of programmers who would write code such that a particular group of bits would be treated as (a) and instruction, (b) constant data, or (c) an address, depending on where it was accessed in the program flow. Back in those days, we used assembly language to write self-modifying code. I once did it for a driver on a CT scanner. Memory was expensive in those days.
Bob
________________________________________
From: linux-assembly-owner@vger.kernel.org [linux-assembly-owner@vger.kernel.org] on behalf of horseriver [horserivers@gmail.com]
Sent: Friday, February 01, 2013 8:53 PM
To: linux-assembly@vger.kernel.org
Subject: Disassembly of 00000
hi:)
I have a question about disassemblly utility .
If I fill an elf's text section with some random data,then
how does the disas command work for these data?
Is there occasion that several sequence of bytes can not be translated
into legal instructions?
thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-assembly" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disassembly of 00000
[not found] ` <CAN0_x-Kw9TxbD4EHmD1Ns5BJtBXToW_qsXSZWRzEtKxObXMhSQ@mail.gmail.com>
@ 2013-02-04 6:24 ` horseriver
2013-02-04 19:25 ` Sofiane Akermoun
0 siblings, 1 reply; 8+ messages in thread
From: horseriver @ 2013-02-04 6:24 UTC (permalink / raw)
To: Sofiane Akermoun; +Cc: linux-assembly
>A disassembler just uses a table. The intel documentation describes all
>the instruction, with the method on how do decode it (not really the method
>you have to write it of course).
>A disassembler should know where to start to disassemble, else he will
>start by disassembling bad data.. and probably fail on an unknown or
>unlogical instruction, or the worst case everything will be ok.
Thanks!
What is the condition that decide one byte is or not a legal instruction's start ?
Are there some occasions that disassembler need scan more than one byte to detect
its logic ?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disassembly of 00000
2013-02-04 6:24 ` horseriver
@ 2013-02-04 19:25 ` Sofiane Akermoun
[not found] ` <CADtGFvnaEgacU-FvtZZnOxkX4tbFxj6tYhqb-1ok6jwrMm=39Q@mail.gmail.com>
0 siblings, 1 reply; 8+ messages in thread
From: Sofiane Akermoun @ 2013-02-04 19:25 UTC (permalink / raw)
To: horseriver; +Cc: linux-assembly
almost every bytes could be a legal instruction. A disassembler need
to know where to start to disassemble the code.
When you pass a binary to a disassembler, it knows that it has to
start at the begining of the code. (a disassembler find the code
section by checking some informtion in the header of the binary).
The condition is to start to disassemble at the begining of the code
else you can not find the offset and length for each instruction.
An examle:
imagine you have this:
66 83 84 98 00 10 000003 1234565431
the disassembler start by checking the first byte and continue until
he find an instruction.
66 : It is a prefix, it means that the whole instrusction use 16 bits
addressing instead of the default 32 bits
83: This instruction use 2 operands, and the second operand is a byte.
This opcode is followed by a mod r/m byte
84; the mod r/m byte equal in binary format 10 000 100.
10: the first operand is memory with a dword displacement
000: the instruction is a ADD
100: it means this mod r/m byte is followed by a SIB byte
98: the SIB byte equal 10 011 000 in binary
it means scale = 4 index = ebx, base = eax
Then now we know:
1 byte prefix = 66
+ 1 byte opcode instruction = 83
+ 1 byte modr/m = 84
+ 1 byte SIB = 98
+ 4 bytes a displacement = 00 10 00 00 (1000 for human)
+ 1 byte operand = 03
The instruction is 9 bytes long
In other words: add word ptr ds:[ebx*4 + eax + 1000], 3
Here 98 is a SIB byte not the instruction CWD, and 03 an operand of 1
byte not the ADD instruction (yes 03 = ADD also).
As last words, you need to get the context to disassemble, and to get
the context you have to start at the begining of the code section to
disassemble, or at least at an instruction that you have checked and
confirmed.
regards,
Sofiane Akermoun
akersof@gmail.com
2013/2/4 horseriver <horserivers@gmail.com>:
>
>>A disassembler just uses a table. The intel documentation describes all
>>the instruction, with the method on how do decode it (not really the method
>>you have to write it of course).
>>A disassembler should know where to start to disassemble, else he will
>>start by disassembling bad data.. and probably fail on an unknown or
>>unlogical instruction, or the worst case everything will be ok.
>
> Thanks!
>
> What is the condition that decide one byte is or not a legal instruction's start ?
>
> Are there some occasions that disassembler need scan more than one byte to detect
>
> its logic ?
>
>
>
>
>
>
--
Sofiane AKERMOUN
akersof@gmail.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* re: Disassembly of 00000
[not found] ` <CADtGFvnaEgacU-FvtZZnOxkX4tbFxj6tYhqb-1ok6jwrMm=39Q@mail.gmail.com>
@ 2013-02-04 22:54 ` Hendrik Visage
0 siblings, 0 replies; 8+ messages in thread
From: Hendrik Visage @ 2013-02-04 22:54 UTC (permalink / raw)
To: linux-assembly
On Mon, Feb 4, 2013 at 9:25 PM, Sofiane Akermoun <akersof@gmail.com> wrote:
>
> almost every bytes could be a legal instruction. A disassembler need
> to know where to start to disassemble the code.
> When you pass a binary to a disassembler, it knows that it has to
> start at the begining of the code. (a disassembler find the code
> section by checking some informtion in the header of the binary).
> The condition is to start to disassemble at the begining of the code
> else you can not find the offset and length for each instruction.
And this would also be true if the CPU itself was given those code to
execute with the PC (program Counter) pointed at that address :)
This is one of the "issues" I have with the ia32/x86_64 with their
variable length instructions and CISC model. Compare this with the
SPARC RISC where the 32bit (up to v8) instructions are all aligned on
32bit word boundaries (ie. the PC increments in 4s, else you have an
alignment error/exception raised), and each instruction is 32bit
words, no exception :)
>
> <snip-variable-length-explanation>
>
> > What is the condition that decide one byte is or not a legal instruction's start ?
Whether that sequence of bytes are a supported instruction on the
given CPU it is meant to execute on, as a SSE instruction might not be
valid on a old AMD with only 3Dnow available.
So in all cases, you'll need to also know the CPU as well as the
specific mode, ie. real, extended, long etc. that it was targeted for
as a instruction in long mode for example won't be valid in real mode.
>
> > Are there some occasions that disassembler need scan more than one byte to detect
> >
> > its logic ?
This have been answered, in the previous example. The other answer
here is RTFM the CPU manuals you are disassembling for ;)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-02-04 22:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-02 4:53 Disassembly of 00000 horseriver
2013-02-02 18:37 ` Brian Raiter
2013-02-02 18:54 ` horseriver
2013-02-03 5:44 ` Sofiane Akermoun
[not found] ` <CAN0_x-Kw9TxbD4EHmD1Ns5BJtBXToW_qsXSZWRzEtKxObXMhSQ@mail.gmail.com>
2013-02-04 6:24 ` horseriver
2013-02-04 19:25 ` Sofiane Akermoun
[not found] ` <CADtGFvnaEgacU-FvtZZnOxkX4tbFxj6tYhqb-1ok6jwrMm=39Q@mail.gmail.com>
2013-02-04 22:54 ` Hendrik Visage
2013-02-03 16:42 ` Robert Plantz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).