* Disassembly of 00000 @ 2013-02-02 4:53 horseriver 2013-02-02 18:37 ` Brian Raiter 2013-02-03 16:42 ` Robert Plantz 0 siblings, 2 replies; 8+ messages in thread From: horseriver @ 2013-02-02 4:53 UTC (permalink / raw) To: linux-assembly hi:) I have a question about disassemblly utility . If I fill an elf's text section with some random data,then how does the disas command work for these data? Is there occasion that several sequence of bytes can not be translated into legal instructions? thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disassembly of 00000 2013-02-02 4:53 Disassembly of 00000 horseriver @ 2013-02-02 18:37 ` Brian Raiter 2013-02-02 18:54 ` horseriver 2013-02-03 16:42 ` Robert Plantz 1 sibling, 1 reply; 8+ messages in thread From: Brian Raiter @ 2013-02-02 18:37 UTC (permalink / raw) To: linux-assembly > I have a question about disassemblly utility . > If I fill an elf's text section with some random data,then > how does the disas command work for these data? > Is there occasion that several sequence of bytes can not be translated > into legal instructions? Yes, definitely. In those cases a typical disassembler will just mark the first byte as being literally emitted and try to resume disassembly at the next byte. For example, using ndisasm v2.07: $ echo -e '\017zz' | ndisasm - 00000000 0F db 0x0f 00000001 7A7A jpe 0x7d 00000003 0A db 0x0a b ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disassembly of 00000 2013-02-02 18:37 ` Brian Raiter @ 2013-02-02 18:54 ` horseriver 2013-02-03 5:44 ` Sofiane Akermoun [not found] ` <CAN0_x-Kw9TxbD4EHmD1Ns5BJtBXToW_qsXSZWRzEtKxObXMhSQ@mail.gmail.com> 0 siblings, 2 replies; 8+ messages in thread From: horseriver @ 2013-02-02 18:54 UTC (permalink / raw) To: linux-assembly On Sat, Feb 02, 2013 at 10:37:32AM -0800, Brian Raiter wrote: > > I have a question about disassemblly utility . > > If I fill an elf's text section with some random data,then > > how does the disas command work for these data? > > Is there occasion that several sequence of bytes can not be translated > > into legal instructions? > > Yes, definitely. In those cases a typical disassembler will just mark > the first byte as being literally emitted and try to resume > disassembly at the next byte. For example, using ndisasm v2.07: Thanks! What is "literally emitted" meaning here? I guess you mean a const value definition. How does disassembler check the number of bytes which consist an instruction ? Can only noe byte tell the instruction's length? > > $ echo -e '\017zz' | ndisasm - > 00000000 0F db 0x0f > 00000001 7A7A jpe 0x7d > 00000003 0A db 0x0a > > b > -- > To unsubscribe from this list: send the line "unsubscribe linux-assembly" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disassembly of 00000 2013-02-02 18:54 ` horseriver @ 2013-02-03 5:44 ` Sofiane Akermoun [not found] ` <CAN0_x-Kw9TxbD4EHmD1Ns5BJtBXToW_qsXSZWRzEtKxObXMhSQ@mail.gmail.com> 1 sibling, 0 replies; 8+ messages in thread From: Sofiane Akermoun @ 2013-02-03 5:44 UTC (permalink / raw) To: horseriver; +Cc: linux-assembly Hello, Answers of you previous questions: How does disassembler check the number of bytes which consist an instruction ? A disassembler just uses a table. The intel documentation describes all the instruction, with the method on how do decode it (not really the method you have to write it of course). A disassembler should know where to start to disassemble, else he will start by disassembling bad data.. and probably fail on an unknown or unlogical instruction, or the worst case everything will be ok. When i wrote a "disassembler just uses a table".. it is not really true, it is not true at all.. disassembler use also heuristic engine, some data, or function, could be store in data section for exemple or of course data could be stored in .text section. Good disassembler know how to interpret them, this is why a disassembler know if he is reading a string or an executable code Can only one byte tell the instruction's length? For few instruction yes for most of them. I sort ,on intel, all opcodes in about 16 famillies. I mean i need 16 different method to get just the size of the current instruction (i did it for my personnal position independant code length disassembler written in assembly of course :)). Some instructions are just 1 byte and go like: push ES, pushf, RETN., some are 1 byte + 1 byte or 1 byte + 1 dword/word Others use the template Prefix byte(0 to 4), opcode bytes (1 to 3), modReg r/m byte, optionnal scaled indexed bytes, and a displacement or and or an immediate value eache one are 0 to 4 bytes. In theory you have to know if an opcode use a modReg byte, and if yes parse it to know if the following parameters are used. Sofiane Akermoun akersof@gmail.com 2013/2/2 horseriver <horserivers@gmail.com> > > On Sat, Feb 02, 2013 at 10:37:32AM -0800, Brian Raiter wrote: > > > I have a question about disassemblly utility . > > > If I fill an elf's text section with some random data,then > > > how does the disas command work for these data? > > > Is there occasion that several sequence of bytes can not be translated > > > into legal instructions? > > > > Yes, definitely. In those cases a typical disassembler will just mark > > the first byte as being literally emitted and try to resume > > disassembly at the next byte. For example, using ndisasm v2.07: > > Thanks! > What is "literally emitted" meaning here? > I guess you mean a const value definition. > How does disassembler check the number of bytes which consist an instruction ? > Can only noe byte tell the instruction's length? > > > > > $ echo -e '\017zz' | ndisasm - > > 00000000 0F db 0x0f > > 00000001 7A7A jpe 0x7d > > 00000003 0A db 0x0a > > > > b > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-assembly" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-assembly" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Sofiane AKERMOUN akersof@gmail.com ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CAN0_x-Kw9TxbD4EHmD1Ns5BJtBXToW_qsXSZWRzEtKxObXMhSQ@mail.gmail.com>]
* Re: Disassembly of 00000 [not found] ` <CAN0_x-Kw9TxbD4EHmD1Ns5BJtBXToW_qsXSZWRzEtKxObXMhSQ@mail.gmail.com> @ 2013-02-04 6:24 ` horseriver 2013-02-04 19:25 ` Sofiane Akermoun 0 siblings, 1 reply; 8+ messages in thread From: horseriver @ 2013-02-04 6:24 UTC (permalink / raw) To: Sofiane Akermoun; +Cc: linux-assembly >A disassembler just uses a table. The intel documentation describes all >the instruction, with the method on how do decode it (not really the method >you have to write it of course). >A disassembler should know where to start to disassemble, else he will >start by disassembling bad data.. and probably fail on an unknown or >unlogical instruction, or the worst case everything will be ok. Thanks! What is the condition that decide one byte is or not a legal instruction's start ? Are there some occasions that disassembler need scan more than one byte to detect its logic ? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disassembly of 00000 2013-02-04 6:24 ` horseriver @ 2013-02-04 19:25 ` Sofiane Akermoun [not found] ` <CADtGFvnaEgacU-FvtZZnOxkX4tbFxj6tYhqb-1ok6jwrMm=39Q@mail.gmail.com> 0 siblings, 1 reply; 8+ messages in thread From: Sofiane Akermoun @ 2013-02-04 19:25 UTC (permalink / raw) To: horseriver; +Cc: linux-assembly almost every bytes could be a legal instruction. A disassembler need to know where to start to disassemble the code. When you pass a binary to a disassembler, it knows that it has to start at the begining of the code. (a disassembler find the code section by checking some informtion in the header of the binary). The condition is to start to disassemble at the begining of the code else you can not find the offset and length for each instruction. An examle: imagine you have this: 66 83 84 98 00 10 000003 1234565431 the disassembler start by checking the first byte and continue until he find an instruction. 66 : It is a prefix, it means that the whole instrusction use 16 bits addressing instead of the default 32 bits 83: This instruction use 2 operands, and the second operand is a byte. This opcode is followed by a mod r/m byte 84; the mod r/m byte equal in binary format 10 000 100. 10: the first operand is memory with a dword displacement 000: the instruction is a ADD 100: it means this mod r/m byte is followed by a SIB byte 98: the SIB byte equal 10 011 000 in binary it means scale = 4 index = ebx, base = eax Then now we know: 1 byte prefix = 66 + 1 byte opcode instruction = 83 + 1 byte modr/m = 84 + 1 byte SIB = 98 + 4 bytes a displacement = 00 10 00 00 (1000 for human) + 1 byte operand = 03 The instruction is 9 bytes long In other words: add word ptr ds:[ebx*4 + eax + 1000], 3 Here 98 is a SIB byte not the instruction CWD, and 03 an operand of 1 byte not the ADD instruction (yes 03 = ADD also). As last words, you need to get the context to disassemble, and to get the context you have to start at the begining of the code section to disassemble, or at least at an instruction that you have checked and confirmed. regards, Sofiane Akermoun akersof@gmail.com 2013/2/4 horseriver <horserivers@gmail.com>: > >>A disassembler just uses a table. The intel documentation describes all >>the instruction, with the method on how do decode it (not really the method >>you have to write it of course). >>A disassembler should know where to start to disassemble, else he will >>start by disassembling bad data.. and probably fail on an unknown or >>unlogical instruction, or the worst case everything will be ok. > > Thanks! > > What is the condition that decide one byte is or not a legal instruction's start ? > > Are there some occasions that disassembler need scan more than one byte to detect > > its logic ? > > > > > > -- Sofiane AKERMOUN akersof@gmail.com ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CADtGFvnaEgacU-FvtZZnOxkX4tbFxj6tYhqb-1ok6jwrMm=39Q@mail.gmail.com>]
* re: Disassembly of 00000 [not found] ` <CADtGFvnaEgacU-FvtZZnOxkX4tbFxj6tYhqb-1ok6jwrMm=39Q@mail.gmail.com> @ 2013-02-04 22:54 ` Hendrik Visage 0 siblings, 0 replies; 8+ messages in thread From: Hendrik Visage @ 2013-02-04 22:54 UTC (permalink / raw) To: linux-assembly On Mon, Feb 4, 2013 at 9:25 PM, Sofiane Akermoun <akersof@gmail.com> wrote: > > almost every bytes could be a legal instruction. A disassembler need > to know where to start to disassemble the code. > When you pass a binary to a disassembler, it knows that it has to > start at the begining of the code. (a disassembler find the code > section by checking some informtion in the header of the binary). > The condition is to start to disassemble at the begining of the code > else you can not find the offset and length for each instruction. And this would also be true if the CPU itself was given those code to execute with the PC (program Counter) pointed at that address :) This is one of the "issues" I have with the ia32/x86_64 with their variable length instructions and CISC model. Compare this with the SPARC RISC where the 32bit (up to v8) instructions are all aligned on 32bit word boundaries (ie. the PC increments in 4s, else you have an alignment error/exception raised), and each instruction is 32bit words, no exception :) > > <snip-variable-length-explanation> > > > What is the condition that decide one byte is or not a legal instruction's start ? Whether that sequence of bytes are a supported instruction on the given CPU it is meant to execute on, as a SSE instruction might not be valid on a old AMD with only 3Dnow available. So in all cases, you'll need to also know the CPU as well as the specific mode, ie. real, extended, long etc. that it was targeted for as a instruction in long mode for example won't be valid in real mode. > > > Are there some occasions that disassembler need scan more than one byte to detect > > > > its logic ? This have been answered, in the previous example. The other answer here is RTFM the CPU manuals you are disassembling for ;) ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Disassembly of 00000 2013-02-02 4:53 Disassembly of 00000 horseriver 2013-02-02 18:37 ` Brian Raiter @ 2013-02-03 16:42 ` Robert Plantz 1 sibling, 0 replies; 8+ messages in thread From: Robert Plantz @ 2013-02-03 16:42 UTC (permalink / raw) To: horseriver, linux-assembly@vger.kernel.org Section 9.3 of my textbook (www.lulu.com/spotlight/bobplantz) provides a *very brief* introduction to x86-64 machine language (0s and 1s) coding. If it would help you, I would be happy to send you a pdf copy of this section. A free preview copy of my book is available at bob.cs.sonoma.edu. Please note that the discussion in my book only describes the general nature of how instructions are coded in bit patterns. But once you understand this, I think it is obvious that trying to "disassemble" a set of random bit patterns would be meaningless. One could make some educated guesses, but there is no way to know for sure. Back in the 70s, I heard of programmers who would write code such that a particular group of bits would be treated as (a) and instruction, (b) constant data, or (c) an address, depending on where it was accessed in the program flow. Back in those days, we used assembly language to write self-modifying code. I once did it for a driver on a CT scanner. Memory was expensive in those days. Bob ________________________________________ From: linux-assembly-owner@vger.kernel.org [linux-assembly-owner@vger.kernel.org] on behalf of horseriver [horserivers@gmail.com] Sent: Friday, February 01, 2013 8:53 PM To: linux-assembly@vger.kernel.org Subject: Disassembly of 00000 hi:) I have a question about disassemblly utility . If I fill an elf's text section with some random data,then how does the disas command work for these data? Is there occasion that several sequence of bytes can not be translated into legal instructions? thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-assembly" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-02-04 22:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-02 4:53 Disassembly of 00000 horseriver
2013-02-02 18:37 ` Brian Raiter
2013-02-02 18:54 ` horseriver
2013-02-03 5:44 ` Sofiane Akermoun
[not found] ` <CAN0_x-Kw9TxbD4EHmD1Ns5BJtBXToW_qsXSZWRzEtKxObXMhSQ@mail.gmail.com>
2013-02-04 6:24 ` horseriver
2013-02-04 19:25 ` Sofiane Akermoun
[not found] ` <CADtGFvnaEgacU-FvtZZnOxkX4tbFxj6tYhqb-1ok6jwrMm=39Q@mail.gmail.com>
2013-02-04 22:54 ` Hendrik Visage
2013-02-03 16:42 ` Robert Plantz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).