* [Qemu-devel] How to make shadow memory for a process? and how to trace the data propation from the instruction level in QEMU?
@ 2010-11-14 6:24 F. Zhang
2010-11-14 8:02 ` Mulyadi Santosa
2010-11-15 4:43 ` F. Zhang
0 siblings, 2 replies; 8+ messages in thread
From: F. Zhang @ 2010-11-14 6:24 UTC (permalink / raw)
To: qemu-devel mailing list
[-- Attachment #1: Type: text/plain, Size: 1851 bytes --]
Hi,
I am a newbie of QEMU. I want to use the QEMU for the dynamic analysis of malware, usually called “taint analysis”. The main idea is to tag data from some specific sources, for example, network packets, files in a harddisk, and user inputs and so on, and then trace the propagation of the tagged data in the system. Once the tagged data is maliciously used, an alarm is raised.
To build an analysis environment, I need to solve the following problems:
(1) Make the shadow memory for each process under analysis. How can I make the shadow memory in QEMU? I think I can partition the memory of QEMU into two blocks, one for the process under analysis, the other for the process’s shadow memory. Is that right?
(2) Tracing propagation of tagged data is implemented in the instruction level. That is to say, for example, if the source operand of an instruction is tagged, then the destination operand of the instruction is also tagged. How can I implement the idea? Should I modify the instruction translation functions to add code for tagging and recompile QEMU?
(3) In the process of analyzing malware, two types of semantic information should be combined. One from the OS, including process information, stack information, heap information and so on; the other from the QEMU, including mostly the tag propagation information. The question is, how can I code to relate both of the information? That is to say, how to make QEMU receive information from OS, and how to make OS receive information from QEMU?
Sorry for writing so much, and thank you very much for your time! I am a newbie of QEMU, and you need ONLY BRIEFLY tell me what do read, where to search, or how to try, if the answer is too complex. Of course, detailed instructions are VERY MUCH WELCOMED!
Thank you very much in advance!
Best regards
F. Zhang
[-- Attachment #2: Type: text/html, Size: 4804 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] How to make shadow memory for a process? and how to trace the data propation from the instruction level in QEMU?
2010-11-14 6:24 [Qemu-devel] How to make shadow memory for a process? and how to trace the data propation from the instruction level in QEMU? F. Zhang
@ 2010-11-14 8:02 ` Mulyadi Santosa
2010-11-15 4:43 ` F. Zhang
1 sibling, 0 replies; 8+ messages in thread
From: Mulyadi Santosa @ 2010-11-14 8:02 UTC (permalink / raw)
To: F. Zhang; +Cc: qemu-devel mailing list
Hi Zhang...
Please consider it a casual user trying to share simple ideas with you....
2010/11/14 F. Zhang <qemustudy@163.com>:
> Hi,
>
> I am a newbie of QEMU. I want to use the QEMU for the dynamic analysis of
> malware, usually called “taint analysis”.
Just before it goes too far, have you check quite similar project
which is Argos (http://www.few.vu.nl/argos/)?
>The main idea is to tag data from
> some specific sources, for example, network packets, files in a harddisk,
> and user inputs and so on, and then trace the propagation of the tagged data
> in the system. Once the tagged data is maliciously used, an alarm is raised.
Quite alike valgrind in general sense, don't you think? Who knows you
can adopt its architecture (and possibly codes too)?
> To build an analysis environment, I need to solve the following problems:
>
> (1) Make the shadow memory for each process under analysis. How can I
> make the shadow memory in QEMU? I think I can partition the memory of QEMU
> into two blocks, one for the process under analysis, the other for the
> process’s shadow memory. Is that right?
>
Are you going to say you wanna mimic the way shadow page table works?
> (2) Tracing propagation of tagged data is implemented in the instruction
> level. That is to say, for example, if the source operand of an instruction
> is tagged, then the destination operand of the instruction is also tagged.
> How can I implement the idea? Should I modify the instruction translation
> functions to add code for tagging and recompile QEMU?
How about using unused one of unused PTE flags for such tag?
> (3) In the process of analyzing malware, two types of semantic
> information should be combined. One from the OS, including process
> information, stack information, heap information and so on; the other from
> the QEMU, including mostly the tag propagation information. The question is,
> how can I code to relate both of the information? That is to say, how to
> make QEMU receive information from OS, and how to make OS receive
> information from QEMU?
Now that's the real "bomb"... I was thinking about creating pseudo
device...oh wait, maybe using QMP (Qemu monitoring protocol)? Maybe
you can use the trace framework introduced in Qemu lately?
This is assuming, you wanna "make Qemu cooperate with host OS"...
--
regards,
Mulyadi Santosa
Freelance Linux trainer and consultant
blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re:Re: [Qemu-devel] How to make shadow memory for a process? and how to trace the data propation from the instruction level in QEMU?
2010-11-14 6:24 [Qemu-devel] How to make shadow memory for a process? and how to trace the data propation from the instruction level in QEMU? F. Zhang
2010-11-14 8:02 ` Mulyadi Santosa
@ 2010-11-15 4:43 ` F. Zhang
2010-11-15 8:38 ` Mulyadi Santosa
2010-11-16 11:39 ` Re:Re: " F. Zhang
1 sibling, 2 replies; 8+ messages in thread
From: F. Zhang @ 2010-11-15 4:43 UTC (permalink / raw)
To: Mulyadi Santosa; +Cc: qemu-devel mailing list
[-- Attachment #1: Type: text/plain, Size: 2537 bytes --]
>Please consider it a casual user trying to share simple ideas with you....
I am very pleased to share ideas with you. But my English is too poor, er…, I’ll try my best to make it clear.J
>
>Just before it goes too far, have you check quite similar project
>which is Argos (http://www.few.vu.nl/argos/)?
Yes, I have read that paper, it’s wonderful!
Besides the Argos, the bitblaze group, led by Dawn Song in Berkeley, has achieved great success in the taint analysis. The website about their dynamic analysis work (called TEMU) can be found at:http://bitblaze.cs.berkeley.edu/temu.html
And TEMU is now open-source.
>
>Are you going to say you wanna mimic the way shadow page table works?
Yes. For each process’s memory space A, I wanna make a shadow memory B. The shadow memory is used to store the tag of data. In other words, ifaddr in memory A is tainted, then the corresponding byte in B should be marked to indicate thataddr in A is tainted.
The question is: I do not know how to make the shadow memory for a process in QEMU.
>
>
>How about using unused one of unused PTE flags for such tag?
Sorry, what is the PTE flag?
In fact, the tag is stored in the shadow memory of the process.
Let us consider the following instruction:
mov eax, [esi]
If data in [esi] is tainted, then eax is tained, too.
In this instruction, we should first consider whether [esi] is tainted or not. This is done by checking the tag in the shadow memory. If [esi] is tainted, then the tag for eax in the shadow memory is set, too.
The question is: how to implement the upper functions? maybe I should modify the instruction-translation functions to implement the trace of tainted data propagation?
>
>>Now that's the real "bomb"... I was thinking about creating pseudo
>device...oh wait, maybe using QMP (Qemu monitoring protocol)? Maybe
>you can use the trace framework introduced in Qemu lately?
>
>This is assuming, you wanna "make Qemu cooperate with host OS"...
Yes, I wanna make QEMU cooperate with the GUEST OS. In fact, malware under analysis is run within the GUEST OS. The guest os collects “higher” semantic from the OS level, and the QEMU collects “lower” semantic from the instruction level. Combination of both semantics is necessary in the analysis process.
The question is: how to communicate between the QEMU and the guest OS, so that they can cooperate with each other?
Maybe I should read code of TEMU. Er…, That’s a huge work for me.
Best regards
F. Zhang
[-- Attachment #2: Type: text/html, Size: 22553 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Re: [Qemu-devel] How to make shadow memory for a process? and how to trace the data propation from the instruction level in QEMU?
2010-11-15 4:43 ` F. Zhang
@ 2010-11-15 8:38 ` Mulyadi Santosa
2010-11-15 12:01 ` Lluís
2010-11-16 12:10 ` F. Zhang
2010-11-16 11:39 ` Re:Re: " F. Zhang
1 sibling, 2 replies; 8+ messages in thread
From: Mulyadi Santosa @ 2010-11-15 8:38 UTC (permalink / raw)
To: F. Zhang; +Cc: qemu-devel mailing list
Hi....
OK it's getting interesting.... perhaps it would lead into
instrumentation topic, which is quite hot topic in qemu-devel quite
recently, so you jump into the wagon just about the right time :)
2010/11/15 F. Zhang <qemustudy@163.com>:
> I am very pleased to share ideas with you. But my English is too poor, er…,
> I’ll try my best to make it clear. J
Either do I. How much do you expect Indonesian like me to write
fluently English, after all? :D heheh, just joking :)
OK, one thing for sure here is, I think you can implement your idea on
top of several (not so complete) existing frameworks in Qemu.
Tracing...is one of them...not sure about the rest...
> Yes, I have read that paper, it’s wonderful!
>
> Besides the Argos, the bitblaze group, led by Dawn Song in Berkeley, has
> achieved great success in the taint analysis. The website about their
> dynamic analysis work (called TEMU) can be found at:
> http://bitblaze.cs.berkeley.edu/temu.html
>
> And TEMU is now open-source.
Thanks for sharing that...it's new stuff for me. So, why don't you
just pick TEMU and improve it instead of...uhm...sorry if I am wrong,
working from scratch? After all, I believe in both Argos and TEMU (and
maybe other similar projects), they share common codes here and there.
But ehm...CMIIW, seems like TEMU is based on Qemu 0.9,x, right? So
it's.... sorry I forgot the name, the generated code is mostly a
constructed by fragments of small codes generated by gcc. Now, it is
qemu which does it by itself. So, a lot of things change
(substantially).
> Yes. For each process’s memory space A, I wanna make a shadow memory B. The
> shadow memory is used to store the tag of data. In other words, if addr in
> memory A is tainted, then the corresponding byte in B should be marked to
> indicate that addr in A is tainted.
I agree that should be the way it works....but..... (see below)
>>How about using unused one of unused PTE flags for such tag?
>
> Sorry, what is the PTE flag?
Page Table Entry...i believe not all flags are really used by the OS
nowadays, so I guess you can utilize 1 or 2 bits there whenever
possible...
>
> In fact, the tag is stored in the shadow memory of the process.
>
> Let us consider the following instruction:
>
> mov eax, [esi]
>
> If data in [esi] is tainted, then eax is tained, too.
May we know, what kind of information do you plan to store in such tag?
> In this instruction, we should first consider whether [esi] is tainted or
> not. This is done by checking the tag in the shadow memory. If [esi] is
> tainted, then the tag for eax in the shadow memory is set, too.
>
> The question is: how to implement the upper functions? maybe I should modify
> the instruction-translation functions to implement the trace of tainted data
> propagation?
I think you should hook all the memory operation related opcode (or to
be precise, Qemu opcode). That way, you won't miss any..
> Yes, I wanna make QEMU cooperate with the GUEST OS. In fact, malware under
> analysis is run within the GUEST OS.
Hm, I thought it would be host OS + qemu....don't you think, if it is
guest OS +qemu, while there is a chance guest OS is compromised first,
then we get such unreliable data? Or am I missing something here?
>The guest os collects “higher” semantic
> from the OS level, and the QEMU collects “lower” semantic from the
> instruction level. Combination of both semantics is necessary in the
> analysis process.
The question is, in a situation where malware already compromise "the
higher semantic", could we trust the analysis?
> The question is: how to communicate between the QEMU and the guest OS, so
> that they can cooperate with each other?
OK, so let's assume it's really guest OS +qemu...i think, uhm, better
create pseudo device, quite similar with virtio....or you can think
it's like /dev/sda, /dev/rtc etc... the guest OS must somewhat be
installed with a driver which knows how to read and talk to this
device.
Via the driver, fed any analysis result....qemu collects it...and
finally pass it to host OS.
Other possibilty is to reserve certain memory region (kinda BIOS
reserved memory space), mmap it inside the guest OS, then treat it
like System V shared memory. Put the data in it, Qemu regularly checks
it...
What do you think?
PS: eduardo cruz might be an interesting person to talk to..he did
instrumention work lately too....
--
regards,
Mulyadi Santosa
Freelance Linux trainer and consultant
blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] How to make shadow memory for a process? and how to trace the data propation from the instruction level in QEMU?
2010-11-15 8:38 ` Mulyadi Santosa
@ 2010-11-15 12:01 ` Lluís
2010-11-16 12:10 ` F. Zhang
1 sibling, 0 replies; 8+ messages in thread
From: Lluís @ 2010-11-15 12:01 UTC (permalink / raw)
To: qemu-devel
Mulyadi Santosa writes:
>> Yes, I have read that paper, it’s wonderful!
>>
>> Besides the Argos, the bitblaze group, led by Dawn Song in Berkeley, has
>> achieved great success in the taint analysis. The website about their
>> dynamic analysis work (called TEMU) can be found at:
>> http://bitblaze.cs.berkeley.edu/temu.html
>>
>> And TEMU is now open-source.
> Thanks for sharing that...it's new stuff for me. So, why don't you
> just pick TEMU and improve it instead of...uhm...sorry if I am wrong,
> working from scratch? After all, I believe in both Argos and TEMU (and
> maybe other similar projects), they share common codes here and there.
> But ehm...CMIIW, seems like TEMU is based on Qemu 0.9,x, right? So
> it's.... sorry I forgot the name, the generated code is mostly a
> constructed by fragments of small codes generated by gcc. Now, it is
> qemu which does it by itself. So, a lot of things change
> (substantially).
I haven't read the TEMU work, but from the problem description I think
you want something similar to "Practical Taint-Based Protection using
Demand Emulation" or many others (I remember reading some of them a few
years ago on the ISCA, MICRO and/or ASPLOS conferences).
>> Yes. For each process’s memory space A, I wanna make a shadow memory B. The
>> shadow memory is used to store the tag of data. In other words, if addr in
>> memory A is tainted, then the corresponding byte in B should be marked to
>> indicate that addr in A is tainted.
The main question here is... what is the granularity that you want to
track with? Bytes? Words? Pages? This will greatly influence which is
your best approach.
Now that I think of it, you could use the tracing points I sent for
guest virtual memory accesses, and instrument them instead of calling a
file-tracing backend (this should provide a hook for an arbitrary
granularity). Then, simply keep track also of address-space changes and
your instrumentation code can always know when to activate propagation.
This, together with the optimization I sent for dynamic control of trace
generation in TCG emulation code should get you on tracks.
Of course, you should still modify all register-accessing instructions
to propagate information passing through the register set. For that,
maybe you could start with the "fetch" tracing/instrumentation point I
sent long time ago, which keeps track of general-purpose register
usage/definition on x86 (although I'm sure I left some astray usages due
to the decoding complexity in x86).
>> The guest os collects “higher” semantic
>> from the OS level, and the QEMU collects “lower” semantic from the
>> instruction level. Combination of both semantics is necessary in the
>> analysis process.
> The question is, in a situation where malware already compromise "the
> higher semantic", could we trust the analysis?
Beware, I've read exactly this kind of scheme on previous top-tier
conferences (but I think tests were using an architectural simulator, so
it's not for a current production environment).
I've found it :)
Secure program execution via dynamic information flow tracking
ASPLOS 2004
>> The question is: how to communicate between the QEMU and the guest OS, so
>> that they can cooperate with each other?
A few choices here, but you should first define if the communication
must be based just on control signals, and/or providing memory storage:
* virtual device : If you need some kind of storage that the guest OS
must access, you could look at the ivshmem device
* backdoor instruction : It's the simplest option; I sent some patch
series recently with two different implementations for x86.
Lluis
--
"And it's much the same thing with knowledge, for whenever you learn
something new, the whole world becomes that much richer."
-- The Princess of Pure Reason, as told by Norton Juster in The Phantom
Tollbooth
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re:Re: Re: [Qemu-devel] How to make shadow memory for a process? and how to trace the data propation from the instruction level in QEMU?
2010-11-15 4:43 ` F. Zhang
2010-11-15 8:38 ` Mulyadi Santosa
@ 2010-11-16 11:39 ` F. Zhang
1 sibling, 0 replies; 8+ messages in thread
From: F. Zhang @ 2010-11-16 11:39 UTC (permalink / raw)
To: Mulyadi Santosa; +Cc: qemu-devel mailing list
[-- Attachment #1: Type: text/plain, Size: 5767 bytes --]
Hi!
>Hi....
>
>OK it's getting interesting.... perhaps it would lead into
>instrumentation topic, which is quite hot topic in qemu-devel quite
>recently, so you jump into the wagon just about the right time :)
>
>
>OK, one thing for sure here is, I think you can implement your idea on
>top of several (not so complete) existing frameworks in Qemu.
>Tracing...is one of them...not sure about the rest...
>
You are right! I have considered implementing my idea based on some existing framework such as TEMU.
But on the other hand, I think I should understand how QEMU works first (TEMU is based on QEMU), in order that once I want to add some other features, I know, at leats, where to modify/add code to implement new functions. That’s also the reason why I started this topic here.
>Thanks for sharing that...it's new stuff for me. So, why don't you
>just pick TEMU and improve it instead of...uhm...sorry if I am wrong,
>working from scratch? After all, I believe in both Argos and TEMU (and
>maybe other similar projects), they share common codes here and there.
>
>But ehm...CMIIW, seems like TEMU is based on Qemu 0.9,x, right? So
>it's.... sorry I forgot the name, the generated code is mostly a
>constructed by fragments of small codes generated by gcc. Now, it is
>qemu which does it by itself. So, a lot of things change
>(substantially).
>
L If things change a lot, it is more important for me to figure out how things work in QEMU. Especially something discussed in this topic.
>
>I agree that should be the way it works....but..... (see below)
>
>
>>>How about using unused one of unused PTE flags for such tag?
>>
>> Sorry, what is the PTE flag?
>
>Page Table Entry...i believe not all flags are really used by the OS
>nowadays, so I guess you can utilize 1 or 2 bits there whenever
>possible...
This is the problem of granularity. In fact, one-tag-per-page is too coarse. One tag per byte, or one tag per word, or one tag per double words may be often-seen. As we know, commonly, only a few bytes in a page are malicious, and alarm is raised only when these bytes are used, not when the page is used. Thus the fine-grained tag is necessary, which requires, according to my knowledge, the shadow memory.
>May we know, what kind of information do you plan to store in such tag?
Since the shadow memory exists, you can store any information in the tag as you want. For example, you can use one bit as the tag to indicate whether a byte is tainted or not; or you can use a C-language-like structure as the tag to contain more information, including stack and heap and so on. This depends on your analysis requirement.
>
>I think you should hook all the memory operation related opcode (or to
>be precise, Qemu opcode). That way, you won't miss any..
J That’s the critical problem. Could you tell me how to hook the opcode? I thought before to modify the instruction-translation functions and recompile QEMU, is that right? Is there any better way?
>
>> Yes, I wanna make QEMU cooperate with the GUEST OS. In fact, malware under
>> analysis is run within the GUEST OS.
>
>Hm, I thought it would be host OS + qemu....don't you think, if it is
>guest OS +qemu, while there is a chance guest OS is compromised first,
>then we get such unreliable data? Or am I missing something here?
>
This is a good question!
The scenario that we use the information provided by guest OS is limited within the following scope: “the being analyzed software’s major purpose is NOT to attack OS kernel, but to implement the malicious behavior in the user space”. Since the software under analysis is in the user space and does not touch the kernel, we can safely use the information provided by OS kernel.
However, if the main purpose of the software is to attack the OS (for example, rootks), the information got from OS kernel is not reliable. Fortunately, in this case, the target of taint analysis is NOT the user space software, but the kernel. That is to say, we need not any information provided by kernel.
For the kernel related dynamic analysis, you can reference the paper “HookScout: Proactive Binary-Centric Hook Detection”.
>>The guest os collects “higher” semantic
>> from the OS level, and the QEMU collects “lower” semantic from the
>> instruction level. Combination of both semantics is necessary in the
>> analysis process.
>
>The question is, in a situation where malware already compromise "the
>higher semantic", could we trust the analysis?
>
>> The question is: how to communicate between the QEMU and the guest OS, so
>> that they can cooperate with each other?
>
>OK, so let's assume it's really guest OS +qemu...i think, uhm, better
>create pseudo device, quite similar with virtio....or you can think
>it's like /dev/sda, /dev/rtc etc... the guest OS must somewhat be
>installed with a driver which knows how to read and talk to this
>device.
>
>Via the driver, fed any analysis result....qemu collects it...and
>finally pass it to host OS.
>
>Other possibilty is to reserve certain memory region (kinda BIOS
>reserved memory space), mmap it inside the guest OS, then treat it
>like System V shared memory. Put the data in it, Qemu regularly checks
>it...
>
>What do you think?
>
Good idea!
I am not very similar to System V share memory. But I guess it may be hard to implement the real-time communication between guest OS and QEMU. A pseudo device may be a better choice.
>PS: eduardo cruz might be an interesting person to talk to..he did
>instrumention work lately too....
Thanks! but what about his mail? Is he in the mailing list?
Thank you very much! I think I get so many from the discussion!
>
>--
Best regards!
F. Zhang
[-- Attachment #2: Type: text/html, Size: 25332 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re:Re: [Qemu-devel] How to make shadow memory for a process? and how to trace the data propation from the instruction level in QEMU?
2010-11-15 8:38 ` Mulyadi Santosa
2010-11-15 12:01 ` Lluís
@ 2010-11-16 12:10 ` F. Zhang
2010-11-16 13:49 ` Lluís
1 sibling, 1 reply; 8+ messages in thread
From: F. Zhang @ 2010-11-16 12:10 UTC (permalink / raw)
To: Lluís; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 4852 bytes --]
>Mulyadi Santosa writes:
>
>>> Yes, I have read that paper, it’s wonderful!
>>>
>>> Besides the Argos, the bitblaze group, led by Dawn Song in Berkeley, has
>>> achieved great success in the taint analysis. The website about their
>>> dynamic analysis work (called TEMU) can be found at:
>>> http://bitblaze.cs.berkeley.edu/temu.html
>>>
>>> And TEMU is now open-source.
>
>> Thanks for sharing that...it's new stuff for me. So, why don't you
>> just pick TEMU and improve it instead of...uhm...sorry if I am wrong,
>> working from scratch? After all, I believe in both Argos and TEMU (and
>> maybe other similar projects), they share common codes here and there.
>
>> But ehm...CMIIW, seems like TEMU is based on Qemu 0.9,x, right? So
>> it's.... sorry I forgot the name, the generated code is mostly a
>> constructed by fragments of small codes generated by gcc. Now, it is
>> qemu which does it by itself. So, a lot of things change
>> (substantially).
>
>I haven't read the TEMU work, but from the problem description I think
>you want something similar to "Practical Taint-Based Protection using
>Demand Emulation" or many others (I remember reading some of them a few
>years ago on the ISCA, MICRO and/or ASPLOS conferences).
Yes! That is just what I want. A practical taint-analysis environment plus a demand emulation.
This topic includes things that I recognized as critical. Have you any suggestions?
>>> Yes. For each process’s memory space A, I wanna make a shadow memory B. The
>>> shadow memory is used to store the tag of data. In other words, if addr in
>>> memory A is tainted, then the corresponding byte in B should be marked to
>>> indicate that addr in A is tainted.
>
>The main question here is... what is the granularity that you want to
>track with? Bytes? Words? Pages? This will greatly influence which is
>your best approach.
I think one byte per tag is necessary for malware analysis in most cases, because only a few bytes are used to launch an attack. For example, a few tainted bytes sent to EIP register will cause CPU to do bad things.
>
>Now that I think of it, you could use the tracing points I sent for
>guest virtual memory accesses, and instrument them instead of calling a
>file-tracing backend (this should provide a hook for an arbitrary
>granularity). Then, simply keep track also of address-space changes and
>your instrumentation code can always know when to activate propagation.
>
Sorry, what is “a file-tracing backend”? Could you be a little more detailed? I think I need byte-level granularity. Thanks!
>This, together with the optimization I sent for dynamic control of trace
>generation in TCG emulation code should get you on tracks.
>
>Of course, you should still modify all register-accessing instructions
>to propagate information passing through the register set. For that,
>maybe you could start with the "fetch" tracing/instrumentation point I
>sent long time ago, which keeps track of general-purpose register
>usage/definition on x86 (although I'm sure I left some astray usages due
>to the decoding complexity in x86).
Thanks! I will read that code first, though I am currently just a newbie.L
>
>
>>> The guest os collects “higher” semantic
>>> from the OS level, and the QEMU collects “lower” semantic from the
>>> instruction level. Combination of both semantics is necessary in the
>>> analysis process.
>
>> The question is, in a situation where malware already compromise "the
>> higher semantic", could we trust the analysis?
>
>Beware, I've read exactly this kind of scheme on previous top-tier
>conferences (but I think tests were using an architectural simulator, so
>it's not for a current production environment).
>
>I've found it :)
>
> Secure program execution via dynamic information flow tracking
> ASPLOS 2004
>
That is a significant paper, which is cited for more than 300 times!
>>> The question is: how to communicate between the QEMU and the guest OS, so
>>> that they can cooperate with each other?
>
>A few choices here, but you should first define if the communication
>must be based just on control signals, and/or providing memory storage:
> * virtual device : If you need some kind of storage that the guest OS
> must access, you could look at the ivshmem device
> * backdoor instruction : It's the simplest option; I sent some patch
> series recently with two different implementations for x86.
>
>
Both of control signals and (shadow) memory storage are required. So, the virtual device may be the right choice.
In this year’s top security conferences (Oakland, CCS, Usenix Security, NDSS and so on), many works are based on virtual technology. So I think QEMU is a good choice for future academic research.
Thank you very much for your time and help!
Best regards!
F. Zhang
[-- Attachment #2: Type: text/html, Size: 17654 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] How to make shadow memory for a process? and how to trace the data propation from the instruction level in QEMU?
2010-11-16 12:10 ` F. Zhang
@ 2010-11-16 13:49 ` Lluís
0 siblings, 0 replies; 8+ messages in thread
From: Lluís @ 2010-11-16 13:49 UTC (permalink / raw)
To: qemu-devel
F Zhang writes:
> This topic includes things that I recognized as critical. Have you any
> suggestions?
Sorry, I don't understand about what you want suggestions.
>>>> Yes. For each process’s memory space A, I wanna make a shadow memory B. The
>>>> shadow memory is used to store the tag of data. In other words, if addr in
>>>> memory A is tainted, then the corresponding byte in B should be marked to
>>>> indicate that addr in A is tainted.
>>
>> The main question here is... what is the granularity that you want to
>> track with? Bytes? Words? Pages? This will greatly influence which is
>> your best approach.
> I think one byte per tag is necessary for malware analysis in most cases,
> because only a few bytes are used to launch an attack. For example, a few
> tainted bytes sent to EIP register will cause CPU to do bad things.
[...]
>> Now that I think of it, you could use the tracing points I sent for
>> guest virtual memory accesses, and instrument them instead of calling a
>> file-tracing backend (this should provide a hook for an arbitrary
>> granularity). Then, simply keep track also of address-space changes and
>> your instrumentation code can always know when to activate propagation.
>>
> Sorry, what is “a file-tracing backend”? Could you be a little more detailed? I
> think I need byte-level granularity. Thanks!
Well, the initial patch series I sent were based on macros, so that you
can place any code you want on these macros, not only tracing.
On its current form (sorry, I don't have spare time right now to finish
it), the points generate code for tracing, but there is a patch series
that lets the user re-define some of the trace points to call any
function provided by the user (look for the "trace-instrument" series).
>> This, together with the optimization I sent for dynamic control of trace
>> generation in TCG emulation code should get you on tracks.
>>
>> Of course, you should still modify all register-accessing instructions
>> to propagate information passing through the register set. For that,
>> maybe you could start with the "fetch" tracing/instrumentation point I
>> sent long time ago, which keeps track of general-purpose register
>> usage/definition on x86 (although I'm sure I left some astray usages due
>> to the decoding complexity in x86).
> Thanks! I will read that code first, though I am currently just a newbie. L
>>>> The guest os collects “higher” semantic
>>>> from the OS level, and the QEMU collects “lower” semantic from the
>>>> instruction level. Combination of both semantics is necessary in the
>>>> analysis process.
>>
>>> The question is, in a situation where malware already compromise "the
>>> higher semantic", could we trust the analysis?
>>
>> Beware, I've read exactly this kind of scheme on previous top-tier
>> conferences (but I think tests were using an architectural simulator, so
>> it's not for a current production environment).
>>
>> I've found it :)
>>
>> Secure program execution via dynamic information flow tracking
>> ASPLOS 2004
>>
> That is a significant paper, which is cited for more than 300 times!
That's why I said you should be careful. Porting this kind of analysis
into QEMU is not significant by itself, although I suppose it should
gain some extra relevance if you implement it in such a way that it can
be used on a production system.
You could start with guest OS taint propagation, and through the "guest
OS to QEMU" channel, activate taint propagation checks when a process
gains access to tainted information coming from the outer world (e.g.,
socket read) [*]. Then, you can conditionally generate taint checks like
I did in the "trace-gen" series, so that programs without access to
tainted information will have no checks at all.
Even more, the optimal solution would be to run in KVM-mode when no
instruction-based taint checking is needed, and use QEMU emulation
otherwise. The down side is that I was told this is not currently
possible with multiple CPUs, and only theoretically possible with one
CPU.
[*] This is just a rough summary of what I remember from the ASPLOS
paper
>>>> The question is: how to communicate between the QEMU and the guest OS, so
>>>> that they can cooperate with each other?
>>
>> A few choices here, but you should first define if the communication
>> must be based just on control signals, and/or providing memory storage:
>> * virtual device : If you need some kind of storage that the guest OS
>> must access, you could look at the ivshmem device
>> * backdoor instruction : It's the simplest option; I sent some patch
>> series recently with two different implementations for x86.
>>
>>
> Both of control signals and (shadow) memory storage are required. So, the
> virtual device may be the right choice.
Shadow memory is not a problem here, as it can be handled by the
intrumented trace points and the guest OS has no need to access it. So
from my understanding, just using an instruction-based backdoor is
sufficient for the guest OS to tell QEMU when taint analysis propagation
must be performed, and on which memory addresses it must start
propagating.
> In this year’s top security conferences (Oakland, CCS, Usenix Security, NDSS and
> so on), many works are based on virtual technology. So I think QEMU is a good
> choice for future academic research.
I'm not much of a security expert, so I don't know what's the current
state-of-the-art, but if you go on board on the journey of implementing
this in QEMU, first make sure you can provide novel ideas and features
on top of this infrastructure.
Coding this kind of things is fun, but if it's just for the sake of
coding, this won't get you publications (I remember reading that you are
doing a PhD); believe me, I've gone through this before :)
Lluis
--
"And it's much the same thing with knowledge, for whenever you learn
something new, the whole world becomes that much richer."
-- The Princess of Pure Reason, as told by Norton Juster in The Phantom
Tollbooth
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-11-16 13:49 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-14 6:24 [Qemu-devel] How to make shadow memory for a process? and how to trace the data propation from the instruction level in QEMU? F. Zhang
2010-11-14 8:02 ` Mulyadi Santosa
2010-11-15 4:43 ` F. Zhang
2010-11-15 8:38 ` Mulyadi Santosa
2010-11-15 12:01 ` Lluís
2010-11-16 12:10 ` F. Zhang
2010-11-16 13:49 ` Lluís
2010-11-16 11:39 ` Re:Re: " F. Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).