Re: [Qemu-devel] [PATCH] Document Qemu coding style

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: David Turner <digit@google.com>
To: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] Document Qemu coding style
Date: Tue, 31 Mar 2009 23:48:21 +0200	[thread overview]
Message-ID: <60cad3f0903311448y7e826b6blecb015140fa09901@mail.gmail.com> (raw)
In-Reply-To: <f43fc5580903310918g1275420eie4496fd6bc8089e@mail.gmail.com>

[-- Attachment #1.1: Type: text/plain, Size: 2914 bytes --]

On Tue, Mar 31, 2009 at 6:18 PM, Blue Swirl <blauwirbel@gmail.com> wrote:

>
> True enough for the comments part. There are still some areas that I
> don't understand well, for example TB handling and its inherent
> limitations.
>
> Which part of the source you find subtle and magical but not commented
> enough?
>

ok, here are a few things that ring a bell, there are probably a lot others:

the software mmu implementation in system mode is *really* hard to
understand at
first. It took me a long time to grasp the various aspects of it, including
these:

   - loads/stores in kernel or userspace map to different translated code
   fragments
   - loads/stores in different emulated CPUs also map to different
   translated code.
   - the way i/o memory access is controlled in fine details and relates to
   the rest of the MMU.

most of what happens in exe.c is black-magic :-)

how the audio-subsystem works and its relationship with hardware emulation
is subtle.
For my work on the Android emulator, I have modified three audio-backends
and wrote
one from scratch. It took me several tries to get things to an acceptable
state. What I
didn't understand first is that there is no common time-based used by all
components
involved.

dyngen used to be pretty radical too. Thanks god it is gone for any target
and host platform
combination I care about :-)

slirp is an hideous pile of goo which mixes network-ordered fields and
host-ordered ones,
including pointers, into the same structures, liberally and at different
times, depending on
context. Which is probably why it took so long to fix 64-bit issues in it. I
would like to add
support for IPv6 to this code, but sadly, I have the feeling that rewriting
most of it from
scratch would be slightly easier.

the CharDriverState interface takes some time to fully understand. In the
Android emulator,
I sometimes need to connect two CharDriverState users together, and had to
write a special
purpose object just to do that, but was surprised how hard writing a
bug-less one was.

I also think that the event loop implementation is confusing compared to
more common interfaces
provides by things like libevent. It is also extremely tied to select(),
which prevents using better
mechanisms on various platforms, and even performs rather poorly on Windows,
but I digress.

qemu_get_clock(vm_clock) will return time in nano-seconds, but
qemu_get_clock(rt_clock)
will return time in milli-seconds. This is totally undocumented, and the
code that uses the result
tend to use magic numbers like 1000000 or 1000 to perform conversions which
are never clear
on first sight. Maybe this is fixed in upstream QEMU but, my, how this
hurted me in the past.

For the record, here are attached a few documents I wrote to detail what
I've "discovered"
until now. Hope some one can find them useful (Sorry if some of them are
focused on ARM
system emulation only).

Regards

[-- Attachment #1.2: Type: text/html, Size: 3409 bytes --]

[-- Attachment #2: AUDIO.TXT --]
[-- Type: text/plain, Size: 8296 bytes --]

HOW AUDIO EMULATION WORKS IN QEMU:
==================================

Things are a bit tricky, but here's a rough description:

  QEMUSoundCard: models a given emulated sound card
  SWVoiceOut:    models an audio output from a QEMUSoundCard
  SWVoiceIn:     models an audio input from a QEMUSoundCard

  HWVoiceOut:    models an audio output (backend) on the host.
  HWVoiceIn:     models an audio input (backend) on the host.

Each voice can have its own settings in terms of sample size, endianess, rate, etc...

Emulation for a given soundcard typically does:

  1/ Create a QEMUSoundCard object and register it with AUD_register_card()
  2/ For each emulated output, call AUD_open_out() to create a SWVoiceOut object.
  3/ For each emulated input, call AUD_open_in() to create a SWVoiceIn object.

  Note that you must pass a callback function to AUD_open_out() and AUD_open_in();
  more on this later.

  Each SWVoiceOut is associated to a single HWVoiceOut, each SWVoiceIn is
  associated to a single HWVoiceIn.

  However you can have several SWVoiceOut associated to the same HWVoiceOut
  (same thing for SWVoiceIn/HWVoiceIn).

SOUND PLAYBACK DETAILS:
=======================

Each HWVoiceOut has the following too:

  - A fixed-size circular buffer of stereo samples (for stereo).
    whose format is either floats or int64_t per sample (depending on build
    configuration).

  - A 'samples' field giving the (constant) number of sample pairs in the stereo buffer.

  - A target conversion function, called 'clip()' that is used to read from the stereo
    buffer and write into a platform-specific sound buffers (e.g. WinWave-managed buffers
    on Windows).

  - A 'rpos' offset into the circular buffer which tells where to read the next samples
    from the stereo buffer for the next conversion through 'clip'.

            |<----------------- samples ----------------------->|

            |                                                   |

            |       rpos                                        |
                    |
            |_______v___________________________________________|
            |       |                                           |
            |       |                                           |
            |_______|___________________________________________|

  - A 'run_out' method that is called each time to tell the output backend to
    send samples from the stereo buffer to the host sound card/server. This method
    shall also modify 'rpos' and returns the number of samples 'played'. A more detailed
    description of this process appears below.

  - A 'write' method callback used to write a buffer of emulated sound samples from
    a SWVoiceOut into the stereo buffer. *All* backends simply call the generic
    function audio_pcm_sw_write() to implement this. It's difficult to see why
    it's needed at all ?

    (Similarly, all backends have a 'read' methods which simply calls 'audio_pcm_sw_read')

Each SWVoiceOut has the following:

  - a 'conv()' function used to read sound samples from the emulated sound card and
    copy/mix them to the corresponding HWVoiceOut's stereo buffer.

  - a 'total_hw_samples_mixed' which correspond to the number of samples that have
    already been mixed into the target HWVoiceOut stereo buffer (starting from the
    HWVoiceOut's 'rpos' offset). NOTE: this is a count of samples in the HWVoiceOut
    stereo buffer, not emulated hardware sound samples, which can have different
    properties (frequency, size, endianess).
                                         ______________
                                        |              |
                                        |  SWVoiceOut2 |
                                        |______________|
                  ______________           |
                 |              |          |
                 |  SWVoiceOut1 |          |     thsm<N> := total_hw_samples_mixed
                 |______________|          |                for SWVoiceOut<N>
                           |               |
                           |               |
                    |<-----|------------thsm2-->|
                    |      |                    |
                    |<---thsm1-------->|        |
             _______|__________________v________|_______________ 
            |       |111111111111111111|        v               |
            |       |222222222222222222222222222|               |
            |_______|___________________________________________|
                    ^
                    |         HWVoiceOut stereo buffer
                    rpos

  - a 'ratio' value, which is the ratio of the target HWVoiceOut's frequency by
    the SWVoiceOut's frequency, multiplied by (1 << 32), as a 64-bit integer.

    So, if the HWVoiceOut has a frequency of 44kHz, and the SWVoiceOut has a frequency
    of 11kHz, then ratio will be (44/11*(1 << 32)) = 0x4_0000_0000

  - a callback provided by the emulated hardware when the SWVoiceOut is created.
    This function is used to mix the SWVoiceOut's samples into the target
    HWVoiceOut stereo buffer (it must also perform frequency interpolation,
    volume adjustment, etc..).

    This callback normally calls another helper functions in the audio subsystem
    (AUD_write()) to to the mixing/volume-adjustment from emulated hardware sample
    buffers.

Here's a small graphics that explains it better:

   SWVoiceOut:  emulated hardware sound buffers:

          |
          |   (mixed through AUD_write() from user-provided callback
          |    which is called on each audio timer tick).
          v

   HWVoiceOut: stereo sample circular buffer

          |
          |   (through HWVoiceOut's 'clip' function, invoked from the
          |    'run_out' method)
          v

   backend-specific sound buffers

THERE IS NO COMMON TIMEBASE BETWEEN ALL LAYERS. DON'T EXPECT ANY HIGH-ACCURACY /
LOW-LATENCY IN THIS IMPLEMENTATION.

The function audio_timer() in audio/audio.c is called periodically and it is used as
a pulse to perform sound buffer transfers and mixing. More specifically for audio
output voices:

- For each HWVoiceOut, find the number of active SWVoiceOut, and the minimum number
  of 'total_hw_samples_mixed' that have already been written to the buffer. We will
  call this value the number of 'live' samples in the stereo buffer.

- if 'live' is 0, call the callback of each active SWVoiceOut to fill the stereo
  buffer, if needed, then exit.

- otherwise, call the 'run_out' method of the HWVoiceOut object. This will change
  the value of 'rpos' and return the number of samples played. Then the
  'total_hw_samples_mixed' field of all active SWVoiceOuts is decremented by
  'played', and the callback is called to re-fill the stereo buffer.

It's important to note that the SWVoiceOut callback:

- takes a 'free' parameter which is the number of stereo sound samples that can
  be sent to the hardware stereo buffer (before rate adjustment, i.e. not the number
  of sound samples in the SWVoiceOut emulated hardware sound buffer).

- must call AUD_write(sw, buff, count), where 'buff' points to emulated sound
  samples, and their 'count', which must be <= the 'free' parameter.

- the implementation of AUD_write() will call the 'write' method of the target
  HWVoiceOut, which in turns calls the function audio_pcm_sw_write() which does
  standard rate/volume adjustment before mixing the conversion into the target
  stereo buffer. It also increases the 'total_hw_samples_mixed' value of the
  SWVoiceOut.

- audio_pcm_sw_write() returns the number of sound sample *bytes* that have
  been mixed into the stereo buffer, and so does AUD_write().

So, in the end, we have the pseudo-code:

    every sound timer ticks:
      for hw in list_HWVoiceOut:
         live = MIN([sw.total_hw_samples_mixed for sw in hw.list_SWVoiceOut ])
         if live > 0:
            played = hw.run_out(live)
            for sw in hw.list_SWVoiceOut:
                sw.total_hw_samples_mixed -= played

        for sw in hw.list_SWVoiceOut:
            free = hw.samples - sw.total_hw_samples_mixed
            if free > 0:
                sw.callback(sw, free)

SOUND RECORDING DETAILS:
========================

Things are similar but in reverse order.

[-- Attachment #3: CHAR-DEVICES.TXT --]
[-- Type: text/plain, Size: 8264 bytes --]

QEMU CHARACTER "DEVICES" MANAGEMENT

I. CharDriverState objects:
---------------------------

One of the strangest abstraction in QEMU is the "CharDriverState"
(abbreviated here as "CS").

The CS is essentially an object used to model a character stream that
can be connected to things like a host serial port, a host network socket,
an emulated device, etc...

What's really unusual is its interface though, which comes from the fact
that QEMU implements a big event loop with no blocking i/o allowed. You
can see "qemu-char.h" for the full interface, but here we will only describe
a few important functions:

  - qemu_chr_write() is used to try to write data into a CS object. Note that
    success is not guaranteed: the function returns the number of bytes that
    were really written (which can be 0) and the caller must deal with it.
    This is very similar to writing to a non-blocking BSD socket on Unix.

       int  qemu_chr_read( CharDriverState*  cs,
                           const uint8_t*    data,
                           int               datalen );

    This function may return -1 in case of error, but this depends entirely
    on the underlying implementation (some of them will just return 0 instead).
    In practice, this means it's not possible to reliably differentiate between
    a "connection reset by peer" and an "operation in progress" :-(

    There is no way to know in advance how many bytes a given CharDriverState
    can accept, nor to be notified when its underlying implementation is ready
    to accept data again.

  - qemu_chr_add_handler() is used to add "read" and "event" handlers
    to a CS object. We will ignore "events" here and focus on the
    "read" part.

    Thing is, you cannot directly read from a CS object. Instead, you provide
    two functions that will be called whenever the object has something for
    you:

        - a 'can_read' function that shall return the number of bytes
          that you are ready to accept from the CharDriverState. It's
          interface is:

             typedef int  IOCanRWHandler (void*  opaque);

        - a 'read' function that will send you bytes from the CharDriverState

             typedef void IOReadHandler  (void*           opaque,
                                          const uint8_t*  data,
                                          int             datalen);

          normally, the value of 'datalen' cannot be larger than the result
          of a previous 'can_read' call.

    For both callbacks, 'opaque' is a value that you pass to the function
    qemu_chr_add_handler() which signature is:

         void qemu_chr_add_handlers(CharDriverState *s,
                                    IOCanRWHandler  *fd_can_read,
                                    IOReadHandler   *fd_read,
                                    IOEventHandler  *fd_event,
                                    void            *opaque);

  - qemu_chr_open() is used to create a new CharDriverState object from a
    descriptive string, it's interface is:

         CharDriverState*  qemu_chr_open(const char*  filename);

    there are various formats for acceptable 'filenames', and they correspond
    to the parameters of the '-serial' QEMU option described here:

       http://www.nongnu.org/qemu/qemu-doc.html#SEC10

    For example:

       "/dev/<file>" (Linux and OS X only):
            connect to a host character device file (e.g. /dev/ttyS0)

       "file:<filename>":
            Write output to a given file (write only)

       "stdio":
            Standard input/output

       "udp:[<remote_host>]:<remote_port>[@[<src_ip>]:<src_port>]":
            Connect to a UDP socket for both read/write.

       "tcp:[<host>]:<port>[,server][,nowait][,nodelay]"
            Connect to a TCP socket either as a client or a server.

            The 'nowait' option is used to avoid waiting for a client
            connection.

            The 'nodelay' is used to disable the TCP Nagle algorithm to
            improve throughput.

    for Android, a few special names have been added to the internal
    implementation and redirect to program functions:

       "android-kmsg":
            A CharDriverState that is used to receive kernel log messages
            from the emulated /dev/ttyS0 serial port.

       "android-qemud":
            A CharDriverState that is used to exchange messages between the
            emulator program and the "qemud" multiplexing daemon that runs in
            the emulated system.

            The "qemud" daemon is used to allow one or more clients in the
            system to connect to various services running in the emulator
            program. This is mainly used to bypass the kernel in order to
            implement certain features with ease.

       "android-gsm":
            A CharDriverState that is used to connect the emulated system to
            a host modem device with the -radio <device> option. Otherwise,
            the system uses qemud to connect to the emulator's internal modem
            emulation.

        "android-gps":
            A CharDriverState that is used to connect the emulated system to a
            host GPS device with the -gps <device> option. Otherwise the
            system uses qemud to connect to the emulator's internal GPS
            emulation.

II. CharDriverState users:
--------------------------

As described above, a CharDriverState "user" is a piece of code that can write
to a CharDriverState (by calling qemu_chr_write() explicitely) and can also
read from it after registering can_read/read handlers for it through
qemu_chr_add_handlers().

Typical examples are the following:

  - The hardware serial port emulation (e.g. hw/goldfish_tty.c) will read data
    from the kernel then send it to a CS. It also uses a small buffer that is
    used to read data from the CS and send it back to the kernel.

  - The Android emulated modem also uses a CS to talk with its client,
    which will in most cases an emulated serial port.

III. CharBuffer objects:
------------------------

The Android emulator provides an object called a CharBuffer which acts as
a CharDriverState object that implements a *write* buffer to send data to a
given CS object, called the endpoint. You can create one with:

    #include "charpipe.h"
    CharDriverState*  qemu_chr_open_buffer( CharDriverState*  endpoint );

This function returns a new CS object that will buffer in the heap any data
that is sent to it, but cannot be sent to the endpoint yet. On each event loop
iteration, the CharBuffer will try to send data to the endpoint untill it
doesn't have any data left.

This can be useful to simplify certain CS users who don't want to maintain
their own emit buffer. Note that writing to a CharBuffer always succeeds.

Note also that calling qemu_chr_add_handler() on the CharBuffer will do the
same on the endpoint. Any endpoint-initiated calls to can_read()/read()
callbacks are passed directly to your handler functions.

IV. CharPipe objects:
---------------------

The Android emulator also provides a convenient abstraction called a "charpipe"
used to connect two CharDriverState users together. For example, this is used
to connect a serial port emulation (in hw/goldfish_tty.c) to the internal
GSM modem emulation (see telephony/modem_driver.c).

Essentially, a "charpipe" is a bi-directionnal communication pipe whose two
endpoints are both CS objects. You call "qemu_chr_open_pipe()" to create the
pipe, and this function will return the two endpoints to you:

    #include "charpipe.h"
    int  qemu_chr_open_pipe(CharDriverState* *pfirst,
                            CharDriverState* *psecond);

When you write to one end of the pipe (with qemu_chr_write()), the charpipe
will try to write as much data as possible to the other end. Any remaining data
is stored in a heap-allocated buffer.

The charpipe will try to re-send the buffered data on the next event loop
iteration by calling the can_read/read functions of the corresponding user,
if there is one.

Note that there is no limit on the amount of data buffered in a charpipe,
and writing to it is never blocking. This simplifies CharDriverState
users who don't need to worry about buffering issues.

[-- Attachment #4: CPU-EMULATION.TXT --]
[-- Type: text/plain, Size: 3949 bytes --]

HOW THE QEMU EXECUTION ENGINE WORKS:
====================================

Translating ARM to x86 machine code:
------------------------------------

QEMU starts by isolating code "fragments" from the emulated machine code.
Each "fragment" corresponds to a series of ARM instructions ending with a
branch (e.g. jumps, conditional branches, returns).

Each fragment is translated into a "translated block" (a.k.a. TB) of host
machine code (e.g. x86). All TBs are put in a cache and each time the
instruction pointer changes (i.e. at the end of TB execution), a hash
table lookup is performed to find the next TB to execute.

If none exists, a new one is generated. As a special exception, it is
sometimes possible to 'link' the end of a given TB to the start of
another one by tacking an explicit jump instruction.

Note that due to differences in translations of memory-related operations
(described below in "MMU emulation"), there are actually two TB caches per
emulated CPU: one for translated kernel code, and one for translated
user-space code.

When a cache fills up, it is simply totally emptied and translation starts
again.

CPU state is kept in a single global structure which the generated code
can access directly (with direct memory addressing).

The file target-arm/translate.c is in charge of translating the ARM or
Thumb instructions starting at the current instruction pointer position
into a TB. This is done by decomposing each instruction into a series of
micro-operations supported by the TCG code generator.

TCG stands for "Tiny Code Generator" and is specific to QEMU. It supports
several host machine code backends. See source files under tcg/ for details.

MMU Emulation:
--------------

The ARM Memory Management Unit is emulated in software, since it is so
different from the one on the host. Essentially, a single ARM memory load/store
instruction is translated into a series of host machine instructions that will
translate virtual addresses into physical ones by performing the following:

- first lookup in a global 256-entries cache for the current page and see if
  a corresponding value is already stored there. If this is the case, use it
  directly.

- otherwise, call a special helper function that will implement the full
  translation according to the emulated system's state, and modify the
  cache accordingly.

The page cache is called the "TLB" in the QEMU sources.

Note that there are actually two TLBs: one is used for host machine
instructions that correspond to kernel code, and the other for instructions
translated from user-level code.

This means that a memory load in the kernel will not be translated into the
same instructions than the same load in user space.

Each TLB is also implemented as a global per-emulated-CPU hash-table.
The user-level TLB is flushed on each process context switch.

When initializing the MMU emulation, one can define several zones of the
address space, with different access rights / type. This is how memory-mapped
I/O is implemented: the virtual->physical conversion helper function detects
that you're trying to read/write from an I/O memory region, and will then call
a callback function associated to it.

Hardware Emulation:
-------------------

Most hardware emulation code initializes by registering its own region of
I/O memory, as well as providing read/write callbacks for it. Then actions
will be based on which offset of the I/O memory is read from/written to and
eventually with which value.

You can have a look at hw/goldfish_tty.c that implements an emulated serial
port for the Goldfish platform.

"Goldfish" is simply the name of the virtual Linux platform used to build
the Android-emulator-specific kernel image. The corresponding sources are
located in the origin/android-goldfish-2.6.27 branch of
git://android.git.kernel.org/kernel/common.git. You can have a look at
arch/arm/mach-goldfish/ for the corresponding kernel driver sources.

next prev parent reply	other threads:[~2009-03-31 21:48 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-29 21:23 [Qemu-devel] [PATCH] Document Qemu coding style Avi Kivity
2009-03-30  1:15 ` malc
2009-03-30 18:28 ` Blue Swirl
2009-03-30 19:02   ` M. Warner Losh
2009-03-30 19:55     ` Avi Kivity
2009-03-30 19:54   ` Avi Kivity
2009-03-30 21:43     ` Lennart Sorensen
2009-03-30 22:15       ` M. Warner Losh
2009-03-30 23:38         ` Lennart Sorensen
2009-03-31  0:09           ` M. Warner Losh
2009-03-31  5:59           ` Laurent Desnogues
2009-03-31 12:58             ` David Turner
2009-03-31 13:31               ` Avi Kivity
2009-03-31 21:18                 ` David Turner
2009-03-31 16:18               ` Blue Swirl
2009-03-31 21:48                 ` David Turner [this message]
2009-03-31 22:38                   ` malc
2009-03-31 23:28                     ` David Turner
2009-03-31 23:49                       ` malc
2009-04-01  0:25                         ` David Turner
2009-04-01  1:02                           ` malc
2009-04-01  9:04               ` Daniel P. Berrange
2009-03-30 19:58   ` Avi Kivity
2009-03-30 20:10     ` Glauber Costa
2009-03-30 20:35       ` Avi Kivity
2009-03-30 20:37         ` Glauber Costa
2009-03-30 20:20   ` Andreas Färber
2009-03-30 21:45   ` Lennart Sorensen
2009-03-30 22:16     ` M. Warner Losh
2009-03-31  5:42     ` Gleb Natapov
2009-03-31 13:47 ` Paul Brook
2009-04-01  8:51 ` Richard W.M. Jones
2009-04-01  9:04   ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=60cad3f0903311448y7e826b6blecb015140fa09901@mail.gmail.com \
    --to=digit@google.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).