qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Modularizing QEMU RFC
@ 2015-07-31 15:45 Marc Marí
  2015-08-03  3:09 ` Fam Zheng
  2015-08-03  9:23 ` Daniel P. Berrange
  0 siblings, 2 replies; 18+ messages in thread
From: Marc Marí @ 2015-07-31 15:45 UTC (permalink / raw)
  To: qemu-devel

Hi everyone

I propose improving the current modular driver system for QEMU so it
can benefit everybody in speed and flexibility. I'm looking for other
ideas, comments, critics, etc.

- Background -
In order to speed up QEMU, I'm looking at the high number of libraries
and dependencies that it loads. I've generated a QEMU image that needs
145 shared libraries to start, and there were still some options
disabled. This is a lot, and this means that it is slow.

So I've been looking at actual module system. Yes, QEMU does have a
module system, but disabled by default. The problem is, the modules get
loaded always during startup. This means, booting with modules enabled
is even slower, because loading at runtime is slower that letting the
linker do all the work at the start. At this point, I doubt of the
benefits of the actual modular system.

But, if disabling the actual block modules (iscsi, curl, rbd, gluster,
ssh and dmg) gives 40 ms of speedup, I think is worth an effort of
improving modules, and modularizing new parts

- Current module flow -
The current module system is based on shared libraries. Each of these
libraries has a constructor that registers the BlockDriver structure to
the global bdrv_drivers list. This list is later searched by the type,
the protocol, etc.

- Proposals -
I have in mind two ideas, which are not mutually exclusive:

-- A --
Having a huge lookup table with the names of the drivers that are in
modules, and the name of the modules where it can be found. When
some type is not found in the driver lists, this table can be traversed
to find the library and load it.

This requires an effort by the driver developer to fill the tables. But
the refactoring work can stay localized in the drivers and the modules,
it is (possibly) not necessary to touch any other part of the QEMU
system.

I don't specially like this huge manual-edited table.

-- B --
The same --enable-X that is done at compile time, but directly on the
command line when running QEMU or even in hot at the monitor.

This solution requires work on the monitor, the command line
processing, the modules and the drivers system. It is also less
transparent to the final user.

- My comments on my proposals -
Ideally, I'd like a mixed solution. The user can specify what wants to
load, but also, when something is not found, it is automatically
searched.

In both options, the current module system has to be partly rewritten,
and some of the current drivers with module capability might need to be
modified to adapt to the new specifications.

And, a part from improving the current modular interface, there are a
lot of other devices that might benefit from it, not just the block
devices.

I still haven't looked at the memory footprint of QEMU, but I'm sure
that the QEMU binary will lose a lot of weight with this addition.

- Closing -
This is just a brief draft. These proposals can be improved a lot, or
there might be some other solutions that I haven't thought of.

So, I'd like to ask for ideas, comments, critics, improvements, etc.
And also ask for contributors to this endeavour, because it will be a
huge amount of work.

Thanks for reading and have a nice weekend
Marc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-07-31 15:45 [Qemu-devel] Modularizing QEMU RFC Marc Marí
@ 2015-08-03  3:09 ` Fam Zheng
  2015-08-03  7:51   ` Peter Maydell
                     ` (3 more replies)
  2015-08-03  9:23 ` Daniel P. Berrange
  1 sibling, 4 replies; 18+ messages in thread
From: Fam Zheng @ 2015-08-03  3:09 UTC (permalink / raw)
  To: Marc Marí; +Cc: qemu-devel

On Fri, 07/31 17:45, Marc Marí wrote:
> Hi everyone
> 
> I propose improving the current modular driver system for QEMU so it
> can benefit everybody in speed and flexibility. I'm looking for other
> ideas, comments, critics, etc.
> 
> - Background -
> In order to speed up QEMU, I'm looking at the high number of libraries
> and dependencies that it loads. I've generated a QEMU image that needs
> 145 shared libraries to start, and there were still some options
> disabled. This is a lot, and this means that it is slow.
> 
> So I've been looking at actual module system. Yes, QEMU does have a
> module system, but disabled by default. The problem is, the modules get
> loaded always during startup. This means, booting with modules enabled
> is even slower, because loading at runtime is slower that letting the
> linker do all the work at the start. At this point, I doubt of the
> benefits of the actual modular system.
> 
> But, if disabling the actual block modules (iscsi, curl, rbd, gluster,
> ssh and dmg) gives 40 ms of speedup, I think is worth an effort of
> improving modules, and modularizing new parts
> 
> - Current module flow -
> The current module system is based on shared libraries. Each of these
> libraries has a constructor that registers the BlockDriver structure to
> the global bdrv_drivers list. This list is later searched by the type,
> the protocol, etc.
> 
> - Proposals -
> I have in mind two ideas, which are not mutually exclusive:
> 
> -- A --
> Having a huge lookup table with the names of the drivers that are in
> modules, and the name of the modules where it can be found. When
> some type is not found in the driver lists, this table can be traversed
> to find the library and load it.
> 
> This requires an effort by the driver developer to fill the tables. But
> the refactoring work can stay localized in the drivers and the modules,
> it is (possibly) not necessary to touch any other part of the QEMU
> system.
> 
> I don't specially like this huge manual-edited table.

Yeah, the way to fill the tables is an (implementation) question, but there are
still more questions if we go this way...

bdrv_find_protocol is easy, the protocol name is extracted from image uri or
blockdev options, we can load the module by the name.

bdrv_probe_all is harder. If we modularize a format driver, its .bdrv_probe
code will be in the module. If we want to do the format detection, we need to
load all format drivers. This means if the command line has an unspecified
format, we'll still need to load all drivers at starting phase. (I wish all
formats are probed according to magic bytes at offset 0, so we can simplify the
.bdrv_probe logic and do it with data matching in block.c like the protocol
case, but that's not true for VMDK :( )

I believe other sub systems have this kind of paradox as well.

> 
> -- B --
> The same --enable-X that is done at compile time, but directly on the
> command line when running QEMU or even in hot at the monitor.
> 
> This solution requires work on the monitor, the command line
> processing, the modules and the drivers system. It is also less
> transparent to the final user.

I'm afraid this breaks backward compatibility. :(

> 
> - My comments on my proposals -
> Ideally, I'd like a mixed solution. The user can specify what wants to
> load, but also, when something is not found, it is automatically
> searched.
> 
> In both options, the current module system has to be partly rewritten,
> and some of the current drivers with module capability might need to be
> modified to adapt to the new specifications.
> 
> And, a part from improving the current modular interface, there are a
> lot of other devices that might benefit from it, not just the block
> devices.
> 
> I still haven't looked at the memory footprint of QEMU, but I'm sure
> that the QEMU binary will lose a lot of weight with this addition.
> 
> - Closing -
> This is just a brief draft. These proposals can be improved a lot, or
> there might be some other solutions that I haven't thought of.
> 
> So, I'd like to ask for ideas, comments, critics, improvements, etc.
> And also ask for contributors to this endeavour, because it will be a
> huge amount of work.
> 
> Thanks for reading and have a nice weekend
> Marc
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  3:09 ` Fam Zheng
@ 2015-08-03  7:51   ` Peter Maydell
  2015-08-03  7:52   ` Marc Marí
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Peter Maydell @ 2015-08-03  7:51 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Marc Marí, qemu-devel

On 3 August 2015 at 04:09, Fam Zheng <famz@redhat.com> wrote:
> bdrv_probe_all is harder. If we modularize a format driver, its .bdrv_probe
> code will be in the module. If we want to do the format detection, we need to
> load all format drivers. This means if the command line has an unspecified
> format, we'll still need to load all drivers at starting phase. (I wish all
> formats are probed according to magic bytes at offset 0, so we can simplify the
> .bdrv_probe logic and do it with data matching in block.c like the protocol
> case, but that's not true for VMDK :( )

If you wanted, you could have a might_be_format_foo() function which
gives its best guess at whether a file is of that format, erring on
the side of saying 'yes'. (So for VMDK you just always say 'yes', for
ones where you can conveniently match you do the match test, and so on.)
Then you can use that to decide whether to load the module for the format
or not. So you'd end up still loading modules for formats that aren't
trivially probed for, but you could avoid loading most of them.

-- PMM

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  3:09 ` Fam Zheng
  2015-08-03  7:51   ` Peter Maydell
@ 2015-08-03  7:52   ` Marc Marí
  2015-08-03  8:22     ` Fam Zheng
  2015-08-03  9:20   ` Daniel P. Berrange
  2015-08-03  9:52   ` Paolo Bonzini
  3 siblings, 1 reply; 18+ messages in thread
From: Marc Marí @ 2015-08-03  7:52 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-devel

On Mon, 3 Aug 2015 11:09:06 +0800
Fam Zheng <famz@redhat.com> wrote:

> On Fri, 07/31 17:45, Marc Marí wrote:
> > Hi everyone
> > 
> > I propose improving the current modular driver system for QEMU so it
> > can benefit everybody in speed and flexibility. I'm looking for
> > other ideas, comments, critics, etc.
> > 
> > - Background -
> > In order to speed up QEMU, I'm looking at the high number of
> > libraries and dependencies that it loads. I've generated a QEMU
> > image that needs 145 shared libraries to start, and there were
> > still some options disabled. This is a lot, and this means that it
> > is slow.
> > 
> > So I've been looking at actual module system. Yes, QEMU does have a
> > module system, but disabled by default. The problem is, the modules
> > get loaded always during startup. This means, booting with modules
> > enabled is even slower, because loading at runtime is slower that
> > letting the linker do all the work at the start. At this point, I
> > doubt of the benefits of the actual modular system.
> > 
> > But, if disabling the actual block modules (iscsi, curl, rbd,
> > gluster, ssh and dmg) gives 40 ms of speedup, I think is worth an
> > effort of improving modules, and modularizing new parts
> > 
> > - Current module flow -
> > The current module system is based on shared libraries. Each of
> > these libraries has a constructor that registers the BlockDriver
> > structure to the global bdrv_drivers list. This list is later
> > searched by the type, the protocol, etc.
> > 
> > - Proposals -
> > I have in mind two ideas, which are not mutually exclusive:
> > 
> > -- A --
> > Having a huge lookup table with the names of the drivers that are in
> > modules, and the name of the modules where it can be found. When
> > some type is not found in the driver lists, this table can be
> > traversed to find the library and load it.
> > 
> > This requires an effort by the driver developer to fill the tables.
> > But the refactoring work can stay localized in the drivers and the
> > modules, it is (possibly) not necessary to touch any other part of
> > the QEMU system.
> > 
> > I don't specially like this huge manual-edited table.
> 
> Yeah, the way to fill the tables is an (implementation) question, but
> there are still more questions if we go this way...
> 
> bdrv_find_protocol is easy, the protocol name is extracted from image
> uri or blockdev options, we can load the module by the name.
> 
> bdrv_probe_all is harder. If we modularize a format driver,
> its .bdrv_probe code will be in the module. If we want to do the
> format detection, we need to load all format drivers. This means if
> the command line has an unspecified format, we'll still need to load
> all drivers at starting phase. (I wish all formats are probed
> according to magic bytes at offset 0, so we can simplify
> the .bdrv_probe logic and do it with data matching in block.c like
> the protocol case, but that's not true for VMDK :( )
> 
> I believe other sub systems have this kind of paradox as well.

I managed to overlook that...

If the user doesn't specify the type of the block device, then, all
block drivers will have to be tested. I see this is based on scores. If
there is a score that means "this is my type for sure" (which I don't
know), the probe could be stopped there.

But the user can also specify the type for its block device (if I
remember correctly). So, if he wants more speed, he could just specify
the type of the block device.
 
> > 
> > -- B --
> > The same --enable-X that is done at compile time, but directly on
> > the command line when running QEMU or even in hot at the monitor.
> > 
> > This solution requires work on the monitor, the command line
> > processing, the modules and the drivers system. It is also less
> > transparent to the final user.
> 
> I'm afraid this breaks backward compatibility. :(

:(

Maybe my ideas are a bit too ideal without rewriting half QEMU. I
should have left my ideas to cool down and rest before writting them
down. So any other ideas to reduce the library overhead are appreciated.

Thanks
Marc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  7:52   ` Marc Marí
@ 2015-08-03  8:22     ` Fam Zheng
  2015-08-03  9:01       ` Marc Marí
  0 siblings, 1 reply; 18+ messages in thread
From: Fam Zheng @ 2015-08-03  8:22 UTC (permalink / raw)
  To: Marc Marí; +Cc: qemu-devel

On Mon, 08/03 09:52, Marc Marí wrote:
> So any other ideas to reduce the library overhead are appreciated.

It would be interesting to see your profiling on the library loading overhead.
For example, how much does it help to reduce the library size, and how much
does it help to reduce the # of libraries?

The protocol drivers are modularized for the sake of library dependencies, so
they should stay that way. However, we can "sensibly" combine all non-native
format drivers (VMDK, VHDX, ...) into a cold-formats.so (if it turns out that
loading one big .so is much faster than loading separate ones).  But we should
leave qcow2 as a separate one for obvious reasons, or make a hot-formats.so
with one or two other formats if that makes more sense.

With that, for the first step, we can lazy load the cold-formats.so whenever we
need to probe or a non-qcow2 format is involved. Then on top of that we can
implement what Peter has suggested.

Fam

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  8:22     ` Fam Zheng
@ 2015-08-03  9:01       ` Marc Marí
  2015-08-03  9:24         ` Alex Bennée
  2015-08-03  9:24         ` Fam Zheng
  0 siblings, 2 replies; 18+ messages in thread
From: Marc Marí @ 2015-08-03  9:01 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-devel

On Mon, 3 Aug 2015 16:22:34 +0800
Fam Zheng <famz@redhat.com> wrote:

> On Mon, 08/03 09:52, Marc Marí wrote:
> > So any other ideas to reduce the library overhead are appreciated.
> 
> It would be interesting to see your profiling on the library loading
> overhead. For example, how much does it help to reduce the library
> size, and how much does it help to reduce the # of libraries?
> 
> The protocol drivers are modularized for the sake of library
> dependencies, so they should stay that way. However, we can
> "sensibly" combine all non-native format drivers (VMDK, VHDX, ...)
> into a cold-formats.so (if it turns out that loading one big .so is
> much faster than loading separate ones).  But we should leave qcow2
> as a separate one for obvious reasons, or make a hot-formats.so with
> one or two other formats if that makes more sense.
> 
> With that, for the first step, we can lazy load the cold-formats.so
> whenever we need to probe or a non-qcow2 format is involved. Then on
> top of that we can implement what Peter has suggested.
> 

Some profiling:

A QEMU with this configuration:
./configure --enable-sparse --enable-sdl --enable-gtk --enable-vte \
 --enable-curses --enable-vnc --enable-vnc-{jpeg,tls,sasl,png,ws} \
 --enable-virtfs --enable-brlapi --enable-curl --enable-fdt \
 --enable-bluez --enable-kvm --enable-rdma --enable-uuid --enable-vde \
 --enable-linux-aio --enable-cap-ng --enable-attr --enable-vhost-net \
 --enable-vhost-scsi --enable-spice --enable-rbd --enable-libiscsi \
 --enable-smartcard-nss --enable-guest-agent --enable-libusb \
 --enable-usb-redir --enable-lzo --enable-snappy --enable-bzip2 \
 --enable-seccomp --enable-coroutine-pool --enable-glusterfs \
 --enable-tpm --enable-libssh2 --enable-vhdx --enable-quorum \
 --enable-numa --enable-tcmalloc --target-list=x86_64-softmmu

Has dependencies on 142 libraries. It takes 60 ms between the run and
the jump to the main function, and 80 ms between the run and the
first kvm_entry.

A QEMU with the same configuration and --enable-modules has
dependencies on 125 libraries. It takes 20 ms between the run and the
jump to the main function, and 100 ms between the run and the first
kvm_entry.

The libraries that are not loaded are: libiscsi, libcurl, librbd,
librados, ligfapi, libglusterfs, libgfrpc, libgfxdr, libssh2, libcrypt,
libidin, libgssapi, liblber, libldap, libboost_thread, libbost_system
and libatomic_ops.

As I already explained, the current implementation of modules loads
the modules at startup always. That's why the QEMU setup takes longer,
even though it uses G_MODULE_BIND_LAZY. And that's why I was proposing
hotplugging.

I don't know if loading one big library is more efficent than a lot of
small ones, but it would make sense.

Marc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  3:09 ` Fam Zheng
  2015-08-03  7:51   ` Peter Maydell
  2015-08-03  7:52   ` Marc Marí
@ 2015-08-03  9:20   ` Daniel P. Berrange
  2015-08-03  9:52   ` Paolo Bonzini
  3 siblings, 0 replies; 18+ messages in thread
From: Daniel P. Berrange @ 2015-08-03  9:20 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Marc Marí, qemu-devel

On Mon, Aug 03, 2015 at 11:09:06AM +0800, Fam Zheng wrote:
> On Fri, 07/31 17:45, Marc Marí wrote:
> > Hi everyone
> > 
> > I propose improving the current modular driver system for QEMU so it
> > can benefit everybody in speed and flexibility. I'm looking for other
> > ideas, comments, critics, etc.
> > 
> > - Background -
> > In order to speed up QEMU, I'm looking at the high number of libraries
> > and dependencies that it loads. I've generated a QEMU image that needs
> > 145 shared libraries to start, and there were still some options
> > disabled. This is a lot, and this means that it is slow.
> > 
> > So I've been looking at actual module system. Yes, QEMU does have a
> > module system, but disabled by default. The problem is, the modules get
> > loaded always during startup. This means, booting with modules enabled
> > is even slower, because loading at runtime is slower that letting the
> > linker do all the work at the start. At this point, I doubt of the
> > benefits of the actual modular system.
> > 
> > But, if disabling the actual block modules (iscsi, curl, rbd, gluster,
> > ssh and dmg) gives 40 ms of speedup, I think is worth an effort of
> > improving modules, and modularizing new parts
> > 
> > - Current module flow -
> > The current module system is based on shared libraries. Each of these
> > libraries has a constructor that registers the BlockDriver structure to
> > the global bdrv_drivers list. This list is later searched by the type,
> > the protocol, etc.
> > 
> > - Proposals -
> > I have in mind two ideas, which are not mutually exclusive:
> > 
> > -- A --
> > Having a huge lookup table with the names of the drivers that are in
> > modules, and the name of the modules where it can be found. When
> > some type is not found in the driver lists, this table can be traversed
> > to find the library and load it.
> > 
> > This requires an effort by the driver developer to fill the tables. But
> > the refactoring work can stay localized in the drivers and the modules,
> > it is (possibly) not necessary to touch any other part of the QEMU
> > system.
> > 
> > I don't specially like this huge manual-edited table.

You don't neccessarily need to have a look up table at all - just use a
1-1 mapping between driver name and the .so module file on disk, so you
can just directly load it when needed.

> 
> Yeah, the way to fill the tables is an (implementation) question, but there are
> still more questions if we go this way...
> 
> bdrv_find_protocol is easy, the protocol name is extracted from image uri or
> blockdev options, we can load the module by the name.
> 
> bdrv_probe_all is harder. If we modularize a format driver, its .bdrv_probe
> code will be in the module. If we want to do the format detection, we need to
> load all format drivers. This means if the command line has an unspecified
> format, we'll still need to load all drivers at starting phase. (I wish all
> formats are probed according to magic bytes at offset 0, so we can simplify the
> .bdrv_probe logic and do it with data matching in block.c like the protocol
> case, but that's not true for VMDK :( )

In general we recommend apps to not rely on probing of disk formats
because of the inherant security problems which have led to multiple
CVEs for many apps over the years. As such I think you could make the
argument that probing is a problem that is not worth putting much
effort into solving.

As such I think it would be fine to say that if 'bdrv_probe' is invoked,
QEMU should just load all modules it can find. Or you could have an
/etc/qemu/block-probe.conf file which lists the modules that are
acceptable to auto-load for probing. Keep it simple on the basis that
most apps should not rely on this.

> > -- B --
> > The same --enable-X that is done at compile time, but directly on the
> > command line when running QEMU or even in hot at the monitor.
> > 
> > This solution requires work on the monitor, the command line
> > processing, the modules and the drivers system. It is also less
> > transparent to the final user.
> 
> I'm afraid this breaks backward compatibility. :(

Requiring an --enable-X command line arg also exposes external apps to
an implementation detail of QEMU's module system. I think that is a bad
idea in general, even ignoring backwards compat issues. Any loading of
modules should be more or less invisible from the POV of an application.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-07-31 15:45 [Qemu-devel] Modularizing QEMU RFC Marc Marí
  2015-08-03  3:09 ` Fam Zheng
@ 2015-08-03  9:23 ` Daniel P. Berrange
  2015-08-03  9:43   ` Marc Marí
  1 sibling, 1 reply; 18+ messages in thread
From: Daniel P. Berrange @ 2015-08-03  9:23 UTC (permalink / raw)
  To: Marc Marí; +Cc: qemu-devel

On Fri, Jul 31, 2015 at 05:45:42PM +0200, Marc Marí wrote:
> Hi everyone
> 
> I propose improving the current modular driver system for QEMU so it
> can benefit everybody in speed and flexibility. I'm looking for other
> ideas, comments, critics, etc.
> 
> - Background -
> In order to speed up QEMU, I'm looking at the high number of libraries
> and dependencies that it loads. I've generated a QEMU image that needs
> 145 shared libraries to start, and there were still some options
> disabled. This is a lot, and this means that it is slow.
> 
> So I've been looking at actual module system. Yes, QEMU does have a
> module system, but disabled by default. The problem is, the modules get
> loaded always during startup. This means, booting with modules enabled
> is even slower, because loading at runtime is slower that letting the
> linker do all the work at the start. At this point, I doubt of the
> benefits of the actual modular system.
> 
> But, if disabling the actual block modules (iscsi, curl, rbd, gluster,
> ssh and dmg) gives 40 ms of speedup, I think is worth an effort of
> improving modules, and modularizing new parts
> 
> - Current module flow -
> The current module system is based on shared libraries. Each of these
> libraries has a constructor that registers the BlockDriver structure to
> the global bdrv_drivers list. This list is later searched by the type,
> the protocol, etc.
> 
> - Proposals -
> I have in mind two ideas, which are not mutually exclusive:

[snip]

> - My comments on my proposals -
> Ideally, I'd like a mixed solution. The user can specify what wants to
> load, but also, when something is not found, it is automatically
> searched.
> 
> In both options, the current module system has to be partly rewritten,
> and some of the current drivers with module capability might need to be
> modified to adapt to the new specifications.
> 
> And, a part from improving the current modular interface, there are a
> lot of other devices that might benefit from it, not just the block
> devices.
> 
> I still haven't looked at the memory footprint of QEMU, but I'm sure
> that the QEMU binary will lose a lot of weight with this addition.

One think I don't see mentioned is how this impacts on QEMU feature
detection by apps. For example, the recommended approach currnetly
is to launch QEMU with 'qemu-system-BLAH --machine none -qmp /some/sock'
and then query QMP for lists of devices supported, list of various
backends and other features.

If you're going to suggest a fully modular system, then when doing
QMP feature detection we still need to see the full list of features.
So either that implies all the metadata associated with the modules
remains built-in to QEMU, so QMP can answer this without lodaing the
modules, or the QMP feature detection must imply auto-loading of all
modules that exist.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  9:01       ` Marc Marí
@ 2015-08-03  9:24         ` Alex Bennée
  2015-08-03  9:36           ` Marc Marí
  2015-08-03  9:38           ` Daniel P. Berrange
  2015-08-03  9:24         ` Fam Zheng
  1 sibling, 2 replies; 18+ messages in thread
From: Alex Bennée @ 2015-08-03  9:24 UTC (permalink / raw)
  To: Marc Marí; +Cc: Fam Zheng, qemu-devel


Marc Marí <markmb@redhat.com> writes:

> On Mon, 3 Aug 2015 16:22:34 +0800
> Fam Zheng <famz@redhat.com> wrote:
>
>> On Mon, 08/03 09:52, Marc Marí wrote:
>> > So any other ideas to reduce the library overhead are appreciated.
>> 
>> It would be interesting to see your profiling on the library loading
>> overhead. For example, how much does it help to reduce the library
>> size, and how much does it help to reduce the # of libraries?
<snip>
>
> Some profiling:
>
> A QEMU with this configuration:
> ./configure --enable-sparse --enable-sdl --enable-gtk --enable-vte \
>  --enable-curses --enable-vnc --enable-vnc-{jpeg,tls,sasl,png,ws} \
>  --enable-virtfs --enable-brlapi --enable-curl --enable-fdt \
>  --enable-bluez --enable-kvm --enable-rdma --enable-uuid --enable-vde \
>  --enable-linux-aio --enable-cap-ng --enable-attr --enable-vhost-net \
>  --enable-vhost-scsi --enable-spice --enable-rbd --enable-libiscsi \
>  --enable-smartcard-nss --enable-guest-agent --enable-libusb \
>  --enable-usb-redir --enable-lzo --enable-snappy --enable-bzip2 \
>  --enable-seccomp --enable-coroutine-pool --enable-glusterfs \
>  --enable-tpm --enable-libssh2 --enable-vhdx --enable-quorum \
>  --enable-numa --enable-tcmalloc --target-list=x86_64-softmmu
>
> Has dependencies on 142 libraries. It takes 60 ms between the run and
> the jump to the main function, and 80 ms between the run and the
> first kvm_entry.
>
> A QEMU with the same configuration and --enable-modules has
> dependencies on 125 libraries. It takes 20 ms between the run and the
> jump to the main function, and 100 ms between the run and the first
> kvm_entry.
>
> The libraries that are not loaded are: libiscsi, libcurl, librbd,
> librados, ligfapi, libglusterfs, libgfrpc, libgfxdr, libssh2, libcrypt,
> libidin, libgssapi, liblber, libldap, libboost_thread, libbost_system
> and libatomic_ops.
>
> As I already explained, the current implementation of modules loads
> the modules at startup always. That's why the QEMU setup takes longer,
> even though it uses G_MODULE_BIND_LAZY. And that's why I was proposing
> hotplugging.
>
> I don't know if loading one big library is more efficent than a lot of
> small ones, but it would make sense.

What's the actual use-case here where start-up latency is so important?
If it is an ephemeral cloudy thing then you might just have a base QEMU
with VIRT drivers and one big .so call "the-rest.so"?

I don't wish to disparage the idea but certainly in emulation world the
difference of 100ms or so is neither here nor there.

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  9:01       ` Marc Marí
  2015-08-03  9:24         ` Alex Bennée
@ 2015-08-03  9:24         ` Fam Zheng
  2015-08-03 10:22           ` Marc Marí
  1 sibling, 1 reply; 18+ messages in thread
From: Fam Zheng @ 2015-08-03  9:24 UTC (permalink / raw)
  To: Marc Marí; +Cc: qemu-devel

On Mon, 08/03 11:01, Marc Marí wrote:
> Some profiling:
> 
> A QEMU with this configuration:
> ./configure --enable-sparse --enable-sdl --enable-gtk --enable-vte \
>  --enable-curses --enable-vnc --enable-vnc-{jpeg,tls,sasl,png,ws} \
>  --enable-virtfs --enable-brlapi --enable-curl --enable-fdt \
>  --enable-bluez --enable-kvm --enable-rdma --enable-uuid --enable-vde \
>  --enable-linux-aio --enable-cap-ng --enable-attr --enable-vhost-net \
>  --enable-vhost-scsi --enable-spice --enable-rbd --enable-libiscsi \
>  --enable-smartcard-nss --enable-guest-agent --enable-libusb \
>  --enable-usb-redir --enable-lzo --enable-snappy --enable-bzip2 \
>  --enable-seccomp --enable-coroutine-pool --enable-glusterfs \
>  --enable-tpm --enable-libssh2 --enable-vhdx --enable-quorum \
>  --enable-numa --enable-tcmalloc --target-list=x86_64-softmmu
> 
> Has dependencies on 142 libraries. It takes 60 ms between the run and
> the jump to the main function, and 80 ms between the run and the
> first kvm_entry.
> 
> A QEMU with the same configuration and --enable-modules has
> dependencies on 125 libraries. It takes 20 ms between the run and the
> jump to the main function, and 100 ms between the run and the first
> kvm_entry.

Which means 40 ms is saved because we reduced the size and dependency of QEMU
executable, but 60 ms is the extra cost of dynamical loading. That's a net
loss.

In your --enable-modules configuration, could you try comment out module_load
body and compare again, so we know how much time is spent in looking up and
loading modules?

Fam

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  9:24         ` Alex Bennée
@ 2015-08-03  9:36           ` Marc Marí
  2015-08-03  9:58             ` Alex Bennée
  2015-08-03  9:38           ` Daniel P. Berrange
  1 sibling, 1 reply; 18+ messages in thread
From: Marc Marí @ 2015-08-03  9:36 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Fam Zheng, qemu-devel

On Mon, 03 Aug 2015 10:24:56 +0100
Alex Bennée <alex.bennee@linaro.org> wrote:

> 
> Marc Marí <markmb@redhat.com> writes:
> 
> > On Mon, 3 Aug 2015 16:22:34 +0800
> > Fam Zheng <famz@redhat.com> wrote:
> >
> >> On Mon, 08/03 09:52, Marc Marí wrote:
> >> > So any other ideas to reduce the library overhead are
> >> > appreciated.
> >> 
> >> It would be interesting to see your profiling on the library
> >> loading overhead. For example, how much does it help to reduce the
> >> library size, and how much does it help to reduce the # of
> >> libraries?
> <snip>
> >
> > Some profiling:
> >
> > A QEMU with this configuration:
> > ./configure --enable-sparse --enable-sdl --enable-gtk --enable-vte \
> >  --enable-curses --enable-vnc --enable-vnc-{jpeg,tls,sasl,png,ws} \
> >  --enable-virtfs --enable-brlapi --enable-curl --enable-fdt \
> >  --enable-bluez --enable-kvm --enable-rdma --enable-uuid
> > --enable-vde \ --enable-linux-aio --enable-cap-ng --enable-attr
> > --enable-vhost-net \ --enable-vhost-scsi --enable-spice
> > --enable-rbd --enable-libiscsi \ --enable-smartcard-nss
> > --enable-guest-agent --enable-libusb \ --enable-usb-redir
> > --enable-lzo --enable-snappy --enable-bzip2 \ --enable-seccomp
> > --enable-coroutine-pool --enable-glusterfs \ --enable-tpm
> > --enable-libssh2 --enable-vhdx --enable-quorum \ --enable-numa
> > --enable-tcmalloc --target-list=x86_64-softmmu
> >
> > Has dependencies on 142 libraries. It takes 60 ms between the run
> > and the jump to the main function, and 80 ms between the run and the
> > first kvm_entry.
> >
> > A QEMU with the same configuration and --enable-modules has
> > dependencies on 125 libraries. It takes 20 ms between the run and
> > the jump to the main function, and 100 ms between the run and the
> > first kvm_entry.
> >
> > The libraries that are not loaded are: libiscsi, libcurl, librbd,
> > librados, ligfapi, libglusterfs, libgfrpc, libgfxdr, libssh2,
> > libcrypt, libidin, libgssapi, liblber, libldap, libboost_thread,
> > libbost_system and libatomic_ops.
> >
> > As I already explained, the current implementation of modules loads
> > the modules at startup always. That's why the QEMU setup takes
> > longer, even though it uses G_MODULE_BIND_LAZY. And that's why I
> > was proposing hotplugging.
> >
> > I don't know if loading one big library is more efficent than a lot
> > of small ones, but it would make sense.
> 
> What's the actual use-case here where start-up latency is so
> important? If it is an ephemeral cloudy thing then you might just
> have a base QEMU with VIRT drivers and one big .so call "the-rest.so"?
> 
> I don't wish to disparage the idea but certainly in emulation world
> the difference of 100ms or so is neither here nor there.
> 

Clear Containers: https://lwn.net/Articles/644675/

We are looking for making QEMU more lightweight for the general use
case and also for the container use case. It is a lot better to have
the same tool for both cases, and not start a new one from scratch as
Intel has done.

This also benefits the general QEMU community, and that's why I'm
having this discussion here. If there's a point where QEMU is still too
slow for containers, but optimizing means breaking, then we will have
to take a step back and change the point of view.

And making QEMU modular I think is benefitial for everyone.

Marc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  9:24         ` Alex Bennée
  2015-08-03  9:36           ` Marc Marí
@ 2015-08-03  9:38           ` Daniel P. Berrange
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel P. Berrange @ 2015-08-03  9:38 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Marc Marí, Fam Zheng, qemu-devel

On Mon, Aug 03, 2015 at 10:24:56AM +0100, Alex Bennée wrote:
> 
> Marc Marí <markmb@redhat.com> writes:
> 
> > On Mon, 3 Aug 2015 16:22:34 +0800
> > Fam Zheng <famz@redhat.com> wrote:
> >
> >> On Mon, 08/03 09:52, Marc Marí wrote:
> >> > So any other ideas to reduce the library overhead are appreciated.
> >> 
> >> It would be interesting to see your profiling on the library loading
> >> overhead. For example, how much does it help to reduce the library
> >> size, and how much does it help to reduce the # of libraries?
> <snip>
> >
> > Some profiling:
> >
> > A QEMU with this configuration:
> > ./configure --enable-sparse --enable-sdl --enable-gtk --enable-vte \
> >  --enable-curses --enable-vnc --enable-vnc-{jpeg,tls,sasl,png,ws} \
> >  --enable-virtfs --enable-brlapi --enable-curl --enable-fdt \
> >  --enable-bluez --enable-kvm --enable-rdma --enable-uuid --enable-vde \
> >  --enable-linux-aio --enable-cap-ng --enable-attr --enable-vhost-net \
> >  --enable-vhost-scsi --enable-spice --enable-rbd --enable-libiscsi \
> >  --enable-smartcard-nss --enable-guest-agent --enable-libusb \
> >  --enable-usb-redir --enable-lzo --enable-snappy --enable-bzip2 \
> >  --enable-seccomp --enable-coroutine-pool --enable-glusterfs \
> >  --enable-tpm --enable-libssh2 --enable-vhdx --enable-quorum \
> >  --enable-numa --enable-tcmalloc --target-list=x86_64-softmmu
> >
> > Has dependencies on 142 libraries. It takes 60 ms between the run and
> > the jump to the main function, and 80 ms between the run and the
> > first kvm_entry.
> >
> > A QEMU with the same configuration and --enable-modules has
> > dependencies on 125 libraries. It takes 20 ms between the run and the
> > jump to the main function, and 100 ms between the run and the first
> > kvm_entry.
> >
> > The libraries that are not loaded are: libiscsi, libcurl, librbd,
> > librados, ligfapi, libglusterfs, libgfrpc, libgfxdr, libssh2, libcrypt,
> > libidin, libgssapi, liblber, libldap, libboost_thread, libbost_system
> > and libatomic_ops.
> >
> > As I already explained, the current implementation of modules loads
> > the modules at startup always. That's why the QEMU setup takes longer,
> > even though it uses G_MODULE_BIND_LAZY. And that's why I was proposing
> > hotplugging.
> >
> > I don't know if loading one big library is more efficent than a lot of
> > small ones, but it would make sense.
> 
> What's the actual use-case here where start-up latency is so important?
> If it is an ephemeral cloudy thing then you might just have a base QEMU
> with VIRT drivers and one big .so call "the-rest.so"?
> 
> I don't wish to disparage the idea but certainly in emulation world the
> difference of 100ms or so is neither here nor there.

If you are running a full OS install w/ TCG that 100ms may not be relevant,
but if you are using QEMU w/ KVM as the basis of a more secure environment
for application containers it can be important. eg if it takes 2 secs
from point of exec'ing QEMU to running your app, then 100ms is 5% of
the total time, which is very relevant to consider optimizing. This is the
kind of scenario seen by libguestfs and libvirt-sandbox

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  9:23 ` Daniel P. Berrange
@ 2015-08-03  9:43   ` Marc Marí
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Marí @ 2015-08-03  9:43 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: qemu-devel

On Mon, 3 Aug 2015 10:23:37 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Fri, Jul 31, 2015 at 05:45:42PM +0200, Marc Marí wrote:
> > Hi everyone
> > 
> > I propose improving the current modular driver system for QEMU so it
> > can benefit everybody in speed and flexibility. I'm looking for
> > other ideas, comments, critics, etc.
> > 
> > - Background -
> > In order to speed up QEMU, I'm looking at the high number of
> > libraries and dependencies that it loads. I've generated a QEMU
> > image that needs 145 shared libraries to start, and there were
> > still some options disabled. This is a lot, and this means that it
> > is slow.
> > 
> > So I've been looking at actual module system. Yes, QEMU does have a
> > module system, but disabled by default. The problem is, the modules
> > get loaded always during startup. This means, booting with modules
> > enabled is even slower, because loading at runtime is slower that
> > letting the linker do all the work at the start. At this point, I
> > doubt of the benefits of the actual modular system.
> > 
> > But, if disabling the actual block modules (iscsi, curl, rbd,
> > gluster, ssh and dmg) gives 40 ms of speedup, I think is worth an
> > effort of improving modules, and modularizing new parts
> > 
> > - Current module flow -
> > The current module system is based on shared libraries. Each of
> > these libraries has a constructor that registers the BlockDriver
> > structure to the global bdrv_drivers list. This list is later
> > searched by the type, the protocol, etc.
> > 
> > - Proposals -
> > I have in mind two ideas, which are not mutually exclusive:
> 
> [snip]
> 
> > - My comments on my proposals -
> > Ideally, I'd like a mixed solution. The user can specify what wants
> > to load, but also, when something is not found, it is automatically
> > searched.
> > 
> > In both options, the current module system has to be partly
> > rewritten, and some of the current drivers with module capability
> > might need to be modified to adapt to the new specifications.
> > 
> > And, a part from improving the current modular interface, there are
> > a lot of other devices that might benefit from it, not just the
> > block devices.
> > 
> > I still haven't looked at the memory footprint of QEMU, but I'm sure
> > that the QEMU binary will lose a lot of weight with this addition.
> 
> One think I don't see mentioned is how this impacts on QEMU feature
> detection by apps. For example, the recommended approach currnetly
> is to launch QEMU with 'qemu-system-BLAH --machine none
> -qmp /some/sock' and then query QMP for lists of devices supported,
> list of various backends and other features.
> 
> If you're going to suggest a fully modular system, then when doing
> QMP feature detection we still need to see the full list of features.
> So either that implies all the metadata associated with the modules
> remains built-in to QEMU, so QMP can answer this without lodaing the
> modules, or the QMP feature detection must imply auto-loading of all
> modules that exist.

Not everything can be trivially modularized.

But, I think that if we are able to get in modules the "very
independent" drivers, it will be already a huge improvement. And then,
we can think if it's worth the rest of the trouble.

Marc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  3:09 ` Fam Zheng
                     ` (2 preceding siblings ...)
  2015-08-03  9:20   ` Daniel P. Berrange
@ 2015-08-03  9:52   ` Paolo Bonzini
  3 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2015-08-03  9:52 UTC (permalink / raw)
  To: Fam Zheng, Marc Marí; +Cc: qemu-devel



On 03/08/2015 05:09, Fam Zheng wrote:
> bdrv_probe_all is harder. If we modularize a format driver, its .bdrv_probe
> code will be in the module. If we want to do the format detection, we need to
> load all format drivers. This means if the command line has an unspecified
> format, we'll still need to load all drivers at starting phase. (I wish all
> formats are probed according to magic bytes at offset 0, so we can simplify the
> .bdrv_probe logic and do it with data matching in block.c like the protocol
> case, but that's not true for VMDK :( )

I think it's okay to say that:

- .bdrv_probe_device is not supported in modules (you have to use
file.driver=foo manually)

- not specifying a format results in all modules being loaded

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  9:36           ` Marc Marí
@ 2015-08-03  9:58             ` Alex Bennée
  2015-08-03 10:16               ` Daniel P. Berrange
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Bennée @ 2015-08-03  9:58 UTC (permalink / raw)
  To: Marc Marí; +Cc: Fam Zheng, qemu-devel


Marc Marí <markmb@redhat.com> writes:

> On Mon, 03 Aug 2015 10:24:56 +0100
> Alex Bennée <alex.bennee@linaro.org> wrote:
>
>> 
>> Marc Marí <markmb@redhat.com> writes:
>> 
>> > On Mon, 3 Aug 2015 16:22:34 +0800
>> > Fam Zheng <famz@redhat.com> wrote:
>> >
>> >> On Mon, 08/03 09:52, Marc Marí wrote:
>> >> > So any other ideas to reduce the library overhead are
>> >> > appreciated.
>> >> 
>> >> It would be interesting to see your profiling on the library
>> >> loading overhead. For example, how much does it help to reduce the
>> >> library size, and how much does it help to reduce the # of
>> >> libraries?
>> <snip>
>> >
>> > Some profiling:
>> >
<snip>
>> >
>> > I don't know if loading one big library is more efficent than a lot
>> > of small ones, but it would make sense.
>> 
>> What's the actual use-case here where start-up latency is so
>> important? If it is an ephemeral cloudy thing then you might just
>> have a base QEMU with VIRT drivers and one big .so call "the-rest.so"?
>> 
>
> Clear Containers: https://lwn.net/Articles/644675/
>
> We are looking for making QEMU more lightweight for the general use
> case and also for the container use case. It is a lot better to have
> the same tool for both cases, and not start a new one from scratch as
> Intel has done.
>
> This also benefits the general QEMU community, and that's why I'm
> having this discussion here. If there's a point where QEMU is still too
> slow for containers, but optimizing means breaking, then we will have
> to take a step back and change the point of view.
>
> And making QEMU modular I think is benefitial for everyone.

Thanks for the link.

If all the less used parts of QEMU where wrapped up into a dynamically
linked library (rather than a dynamically loaded module) wouldn't you
get the best of both worlds? A fast loading executable which only
instantiated the rest if a function from the library was actually called?

>
> Marc

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  9:58             ` Alex Bennée
@ 2015-08-03 10:16               ` Daniel P. Berrange
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel P. Berrange @ 2015-08-03 10:16 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Marc Marí, Fam Zheng, qemu-devel

On Mon, Aug 03, 2015 at 10:58:41AM +0100, Alex Bennée wrote:
> 
> Marc Marí <markmb@redhat.com> writes:
> 
> > On Mon, 03 Aug 2015 10:24:56 +0100
> > Alex Bennée <alex.bennee@linaro.org> wrote:
> >
> >> 
> >> Marc Marí <markmb@redhat.com> writes:
> >> 
> >> > On Mon, 3 Aug 2015 16:22:34 +0800
> >> > Fam Zheng <famz@redhat.com> wrote:
> >> >
> >> >> On Mon, 08/03 09:52, Marc Marí wrote:
> >> >> > So any other ideas to reduce the library overhead are
> >> >> > appreciated.
> >> >> 
> >> >> It would be interesting to see your profiling on the library
> >> >> loading overhead. For example, how much does it help to reduce the
> >> >> library size, and how much does it help to reduce the # of
> >> >> libraries?
> >> <snip>
> >> >
> >> > Some profiling:
> >> >
> <snip>
> >> >
> >> > I don't know if loading one big library is more efficent than a lot
> >> > of small ones, but it would make sense.
> >> 
> >> What's the actual use-case here where start-up latency is so
> >> important? If it is an ephemeral cloudy thing then you might just
> >> have a base QEMU with VIRT drivers and one big .so call "the-rest.so"?
> >> 
> >
> > Clear Containers: https://lwn.net/Articles/644675/
> >
> > We are looking for making QEMU more lightweight for the general use
> > case and also for the container use case. It is a lot better to have
> > the same tool for both cases, and not start a new one from scratch as
> > Intel has done.
> >
> > This also benefits the general QEMU community, and that's why I'm
> > having this discussion here. If there's a point where QEMU is still too
> > slow for containers, but optimizing means breaking, then we will have
> > to take a step back and change the point of view.
> >
> > And making QEMU modular I think is benefitial for everyone.
> 
> Thanks for the link.
> 
> If all the less used parts of QEMU where wrapped up into a dynamically
> linked library (rather than a dynamically loaded module) wouldn't you
> get the best of both worlds? A fast loading executable which only
> instantiated the rest if a function from the library was actually called?

The problem lies with defining what "the less used parts" actually
are. You'll end up building something which suits one case, at the
expense of another case, because everyone will have a different
perception on what the less used parts are. A large portion of the
QEMU userbase probably don't care about RBD block driver whatsoever,
but another significant portion probably use it for all their storage
needs and don't ever use any other block driver. A general purpose
module loading system avoids having to favour one particular usage
scenario at the expense of others.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03  9:24         ` Fam Zheng
@ 2015-08-03 10:22           ` Marc Marí
  2015-08-03 10:54             ` Fam Zheng
  0 siblings, 1 reply; 18+ messages in thread
From: Marc Marí @ 2015-08-03 10:22 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-devel

On Mon, 3 Aug 2015 17:24:57 +0800
Fam Zheng <famz@redhat.com> wrote:

> On Mon, 08/03 11:01, Marc Marí wrote:
> > Some profiling:
> > 
> > A QEMU with this configuration:
> > ./configure --enable-sparse --enable-sdl --enable-gtk --enable-vte \
> >  --enable-curses --enable-vnc --enable-vnc-{jpeg,tls,sasl,png,ws} \
> >  --enable-virtfs --enable-brlapi --enable-curl --enable-fdt \
> >  --enable-bluez --enable-kvm --enable-rdma --enable-uuid
> > --enable-vde \ --enable-linux-aio --enable-cap-ng --enable-attr
> > --enable-vhost-net \ --enable-vhost-scsi --enable-spice
> > --enable-rbd --enable-libiscsi \ --enable-smartcard-nss
> > --enable-guest-agent --enable-libusb \ --enable-usb-redir
> > --enable-lzo --enable-snappy --enable-bzip2 \ --enable-seccomp
> > --enable-coroutine-pool --enable-glusterfs \ --enable-tpm
> > --enable-libssh2 --enable-vhdx --enable-quorum \ --enable-numa
> > --enable-tcmalloc --target-list=x86_64-softmmu
> > 
> > Has dependencies on 142 libraries. It takes 60 ms between the run
> > and the jump to the main function, and 80 ms between the run and the
> > first kvm_entry.
> > 
> > A QEMU with the same configuration and --enable-modules has
> > dependencies on 125 libraries. It takes 20 ms between the run and
> > the jump to the main function, and 100 ms between the run and the
> > first kvm_entry.
> 
> Which means 40 ms is saved because we reduced the size and dependency
> of QEMU executable, but 60 ms is the extra cost of dynamical loading.
> That's a net loss.
> 
> In your --enable-modules configuration, could you try comment out
> module_load body and compare again, so we know how much time is spent
> in looking up and loading modules?
> 

With the module load disabled, 20 ms from run to main, and 40 ms from
run to kvm_entry. Which is "the expected", from the numbers above.

Marc

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] Modularizing QEMU RFC
  2015-08-03 10:22           ` Marc Marí
@ 2015-08-03 10:54             ` Fam Zheng
  0 siblings, 0 replies; 18+ messages in thread
From: Fam Zheng @ 2015-08-03 10:54 UTC (permalink / raw)
  To: Marc Marí; +Cc: qemu-devel

On Mon, 08/03 12:22, Marc Marí wrote:
> On Mon, 3 Aug 2015 17:24:57 +0800
> Fam Zheng <famz@redhat.com> wrote:
> 
> > On Mon, 08/03 11:01, Marc Marí wrote:
> > > Some profiling:
> > > 
> > > A QEMU with this configuration:
> > > ./configure --enable-sparse --enable-sdl --enable-gtk --enable-vte \
> > >  --enable-curses --enable-vnc --enable-vnc-{jpeg,tls,sasl,png,ws} \
> > >  --enable-virtfs --enable-brlapi --enable-curl --enable-fdt \
> > >  --enable-bluez --enable-kvm --enable-rdma --enable-uuid
> > > --enable-vde \ --enable-linux-aio --enable-cap-ng --enable-attr
> > > --enable-vhost-net \ --enable-vhost-scsi --enable-spice
> > > --enable-rbd --enable-libiscsi \ --enable-smartcard-nss
> > > --enable-guest-agent --enable-libusb \ --enable-usb-redir
> > > --enable-lzo --enable-snappy --enable-bzip2 \ --enable-seccomp
> > > --enable-coroutine-pool --enable-glusterfs \ --enable-tpm
> > > --enable-libssh2 --enable-vhdx --enable-quorum \ --enable-numa
> > > --enable-tcmalloc --target-list=x86_64-softmmu
> > > 
> > > Has dependencies on 142 libraries. It takes 60 ms between the run
> > > and the jump to the main function, and 80 ms between the run and the
> > > first kvm_entry.
> > > 
> > > A QEMU with the same configuration and --enable-modules has
> > > dependencies on 125 libraries. It takes 20 ms between the run and
> > > the jump to the main function, and 100 ms between the run and the
> > > first kvm_entry.
> > 
> > Which means 40 ms is saved because we reduced the size and dependency
> > of QEMU executable, but 60 ms is the extra cost of dynamical loading.
> > That's a net loss.
> > 
> > In your --enable-modules configuration, could you try comment out
> > module_load body and compare again, so we know how much time is spent
> > in looking up and loading modules?
> > 
> 
> With the module load disabled, 20 ms from run to main, and 40 ms from
> run to kvm_entry. Which is "the expected", from the numbers above.
> 

Thanks, that proves the 40 ms speeding up is already waving hands. :)

Once you figure out a way to map protocol/format names to module names in
block.c, we can review all "QLIST_FOREACH(drv, &bdrv_drivers, list)" there, to
either inject loading of all modules, or change it to loading of a specific
module.

Fam

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-08-03 10:54 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-31 15:45 [Qemu-devel] Modularizing QEMU RFC Marc Marí
2015-08-03  3:09 ` Fam Zheng
2015-08-03  7:51   ` Peter Maydell
2015-08-03  7:52   ` Marc Marí
2015-08-03  8:22     ` Fam Zheng
2015-08-03  9:01       ` Marc Marí
2015-08-03  9:24         ` Alex Bennée
2015-08-03  9:36           ` Marc Marí
2015-08-03  9:58             ` Alex Bennée
2015-08-03 10:16               ` Daniel P. Berrange
2015-08-03  9:38           ` Daniel P. Berrange
2015-08-03  9:24         ` Fam Zheng
2015-08-03 10:22           ` Marc Marí
2015-08-03 10:54             ` Fam Zheng
2015-08-03  9:20   ` Daniel P. Berrange
2015-08-03  9:52   ` Paolo Bonzini
2015-08-03  9:23 ` Daniel P. Berrange
2015-08-03  9:43   ` Marc Marí

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).