qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] KVM Forum 2012 Block BoF minutes
@ 2012-11-15 13:31 Markus Armbruster
  2012-11-18  5:04 ` Marcelo Tosatti
  0 siblings, 1 reply; 4+ messages in thread
From: Markus Armbruster @ 2012-11-15 13:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, Jeff Cody, Alexander Graf,
	Anthony Liguori, Paolo Bonzini, Jason Baron

Attendees:

Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi <stefanha@redhat.com>
Jeff Cody <jcody@redhat.com>
Markus Armbruster <armbru@redhat.com>

and a few people dropping in and out.

Minutes are basically a TODO list.  Could be a bit terse in places; if
anything's unclear, please ask.


Block layer data structure cleanup:

- Split block backend off BlockDriverState

- Turn block backend into a proper ADT (besides other good things, this
  makes it possible to replace its implementation by a test harness for
  unit testing of its customers)

- Get rid of BlockDriverState opaque (driver private state), inherit the
  common part into driver state

Convert open to QemuOpts (or perhaps QDict?)

Avoid double open for probing (update: patches posted)

Block jobs:

- Rate limiting is broken in general (it works for the single
  BlockDriverState case)

- Block jobs should be jobs: generally available instead of tied to a
  (single) BlockDriverState; get rid of the block job pointer in
  BlockDriverState

- Migration should probably be a job

Block migration needs to die after its replacement is ready

BlockDriverState member in_use and DriveInfo member refcount are a mess:

- in_use is used to avoid running certain things concurrently, but the
  rules are unclear, except they're clearly incomplete; possible rules
  could be about need for consistent views of image contents

- refcount is only used together with in_use, and appears to be for
  avoiding premature deletion

AHCI:

- Same IDE device models connect to different kinds of controllers just
  fine

- Use links to connect controller device model and drive device models;
  the link serves as generic address replacing bus, unit, index

- Make -drive if!=none true sugar for suitable -device (details depend
  on machine)

- Checking legacy bus, unit, index needs to be delayed until machine's
  limits are known

- Add "enough" -hdX to cover common usage (got 4 now, want 6 for q35)

- Some AHCI controllers can optionally mimick a legacy IDE controller,
  and expose some of their ports there (controlled by BIOS for real
  hardware, by controller qdev property for us)

- Convenience machine option to control that qdev property of an onboard
  controller would be nice (should not apply to additional controllers
  plugged in with -device)

- if=ahci is not necessary

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] KVM Forum 2012 Block BoF minutes
  2012-11-15 13:31 [Qemu-devel] KVM Forum 2012 Block BoF minutes Markus Armbruster
@ 2012-11-18  5:04 ` Marcelo Tosatti
  2012-11-19 17:03   ` [Qemu-devel] BlockDriverState member in_use (was: KVM Forum 2012 Block BoF minutes) Markus Armbruster
  0 siblings, 1 reply; 4+ messages in thread
From: Marcelo Tosatti @ 2012-11-18  5:04 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Stefan Hajnoczi, Jeff Cody, qemu-devel,
	Alexander Graf, Anthony Liguori, Paolo Bonzini, Jason Baron

On Thu, Nov 15, 2012 at 02:31:53PM +0100, Markus Armbruster wrote:
> Attendees:
> 
> Kevin Wolf <kwolf@redhat.com>
> Stefan Hajnoczi <stefanha@redhat.com>
> Jeff Cody <jcody@redhat.com>
> Markus Armbruster <armbru@redhat.com>
> 
> and a few people dropping in and out.
> 
> Minutes are basically a TODO list.  Could be a bit terse in places; if
> anything's unclear, please ask.
> 
> 
> Block layer data structure cleanup:
> 
> - Split block backend off BlockDriverState
> 
> - Turn block backend into a proper ADT (besides other good things, this
>   makes it possible to replace its implementation by a test harness for
>   unit testing of its customers)
> 
> - Get rid of BlockDriverState opaque (driver private state), inherit the
>   common part into driver state
> 
> Convert open to QemuOpts (or perhaps QDict?)
> 
> Avoid double open for probing (update: patches posted)
> 
> Block jobs:
> 
> - Rate limiting is broken in general (it works for the single
>   BlockDriverState case)
> 
> - Block jobs should be jobs: generally available instead of tied to a
>   (single) BlockDriverState; get rid of the block job pointer in
>   BlockDriverState
> 
> - Migration should probably be a job
> 
> Block migration needs to die after its replacement is ready
> 
> BlockDriverState member in_use and DriveInfo member refcount are a mess:
> 
> - in_use is used to avoid running certain things concurrently, but the
>   rules are unclear, except they're clearly incomplete; possible rules
>   could be about need for consistent views of image contents

    block: enable in_use flag

    Set block device in use during block migration, disallow drive_del and
    bdrv_truncate for in use devices.

1) drive_del is not allowed because memory object used by block
streaming/migration can disappear.
2) bdrv_truncate changes device size, which the block migration code was
unable to handle at the time.

These are the rules (accordingly to the changeset). IIRC new rules have
been added (new uses for bdrv_in_use).

"Consistent views of image contents" is not defined.

Question: why are the rules "clearly incomplete" ?

> - refcount is only used together with in_use, and appears to be for
>   avoiding premature deletion

Refcount is used to stop block auto deletion when device is unplugged
by the guest. If you can reach BlockDriverState and use its in_use
flag, then you can kill refcount.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Qemu-devel] BlockDriverState member in_use (was: KVM Forum 2012 Block BoF minutes)
  2012-11-18  5:04 ` Marcelo Tosatti
@ 2012-11-19 17:03   ` Markus Armbruster
  2012-11-19 20:13     ` Marcelo Tosatti
  0 siblings, 1 reply; 4+ messages in thread
From: Markus Armbruster @ 2012-11-19 17:03 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Kevin Wolf, Stefan Hajnoczi, Jeff Cody, qemu-devel,
	Alexander Graf, Anthony Liguori, Paolo Bonzini, Jason Baron

Marcelo Tosatti <mtosatti@redhat.com> writes:

> On Thu, Nov 15, 2012 at 02:31:53PM +0100, Markus Armbruster wrote:
[...]
>> BlockDriverState member in_use and DriveInfo member refcount are a mess:
>> 
>> - in_use is used to avoid running certain things concurrently, but the
>>   rules are unclear, except they're clearly incomplete; possible rules
>>   could be about need for consistent views of image contents
>
>     block: enable in_use flag
>
>     Set block device in use during block migration, disallow drive_del and
>     bdrv_truncate for in use devices.
>
> 1) drive_del is not allowed because memory object used by block
> streaming/migration can disappear.
> 2) bdrv_truncate changes device size, which the block migration code was
> unable to handle at the time.
>
> These are the rules (accordingly to the changeset).

Yup.  Your commits db593f25^..8591675f.

>                                                     IIRC new rules have
> been added (new uses for bdrv_in_use).

Stefan's commit 2d3735d3.

Let's have a look at the users of in_use (from notes I took back in
August, possibly stale now):

* bdrv_commit()

  Since commit 2d3735d3:

    block: check bdrv_in_use() before blockdev operations
    
    Long-running block operations like block migration and image streaming
    must have continual access to their block device.  It is not safe to
    perform operations like hotplug, eject, change, resize, commit, or
    external snapshot while a long-running operation is in progress.
    
    This patch adds the missing bdrv_in_use() checks so that block migration
    and image streaming never have the rug pulled out from underneath
    them.

  Users are monitor command commit and terminal escape 's', directly and
  via bdrv_commit_all()

  What exactly would go wrong if someone commited during block migration
  / image streaming?

  Strange: also tests whether backing_hd is in_use.  The other users
  don't, and I can't see how the test could ever succeed.

  Cock-up: bdrv_commit_all() stops on first error.

* bdrv_truncate()

  Since commit 8591675f:
  
    block: enable in_use flag
    
    Set block device in use during block migration, disallow drive_del and
    bdrv_truncate for in use devices.

  Users are monitor command block_resize and a bunch of block drivers.

  I don't think block drivers are affected by in_use.  They resize their
  internal BDSs such as bs->file, and I can't see how in_use could be
  set there.

  "No resize while block migration is running" is (was?) an artificial
  restriction; it should migrate the resize just like any other change.

  Image streaming, too, I guess.

* block_job_create()

  Since commit eeec61f2:

    block: add BlockJob interface for long-running operations

  Use is monitor command block-stream.

  Why does the block job infrastructure set in_use?  I suspect just
  because its only user "image streaming" needs it, but future users
  might not.  If that's correct, it's set in the wrong place.

* qmp_transaction()

  Since commit 2d3735d3 (see above).  Function was called
  qmp_blockdev_snapshot_sync() then.

  User is monitor command snapshot_blkdev.

  What exactly would go wrong if someone snaphotted during block
  migration / image streaming?

* eject_device()

  Since commit 2d3735d3 (see above).

  Users are monitor command eject, change.

  What would go wrong is obvious, for a change: we can't just kill the
  BDS while a long-running operation is using it.

  Could we just disassociate the BDS from the drive without killing it?
  So that when the job completes, the BDS's reference count drops to
  zero, and the BDS gets destroyed.

* drive_del

  Since commit 8591675f (see above).

  In theory, in_use is unnecessary here: reference counting should delay
  the delete just fine.  In practice, do_drive_del() *ignores* the
  reference count.  Even if that was fixed, the count only delays
  drive_uninit(), not bdrv_drain_all()..bdrv_close().  Another fine
  mess.

  Note how we treat the device model's reference to the drive: if it
  exists, we make the drive anonymous instead of destroying it.  It gets
  destroyed only when the device model drops the reference.  Similar to
  reference counting.

Tests of in_use are spread over block.c and monitor commands.  Smells
bad.  If it was only about excluding monitor commands during certain
long-running operations, the montor commands would be the appropriate
place.  Is it?

> "Consistent views of image contents" is not defined.

Imprecise language for "in_use seems to be about changes to the image,
while refcount is for keeping objects alive".  We were trying to make
sense of the mess.

> Question: why are the rules "clearly incomplete" ?

Are we sure we catch everything that could interfere with block
migration / image streaming?  Why only these two?  What about other
long-running jobs?  Do they need interference protection, too?

Shouldn't there be clear rules on what changes can happen while a job
runs, and which ones can be prevented, and how?

>> - refcount is only used together with in_use, and appears to be for
>>   avoiding premature deletion
>
> Refcount is used to stop block auto deletion when device is unplugged
> by the guest. If you can reach BlockDriverState and use its in_use
> flag, then you can kill refcount.

Except it'll come right back when we switch to QOM :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] BlockDriverState member in_use (was: KVM Forum 2012 Block BoF minutes)
  2012-11-19 17:03   ` [Qemu-devel] BlockDriverState member in_use (was: KVM Forum 2012 Block BoF minutes) Markus Armbruster
@ 2012-11-19 20:13     ` Marcelo Tosatti
  0 siblings, 0 replies; 4+ messages in thread
From: Marcelo Tosatti @ 2012-11-19 20:13 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Stefan Hajnoczi, Jeff Cody, qemu-devel,
	Alexander Graf, Anthony Liguori, Paolo Bonzini, Jason Baron

On Mon, Nov 19, 2012 at 06:03:55PM +0100, Markus Armbruster wrote:
> Marcelo Tosatti <mtosatti@redhat.com> writes:
> 
> > On Thu, Nov 15, 2012 at 02:31:53PM +0100, Markus Armbruster wrote:
> [...]
> >> BlockDriverState member in_use and DriveInfo member refcount are a mess:
> >> 
> >> - in_use is used to avoid running certain things concurrently, but the
> >>   rules are unclear, except they're clearly incomplete; possible rules
> >>   could be about need for consistent views of image contents
> >
> >     block: enable in_use flag
> >
> >     Set block device in use during block migration, disallow drive_del and
> >     bdrv_truncate for in use devices.
> >
> > 1) drive_del is not allowed because memory object used by block
> > streaming/migration can disappear.
> > 2) bdrv_truncate changes device size, which the block migration code was
> > unable to handle at the time.
> >
> > These are the rules (accordingly to the changeset).
> 
> Yup.  Your commits db593f25^..8591675f.
> 
> >                                                     IIRC new rules have
> > been added (new uses for bdrv_in_use).
> 
> Stefan's commit 2d3735d3.
> 
> Let's have a look at the users of in_use (from notes I took back in
> August, possibly stale now):
> 
> * bdrv_commit()
> 
>   Since commit 2d3735d3:
> 
>     block: check bdrv_in_use() before blockdev operations
>     
>     Long-running block operations like block migration and image streaming
>     must have continual access to their block device.  It is not safe to
>     perform operations like hotplug, eject, change, resize, commit, or
>     external snapshot while a long-running operation is in progress.
>     
>     This patch adds the missing bdrv_in_use() checks so that block migration
>     and image streaming never have the rug pulled out from underneath
>     them.
> 
>   Users are monitor command commit and terminal escape 's', directly and
>   via bdrv_commit_all()
> 
>   What exactly would go wrong if someone commited during block migration
>   / image streaming?

    if (ro) {
        if (bdrv_reopen(bs->backing_hd, open_flags | BDRV_O_RDWR, NULL)) {
            return -EACCES;
        }
    }

>   Strange: also tests whether backing_hd is in_use.  The other users
>   don't, and I can't see how the test could ever succeed.
> 
>   Cock-up: bdrv_commit_all() stops on first error.
> 
> * bdrv_truncate()
> 
>   Since commit 8591675f:
>   
>     block: enable in_use flag
>     
>     Set block device in use during block migration, disallow drive_del and
>     bdrv_truncate for in use devices.
> 
>   Users are monitor command block_resize and a bunch of block drivers.
> 
>   I don't think block drivers are affected by in_use.  They resize their
>   internal BDSs such as bs->file, and I can't see how in_use could be
>   set there.
> 
>   "No resize while block migration is running" is (was?) an artificial
>   restriction; it should migrate the resize just like any other change.

>   Image streaming, too, I guess.

If its necessary to allow concurrent operation, then yes.

> * block_job_create()
> 
>   Since commit eeec61f2:
> 
>     block: add BlockJob interface for long-running operations
> 
>   Use is monitor command block-stream.
> 
>   Why does the block job infrastructure set in_use?  I suspect just
>   because its only user "image streaming" needs it, but future users
>   might not.  If that's correct, it's set in the wrong place.

Because block stream supposedly does not handle a BDS changing size (and
it was not necessary to handle this case at the time). Changing it
requires auditing the code.

> * qmp_transaction()
> 
>   Since commit 2d3735d3 (see above).  Function was called
>   qmp_blockdev_snapshot_sync() then.
> 
>   User is monitor command snapshot_blkdev.
> 
>   What exactly would go wrong if someone snaphotted during block
>   migration / image streaming?
> 
> * eject_device()
> 
>   Since commit 2d3735d3 (see above).
> 
>   Users are monitor command eject, change.
> 
>   What would go wrong is obvious, for a change: we can't just kill the
>   BDS while a long-running operation is using it.
> 
>   Could we just disassociate the BDS from the drive without killing it?
>   So that when the job completes, the BDS's reference count drops to
>   zero, and the BDS gets destroyed.
> 
> * drive_del
> 
>   Since commit 8591675f (see above).
> 
>   In theory, in_use is unnecessary here: reference counting should delay
>   the delete just fine.  In practice, do_drive_del() *ignores* the
>   reference count.  Even if that was fixed, the count only delays
>   drive_uninit(), not bdrv_drain_all()..bdrv_close().  Another fine
>   mess.
> 
>   Note how we treat the device model's reference to the drive: if it
>   exists, we make the drive anonymous instead of destroying it.  It gets
>   destroyed only when the device model drops the reference.  Similar to
>   reference counting.
> 
> Tests of in_use are spread over block.c and monitor commands.  Smells
> bad.  If it was only about excluding monitor commands during certain
> long-running operations, the montor commands would be the appropriate
> place.  Is it?
> 
> > "Consistent views of image contents" is not defined.
> 
> Imprecise language for "in_use seems to be about changes to the image,
> while refcount is for keeping objects alive".  We were trying to make
> sense of the mess.
> 
> > Question: why are the rules "clearly incomplete" ?
> 
> Are we sure we catch everything that could interfere with block
> migration / image streaming?  Why only these two?  What about other
> long-running jobs?  Do they need interference protection, too?

Good question. 

> Shouldn't there be clear rules on what changes can happen while a job
> runs, and which ones can be prevented, and how?

It depends on what operation can't fail to be performed while block
stream is executing. If such an operation exists, then you take the
energy to guarantee block streaming can perform with such an operation
in between.

There is no other basis for the rules.

Agree its a mess now (perhaps you'd want to work out the details of
every operation and corresponding effect on block streaming).

> >> - refcount is only used together with in_use, and appears to be for
> >>   avoiding premature deletion
> >
> > Refcount is used to stop block auto deletion when device is unplugged
> > by the guest. If you can reach BlockDriverState and use its in_use
> > flag, then you can kill refcount.
> 
> Except it'll come right back when we switch to QOM :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-11-19 21:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-15 13:31 [Qemu-devel] KVM Forum 2012 Block BoF minutes Markus Armbruster
2012-11-18  5:04 ` Marcelo Tosatti
2012-11-19 17:03   ` [Qemu-devel] BlockDriverState member in_use (was: KVM Forum 2012 Block BoF minutes) Markus Armbruster
2012-11-19 20:13     ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).