All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
@ 2010-07-27 15:58 Ian Campbell
  2010-07-27 16:50 ` Ian Jackson
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Ian Campbell @ 2010-07-27 15:58 UTC (permalink / raw)
  To: xen-devel

Currently the configuration syntax available in a domain configuration
has several ways of specifying devices, some of which have slightly
unexpected semantics wrt whether or not an emulated device is created,
what the major number in xenstore is etc. Some also expose details of
the guest OS's choice of major number (or rather exposes Linux's choice
to all guests AFAICT).

In an attempt to clean this up, or at least make the strange behaviour
more explicit, I'd like to propose some extensions to the dXpY syntax
supported by libxl such that the other existing ways of specifying
devices become syntactic sugar for specific well defined configurations
in the new syntax, whilst preserving backwards compatibility.

I hope that the following will also form the basis for a future document
(gasp!) describing the available syntax, which combinations are valid
etc (unless someone can point me to an existing document I can update).

Virtual Disk Configuration
--------------------------

A virtual disk is defined in the guest configuration file as d<X>p<Y>
where <X> is the disk number and <Y> is the partition number. In
addition a number of options can be specified.

p0 indicates the entire disk.

Device number encoding in xenstore
----------------------------------

Given a disk specified as dXpY the device encoding used in xenstore has
two potential formats, legacy and extended. Both of these are already
defined and implemented in guest frontend drivers.

The extended encoding is generally preferred but for backwards
compatibility the legacy format must still be supported.

The legacy encoding is (major and minor 8 bits each):
        (major << 8) | minor

The extended encoding is (disk == 19 bits, partition == 256 bits):
        (1 << 28) | (disk << 8) | partition

Note that the extended encoding for d0p0..d0p255 overlaps in the minor
number space with the legacy encodings of d0p0..d15p15 and therefore
these must not be used simultaneously.

Configuration Options
---------------------

Each disk dXpY can optionally be followed by one or more of the
following key value pairs (precise syntax TBD, but comma separated is
common in similar situations).

Option keys and values with a _ prefix are for internal use only and are
used only to provide legacy semantics for syntactic sugar and must not
otherwise be used.
        
        pv = true | false
        
                Should a PV backend/frontend pair be created in xenstore
                to correspond to this device.
                
                Default: true for HVM guests, ignored for PV guests
                (treated as true)
        
        extended = true | false
        
                Request use of extended device encoding in xenstore.
                
                extended = false is only valid for d0..d15 (as d16+
                cannot be represented in the legacy encoding)
                
                When extended = false and in the absence of a specific
                _vdevice configuration option (see below) the encoding
                will use major==202 and minor=="(disk << 4) |
                partition".
                
                Default: false for d0p0..d0p255, false if _vdevice
                option present (see below), otherwise true.
        
        emul = none | ide[01].[01] | _ide[01].[01] | ...
        
                none = No emulated device to be created.
                
                ide[01].[01] = Emulate IDE device. First [01] =>
                primary, secondary. Second [01] => master, slave
                
                _ide[01].[01] = As per ide[01].[01] however emulation is
                enabled iff no other disk is explicitly configured with
                emulation.
                
                In the future sata<X>.<Y> or similar might be added
                here.
                
                Default: none HVM guests, ignored for PV guests (treated
                as none)
                
        _vdevice = <N>:<M> | <Q>
        
                Enforce use of legacy device encoding in xenstore with
                the given major:minor or explicit value.
                
                Default: unset, encoding determined by "extended" option
                (see above)

Backward compatible disk configuration
--------------------------------------

Given the above configuration options several short hands are defined
for backwards compatibility with existing configuration files and
guests.

These will be implemented by a straight textual substitution before
parsing the configuration.

        hda => d0p0,pv=true,emul=ide0.0,_vdevice=3:0
        hdb => d1p0,pv=true,emul=ide0.1,_vdevice=3:64
        hdc => d2p0,pv=true,emul=ide1.0,_vdevice=22:0
        hdd => d3p0,pv=true,emul=ide1.1,_vdevice=22:64

        xvda => d0p0,pv=true,emul=_ide0.0,_vdevice=202:0
        xvdb => d1p0,pv=true,emul=_ide0.1,_vdevice=202:16
        xvdc => d2p0,pv=true,emul=_ide1.0,_vdevice=202:32
        xvdd => d3p0,pv=true,emul=_ide1.1,_vdevice=202:64
        xvde => d4p0,pv=true,emul=none,_vdevice=202:80
        ...
        xvdo => d15p0,pv=true,emul=none,_vdevice=202:240
        xvdp => d16p0,pv=true,emul=none
        ...
        xvdz => d25,pv=true,emul=none
        
        xvda[1..15] =>
        d0p[1..15],pv=true,emul=_ide0.0,_vdevice=202:[0..15]
        xvdb[1..15] => etc

Note that all the above are Linux (guest) specific.

The sd* syntax is not covered. It's unclear if this is used in the wild
or what the existing semantics of emul= are for SCSI devices. If someone
cares to investigate the existing behaviour then it can be added.

Otherwise it is expected that additions will not be made to this set of
shorthands and that new functionality (e.g. emulation types) will be
available only via the explicit syntax.

(is there any non-Linux specific syntax used by other guest OSes which
needs to be supported?)

Implementation notes
--------------------

The behaviour specified by the emul=_ide[01].[01] syntax is currently
implemented by qemu (effectively as a workaround for users forgetting to
specify any emulated disks). I propose that as part of implementing this
new syntax we push responsibility for these semantics up into libxl.

libxl currently uses the legacy encoding for devices specified as xvd or
dXpY iff the particular configuration can be represented using the
legacy format (e.g. for d0p0..d15p15 or xvda..xvdp) in order to (1)
avoid the clash between the extended representation of d0p0 and the
legacy representations of d1..d15 and (2) to provide compatibility with
guests which do not support the extended device encoding.

The proposal above suggests instead that d1+ should be encoded using the
extended format unless overridden using the extended=false option or one
of the shorthands which uses the_vdevice option. Only d0 would default
to legacy encoding.

This (1) avoids the clash in minor numbers since d0 is the only disk
which can clash with legacy encodings and (2) provides compatibility
with old guests through their use of the xvd* syntax.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-27 15:58 [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc Ian Campbell
@ 2010-07-27 16:50 ` Ian Jackson
  2010-07-27 20:41 ` Pasi Kärkkäinen
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Ian Jackson @ 2010-07-27 16:50 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

Ian Campbell writes ("[Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc"):
> In an attempt to clean this up, or at least make the strange behaviour
> more explicit, I'd like to propose some extensions to the dXpY syntax
> supported by libxl such that the other existing ways of specifying
> devices become syntactic sugar for specific well defined configurations
> in the new syntax, whilst preserving backwards compatibility.

Urgh.  I don't like this at all.  I have a completely different
conceptual model.  I guess I'll have to write it up.

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-27 15:58 [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc Ian Campbell
  2010-07-27 16:50 ` Ian Jackson
@ 2010-07-27 20:41 ` Pasi Kärkkäinen
  2010-07-28  9:45   ` Ian Campbell
  2010-07-28 12:31 ` Paolo Bonzini
  2010-07-28 16:05 ` Ian Jackson
  3 siblings, 1 reply; 17+ messages in thread
From: Pasi Kärkkäinen @ 2010-07-27 20:41 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

On Tue, Jul 27, 2010 at 04:58:10PM +0100, Ian Campbell wrote:
> Currently the configuration syntax available in a domain configuration
> has several ways of specifying devices, some of which have slightly
> unexpected semantics wrt whether or not an emulated device is created,
> what the major number in xenstore is etc. Some also expose details of
> the guest OS's choice of major number (or rather exposes Linux's choice
> to all guests AFAICT).
> 
> In an attempt to clean this up, or at least make the strange behaviour
> more explicit, I'd like to propose some extensions to the dXpY syntax
> supported by libxl such that the other existing ways of specifying
> devices become syntactic sugar for specific well defined configurations
> in the new syntax, whilst preserving backwards compatibility.
> 
> I hope that the following will also form the basis for a future document
> (gasp!) describing the available syntax, which combinations are valid
> etc (unless someone can point me to an existing document I can update).
> 
> Virtual Disk Configuration
> --------------------------
> 
> A virtual disk is defined in the guest configuration file as d<X>p<Y>
> where <X> is the disk number and <Y> is the partition number. In
> addition a number of options can be specified.
> 
> p0 indicates the entire disk.
> 
> Device number encoding in xenstore
> ----------------------------------
> 
> Given a disk specified as dXpY the device encoding used in xenstore has
> two potential formats, legacy and extended. Both of these are already
> defined and implemented in guest frontend drivers.
> 
> The extended encoding is generally preferred but for backwards
> compatibility the legacy format must still be supported.
> 
> The legacy encoding is (major and minor 8 bits each):
>         (major << 8) | minor
> 
> The extended encoding is (disk == 19 bits, partition == 256 bits):
>         (1 << 28) | (disk << 8) | partition
> 
> Note that the extended encoding for d0p0..d0p255 overlaps in the minor
> number space with the legacy encodings of d0p0..d15p15 and therefore
> these must not be used simultaneously.
> 
> Configuration Options
> ---------------------
> 
> Each disk dXpY can optionally be followed by one or more of the
> following key value pairs (precise syntax TBD, but comma separated is
> common in similar situations).
> 
> Option keys and values with a _ prefix are for internal use only and are
> used only to provide legacy semantics for syntactic sugar and must not
> otherwise be used.
>         
>         pv = true | false
>         
>                 Should a PV backend/frontend pair be created in xenstore
>                 to correspond to this device.
>                 
>                 Default: true for HVM guests, ignored for PV guests
>                 (treated as true)
>         
>         extended = true | false
>         
>                 Request use of extended device encoding in xenstore.
>                 
>                 extended = false is only valid for d0..d15 (as d16+
>                 cannot be represented in the legacy encoding)
>                 
>                 When extended = false and in the absence of a specific
>                 _vdevice configuration option (see below) the encoding
>                 will use major==202 and minor=="(disk << 4) |
>                 partition".
>                 
>                 Default: false for d0p0..d0p255, false if _vdevice
>                 option present (see below), otherwise true.
>         
>         emul = none | ide[01].[01] | _ide[01].[01] | ...
>         
>                 none = No emulated device to be created.
>                 
>                 ide[01].[01] = Emulate IDE device. First [01] =>
>                 primary, secondary. Second [01] => master, slave
>                 
>                 _ide[01].[01] = As per ide[01].[01] however emulation is
>                 enabled iff no other disk is explicitly configured with
>                 emulation.
>                 
>                 In the future sata<X>.<Y> or similar might be added
>                 here.
>                 
>                 Default: none HVM guests, ignored for PV guests (treated
>                 as none)
>                 
>         _vdevice = <N>:<M> | <Q>
>         
>                 Enforce use of legacy device encoding in xenstore with
>                 the given major:minor or explicit value.
>                 
>                 Default: unset, encoding determined by "extended" option
>                 (see above)
> 
> Backward compatible disk configuration
> --------------------------------------
> 
> Given the above configuration options several short hands are defined
> for backwards compatibility with existing configuration files and
> guests.
> 
> These will be implemented by a straight textual substitution before
> parsing the configuration.
> 
>         hda => d0p0,pv=true,emul=ide0.0,_vdevice=3:0
>         hdb => d1p0,pv=true,emul=ide0.1,_vdevice=3:64
>         hdc => d2p0,pv=true,emul=ide1.0,_vdevice=22:0
>         hdd => d3p0,pv=true,emul=ide1.1,_vdevice=22:64
> 
>         xvda => d0p0,pv=true,emul=_ide0.0,_vdevice=202:0
>         xvdb => d1p0,pv=true,emul=_ide0.1,_vdevice=202:16
>         xvdc => d2p0,pv=true,emul=_ide1.0,_vdevice=202:32
>         xvdd => d3p0,pv=true,emul=_ide1.1,_vdevice=202:64
>         xvde => d4p0,pv=true,emul=none,_vdevice=202:80
>         ...
>         xvdo => d15p0,pv=true,emul=none,_vdevice=202:240
>         xvdp => d16p0,pv=true,emul=none
>         ...
>         xvdz => d25,pv=true,emul=none
>         
>         xvda[1..15] =>
>         d0p[1..15],pv=true,emul=_ide0.0,_vdevice=202:[0..15]
>         xvdb[1..15] => etc
> 
> Note that all the above are Linux (guest) specific.
> 
> The sd* syntax is not covered. It's unclear if this is used in the wild
> or what the existing semantics of emul= are for SCSI devices. If someone
> cares to investigate the existing behaviour then it can be added.
>

sd* devices are still often used for Xen PV domUs..
(yeah, people should use xvd*, but many people still have sd*).

-- Pasi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-27 20:41 ` Pasi Kärkkäinen
@ 2010-07-28  9:45   ` Ian Campbell
  0 siblings, 0 replies; 17+ messages in thread
From: Ian Campbell @ 2010-07-28  9:45 UTC (permalink / raw)
  To: Pasi Kärkkäinen; +Cc: xen-devel

On Tue, 2010-07-27 at 21:41 +0100, Pasi Kärkkäinen wrote:
> > Backward compatible disk configuration
> > --------------------------------------
> > 
> > Given the above configuration options several short hands are
> defined
> > for backwards compatibility with existing configuration files and
> > guests.
> > 
> > These will be implemented by a straight textual substitution before
> > parsing the configuration.
> > 
> >         hda => d0p0,pv=true,emul=ide0.0,_vdevice=3:0
> >         hdb => d1p0,pv=true,emul=ide0.1,_vdevice=3:64
> >         hdc => d2p0,pv=true,emul=ide1.0,_vdevice=22:0
> >         hdd => d3p0,pv=true,emul=ide1.1,_vdevice=22:64
> > 
> >         xvda => d0p0,pv=true,emul=_ide0.0,_vdevice=202:0
> >         xvdb => d1p0,pv=true,emul=_ide0.1,_vdevice=202:16
> >         xvdc => d2p0,pv=true,emul=_ide1.0,_vdevice=202:32
> >         xvdd => d3p0,pv=true,emul=_ide1.1,_vdevice=202:64
> >         xvde => d4p0,pv=true,emul=none,_vdevice=202:80
> >         ...
> >         xvdo => d15p0,pv=true,emul=none,_vdevice=202:240
> >         xvdp => d16p0,pv=true,emul=none
> >         ...
> >         xvdz => d25,pv=true,emul=none
> >         
> >         xvda[1..15] =>
> >         d0p[1..15],pv=true,emul=_ide0.0,_vdevice=202:[0..15]
> >         xvdb[1..15] => etc
> > 
> > Note that all the above are Linux (guest) specific.
> > 
> > The sd* syntax is not covered. It's unclear if this is used in the 
> > wild or what the existing semantics of emul= are for SCSI devices.
> > If someone cares to investigate the existing behaviour then it can 
> > be added.
> >
> 
> sd* devices are still often used for Xen PV domUs..
> (yeah, people should use xvd*, but many people still have sd*). 

Thanks. We can therefore add to the shorthands (emul=none unless anyone
knows better):

        sda => d0p0,pv=true,emul=none,_vdevice=8:0
        sdb => d1p0,pv=true,emul=none,_vdevice=8:16
        sdc => d2p0,pv=true,emul=none,_vdevice=8:32
        sdd => d3p0,pv=true,emul=none,_vdevice=8:48
        ...
        sdp => d15p0,pv=true,emul=none,_vdevice=8:240
        sdq => d16p0,pv=true,emul=none,_vdevice=65:0
        sdr => d17p0,pv=true,emul=none,_vdevice=65:16
        ... etc through:
        ... ... major 65 (sdq ->sdaf)
        ... ... major 66 (sdag->sdav)
        ... ... major 67 (sdaw->sdbl)
        ... ... major 68 (sdbm->sdcr)
        ... ... major 69 (sdcc->sdcr)
        ... ... major 70 (sdcs->sddh)
        ... ... major 71 (sddi->sddx)
        sddx => d127p0,pv=true,emul=none,_vdevice=71:240

(perhaps supporting all these is overkill ;-))

Ian. 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-27 15:58 [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc Ian Campbell
  2010-07-27 16:50 ` Ian Jackson
  2010-07-27 20:41 ` Pasi Kärkkäinen
@ 2010-07-28 12:31 ` Paolo Bonzini
  2010-07-28 16:05 ` Ian Jackson
  3 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2010-07-28 12:31 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

On 07/27/2010 05:58 PM, Ian Campbell wrote:
> The sd* syntax is not covered. It's unclear if this is used in the wild
> or what the existing semantics of emul= are for SCSI devices. If someone
> cares to investigate the existing behaviour then it can be added.

I don't know what semantics xl uses for SCSI devices, but I know that 
we've seen bugs about SCSI emulation so it is sometimes used, and this 
is the semantics that it should use given your IDE example:

         sda => d0p0,pv=true,emul=scsi0.0,_vdevice=8:0
         sdb => d0p0,pv=true,emul=scsi0.1,_vdevice=8:16

where the first number is the bus and the second is the unit as passed 
to -drive.  The second number goes from 0 to 7 (that's what QEMU does at 
least).

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-27 15:58 [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc Ian Campbell
                   ` (2 preceding siblings ...)
  2010-07-28 12:31 ` Paolo Bonzini
@ 2010-07-28 16:05 ` Ian Jackson
  2010-07-28 16:45   ` Jeremy Fitzhardinge
  3 siblings, 1 reply; 17+ messages in thread
From: Ian Jackson @ 2010-07-28 16:05 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

Ian Campbell writes ("[Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc"):
> Virtual Disk Configuration

I don't agree with this interpretation.  In February I posted a
draft spec which provided a different interpretation of events:
  http://lists.xensource.com/archives/html/xen-devel/2010-02/msg00183.html

Below is a version of that which has been enhanced to answer the
questions raised by this conversation.


Xen guest interface
-------------------

A Xen guest can be provided with block devices.  These are always
provided as Xen VBDs; for HVM guests they may also be provided as
emulated IDE or SCSI disks.

The abstract interface involves specifying, for each block device:

 * Nominal disk type: Xen virtual disk (aka xvd*, the default); SCSI
   (sd*); IDE (hd*).

   For HVM guests, each whole-disk hd* and and sd* device is made
   available _both_ via emulated IDE resp. SCSI controller, _and_ as a
   Xen VBD.  The HVM guest is entitled to assume that the IDE or SCSI
   disks available via the emulated IDE controller target the same
   underlying devices as the corresponding Xen VBD (ie, multipath).

   For PV guests every device is made available to the guest only as a
   Xen VBD.  For these domains the type is advisory, for use by the
   guest's device naming scheme.

   The Xen interface does not specify what name a device should have
   in the guest (nor what major/minor device number it should have in
   thee guest, if the guest has such a concept).

 * Disk number, which is a nonnegative integer,
   conventionally starting at 0 for the first disk.

 * Partition number, which is a nonnegative integer where by
   convention partition 0 indicates the "whole disk".

   Normally for any disk _either_ partition 0 should be supplied in
   which case the guest is expected to treat it as they would a native
   whole disk (for example by putting or expecting a partition table
   or disk label on it);

   _Or_ only non-0 partitions should be supplied in which case the
   guest should expect storage management to be done by the host and
   treat each vbd as it would a partition or slice or LVM volume (for
   example by putting or expecting a filesystem on it).

   Non-whole disk devices cannot be passed through to HVM guests via
   the emulated IDE or SCSI controllers.


Configuration file syntax
-------------------------

The config file syntaxes are, for example

       d0 d0p0  xvda     Xen virtual disk 0 partition 0 (whole disk)
       d1p2     xvda2    Xen virtual disk 1 partition 2
       d536p37  xvdtq37  Xen virtual disk 536 partition 37
       sdb3              SCSI disk 1 partition 3
       hdc2              IDE disk 2 partition 2

The d*p* syntax is not supported by xm/xend.

To cope with guests which predate this scheme we therefore preserve
the existing facility to specify the xenstore numerical value directly
by putting a single number (hex, decimal or octal) in the domain
config file instead of the disk identifier.


Concrete encoding in the VBD interface (in xenstore)
----------------------------------------------------

The information above is encoded in the concrete interface as an
integer (in a canonical decimal format in xenstore), whose value
encodes the information above as follows:

    1 << 28 | disk << 8 | partition      xvd, disks or partitions 16 onwards
   202 << 8 | disk << 4 | partition      xvd, disks and partitions up to 15
     8 << 8 | disk << 4 | partition      sd, disks and partitions up to 15
     3 << 8 | disk << 6 | partition      hd, disks 0..1, partitions 0..63
    22 << 8 | (disk-2) << 6 | partition  hd, disks 2..3, partitions 0..63
    2 << 28 onwards                      reserved for future use
   other values less than 1 << 28        deprecated / reserved

The 1<<28 format handles disks up to (1<<20)-1 and partitions up to
255.  It will be used only where the 202<<8 format does not have
enough bits.

Guests MAY support any subset of the formats above except that if they
support 1<<28 they MUST also support 202<<8.  PV-on-HVM drivers MUST
support at least one of 3<<8 or 8<<8; 3<<8 is recommended.

Some software has provided essentially Linux-specific encodings for
SCSI disks beyond disk 15 partition 15, and IDE disks beyond disk 3
partition 63.  These vbds, and the corresponding encoded integers, are
deprecated.

Guests SHOULD ignore numbers that they do not understand or
recognise.  They SHOULD check supplied numbers for validity.


Notes on Linux as a guest
-------------------------

Very old Linux guests (PV and PV-on-HVM) are able to "steal" the
device numbers and names normally used by the IDE and SCSI
controllers, so that writing "hda1" in the config file results in
/dev/hda1 in the guest.  These systems interpret the xenstore integer
as
       major << 8 | minor
where major and minor are the Linux-specific device numbers.  Some old
configurations may depend on deprecated high-numbered SCSI and IDE
disks.  This does not work in recent versions of Linux.

So for Linux PV guests, users are recommended to supply xvd* devices
only.  Modern PV drivers will map these to identically-named devices
in the guest.

For Linux HVM guests using PV-on-HVM drivers, users are recommended to
supply as few hd* devices as possible and use pure xvd* devices for
the rest.  Modern PV-on-HVM drivers will map the hd* devices to
/dev/xvdHDa etc.

Some Linux HVM guests with broken PV-on-HVM drivers do not cope
properly if both hda and hdc are supplied, nor with both hda and xvda,
because they directly map the bottom 8 bits of the xenstore integer
directly to the Linux guest's device number and throw away the rest;
they can crash due to minor number clashes.


Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-28 16:05 ` Ian Jackson
@ 2010-07-28 16:45   ` Jeremy Fitzhardinge
  2010-07-29 14:50     ` Ian Jackson
  0 siblings, 1 reply; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2010-07-28 16:45 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Ian Campbell

  On 07/28/2010 09:05 AM, Ian Jackson wrote:
> For Linux HVM guests using PV-on-HVM drivers, users are recommended to
> supply as few hd* devices as possible and use pure xvd* devices for
> the rest.  Modern PV-on-HVM drivers will map the hd* devices to
> /dev/xvdHDa etc.

I think we've decided to make blkfront register pv versions of emulated 
devices as hdX/sdX rather than using xvdHD.  We don't do this in pv domains.

     J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-28 16:45   ` Jeremy Fitzhardinge
@ 2010-07-29 14:50     ` Ian Jackson
  2010-07-29 15:07       ` Stefano Stabellini
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Jackson @ 2010-07-29 14:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Ian Campbell

Jeremy Fitzhardinge writes ("Re: [Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc"):
>   On 07/28/2010 09:05 AM, Ian Jackson wrote:
> > For Linux HVM guests using PV-on-HVM drivers, users are recommended to
> > supply as few hd* devices as possible and use pure xvd* devices for
> > the rest.  Modern PV-on-HVM drivers will map the hd* devices to
> > /dev/xvdHDa etc.
> 
> I think we've decided to make blkfront register pv versions of emulated 
> devices as hdX/sdX rather than using xvdHD.  We don't do this in pv domains.

Stealing the major number from the ide and scsi drivers, or just the
name ?

What if the domain has real sd* devices too ?  (pvscsi, pvusb + usb
mass storage, passthrough, ...)

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-29 14:50     ` Ian Jackson
@ 2010-07-29 15:07       ` Stefano Stabellini
  2010-07-29 15:45         ` Ian Jackson
  0 siblings, 1 reply; 17+ messages in thread
From: Stefano Stabellini @ 2010-07-29 15:07 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Ian Campbell, Jeremy Fitzhardinge, xen-devel

On Thu, 29 Jul 2010, Ian Jackson wrote:
> Jeremy Fitzhardinge writes ("Re: [Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc"):
> >   On 07/28/2010 09:05 AM, Ian Jackson wrote:
> > > For Linux HVM guests using PV-on-HVM drivers, users are recommended to
> > > supply as few hd* devices as possible and use pure xvd* devices for
> > > the rest.  Modern PV-on-HVM drivers will map the hd* devices to
> > > /dev/xvdHDa etc.
> > 
> > I think we've decided to make blkfront register pv versions of emulated 
> > devices as hdX/sdX rather than using xvdHD.  We don't do this in pv domains.
> 
> Stealing the major number from the ide and scsi drivers, or just the
> name ?
> 

Both

> What if the domain has real sd* devices too ?  (pvscsi, pvusb + usb
> mass storage, passthrough, ...)
> 

Clashes are theoretically possible but very hard to produce in practice.
We are "stealing" device names only for emulated IDE and SCSI disks, and
emulated SCSI disks don't even work at the moment. So you would need to
passthrough an IDE controller whose disks are configured as hd* (most
distros use sd* for IDE disks).
I think we are doing exactly what the user asked us to: setting up an
hdX device; in these very unlikely scenarios the user knows what he is
doing and can change the configuration.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-29 15:07       ` Stefano Stabellini
@ 2010-07-29 15:45         ` Ian Jackson
  2010-07-29 15:59           ` Stefano Stabellini
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Jackson @ 2010-07-29 15:45 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Ian Campbell, Jeremy Fitzhardinge, xen-devel

Stefano Stabellini writes ("Re: [Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc"):
> On Thu, 29 Jul 2010, Ian Jackson wrote:
> > What if the domain has real sd* devices too ?  (pvscsi, pvusb + usb
> > mass storage, passthrough, ...)
> 
> Clashes are theoretically possible but very hard to produce in practice.
> We are "stealing" device names only for emulated IDE and SCSI disks, and
> emulated SCSI disks don't even work at the moment. So you would need to
> passthrough an IDE controller whose disks are configured as hd* (most
> distros use sd* for IDE disks).

There are definitely people who are using emulated scsi disks; perhaps
they just haven't updated yet.

> I think we are doing exactly what the user asked us to: setting up an
> hdX device; in these very unlikely scenarios the user knows what he is
> doing and can change the configuration.

Well, no, they can't, because their bootloader probably doesn't
understand anything besides what they're actually using.

Certainly stealing the major number for scsi disks seems quite
dangerous.  pv-usb is hardly that unlikely a scenario.

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-29 15:45         ` Ian Jackson
@ 2010-07-29 15:59           ` Stefano Stabellini
  2010-07-29 16:09             ` Ian Jackson
  0 siblings, 1 reply; 17+ messages in thread
From: Stefano Stabellini @ 2010-07-29 15:59 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Ian Campbell, Jeremy Fitzhardinge, xen-devel, Stefano Stabellini

On Thu, 29 Jul 2010, Ian Jackson wrote:
> Stefano Stabellini writes ("Re: [Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc"):
> > On Thu, 29 Jul 2010, Ian Jackson wrote:
> > > What if the domain has real sd* devices too ?  (pvscsi, pvusb + usb
> > > mass storage, passthrough, ...)
> > 
> > Clashes are theoretically possible but very hard to produce in practice.
> > We are "stealing" device names only for emulated IDE and SCSI disks, and
> > emulated SCSI disks don't even work at the moment. So you would need to
> > passthrough an IDE controller whose disks are configured as hd* (most
> > distros use sd* for IDE disks).
> 
> There are definitely people who are using emulated scsi disks; perhaps
> they just haven't updated yet.

I am not so sure about that

> 
> > I think we are doing exactly what the user asked us to: setting up an
> > hdX device; in these very unlikely scenarios the user knows what he is
> > doing and can change the configuration.
> 
> Well, no, they can't, because their bootloader probably doesn't
> understand anything besides what they're actually using.
> 

they only have to change the device name, not the device class

> Certainly stealing the major number for scsi disks seems quite
> dangerous.  pv-usb is hardly that unlikely a scenario.
 
we are not doing that for pvusb

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-29 15:59           ` Stefano Stabellini
@ 2010-07-29 16:09             ` Ian Jackson
  2010-07-29 16:14               ` Stefano Stabellini
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Jackson @ 2010-07-29 16:09 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Ian Campbell, Jeremy Fitzhardinge, xen-devel

Stefano Stabellini writes ("Re: [Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc"):
> On Thu, 29 Jul 2010, Ian Jackson wrote:
> > Well, no, they can't, because their bootloader probably doesn't
> > understand anything besides what they're actually using.
> 
> they only have to change the device name, not the device class

Surely you can't steal only one minor number ?

> > Certainly stealing the major number for scsi disks seems quite
> > dangerous.  pv-usb is hardly that unlikely a scenario.
>  
> we are not doing that for pvusb

pv-usb => usb mass storage => scsi disks

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-29 16:09             ` Ian Jackson
@ 2010-07-29 16:14               ` Stefano Stabellini
  2010-07-29 16:29                 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 17+ messages in thread
From: Stefano Stabellini @ 2010-07-29 16:14 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Ian Campbell, Jeremy Fitzhardinge, xen-devel, Stefano Stabellini

On Thu, 29 Jul 2010, Ian Jackson wrote:
> Stefano Stabellini writes ("Re: [Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc"):
> > On Thu, 29 Jul 2010, Ian Jackson wrote:
> > > Well, no, they can't, because their bootloader probably doesn't
> > > understand anything besides what they're actually using.
> > 
> > they only have to change the device name, not the device class
> 
> Surely you can't steal only one minor number ?

yes, that's what we do.

> 
> > > Certainly stealing the major number for scsi disks seems quite
> > > dangerous.  pv-usb is hardly that unlikely a scenario.
> >  
> > we are not doing that for pvusb
> 
> pv-usb => usb mass storage => scsi disks
 
I mean there is no such thing as pv-usb.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-29 16:14               ` Stefano Stabellini
@ 2010-07-29 16:29                 ` Jeremy Fitzhardinge
  2010-07-29 16:34                   ` Ian Jackson
  0 siblings, 1 reply; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2010-07-29 16:29 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Ian Campbell, xen-devel, Ian Jackson

  On 07/29/2010 09:14 AM, Stefano Stabellini wrote:
> On Thu, 29 Jul 2010, Ian Jackson wrote:
>> Stefano Stabellini writes ("Re: [Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc"):
>>> On Thu, 29 Jul 2010, Ian Jackson wrote:
>>>> Well, no, they can't, because their bootloader probably doesn't
>>>> understand anything besides what they're actually using.
>>> they only have to change the device name, not the device class
>> Surely you can't steal only one minor number ?
> yes, that's what we do.

More than one minor, surely?  One for each device.

>>>> Certainly stealing the major number for scsi disks seems quite
>>>> dangerous.  pv-usb is hardly that unlikely a scenario.
>>>
>>> we are not doing that for pvusb
>> pv-usb =>  usb mass storage =>  scsi disks
>
> I mean there is no such thing as pv-usb.

Well, it hasn't been ported to pvops yet.  I've been getting promises of 
patches any month now for a couple of years.

I wonder if blkfront could register itself with the scsi subsystem 
rather than directly as a block device?

     J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-29 16:29                 ` Jeremy Fitzhardinge
@ 2010-07-29 16:34                   ` Ian Jackson
  2010-07-29 16:37                     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Jackson @ 2010-07-29 16:34 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Ian Campbell, xen-devel, Stefano Stabellini

Jeremy Fitzhardinge writes ("Re: [Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc"):
> I wonder if blkfront could register itself with the scsi subsystem 
> rather than directly as a block device?

I bet that would mean it would have to deal with SCSI command blocks
and stuff, so I doubt it.

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-29 16:34                   ` Ian Jackson
@ 2010-07-29 16:37                     ` Jeremy Fitzhardinge
  2010-07-29 17:54                       ` Alan Cox
  0 siblings, 1 reply; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2010-07-29 16:37 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Ian Campbell, xen-devel, Stefano Stabellini

  On 07/29/2010 09:34 AM, Ian Jackson wrote:
> Jeremy Fitzhardinge writes ("Re: [Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc"):
>> I wonder if blkfront could register itself with the scsi subsystem
>> rather than directly as a block device?
> I bet that would mean it would have to deal with SCSI command blocks
> and stuff, so I doubt it.

Well, random ide/ata devices are now part of scsi via libata, and they 
presumably can not deal with raw scsi commands.

     J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc
  2010-07-29 16:37                     ` Jeremy Fitzhardinge
@ 2010-07-29 17:54                       ` Alan Cox
  0 siblings, 0 replies; 17+ messages in thread
From: Alan Cox @ 2010-07-29 17:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ian Campbell, xen-devel, Ian Jackson, Stefano Stabellini

On Thu, 29 Jul 2010 09:37:22 -0700
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

>   On 07/29/2010 09:34 AM, Ian Jackson wrote:
> > Jeremy Fitzhardinge writes ("Re: [Xen-devel] [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc"):
> >> I wonder if blkfront could register itself with the scsi subsystem
> >> rather than directly as a block device?
> > I bet that would mean it would have to deal with SCSI command blocks
> > and stuff, so I doubt it.
> 
> Well, random ide/ata devices are now part of scsi via libata, and they 
> presumably can not deal with raw scsi commands.

ATAPI device speak SCSI (or a sort of Pidgin SCSI anyway), libata
translates SCSI<->ATA for disks, so you can in the Linux world throw
arbitary *valid* SCSI at them and you should get valid and correct
behaviour for a SCSI disk. If not its a bug.

Alan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-07-29 17:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-27 15:58 [RFC] Virtual disk configuration, PV vs. emulated, backward compatibility etc Ian Campbell
2010-07-27 16:50 ` Ian Jackson
2010-07-27 20:41 ` Pasi Kärkkäinen
2010-07-28  9:45   ` Ian Campbell
2010-07-28 12:31 ` Paolo Bonzini
2010-07-28 16:05 ` Ian Jackson
2010-07-28 16:45   ` Jeremy Fitzhardinge
2010-07-29 14:50     ` Ian Jackson
2010-07-29 15:07       ` Stefano Stabellini
2010-07-29 15:45         ` Ian Jackson
2010-07-29 15:59           ` Stefano Stabellini
2010-07-29 16:09             ` Ian Jackson
2010-07-29 16:14               ` Stefano Stabellini
2010-07-29 16:29                 ` Jeremy Fitzhardinge
2010-07-29 16:34                   ` Ian Jackson
2010-07-29 16:37                     ` Jeremy Fitzhardinge
2010-07-29 17:54                       ` Alan Cox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.