From mboxrd@z Thu Jan  1 00:00:00 1970
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: HELP required with some ideas
Date: Mon, 30 Aug 2010 10:27:53 -0400
Message-ID: <20100830142753.GA5652@phenom.dumpdata.com>
References: <AANLkTik91OUG3gOt=2brh9knA9MKj1Pdn37zniBN8Fx4@mail.gmail.com>
	<AANLkTimLxNbeWT=mD2ZNcjy0SUESb4n4q3CQmD0p1QJo@mail.gmail.com>
	<AANLkTi=h=QAO8YO411_Rx6HcaVMuQA6TKFbrW+BYJtaD@mail.gmail.com>
	<20100829140544.GT2804@reaktio.net>
	<AANLkTimzTdEcmu6iwktC93RJFUPV-CRvwRV2mJY55XXW@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xensource.com>
Content-Disposition: inline
In-Reply-To: <AANLkTimzTdEcmu6iwktC93RJFUPV-CRvwRV2mJY55XXW@mail.gmail.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: grapgroup grapgroup <we.are.grap@gmail.com>
Cc: xen-devel@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

On Mon, Aug 30, 2010 at 02:15:26AM +0530, grapgroup grapgroup wrote:
> On Sun, Aug 29, 2010 at 7:35 PM, Pasi K=E4rkk=E4inen <pasik@iki.fi> wro=
te:
>=20
> > On Sun, Aug 29, 2010 at 06:17:06PM +0530, grapgroup grapgroup wrote:
> > >    Hi,
> > >     We are a group of four students studying in an undergraduate co=
llege.
> > >     We are new to XEN and we would like to contribute to the develo=
pment
> > of
> > >    XEN through our college final year project.
> > >     We have gone through a few research papers and have shortlisted=
 a few
> > >    ideas out of which we are going to finalize the project.
> > >     As we are beginners we would be very grateful if you could guid=
e us
> > in
> > >    any of the following ways :
> > >
> >
> > Hello!
> >
> > Could you send the links the papers you mention?
> > Some comments below..
> >
> > >    1)  telling us if the idea is already implemented in
> > >    XEN.                                                       OR
> > >    2)  if the idea is implemented then suggesting any modifications=
 which
> > can
> > >    be done in it.       OR
> > >    3)  telling the feasibility of the idea.
> > >
> > >    We would be very thankful if you could guide us in any way.
> > >    We would also like to think on any ideas suggested by you.
> > >
> > >    Regards,
> > >      Rohan Malpani
> > >      Ammar Ekbote
> > >      Paresh Nakhe
> > >      Gaurav Jain
> > >
> > >    *******************************IDEAS****************************=
*
> > >    1) Disk I/O scheduling on virtual machines
> > >
> > >        Scheduling algorithms for native OS are designed keeping in =
mind
> > the
> > >    latency characteristics of the disk. In virtual environment, a
> > >    VM will have a virtual disk which is physical space on the physi=
cal
> > disk.
> > >    Therefore, the same algorithms do not work well on virtual
> > >    machines. There is a need of new scheduling algorithms for VMs w=
hich
> > will
> > >    take into account the type of workload and perform schduling in
> > >    such a way so as to increase the preformance. The paper we refer=
red
> > >    suggested using two level scheduling, one at the VM level and ot=
her at
> > >    the hypervisor level.
> > >
> >
> > Have you guys looked at projects like dm-ioband ?
> >
> >
> > >    2) Network Interface Virtualization
> > >
> > >        There is a particular mechanism in XEN called 'Page grant
> > mechanism'
> > >    to achieve network interface virtualization. In this
> > >    mechanism there is considerable s/w overhead as for each I/O, ac=
cess
> > to
> > >    certain guest pages(I/O buffer) is granted to driver domain and =
is
> > >    immediately revoked as soon as the i/o is complete. Current mech=
anism
> > is
> > >    said to be giving  a performance 2.9 Gb/s on 10 Gb/s line. The p=
aper
> > >    we referred suggested a mechanism where this s/w overhead can be
> > reduced
> > >    to a great extent.
> > >    First  is implementation of multi-queue NIC support for the driv=
er
> > domain
> > >    model in Xen and other is grant reuse mechanism based on
> > >    software I/O address translation table. In this,once the access =
to
> > guest
> > >    pages is granted it is reused for multiple i/o transactions.
> > >
> >
> > Some of this stuff is done in the xen 'netchannel2' development.
> >
> > I think there are multiple presentations about possible xen network
> > improvements available from XenSummit slides.
> >
> > >    3) Asymmetry aware hypervisor
> > >
> > >        Experiments show that asymmetric multi-core processors are m=
ore
> > >    efficient than the SMP. Idea is to deliver better performance
> > >    per watt and per area. The paper suggests that each VM running o=
n the
> > >    hypervisor has some number of fast vCPUs and some number of slow
> > >    vCPUs. Each task is identified for its type and accordingly sent=
 to
> > fast
> > >    or slow vCPU. CPU intensive applications are scheduled on fast
> > >    vCPUs and memory intensive applications are scheduled on slow vC=
PUs.
> > These
> > >    vCPUs are mapped to the corresponding type of physical
> > >    core. Hypervisor needs to modified to become asymmetry aware. Th=
e
> > goals of
> > >    such a hypervisor are
> > >
> > >    1.fair sharing of fast cores among all vCPUs in the system;
> > >    2.support for "asymmetry aware" guests;
> > >    3.a mechanism for controlling priority of VMs in using fast core=
s;
> > >    4.a mechanism ensuring that fast cores never go idle before slow=
 cores
> >
> > Hmm.. do you mean NUMA aware hypervisor/VMs, or something else?
> >
> > -- Pasi
> >
> >
> Hello,
>  First of all we would like to thank you for sparing your time and look=
ing
> at the suggested ideas.
>=20
>   We have mentioned below the links for the papers regarding the ideas.
>   We have also gone through the topics which you mentioned and we have
> summarized below what we found.
>   We have elaborated two ideas a bit further which would give their cle=
ar
> picture.
>=20
>    Our concerns regarding all of these ideas are that whether they are
> feasible as a 7-8 months project and are they already implemented
> elsewhere.
>    Any suggestions extending or modifying these ideas would prove to be=
 of
> great help.
>=20
> Links to papers:
>=20
> 1)On Disk I/O Scheduling in Virtual Machines :
> http://sysrun.haifa.il.ibm.com/hrl/wiov2010/papers/kesavan.pdf
> 2) Network Interface Virtualization :
> http://www.cs.rice.edu/CS/Architecture/docs/ram-vee09.pdf
> 3) Asymmetric aware hypervisors :
> http://www.cs.sfu.ca/~fedorova/papers/vee04-kazempour.pdf<http://www.cs=
.sfu.ca/%7Efedorova/papers/vee04-kazempour.pdf>
>=20
> Regards,
>     Rohan Malpani
>     Ammar Ekbote
>     Paresh Nakhe
>     Gaurav Jain
>=20
>=20
> ***********************************************************************=
*************************************************************************=
***************************
>=20
> *1) dm-ioband * (in context of the first idea : On Disk I/O Scheduling =
in
> Virtual Machines)
>      dm-ioband is an I/O bandwidth controller implemented as a device-m=
apper
> driver and can control bandwidth on per partition, per user, per proces=
s ,
> per virtual machine (such as KVM or Xen) basis. Our suggested idea does=
 not
> revolve around I/O scheduling between VMs but its related to disk I/O
> scheduling carried at different levels in virtualized environments.
>=20
>=20
> *Further elaboration of the first idea*  (On Disk I/O Scheduling in Vir=
tual
> Machines)
>=20
>      The suggested idea intends to introduce a disk I/O scheduling algo=
rithm
> in the hypervisor which would take into consideration the disk I/O
> scheduling in the guest VM.
>=20
>=20
>  The scenario is as follows :
>=20
> To read or write, the disk head must be positioned at the desired track=
 and
> at the beginning of the desired sector and in doing so we encounter see=
k
> time and rotational delay.
> For a single disk there will be a number of I/O requests
> If requests are selected randomly, we will poor performance.

That is not entirely true. Think SSDs, where random writes are not a
problem anymore. Also NCQ or SWCQ address this by the SATA interface
deciding in which order the sectors are writen and telling the control
(ahci for example) which of them sectors have been written. In other
words, the elevator logic has been moved down to the harddrive.

> So we have to "reorder" (using various algorithms) these requests to
> minimize the seek time by making the head move in an optimized way.
>=20
> The various algorithms used for these purposes:
>=20
> 1) First-in, first-out (FIFO) : Process request sequentially
> 2) Shortest Service Time First : Select the disk I/O request that requi=
res
> the least movement of the disk arm from its current position
> 3) SCAN : Arm moves in one direction only, satisfying all outstanding
> requests until it reaches the last track in that direction and then
> Direction is reversed
> 4) C-SCAN : Restricts scanning to one direction only.When the last trac=
k has
> been visited in one direction, the arm is returned to the opposite end =
of
> the disk and the scan begins again.
>=20
>=20
> Now the problem in virtualized environments is that there are two level=
s of
> disk I/O scheduling;
> 1) at the guest VM level (domU)
> 2) at the hypervisor level (dom0)
>=20
> Now due to this if the scheduling is carried in hypervisor (dom0) then =
the
> scheduling carried at the guest VM level (dom0) will be of no use.
> e.g if guest uses FIFO and dom0 uses Shortest-service time first then
> ultimately the  Shortest-service first will be considered and the FIFO
> scheduling will be wasted.
>=20
> Currently in XEN we have the follwing pattern:
>=20
> 1) *at guest VM: The NOOP scheduler* : It is the most basic scheduler t=
hat
> inserts all incoming I/O requests into a simple, unordered FIFO queue a=
nd
> implements request merging.
>=20
> 2) *at dom0 : uses cfq scheduler*

Take a look at Vivek Goyals' talk on the recent LSF/MM mini-summit:

http://lwn.net/Articles/400589/

(unfortunatly it doesn't have the slides, maybe you can email him for
more details).
>=20
>=20
> *The Idea:*
>=20
> So effectively no scheduling is carried at the guest level in XEN.
> However the paper suggests that it is better if the guest VM schedules =
the
> tasks according to its need ( some VMs may run applications requiring
> sequential disk I/O thus needing shortest-seek time while others may
> berunning applications running randomized data  and may be need SCAN ).

OK, but that won't be a problem with SSDs where both values are about
the same (ok, sequential disk I/O will be higher, but not that much).

>  Thus it is more useful if the guest VMs carry out the scheduling at th=
eir
> level. In such a case it becomes necessary to make the hypervisor aware=
 that
> scheduling has already been carried out in the guests and not to carry =
any
> scheduling in the dom0 level.

So re-priroties the I/O in the guest. I was under the impression that
'io-nice' would be doing exactly that - prioritizing I/Os from specific
applications? Which means re-prioritizing the I/O queue with more
important I/O, which are then feed in the NOOP I/O scheduler?

>  Thus we want to make modifications to make XEN aware if scheduling has=
 been
> carried at the guest VM level or not and then accordingly apply its dis=
k I/O
> scheduling policy thereby not disturbing the scheduling carried at gues=
t VM
> level.
>=20
> ///////////////////////////////////////////////////////////////////////=
/////////////////////////////////////////////////////////////////////////=
/////////////////////////////////////////////////////////////////
>=20
> 2) about netchannel2 (in context of the second idea : Network Interface
> Virtualization)
>=20
> We went through netchannel2 and found that what you said was right. A
> considerable part of the idea was already implemented in netchannel2.
> We would go through netchannel2 and find if anything more could be done=
 in
> it.
> Also any suggestions regarding what more could be done would be of grea=
t
> help.

Well, NetChannel2 is dead. The code hasn't been upported to PV-OPS so
unless somebody looks at it, it won't be in PVOPS kernels. But that is a
seperate discussion.

>=20
> ///////////////////////////////////////////////////////////////////////=
/////////////////////////////////////////////////////////////////////////=
/////////////////////////////////////////////////////////////////
>=20
> 3) about NUMA-aware hypervisors. (in context of 3rd idea : Asymmetry aw=
are
> hypervisor)
>=20
> We went through NUMA-aware hypervisors and found that our suggested ide=
a
> does not address NUMA aware hypervisors. However, we might consider tha=
t
> idea if it is not yet done.
>=20
> *Further elaboration of the idea (Asymmetry aware hypervisor**):
> *
> To ensure that asymmetric hardware is well utilized, the system must ma=
tch
> each  thread  with the right type of core:
> e.g., memory-intensive threads with slow cores and computer intensive
> threads with fast cores.
>=20
> This can be accomplished by an asymmetry-aware thread scheduler in the =
guest
> operating system, where properties of individual threads can be monitor=
ed
> more easily than at the hypervisor level.  However, if the hypervisor i=
s not
> asymmetry-aware it can thwart the efforts of the asymmetry-aware
> guest OS scheduler, for instance if it consistently maps the virtual CP=
U
> (vCPU)  that the guest believes to be fast to a physical core that is
> actually slow.
>=20
> This paper focuses on the enabling asymmetric core support in hyperviso=
rs.
> Asymmetric cores (dissimilar cores) are seen to give better efficiency =
than
> Multi-core processors having similar cores. In such a case particular k=
ind
> of processes can be handled by one group of asymmetric cores while othe=
r
> kind of processes can be handled by other group of cores.
>=20
> Operating systems have schedulers which can multiplex the threads to th=
e
> different cores. However currently hypervisors do not have scheduling b=
ased
> on the asymmetric nature of the cores.
>=20
> *THE IDEA:*
>=20
> The paper proposes to make the hypervisor aware of this asymmetric natu=
re.
> The paper proposes to map the vCPU's (present in the VMs) to physical c=
ores
> of the same kind. i.e. fast vCPU's will be mapped to fast physical core=
s and
> slow vCPU's will be mapped to slow physical cores.
> Thus due to this appropriate mapping the threads in the VMs would be
> serviced by the appropriate set of cores.
>=20
> ///////////////////////////////////////////////////////////////////////=
/////////////////////////////////////////////////////////////////////////=
///////////////////////////////////////////////////////////////////////

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel