xend leaks/bugs/etc

All of lore.kernel.org
 help / color / mirror / Atom feed

* xend leaks/bugs/etc
@ 2005-04-16 18:08 Allen Short
  0 siblings, 0 replies; 20+ messages in thread
From: Allen Short @ 2005-04-16 18:08 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1.1: Type: text/plain, Size: 751 bytes --]

Hi. I'm a Twisted developer interested in improving xend. First, I'm
going to ensure that it runs on Twisted 2.0. I'd also like to see if I
can reproduce the memory leaks I've seen reported here, and find out
what can be done about them. Eventually I'd like to refactor most of
xend to use Twisted's service architecture, which handles things like
startup and shutdown in a more modular fashion. (I'm working on a
project that could benefit from tighter integration with xend, but the
current codebase is not very friendly to that.) Is there a list of
existing problems with xend that I could refer to? I attempted to
reproduce the "xm list" memory leak, but was not able to do so. (I am
using Python 2.4.1 and yesterday's xeno-unstable.) 

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: xend leaks/bugs/etc
@ 2005-04-17 15:42 Ian Pratt
  2005-04-18  6:00 ` Allen Short
  0 siblings, 1 reply; 20+ messages in thread
From: Ian Pratt @ 2005-04-17 15:42 UTC (permalink / raw)
  To: Allen Short, xen-devel

> Hi. I'm a Twisted developer interested in improving xend. 
> First, I'm going to ensure that it runs on Twisted 2.0. I'd 
> also like to see if I can reproduce the memory leaks I've 
> seen reported here, and find out what can be done about them. 
> Eventually I'd like to refactor most of xend to use Twisted's 
> service architecture, which handles things like startup and 
> shutdown in a more modular fashion. (I'm working on a project 
> that could benefit from tighter integration with xend, but 
> the current codebase is not very friendly to that.) Is there 
> a list of existing problems with xend that I could refer to? 
> I attempted to reproduce the "xm list" memory leak, but was 
> not able to do so. (I am using Python 2.4.1 and yesterday's 
> xeno-unstable.) 

Allen, I think we've come to the conclusion that Twisted was rather
overkill for our needs, and led to some rather confusing code that has
proved hard to maintain. I've no doubt that someone more experienced
with using Twisted could have done a better job, but do you really think
it's the best route forward?  Xend is a 'control plane' daemon and
doesn't have to handle a high rate of invocations. It needs some ability
to handle asynchronous or out-of-order events, but this could be handled
by simple language-level threads (we don't need concurrency). 

The other downside of using Twisted is that its not available in some
distros, and we've had a few issues with version mismatches. It also has
quite an impact on the RSS memory footprint, which is not ideal.

What do you think?

Thanks,
Ian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: xend leaks/bugs/etc
  2005-04-17 15:42 xend leaks/bugs/etc Ian Pratt
@ 2005-04-18  6:00 ` Allen Short
  2005-04-18 14:32   ` Harry Butterworth
  0 siblings, 1 reply; 20+ messages in thread
From: Allen Short @ 2005-04-18  6:00 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1.1: Type: text/plain, Size: 3118 bytes --]

On Sun, 2005-04-17 at 16:42 +0100, Ian Pratt wrote: 

> Allen, I think we've come to the conclusion that Twisted was rather
> overkill for our needs, and led to some rather confusing code that has
> proved hard to maintain.

With all due respect, I work on 5 projects that use Twisted and,
overall, they're the easiest codebases to extend that I've dealt with.
xend's code is by far some of the worst Python code I've worked on.

>  I've no doubt that someone more experienced
> with using Twisted could have done a better job, but do you really think
> it's the best route forward?  Xend is a 'control plane' daemon and
> doesn't have to handle a high rate of invocations.

This is a point in favor of Twisted, I'd think; if you needed very high
performance in that area, Python might not be appropriate.

> It needs some ability to handle asynchronous or out-of-order events, but 
> this could be handled by simple language-level threads (we don't need 
> concurrency). 

Given the current architecture (a daemon that accepts connections from a
commandline tool or from a web interface), it would seem that you do
need concurrency; personally, I'd find it inconvenient if this was
handled differently. Plus, the languages that I'm familiar with that
provide language-level threads require at least as much
infrastructure/resource usage as Python.

> 
> The other downside of using Twisted is that its not available in some
> distros, and we've had a few issues with version mismatches.

Other projects (such as Chandler) have dealt with this by shipping
Twisted in their release tarballs. I believe this strategy would be
reasonable for Xen, especially now that Twisted has split some of its
less-used subprojects into separate packages. 

>  It also has quite an impact on the RSS memory footprint, which is not ideal.

I believe that the memory footprint can be significantly reduced; the
current codebase seems to have a good deal of unnecessary complexity.

> What do you think?

I think that there are probably some Xen deployments that would benefit
from a minimal-functionality, minimal-resource-usage control daemon, but
that they are not the only use case. The project that led to my interest
in Xen is a good example: I want to do dynamic auction-based resource
allocation to domains, a la Miller and Drexler's "Incentive Engineering
for Computational Resource Management". 
(http://www.agorics.com/Library/agoricpapers.html ) This would be most
easily achieved by putting my auction/resource-allocation code in the
same process as xend. Unfortunately its current implementation makes
that prohibitively difficult -- my current prototype uses the HTTP
interface, with some difficulty. 

Given these concerns -- greater flexibility, lower memory usage, more
comprehensible code -- I believe my best choice is to reimplement xend,
using the existing lowlevel xc and xu modules. I need it for the things
I want to write anyway, but hopefully enough configuration/UI
compatibility can be maintained for it to be useful to the community.

Allen

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: xend leaks/bugs/etc
  2005-04-18  6:00 ` Allen Short
@ 2005-04-18 14:32   ` Harry Butterworth
  2005-04-18 15:15     ` Anthony Liguori
  0 siblings, 1 reply; 20+ messages in thread
From: Harry Butterworth @ 2005-04-18 14:32 UTC (permalink / raw)
  To: Allen Short; +Cc: xen-devel

On Mon, 2005-04-18 at 01:00 -0500, Allen Short wrote:
> On Sun, 2005-04-17 at 16:42 +0100, Ian Pratt wrote: 
> 
> > Allen, I think we've come to the conclusion that Twisted was rather
> > overkill for our needs, and led to some rather confusing code that has
> > proved hard to maintain.
> 
> With all due respect, I work on 5 projects that use Twisted and,
> overall, they're the easiest codebases to extend that I've dealt with.
> xend's code is by far some of the worst Python code I've worked on.

Working on xend has been my first experience of using Python.  Glad to
hear it's atypical :-)

> 
> >  I've no doubt that someone more experienced
> > with using Twisted could have done a better job, but do you really think
> > it's the best route forward?  Xend is a 'control plane' daemon and
> > doesn't have to handle a high rate of invocations.
> 
> This is a point in favor of Twisted, I'd think; if you needed very high
> performance in that area, Python might not be appropriate.
> 
> > It needs some ability to handle asynchronous or out-of-order events, but 
> > this could be handled by simple language-level threads (we don't need 
> > concurrency). 
> 
> Given the current architecture (a daemon that accepts connections from a
> commandline tool or from a web interface), it would seem that you do
> need concurrency; personally, I'd find it inconvenient if this was
> handled differently. Plus, the languages that I'm familiar with that
> provide language-level threads require at least as much
> infrastructure/resource usage as Python.

I've done a lot of similar code in C using a facility similar to task
queues to handle chains of asynchronous events and I think the language
is not a significant factor:  whilst I was rewriting parts of the xend
USB code I found I could use deferreds to structure the code in the way
I was used to from my C experience; the twisted framework has a well
defined API which is well documented and I estimate the overhead for
learning enough Python and Twisted to pick up those aspects of the xend
interface at about 3 days for a competent programmer.

I think the initial confusion in xend, at least from the point of view
of extending it for new device types, lies in the use of inheritence in
the controller object model and the fact that all of the objects seem to
be called "controller".

Once you've understood the controller object model and inheritance
hierarchy, you then hit the fact that setting up the inter-domain
communication channels between front-end and back-end drivers is
overly-complex and must be reimplemented for each new device type.

The problem here is that the inter-domain communication primitives are
very low level and separated into a notification channel, a message
channel and a facility for bulk-data transfer which are all provided
independently.  The client of these interfaces must use them together to
make a communications channel but, because they are provided separately
with no constraints on correct relative sequencing of the three
interfaces the complexity from the client's perspective is cubed.

After getting over the hurdle of the inter-domain communication
mechanism, you come to the facts that the requirements for coping with
the domain lifecycle are unspecified, that the existing code doesn't
allow for loadable modules and that the controller model creates
back-end controller instances on demand during front-end creation which
makes it impossible to track the state of a loadable back-end driver
module correctly.

If you can guess what the domain lifecycle intention was, fix the bugs
in controller.py that prevent correct shutdown of driver domains (I'll
submit a patch) and work-around the above issues by constraining the
sequence of allowed driver module loads/unloads then you hit the final
hurdle of the fact that the requirements for error handling are again
unspecified and appear to be largely unmet by the existing code.

Finally, the xend code seems to trust input it receives from domains
which is incompatible with the architectural goal of VM isolation.

Even after dealing with the above issues, you'd still be left with the
problem that xend is very much a single node system when the
architectural direction for the tools is to be used to control Xen
clusters which would need to be highly-available for serious use in
enterprise environments.

So, to address the issues, I think the following steps are required:

1) Define a cluster architecture.  If the tools are going to be cluster
aware, we need to know what the definition of a cluster is and what the
cluster programming model is.  If HA is a requirement then the cluster
architecture should be HA from the start or the mechanism for making the
transition to HA should be precisely defined up-front since HA
architecture is a discriminating characteristic of any system which
makes it easier to start again than retrofit if you actually want to get
it right.

2) Define a high level inter-domain communication API.  This should be
consistent with the cluster model, should define the domain lifecycle
and contain sufficient guarantees for general purpose use. In particular
the API should deal with domain connection/disconnection notification
and elimination of stale communications. The inter-domain communication
API must be compatible with a MAC security implementation.

3) Define a dynamic resource discovery mechanism for use, for example,
by FE and BE driver domains.  This mechanism probably ought to be a
service accessible over the inter-domain communication API.

4) Define a configuration mechanism framework.  The last tools document
I read coupled the configuration aspects to the resource discovery
aspects.  I think they are distinct: the resource discovery mechanism
deals with dynamic changes which are not necessarily under user control
(loss of availability for example) whereas the configuration mechanism
is used by the user or higher level management tools to specify the
desired system configuration.

So, the language issues are insignificant compared to the architectural,
design and implementation issues of the current code.

Having said this, if you are going to get the architecture, design and
implementation right, it would be nice to also end up with minimalist
code with a small footprint with the minimum learning curve for people
joining the project.

Not sure whether the way to do that is to use C so as to have a single
language pre-req for the whole of Xen and get static-type checking or to
use Python for the tools to take advantage of its compact, expressive
qualities or to provide bindings for the core interfaces in a number of
languages so that people can extend the system however they choose.

Anyway, the main points I'm trying to make are that 1) there is a big
discussion that needs to happen on the list to define the architecture
for the tools 2) reimplementing xend better won't address the core
architectural issues 3) choosing a language to implement the tools in is
a second order concern.

Harry

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-18 14:32   ` Harry Butterworth
@ 2005-04-18 15:15     ` Anthony Liguori
  2005-04-18 15:27       ` Hollis Blanchard
                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Anthony Liguori @ 2005-04-18 15:15 UTC (permalink / raw)
  To: Harry Butterworth; +Cc: xen-devel, Allen Short

Harry Butterworth wrote:

>On Mon, 2005-04-18 at 01:00 -0500, Allen Short wrote:
>  
>
>>With all due respect, I work on 5 projects that use Twisted and,
>>overall, they're the easiest codebases to extend that I've dealt with.
>>xend's code is by far some of the worst Python code I've worked on.
>>    
>>
>
>Working on xend has been my first experience of using Python.  Glad to
>hear it's atypical :-)
>  
>
In all fairness, Xend has a complex job that has evolved over time.  I 
don't think it's poorly written rather that it hasn't aged well.

>The problem here is that the inter-domain communication primitives are
>very low level and separated into a notification channel, a message
>channel and a facility for bulk-data transfer which are all provided
>independently.  The client of these interfaces must use them together to
>make a communications channel but, because they are provided separately
>with no constraints on correct relative sequencing of the three
>interfaces the complexity from the client's perspective is cubed.
>  
>
 From the perspective of the driver, the IDC primatives shouldn't 
matter.  All a discovery-driver should have to do is maintain a 
discovery state and examine every message of a certain type and 
determine how to change it's state and generate additional messages when 
necessary.

The discovery-drivers should be obvilious to how the messages are 
actually delivered.

>After getting over the hurdle of the inter-domain communication
>mechanism, you come to the facts that the requirements for coping with
>the domain lifecycle are unspecified, that the existing code doesn't
>  
>
The domain lifecycle is a fundamental problem it seems.  VIRQs are of 
limited use because they do not tell you what the event was (whether it 
was a crash or shutdown) or which domain it came from.

Not to mention the fact that you can only ever know about one domain at 
any point in time.  Since you can only look up one domain with 
getdomaininfo, there's no guarentee that there hasn't been change in 
between multiple calls.

This is further complicated by the fact that getdomaininfo(domid=4) may 
return the info for domid=6 forcing you to enumerate domains in sequence 
order.

>Finally, the xend code seems to trust input it receives from domains
>which is incompatible with the architectural goal of VM isolation.
>  
>
This is a very big problem.  One very difficult issue to address is how 
to deal with very hostile domains that may attempt DoS attacks by 
flooding their own console.

>2) Define a high level inter-domain communication API.  This should be
>consistent with the cluster model, should define the domain lifecycle
>and contain sufficient guarantees for general purpose use. In particular
>the API should deal with domain connection/disconnection notification
>and elimination of stale communications. The inter-domain communication
>API must be compatible with a MAC security implementation.
>  
>
I'm not sure this is necessary.  The registry should all but implement 
the tools interaction with IDC.  The real use will be for console data 
and there's been talk for a while about moving the console's out of the 
control channel.  This would simplify things even further.

>3) Define a dynamic resource discovery mechanism for use, for example,
>by FE and BE driver domains.  This mechanism probably ought to be a
>service accessible over the inter-domain communication API.
>  
>
I believe this is the purpose of xenbus.

>4) Define a configuration mechanism framework.  The last tools document
>I read coupled the configuration aspects to the resource discovery
>aspects.  I think they are distinct: the resource discovery mechanism
>deals with dynamic changes which are not necessarily under user control
>(loss of availability for example) whereas the configuration mechanism
>is used by the user or higher level management tools to specify the
>desired system configuration.
>  
>
I'm wary of standardizing configuration although I'm curious to hear 
thoughts on it.

>Anyway, the main points I'm trying to make are that 1) there is a big
>discussion that needs to happen on the list to define the architecture
>for the tools 2) reimplementing xend better won't address the core
>architectural issues 3) choosing a language to implement the tools in is
>a second order concern.
>  
>
I agree with all three points.  What I would like to see, and what I am 
working on now for VM-Tools, is the function of Xend broken up.

I think you need one daemon to multiplex the control channels and do 
device discovery, and then everything else can be independent tools.  
With the addition of the registry, the daemon is drastically simplified.

I think the goal should be to have the least amount of code (regardless 
of language) in whatever is running as a daemon.

Regards,

>Harry
>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
>
>  
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-18 15:15     ` Anthony Liguori
@ 2005-04-18 15:27       ` Hollis Blanchard
  2005-04-18 15:45         ` Anthony Liguori
  2005-04-18 15:58       ` Harry Butterworth
  2005-04-18 21:33       ` Mike D. Day
  2 siblings, 1 reply; 20+ messages in thread
From: Hollis Blanchard @ 2005-04-18 15:27 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Xen-devel

On Mon, 2005-04-18 at 10:15 -0500, Anthony Liguori wrote:
> >Finally, the xend code seems to trust input it receives from domains
> >which is incompatible with the architectural goal of VM isolation.
> >  
> This is a very big problem.  One very difficult issue to address is
> how to deal with very hostile domains that may attempt DoS attacks by 
> flooding their own console.

This isn't really a xend issue. I'm not sure this *can* be addressed,
and I believe other hypervisors have this problem as well.

At some point, you have to acknowledge there will be *some* resource
sharing among otherwise isolated domains. Switching domains on a single
CPU will increase cache misses; domains doing lots of (valid and
allowed) IO will reduce shared bus bandwidth for other domains; etc...

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-18 15:27       ` Hollis Blanchard
@ 2005-04-18 15:45         ` Anthony Liguori
  2005-04-18 16:16           ` Hollis Blanchard
  0 siblings, 1 reply; 20+ messages in thread
From: Anthony Liguori @ 2005-04-18 15:45 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: Xen-devel

Hollis Blanchard wrote:

>On Mon, 2005-04-18 at 10:15 -0500, Anthony Liguori wrote:
>  
>
>>>Finally, the xend code seems to trust input it receives from domains
>>>which is incompatible with the architectural goal of VM isolation.
>>> 
>>>      
>>>
>>This is a very big problem.  One very difficult issue to address is
>>how to deal with very hostile domains that may attempt DoS attacks by 
>>flooding their own console.
>>    
>>
>
>This isn't really a xend issue. I'm not sure this *can* be addressed,
>and I believe other hypervisors have this problem as well.
>  
>
I'm not sure I agree.  Since Xen only provides shared-memory and event 
channels, the tools control how frequently they look at shared-memory 
(so a tool can throttle itself).  The only possible DoS venue should be 
the event channels.  The tools should simply be able to unbind from 
event channels that are considered hostile.

>At some point, you have to acknowledge there will be *some* resource
>sharing among otherwise isolated domains. Switching domains on a single
>CPU will increase cache misses; domains doing lots of (valid and
>allowed) IO will reduce shared bus bandwidth for other domains; etc...
>  
>
There are certainly going to be things that you cannot prevent but that 
does not mean we shouldn't try to prevent everything we can prevent.

Regards,
Anthony Liguori

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-18 15:15     ` Anthony Liguori
  2005-04-18 15:27       ` Hollis Blanchard
@ 2005-04-18 15:58       ` Harry Butterworth
  2005-04-18 21:33       ` Mike D. Day
  2 siblings, 0 replies; 20+ messages in thread
From: Harry Butterworth @ 2005-04-18 15:58 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: xen-devel, Allen Short

On Mon, 2005-04-18 at 10:15 -0500, Anthony Liguori wrote:
> Harry Butterworth wrote:
> >The problem here is that the inter-domain communication primitives are
> >very low level and separated into a notification channel, a message
> >channel and a facility for bulk-data transfer which are all provided
> >independently.  The client of these interfaces must use them together to
> >make a communications channel but, because they are provided separately
> >with no constraints on correct relative sequencing of the three
> >interfaces the complexity from the client's perspective is cubed.
> >  
> >
>  From the perspective of the driver, the IDC primatives shouldn't 
> matter.  All a discovery-driver should have to do is maintain a 
> discovery state and examine every message of a certain type and 
> determine how to change it's state and generate additional messages when 
> necessary.
> 
> The discovery-drivers should be obvilious to how the messages are 
> actually delivered.

Exactly.  Contrast that with the current implementation where each new
driver reimplements and is explicitly coupled to a specific delivery
mechanism.

> >2) Define a high level inter-domain communication API.  This should be
> >consistent with the cluster model, should define the domain lifecycle
> >and contain sufficient guarantees for general purpose use. In particular
> >the API should deal with domain connection/disconnection notification
> >and elimination of stale communications. The inter-domain communication
> >API must be compatible with a MAC security implementation.
> >  
> >
> I'm not sure this is necessary.  The registry should all but implement 
> the tools interaction with IDC.  The real use will be for console data 
> and there's been talk for a while about moving the console's out of the 
> control channel.  This would simplify things even further.

As well as the inter-domain communication for tools interaction, all the
FE and BE driver comms require inter-domain communication to implement
their device-specific protocols.  There ought to be a single general
purpose underlying API which is minimal and sufficient from the client's
perspective. The existing API (notification/shared memory/grant tables)
is sufficient but not minimal from the client's perspective because of
the complexity of the three independent mechanisms and the interaction
with the sketchy domain lifecycle model.

> 
> >3) Define a dynamic resource discovery mechanism for use, for example,
> >by FE and BE driver domains.  This mechanism probably ought to be a
> >service accessible over the inter-domain communication API.
> >  
> >
> I believe this is the purpose of xenbus.

What is this xenbus of which you speak?  Any public discussion/docs
around?  I heard one mention of xenbus at the summit.  I have to admit
I've not been following checkins to unstable recently but I have been
keeping an eye on the devel-list and haven't noticed anything about it.

> 
> >4) Define a configuration mechanism framework.  The last tools document
> >I read coupled the configuration aspects to the resource discovery
> >aspects.  I think they are distinct: the resource discovery mechanism
> >deals with dynamic changes which are not necessarily under user control
> >(loss of availability for example) whereas the configuration mechanism
> >is used by the user or higher level management tools to specify the
> >desired system configuration.
> >  
> >
> I'm wary of standardizing configuration although I'm curious to hear 
> thoughts on it.

You can't standardize configuration itself because all the different
aspects are necessarily different in detail but you can provide a
standard  extensible framework consistent with the cluster architecture
that solves the aspects common to all configuration activity.  For
example, making configuration activity fault-tolerant can have a common
solution if that is a requirement.

Harry

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-18 15:45         ` Anthony Liguori
@ 2005-04-18 16:16           ` Hollis Blanchard
  2005-04-18 16:49             ` Harry Butterworth
  2005-04-18 18:01             ` Anthony Liguori
  0 siblings, 2 replies; 20+ messages in thread
From: Hollis Blanchard @ 2005-04-18 16:16 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Xen-devel

On Mon, 2005-04-18 at 10:45 -0500, Anthony Liguori wrote:
> Hollis Blanchard wrote:
> 
> >On Mon, 2005-04-18 at 10:15 -0500, Anthony Liguori wrote:
> >  
> >>This is a very big problem.  One very difficult issue to address is
> >>how to deal with very hostile domains that may attempt DoS attacks by 
> >>flooding their own console.
> >
> >This isn't really a xend issue. I'm not sure this *can* be addressed,
> >and I believe other hypervisors have this problem as well.
> >  
> I'm not sure I agree.  Since Xen only provides shared-memory and event 
> channels, the tools control how frequently they look at shared-memory 
> (so a tool can throttle itself).  The only possible DoS venue should be 
> the event channels.  The tools should simply be able to unbind from 
> event channels that are considered hostile.

And how exactly would you distinguish between a hostile domain and a
mission-critical-yet-chatty domain? Or would you indiscriminately drop
console data from all overly talkative domains?

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-18 16:16           ` Hollis Blanchard
@ 2005-04-18 16:49             ` Harry Butterworth
  2005-04-18 18:01             ` Anthony Liguori
  1 sibling, 0 replies; 20+ messages in thread
From: Harry Butterworth @ 2005-04-18 16:49 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: Xen-devel

On Mon, 2005-04-18 at 11:16 -0500, Hollis Blanchard wrote:
> On Mon, 2005-04-18 at 10:45 -0500, Anthony Liguori wrote:
> > Hollis Blanchard wrote:
> > 
> > >On Mon, 2005-04-18 at 10:15 -0500, Anthony Liguori wrote:
> > >  
> > >>This is a very big problem.  One very difficult issue to address is
> > >>how to deal with very hostile domains that may attempt DoS attacks by 
> > >>flooding their own console.
> > >
> > >This isn't really a xend issue. I'm not sure this *can* be addressed,
> > >and I believe other hypervisors have this problem as well.
> > >  
> > I'm not sure I agree.  Since Xen only provides shared-memory and event 
> > channels, the tools control how frequently they look at shared-memory 
> > (so a tool can throttle itself).  The only possible DoS venue should be 
> > the event channels.  The tools should simply be able to unbind from 
> > event channels that are considered hostile.
> 
> And how exactly would you distinguish between a hostile domain and a
> mission-critical-yet-chatty domain? Or would you indiscriminately drop
> console data from all overly talkative domains?
> 

The above are just quota issues. It ought to be possible to throttle
inter-domain notification to meet a quota. The quotas can be
configurable. A mission critical yet chatty domain must be configured
with a high quota and it gets to starve other less critical domains when
it wants.

The problems with the existing xend code are less subtle.  On the order
of failing to check parameters passed from domains or failing to cope
with domains that issue protocol requests out of sequence.  Basically,
as far as I can tell, the current xend code just assumes that the
communication it is handling will follow the expected good path and the
behaviour of xend if things do not go to plan is substantially
undefined.

I guess it's possible that this has all been carefully thought through
but it certainly isn't obvious from reading the code: the state machines
for handling the device channel set-up protocol are coded implicitly in
the chaining of message handling functions, it's very hard to say what
the behaviour is under receipt of erroneous or malicious sequences of
messages from front or back end domains.

Harry

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-18 18:01             ` Anthony Liguori
@ 2005-04-18 17:53               ` Hollis Blanchard
  2005-04-20  7:10                 ` Jacob Gorm Hansen
  0 siblings, 1 reply; 20+ messages in thread
From: Hollis Blanchard @ 2005-04-18 17:53 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Xen-devel

On Mon, 2005-04-18 at 13:01 -0500, Anthony Liguori wrote:
> On Mon, 2005-04-18 at 11:16, Hollis Blanchard wrote:
> > On Mon, 2005-04-18 at 10:45 -0500, Anthony Liguori wrote:
> 
> > And how exactly would you distinguish between a hostile domain and a
> > mission-critical-yet-chatty domain? Or would you indiscriminately drop
> > console data from all overly talkative domains?
> 
> Ideally with a text-console you would use a shared ring-queue.  You
> could read from the queue whenever you felt it was appropriate.  The
> frequency of reading (and the size of the ring-queue) could be
> configured by the user.

When the ring queue overflows, console data will be dropped, which is
significant. I notice you avoided the subject of a shared-memory video
protocol: in that case, dropped data could leave parts of the screen
obsolete.

Just some things to think about; dropping data should be taken very
seriously. For example, testing the worst-case DoS attack to see what
impact it actually has on the overall system would be a good first step.

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-18 16:16           ` Hollis Blanchard
  2005-04-18 16:49             ` Harry Butterworth
@ 2005-04-18 18:01             ` Anthony Liguori
  2005-04-18 17:53               ` Hollis Blanchard
  1 sibling, 1 reply; 20+ messages in thread
From: Anthony Liguori @ 2005-04-18 18:01 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: Xen-devel

On Mon, 2005-04-18 at 11:16, Hollis Blanchard wrote:
> On Mon, 2005-04-18 at 10:45 -0500, Anthony Liguori wrote:

> And how exactly would you distinguish between a hostile domain and a
> mission-critical-yet-chatty domain? Or would you indiscriminately drop
> console data from all overly talkative domains?

Ideally with a text-console you would use a shared ring-queue.  You
could read from the queue whenever you felt it was appropriate.  The
frequency of reading (and the size of the ring-queue) could be
configured by the user.

Regards,

-- 
Anthony Liguori
Linux Technology Center (LTC) - IBM Austin
E-mail: aliguori@us.ibm.com
Phone: (512) 838-1208

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-18 15:15     ` Anthony Liguori
  2005-04-18 15:27       ` Hollis Blanchard
  2005-04-18 15:58       ` Harry Butterworth
@ 2005-04-18 21:33       ` Mike D. Day
  2 siblings, 0 replies; 20+ messages in thread
From: Mike D. Day @ 2005-04-18 21:33 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Harry Butterworth, xen-devel, Allen Short

On Mon, 2005-04-18 at 10:15 -0500, Anthony Liguori wrote:
> >3) Define a dynamic resource discovery mechanism for use, for example,
> >by FE and BE driver domains.  This mechanism probably ought to be a
> >service accessible over the inter-domain communication API.
> >  
> >
> I believe this is the purpose of xenbus.

Is there a design proposal for xenbus interaction with userland or
should we assume it is modeled on the linux driver core/sysfs
and /sbin/hotplug?

> I agree with all three points.  What I would like to see, and what I am 
> working on now for VM-Tools, is the function of Xend broken up.

I agree. Persistent state for domains should be kept in a file system
backed by persistent media, not in the memory of the daemon. With a
repository factored out of the daemon, the only required functions of
the daemon are to maintain control channels and to dispatch state change
notifications to the repository. Everything else can be done using
single purpose tools. 

> I think the goal should be to have the least amount of code (regardless 
> of language) in whatever is running as a daemon.

Exactly - the least amount of code that meets functional requirements. 

Mike
-- 
Mike D. Day
STSM and Architect, Open Virtualization
IBM Linux Technology Center
3039 Cornwallis Road
Research Triangle Park, NC  27709
Phone: (919) 543-4283
ncmike@us.ibm.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: xend leaks/bugs/etc
@ 2005-04-18 23:12 Ian Pratt
  2005-04-20  7:54 ` Jacob Gorm Hansen
  0 siblings, 1 reply; 20+ messages in thread
From: Ian Pratt @ 2005-04-18 23:12 UTC (permalink / raw)
  To: ncmike, Anthony Liguori; +Cc: Harry Butterworth, xen-devel, Allen Short

> > I believe this is the purpose of xenbus.
> 
> Is there a design proposal for xenbus interaction with 
> userland or should we assume it is modeled on the linux 
> driver core/sysfs and /sbin/hotplug?

Sysfs and hotplug are the model. 

> > I agree with all three points.  What I would like to see, 
> and what I 
> > am working on now for VM-Tools, is the function of Xend broken up.
> 
> I agree. Persistent state for domains should be kept in a 
> file system backed by persistent media, not in the memory of 
> the daemon. With a repository factored out of the daemon, the 
> only required functions of the daemon are to maintain control 
> channels and to dispatch state change notifications to the 
> repository. Everything else can be done using single purpose tools. 

To be fair to xend, this is what it does already: all its internal state
is stored in the file system, hence it can be killed and restarted
(modulo bugs in the current unstable).

The key step that needs to happen with the rewrite is to factor xend
into a number of pieces that communicate via the persistent store.

> > I think the goal should be to have the least amount of code 
> > (regardless of language) in whatever is running as a daemon.
> 
> Exactly - the least amount of code that meets functional 
> requirements. 

It's hard to beat python for this sort of thing...

Ian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-18 17:53               ` Hollis Blanchard
@ 2005-04-20  7:10                 ` Jacob Gorm Hansen
  2005-04-20 14:16                   ` Anthony Liguori
  0 siblings, 1 reply; 20+ messages in thread
From: Jacob Gorm Hansen @ 2005-04-20  7:10 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: Xen-devel

Hollis Blanchard wrote:

> When the ring queue overflows, console data will be dropped, which is
> significant. I notice you avoided the subject of a shared-memory video
> protocol: in that case, dropped data could leave parts of the screen
> obsolete.

The same thing happens on a VGA console today, lets not make things 
harder in Xen than they are in real life. If you want to be sure that 
you see everything (human operator reading bandwidth is also fairly 
limited btw.) setup a system which logs stuff to a file inside the VM.

Jacob

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-18 23:12 Ian Pratt
@ 2005-04-20  7:54 ` Jacob Gorm Hansen
  2005-04-20 14:34   ` Hollis Blanchard
  0 siblings, 1 reply; 20+ messages in thread
From: Jacob Gorm Hansen @ 2005-04-20  7:54 UTC (permalink / raw)
  To: Ian Pratt; +Cc: ncmike, Harry Butterworth, xen-devel, Allen Short

Ian Pratt wrote:

>>>I think the goal should be to have the least amount of code 
>>>(regardless of language) in whatever is running as a daemon.
>>
>>Exactly - the least amount of code that meets functional 
>>requirements. 
> 
> 
> It's hard to beat python for this sort of thing...

Not if you factor in the footprint of the python compiler and runtime 
and its large set of standard modules, and its unpredictable runtime 
performance (memory leaks and lack of static checking). It seems to me 
the 'python for xend' experiment has failed, and that this is further 
evidenced by that fact that a) Anthony/IBM has decided to write a 
competing implementation from scratch b) Xend developers blame Twisted, 
and now c) Twisted developers blame Xend. Short of name-calling, I do 
not see how we can proceed from here.

No other serious OS has vital components written in interpreted 
dynamically typed languages, I do not see why Xen needs to be the only 
one. Perhaps it is true that development is a little faster in Python (I 
have extensive Python experience, yet I have felt much more comfortable 
reading and modifying code in the C-implemented parts of the system than 
in Xend which remains a complete blackbox to me), but still we are 
making users pay for our (perceived) increased productivity with their 
memory, their system stability, and their runtime performance. I am not 
an engineer, but to me this seems like poor engineering.

The Intermezzo project tried something similar some years back, having a 
kernel component in C and a user-level file server in Perl. While there 
was great progress in the beginning, the project more or less died when 
the limitations of Perl were reached. A rewrite in C was attempted, but 
at that point the project had run out of steam.

Jacob

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-20  7:10                 ` Jacob Gorm Hansen
@ 2005-04-20 14:16                   ` Anthony Liguori
  2005-04-21 22:58                     ` Jacob Gorm Hansen
  0 siblings, 1 reply; 20+ messages in thread
From: Anthony Liguori @ 2005-04-20 14:16 UTC (permalink / raw)
  To: Jacob Gorm Hansen; +Cc: Xen-devel

Jacob Gorm Hansen wrote:

> Hollis Blanchard wrote:
>
>> When the ring queue overflows, console data will be dropped, which is
>> significant. I notice you avoided the subject of a shared-memory video
>> protocol: in that case, dropped data could leave parts of the screen
>> obsolete.
>
>
> The same thing happens on a VGA console today, lets not make things 
> harder in Xen than they are in real life. If you want to be sure that 
> you see everything (human operator reading bandwidth is also fairly 
> limited btw.) setup a system which logs stuff to a file inside the VM.

It's a bit more complex it we every support 3D acceleration.  We can't 
really throttle that without losing portion of whatevers drawn.  For a 
normal, non-acceleration 2D shared-memory video driver, I imagine it 
will look a lot like RFB so it will be very easy to control how much 
work dom0 has to do without losing any quality.

> Jacob
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-20  7:54 ` Jacob Gorm Hansen
@ 2005-04-20 14:34   ` Hollis Blanchard
  0 siblings, 0 replies; 20+ messages in thread
From: Hollis Blanchard @ 2005-04-20 14:34 UTC (permalink / raw)
  To: Jacob Gorm Hansen
  Cc: Ian Pratt, Harry Butterworth, xen-devel, Allen Short, ncmike

Jacob Gorm Hansen wrote:
> 
> Not if you factor in the footprint of the python compiler and runtime
> and its large set of standard modules, and its unpredictable runtime
> performance (memory leaks and lack of static checking). It seems to me
> the 'python for xend' experiment has failed, and that this is further
> evidenced by that fact that a) Anthony/IBM has decided to write a
> competing implementation from scratch b) Xend developers blame Twisted,
> and now c) Twisted developers blame Xend. Short of name-calling, I do
> not see how we can proceed from here.

Bad code or difficult-to-use code can be written in any language (or
with any framework), statically typed or not, interpreted or not. There
is no shortage of terrible tools written in C. :)

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-20 14:16                   ` Anthony Liguori
@ 2005-04-21 22:58                     ` Jacob Gorm Hansen
  2005-04-22  0:21                       ` Anthony Liguori
  0 siblings, 1 reply; 20+ messages in thread
From: Jacob Gorm Hansen @ 2005-04-21 22:58 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Xen-devel

Anthony Liguori wrote:

> It's a bit more complex it we every support 3D acceleration.  We can't 
> really throttle that without losing portion of whatevers drawn.  For a 
> normal, non-acceleration 2D shared-memory video driver, I imagine it 
> will look a lot like RFB so it will be very easy to control how much 
> work dom0 has to do without losing any quality.
> 

Do you have any design notes for this '3D-console'?

Jacob

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: xend leaks/bugs/etc
  2005-04-21 22:58                     ` Jacob Gorm Hansen
@ 2005-04-22  0:21                       ` Anthony Liguori
  0 siblings, 0 replies; 20+ messages in thread
From: Anthony Liguori @ 2005-04-22  0:21 UTC (permalink / raw)
  To: Jacob Gorm Hansen; +Cc: Xen-devel

Jacob Gorm Hansen wrote:

> Anthony Liguori wrote:
>
>> It's a bit more complex it we every support 3D acceleration.  We 
>> can't really throttle that without losing portion of whatevers 
>> drawn.  For a normal, non-acceleration 2D shared-memory video driver, 
>> I imagine it will look a lot like RFB so it will be very easy to 
>> control how much work dom0 has to do without losing any quality.
>>
>
> Do you have any design notes for this '3D-console'?

I was just thinking out loud :-)

Regards,
Anthony Liguori

> Jacob
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2005-04-22  0:21 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-17 15:42 xend leaks/bugs/etc Ian Pratt
2005-04-18  6:00 ` Allen Short
2005-04-18 14:32   ` Harry Butterworth
2005-04-18 15:15     ` Anthony Liguori
2005-04-18 15:27       ` Hollis Blanchard
2005-04-18 15:45         ` Anthony Liguori
2005-04-18 16:16           ` Hollis Blanchard
2005-04-18 16:49             ` Harry Butterworth
2005-04-18 18:01             ` Anthony Liguori
2005-04-18 17:53               ` Hollis Blanchard
2005-04-20  7:10                 ` Jacob Gorm Hansen
2005-04-20 14:16                   ` Anthony Liguori
2005-04-21 22:58                     ` Jacob Gorm Hansen
2005-04-22  0:21                       ` Anthony Liguori
2005-04-18 15:58       ` Harry Butterworth
2005-04-18 21:33       ` Mike D. Day
  -- strict thread matches above, loose matches on Subject: below --
2005-04-18 23:12 Ian Pratt
2005-04-20  7:54 ` Jacob Gorm Hansen
2005-04-20 14:34   ` Hollis Blanchard
2005-04-16 18:08 Allen Short

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.