xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Hackathon Minutes] Xen 4.4 Planning
@ 2013-06-13 14:00 Lars Kurth
  2013-06-13 14:22 ` Jan Beulich
                   ` (4 more replies)
  0 siblings, 5 replies; 44+ messages in thread
From: Lars Kurth @ 2013-06-13 14:00 UTC (permalink / raw)
  To: xen-devel@lists.xen.org, George.Dunlap@citrix.com

This took me a while to post, but given that we are not starting 4.4 
just yet, this may be appropriate now. I may have misrepresented some 
stuff as it has been 4 weeks since I wrote these.
Cheers
Lars

= Purpose of Roadmap =
* Set a vision for interesting features
* Track items
* Help consumers of Xen with their planning

= Release Models that work well =
There was a brief discussion on two different release models
* Train leaves the station (Linux)
* Release when ready (Debian)

== Stefano's Proposal ==
We should aim to reduce the release cycle to 4 months (or maybe 6 months 
as an intermediate step) from the current 9 months. A 4 months relase 
cycle should help accelerate development and lead to fewer patches being 
queued up. The implications are that we would have to operate a 2-3 
weeks merge window.

To do this, we would need to resolve a number of issues
* It is likely that code reviews for such a short merge window would 
become a bottleneck. We are not sure whether this would be a major issue 
: the review bandwith would depend on the patches submitted (and their 
complexity)
* [I can't remember who raised this] The concern was raised that we 
would not have enough x86 Xen reviewers got a 2-3 weeks merge window
* [Konrad] Stated that in PVOPS for Linux contributions we don't have a 
review bottleneck, but we should make sure that the Xen and PVOPS/Linux 
merge window don't overlap (as this would require the same set of people 
to review patches in two projects)
* The rest of the time (approx 3 months) would be used for stabilizing 
the release
* If we had a more extensive testing framework, and thus better testing, 
we could tighten the RC period (making this happen is also being 
discussed by the Advisory Board)

Additional conerns raised:
* [Matt Wilson]: if we had shorter merge windows, there is a risk that 
we would end up with unnused code (uncompleted features) in mainline. 
Something which we'd rather not have
* [I can't remember who raised this] But we already have a buffer via 
staging : we could make more use of this

[Conclusion] We aLL agreed, that a release cycle of less than 9 months 
is desirable. Maybe we can go to 6 months for Xen 4.4 (before being more 
aggressive).

= 4.3 Release cycle : what worked well / didn't work well =
* The 4.3 release updates and criteria went well
* BUT :  50% of what was supposed to be in 4.3 didn't make it
** In some cases, we simply underestimated the effort that is needed. 
Concrete example : QEMU/stubdomain was a combination of under-estimating 
the size and over-estimating the development bandwidth that was available
** Some of the high-impact features (e.g. PVH) came in too late in the 
dev cycle. Mitigation : break contributions into smaller parts and 
submit earlier in the merge window. The same applies to changes to 
generic code.
* BUT : Some patches were lost (i.e. when there are spikes of activity 
it becomes hard for some maintainers/committers to keep on top of their 
queue).
** [Ian Campell] said that we should rely on submitter to resend the 
patch: the assumption is that if the patch is not important, the 
submitter will badger and resend.
** [Lars] raised the point that this can alienate contributors and get 
them to look at other projects instead.
** [Can't remember who said this] Maybe use patchwork 
(http://jk.ozlabs.org/projects/patchwork/) to track patches
** [Ian Campbell]: patchwork looks like a good idea, but may not work 
well in practice
[Note] We should probably have a discussion or some sort of trial. It 
may also be possible to use the http://bugs.xenproject.org prototype (or 
add a "deferred patch" attribute)
[Note] We typically start opening the dev branch at RC5/6 (not sure I 
quite got this)
[Note] We don't actually have a list of patches that got lost in the 4.3 
release cycle )-:

[ACTION]: George write up a proposal for the beginning of the 4.4 
release cycle

= 4.4 Content =
George volunteers to be the Release co-ordinator for Xen 4.4 (to apply 
what he learned)

* Big features that did not make it in 4.3

* PVH? What will make it into 4.4

* Missed patches (don't have a list)

* "User" features that look interesting

** Network effects

** We shouldn't have broken feature

** What about XenClient / VirtualComputer being able to help out in 
adding mopre support for PCI / VGA cards

** Other features: Sharing dom0 keyboard / mouse


* GPU passthrough
** We have an issue with graphics card support
** A lot of users care about GPU passthrough (but not many vendors)
** Maybe we could mentor somebody about GPU passthrough?

** Maybe we could get XenClient, VirtualComputer or Qubes to pick this 
up (or partly do so)?

^ permalink raw reply	[flat|nested] 44+ messages in thread
* Re: [Hackathon Minutes] Xen 4.4 Planning
@ 2013-06-14 11:46 Alex Bligh
  2013-06-14 12:26 ` Jan Beulich
  0 siblings, 1 reply; 44+ messages in thread
From: Alex Bligh @ 2013-06-14 11:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: lars.kurth, George.Dunlap, Alex Bligh, xen-devel

Jan,

--On 14 June 2013 09:15:35 +0100 Jan Beulich <JBeulich@suse.com> wrote:

>>>> On 13.06.13 at 23:03, Alex Bligh <alex@alex.org.uk> wrote:
>> The thing we like the second least about Xen is how long it seems to take
>> to get what we count as serious bugs fixed, even in stable releases.
>
> I think applicable bug fixes get applied to the stable branches in
> quite timely a manner, at least on the hypervisor side. Hence I
> can only assume that you're unhappy with the rate of stable
> releases. Yet I don't think we're going to get anywhere near of
> the almost weekly stable releases that get done for Linux. As
> you also didn't say what expectations you really have, it's hard
> to take out anything useful from that complaint.

I obviously miswrote some of this, as it wasn't intended as a complaint,
rather as observation. As with all open source code, the answer to 'it
doesn't work' is (a) try to find out why, and (b) send code. Which
we have on occasions tried to do.

I'm not unhappy with the rate of stable releases, I am/was unhappy about
the quality of stable releases (particularly 4.2). All our testing to
date on 4.3 indicates it's a lot higher quality release, at least
for what we want it for. It's possible that our expectations were
incorrect, because we were using 4.2 to support qemu-upstream dm,
and this was (IIRC) marked as a 'preview' feature; however we were
surprised things like live migrate were missing, and we had several
'killer' bugs. We've not found that on 4.3. Saying that, we've tested
4.3 quite a lot on its own, but not with our agent code, because we've
been trying to get 4.2 working properly first.

>> We like it even less if we have to find them and fix them.
>
> But isn't that how open source projects work - everyone contributes
> and fixes bugs. If you don't want to help fixing bugs, I'm afraid
> there's also no good reason for you to complain they don't get fixed.

I should have explicitly added the words 'in stable releases'. We're
perfectly happy to test, find bugs, etc. and indeed contribute fixes
as we have done. However, we shouldn't (in my opinion) be finding these
in stable releases.

>> By 'serious' I mean basic functionality not working, crashes dom0, etc.
>
> When did we last break Dom0, and not fix it in a timely manner?

Well, the 'fatal crash on Xen4.2 HVM' thread that started on 14 Dec 2012
had the last fix committed on 5 Apr 2013, and I think came out in 4.2.2
on 23 April. Between those points, as far as I'm concerned anything
running with network backed VMs was likely to crash dom0. That's about
half a 9 month release cycle.

>> The result of this is two-fold. Firstly, we've never (yet) been able to
>> run a production version of xen which is a standard xen release. We've
>> always had to maintain our own patches even on 'stable' releases.
>> Frankly, this is a pain.
>
> But if you don't contribute back your patches, how do you expect
> them to get accepted/merged?

We contributed back ALL of our patches. Every one. And we work pretty
hard to get them merged too. The only one that hasn't been merged or
superseded by a better patch is the minideb patch, which I fully
understand why it wasn't merged, and is just packaging.

I'm not complaining you drop my patches on the floor. In fact
I'm not really complaining at all. What I'm pointing out is that
if a 'stable' is released with a nasty bug in, and it takes a while
to find the right solution, it also takes a while to get a point
release out that fixes it. The solution here is not to change the
rate of acceptance of patches into a stable tree, but to ensure
the bugs are caught before they make their way into a stable
release.

> Or are you saying that it's _far_
> more than occasional that patches from you get entirely lost?

Nope.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 44+ messages in thread
* Re: [Hackathon Minutes] Xen 4.4 Planning
@ 2013-06-14 18:55 Alex Bligh
  0 siblings, 0 replies; 44+ messages in thread
From: Alex Bligh @ 2013-06-14 18:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Ian Murray; +Cc: Alex Bligh, xen-devel



--On 14 June 2013 13:25:38 -0400 Konrad Rzeszutek Wilk 
<konrad.wilk@oracle.com> wrote:

> What we had issues was that when compiling against a new hypervisor
> API it all compiled, but launching guests was broken. What we found was
> that some structures had changed and their corresponding
> XEN_VERSION_SOMETHING changed as well. But it was not clear _which_ of
> the structures changed so it took a while to figure out that the cpumap
> had been changed (I think that is what it was). There was also some HVM
> parameter field that had to be added otherwise the guest would not boot
> (can't remember the details).

That's precisely it.

Actually the one that REALLY bit us was far more subtle. We have a
multithreaded, forking agent. In Xen 4.2 (but not 4.1) you have to
call some post-fork function (whose name escapes me) in order to
tell xen about a fork. Signal handlers have to be in a particular
state (in particular SIGCHLD being set to ignore causes subtle
problems). And to get various xl calls working (mainly server
creation), it turns out you need to do a fork (assuming your
client is multithreaded and other threads call other xen things
even with different contexts); again this wasn't necessary in 4.1.

We have this working reliably now, but it was a voyage of
discovery particularly as there are no multithreaded client examples,
and whilst this is (sort of) explained in the header files, it's a
bit opaque. Seeing xl launch stuff, and our agent fail when it was
passing byte for byte the same values was frustrating. If I felt I
really understood how this worked (rather than discovered from trial
and error and a lot of strace), I'd have sent a documentation patch.

So this wasn't a change in the API in the sense of structs changing
or parameters changing (we caught that early at compile time), but
in how the thing gets used in practice.

I fully understand why the API was changed 4.2 -> 4.3, and really
appreciate the fact it's now stable (or at least advertised as
such). Initial indications (i.e. it built first time without
warnings) are good.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2013-06-19 21:22 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-13 14:00 [Hackathon Minutes] Xen 4.4 Planning Lars Kurth
2013-06-13 14:22 ` Jan Beulich
2013-06-13 15:11   ` George Dunlap
2013-06-13 15:30     ` Jan Beulich
2013-06-13 15:39       ` Ian Campbell
2013-06-17  8:27         ` Fabio Fantoni
2013-06-17  9:52           ` Ian Campbell
2013-06-13 14:31 ` Ian Campbell
2013-06-13 14:52   ` George Dunlap
2013-06-13 15:06     ` Ian Campbell
2013-06-14 17:41   ` Konrad Rzeszutek Wilk
2013-06-13 14:43 ` George Dunlap
2013-06-13 17:09 ` Ben Guthro
2013-06-13 18:07   ` Pasi Kärkkäinen
2013-06-13 21:03 ` Alex Bligh
2013-06-13 23:56   ` Ian Murray
2013-06-14  7:01     ` Alex Bligh
2013-06-14  9:46       ` Ian Murray
2013-06-14 11:53         ` Alex Bligh
2013-06-14 12:32           ` Ian Murray
2013-06-14 12:49             ` Alex Bligh
2013-06-14 13:34               ` Ian Murray
2013-06-14 13:55                 ` Ian Campbell
2013-06-14 14:44                   ` Ian Murray
2013-06-14 14:55                     ` Gordan Bobic
2013-06-14 15:00                       ` George Dunlap
2013-06-14 15:09                     ` Ian Campbell
2013-06-14 15:43                   ` Alex Bligh
2013-06-14 21:05                     ` Ian Murray
2013-06-19 21:22                     ` Alex Bligh
2013-06-14 15:44                 ` Alex Bligh
2013-06-14 17:25         ` Konrad Rzeszutek Wilk
2013-06-14  8:15   ` Jan Beulich
2013-06-14  9:47     ` George Dunlap
2013-06-14  9:59     ` Lars Kurth
2013-06-14 10:45       ` Jan Beulich
2013-06-14 11:19         ` George Dunlap
2013-06-14 11:30         ` Gordan Bobic
2013-06-14 12:10       ` Sander Eikelenboom
2013-06-14 10:44   ` George Dunlap
  -- strict thread matches above, loose matches on Subject: below --
2013-06-14 11:46 Alex Bligh
2013-06-14 12:26 ` Jan Beulich
2013-06-14 12:45   ` Alex Bligh
2013-06-14 18:55 Alex Bligh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).