xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Julien Grall <julien.grall@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: roger.pau@citrix.com,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	StefanoStabellini <Stefano.Stabellini@eu.citrix.com>
Subject: Re: [RFC] Support of non-indirect grant backend on 64KB guest
Date: Wed, 19 Aug 2015 08:25:54 -0700	[thread overview]
Message-ID: <55D4A002.2050203@citrix.com> (raw)
In-Reply-To: <55D46165020000780009BDD8@prv-mh.provo.novell.com>

On 19/08/2015 01:58, Jan Beulich wrote:
>>>> On 18.08.15 at 20:45, <julien.grall@citrix.com> wrote:
>> Hi Roger,
>>
>> On 18/08/2015 00:09, Roger Pau Monné wrote:
>>> Hello,
>>>
>>> El 18/08/15 a les 8.29, Julien Grall ha escrit:
>>>> Hi,
>>>>
>>>> Firstly, this patch is not ready at all and mostly here for collecting
>> comment about the way to do it. It's not clean so no need to complain about
>> the coding style.
>>>>
>>>> The qdisk backend in QEMU is not supporting indirect grant, this is means
>> that a request can only support 11 * 4KB = 44KB.
>>>>
>>>> When using 64KB page, a Linux block request (struct *request) may contain up
>> to 64KB of data. This is because the block segment size must at least be the
>> size of a Linux page.
>>>>
>>>> So when indirect is not supported by the backend, we are not able to fitall
>> the data in a single request. We therefore need to create a second request to
>> copy the rest of the data.
>>>>
>>>> I've wrote a patch last week which make 64KB guest booting with qdisk.
>> Although, I'm not sure this is the right way to do it. I would appreciate
>> ifone of the block maintainers give me insight about it.
>>>
>>> Maybe I'm missing some key data, but I see two ways to solve this, the
>>> first one is the one you describe above, and consists in allowing
>>> blkfront to split a request into multiple ring slots. The other solution
>>> would be to add indirect descriptors support to Qdisk, has this been
>>> looked into?
>>>
>>> AFAICT it looks more interesting, and x86 can also benefit from it.
>>> Since I would like to prevent adding more cruft to blkfront, I rather
>>> prefer 64KB guests to require indirect descriptors in order to run.
>>
>> Actually supporting indirect in Qdisk was one of our idea. While I agree
>> this is a good improvement in general we put aside this idea for various
>> reasons.
>>
>> The first one is openStack is using by default Qdisk backend, so Linux
>> 64KB guest wouldn't be able to boot on current version of Xen. This is
>> the only blocker in order use 64KB guests, everything else is working.
>> Having the indirect grant support in QEMU for Xen 4.6 is not realistic,
>> there is only a month left and we are already in feature.
>>
>> That would mean that any new distribution using Linux 64KB would not
>> work out-of-box on Xen.
>>
>> Furthermore, not supporting non-indirect grant in the frontend means
>> that any userspace backend won't be supported for Linux 64KB guests.
>>
>> Overall, I think we have to support non-indirect with Linux 64KB guests.
>> Many (but not all) distribution will only support 64KB pages, so we
>> can't wait until Xen 4.7 to get something running. Not that I rule out
>> the requirement for the user to upgrade the QEMU version in order to run
>> 64KB guests.
>
> To be honest, none of this really reads like a good reason for not
> following Roger's suggestion. All it points out that there is a new
> feature that's not fully cooked yet. Distros wanting to support such
> guests should be willing to either backport the necessary qemu
> patch(es) or update qemu. Uglifying blkfront (via other than an
> experimental, out of tree patch) doesn't look like a good idea to me.

With your suggested approach, modifiying QEMU, you would have to push a 
patch in DOM0 distributions (which may be different as the guests) in 
order to boot a such guests. On the other hand, patching Linux, will 
make a guest booting out-of-box without having the cloud provider using 
aarch64 platform fixing their DOM0 in order to boot a such a guest.

> And then, if blkfront was to be made capable, it would seem to me
> that the better route would be to have it use its native page size for
> the blkif instantiation, and extend the blkif protocol so the frontend
> can communicate its page size to the backend. The current
> backend-page-size == frontend-page-size restriction would then
> become <= (or maybe could even go away altogether).

And then we could also improve memory performance right now rather than 
having a first version upstream... if we take this approach, it will 
take another year to upstream the 64KB guest support in Linux.
Having a fully working, high-performing 64KB guest support is huge and 
we can't do everything at one time.

Having 64KB grant is my next plan but it requires changes in both Linux 
and Xen. Although, this can't be done in a short timeline and we need to 
have a first approach done today.

Major distributions will be shipped with 64KB page supports only 
(including the one from your company Suse), they are targeting Linux 
4.2/4.3 and are willing to take patches in their kernel.

Now, if we decide to not support non-indirect grant for 64KB guests, we 
also have to reach distributions using 4KB pages in order to update 
their QEMU.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2015-08-19 15:25 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-18  6:29 [RFC] Support of non-indirect grant backend on 64KB guest Julien Grall
2015-08-18  7:09 ` Roger Pau Monné
2015-08-18  7:26   ` Jan Beulich
2015-08-18 18:45   ` Julien Grall
2015-08-19  8:50     ` Roger Pau Monné
2015-08-19 14:54       ` Julien Grall
2015-08-19 15:17         ` Roger Pau Monné
2015-08-19 15:52           ` Julien Grall
2015-08-19 23:44           ` Stefano Stabellini
2015-08-20  8:31             ` Roger Pau Monné
2015-08-20  9:43               ` David Vrabel
2015-08-20 16:16                 ` Julien Grall
2015-08-20 17:23                 ` Stefano Stabellini
2015-08-21 16:05                   ` Konrad Rzeszutek Wilk
2015-08-21 16:08                     ` David Vrabel
2015-08-21 16:49                       ` Stefano Stabellini
2015-08-21 17:10                       ` PAGE_SIZE (64KB), while block driver 'struct request' deals with < PAGE_SIZE (up to 44Kb). Was:Re: " Konrad Rzeszutek Wilk
2015-08-27 17:51                         ` Julien Grall
2015-09-04 14:04                           ` Stefano Stabellini
2015-09-04 15:41                             ` Konrad Rzeszutek Wilk
2015-09-04 16:15                               ` Julien Grall
2015-09-04 17:32                                 ` Konrad Rzeszutek Wilk
2015-09-04 22:05                                   ` Julien Grall
2015-08-20  9:37             ` Jan Beulich
2015-08-19  8:58     ` Jan Beulich
2015-08-19 15:25       ` Julien Grall [this message]
2015-08-20 17:42 ` David Vrabel
2015-08-21  1:30   ` Julien Grall
2015-08-21 16:07     ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55D4A002.2050203@citrix.com \
    --to=julien.grall@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Stefano.Stabellini@eu.citrix.com \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).