From mboxrd@z Thu Jan 1 00:00:00 1970 From: Scott Wood Subject: Re: [RFC PATCH 1/6] kvm: add device control API Date: Thu, 21 Feb 2013 20:17:54 -0600 Message-ID: <1361499474.1574.9@snotra> References: <20130221082209.GX3600@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; delsp=Yes format=Flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Alexander Graf , , , Christoffer Dall , Paul Mackerras To: Gleb Natapov Return-path: In-Reply-To: <20130221082209.GX3600@redhat.com> (from gleb@redhat.com on Thu Feb 21 02:22:09 2013) Content-Disposition: inline Sender: kvm-ppc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 02/21/2013 02:22:09 AM, Gleb Natapov wrote: > On Wed, Feb 20, 2013 at 08:05:12PM -0600, Scott Wood wrote: > > On 02/20/2013 07:09:49 AM, Gleb Natapov wrote: > > >On Tue, Feb 19, 2013 at 03:16:37PM -0600, Scott Wood wrote: > > >> On 02/19/2013 06:24:18 AM, Gleb Natapov wrote: > > >> >On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: > > >> >> The ability to set/get attributes is needed. Sorry, but "get > > >or set > > >> >> one blob of data, up to 512 bytes, for the entire irqchip" is > > >just > > >> >> not good enough -- assuming you don't want us to start =20 > sticking > > >> >> pointers and commands in *that* data. :-) > > >> >> > > >> >Proposed interface sticks pointers into ioctl data, so why doin= g > > >> >the same > > >> >for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile. > > >> > > >> There's a difference between putting a pointer in an ioctl =20 > control > > >> structure that is specifically documented as being that way (as = =20 > in > > >> ONE_REG), versus taking an ioctl that claims to be =20 > setting/getting a > > >> blob of state and embedding pointers in it. It would be like > > >> sticking a pointer in the attribute payload of this API, which I > > >> think is something to be discouraged. > > >If documentation is what differentiate for you between silly and =20 > smart > > >then write documentation instead of new interfaces. > > > > You mean like what we did with SREGS, that got deprecated and > > replaced with ONE_REG? > > > SREGS is implemented by ppc. I see ONE_REG as addition to REGS > interface. You can access all register at once or you can access them > one by one. If there is a need we can add MULTIPLE_REGS that will get > list of requested REGS. http://www.spinics.net/lists/kvm-ppc/msg04876.html http://www.spinics.net/lists/kvm-ppc/msg05842.html > The interface is not over generic i.e it does > not try to replace KVM_RUN by writing special register. Sigh. > > And we > > don't necessarily want to set *everything*. > What are those cases? You do need to on reset/migration. Why do we want to set all the registers on reset, rather than tell the = =20 in-kernel device to reset? The default state came from the kernel in =20 the first place on irqchip creation... > > >> It'd also be using > > >> KVM_SET_IRQCHIP to read data, which is the sort of thing you =20 > object > > >> to later on regarding KVM_IRQ_LINE_STATUS. > > >> > > >Do not see why. > > > > It's either that, or have the data direction of the "chip" field in > > KVM_GET_IRQCHIP not be entirely in the "get" direction. > > > Still do not follow. Example? struct kvm_irqchip has "chip_id", "pad", and "chip". "pad" is =20 insufficient to communicate attribute type plus a pointer. So if we =20 want to provide a pointer for the kernel to write the attribute into, =20 it has to read from memory that the ioctl definition suggests should =20 only be written to. > > >Setting the address is setting an attribute. Sending MSI is a =20 > command. > > >Things you set/get during init/migration are attributes. Things =20 > you do > > >to cause side-effects are commands. > > > > What if I set the address at a time that isn't init/migration (the > > hardware allows moving it, like a PCI BAR)? Suddenly it becomes no= t > > an attribute due to how the caller uses it? > > > What's the interface for guest to move it? Some non-MPIC registers called CCSRBARH, CCSRBARL, and CCSRBAR. > Why it goes via userspace? Because the mechanism in question doesn't just move MPIC. It moves a =20 big block of a bunch of different devices all at once. > You can move APIC base too, but this does not involve userspace. But > even if you do go via userspace, it is just a guest asking to change = =20 > device > configuration, so using SET_ATTR to set new configuration is fine. >=20 > > >> What is the benefit of KVM_IRQ_LINE over what MPIC does? What =20 > real > > >> (non-glue/wrapper) code can become common? > > >> > > >No new ioctl with exactly same result (well actually even faster =20 > since > > >less copying is done). > > > > Which ioctl would go away? > > > Those that you propose in your new interface. No, they wouldn't. At most one MPIC attribute group would go away =20 (though as I've noted it would still be useful to be able to "get" =20 those attributes for debugging). > > >> And I really hope you don't want us to do MSIs the x86 way. > > >> > > >What is wrong with KVM_SIGNAL_MSI? Except that you'll need code to > > >glue it > > >to mpic. > > > > We'll have to write extra code for it compared to the current way > > where it works with *zero* code beyond what is wanted for other > > purposes such as debug and (eventually) migration. At least it's > > more direct than having to establish a GSI route... > If just writing a register cause MSI to be send how do you distinguis= h > between write that should send MSI and write that is done on migratio= n > to transfer current value? It is a write-only command register. The registers that contain the =20 state are elsewhere. Again, we do not currently support migration on MPIC. It is a very low= =20 priority for embedded. We do not wish to rule it out entirely, but it = =20 most likely would require adding more state accesors. > We had that problem with MSRs on x86. We had > to, eventually, add a flag that tells us the reason of MSR access. The equivalent to that flag would be using the right kind of accessor =20 for what you want to do (simulated guest access versus backdoor state =20 access). > > >> In the XICS thread, Paul brought up the possibliity of cascaded > > >> MPICs. It's not relevant to the systems we're trying to model, = =20 > but > > >> if one did want to use the in-kernel irqchip interface for that,= =20 > it > > >> would be really nice to be able to operate on a specific MPIC fo= r > > >> injection rather than have to come up with some sort of global > > >> identifier (above and beyond the minor flattening we'd need to =20 > do to > > >> represent a single MPIC's interrupts in a flat numberspace). > > >> > > >ARM encodes information in irq field of KVM_IRQ_LINE like that: > > > =A0bits: | 31 ... 24 | 23 ... 16 | 15 ... 0 | > > > field: | irq_type | vcpu_index | irq_number | > > >Why will similar approach not work? > > > > Well, it conflicts with the GSI routing stuff, and I don't see an > > irq chip ID field... > It does :( Can't say I am happy about it, but I skipped the discussio= n > about the interface back then and it is too late to complain now. =20 > Since, > as you notices, irqfd interfaces with irq routing I wonder what's ARM > plan about it. But if you choose to go ARM way the format is ARM =20 > specific, > so you can use your own encoding and put irq chip information there. Well, we do want to support irqfd, so I don't think we'll be following = =20 ARM here. > > But otherwise (or assuming you mean to use such an encoding when > > setting up a GSI route), I didn't say this part couldn't be made to > > work. It will require new kernel code for managing a GSI table in = a > > non-APIC way, and a new callback into the device code, but as I've > > said elsewhere I think we need it for irqfd anyway. If I use > > KVM_IRQ_LINE for injecting interrupts, do you still object to the > > rest of it? > The rest of what, proposed interface? There are two separate =20 > discussions > happening here interleaved. First is "do we need to introduce new =20 > generic > interface for device creation when existing one, albeit not ideal, =20 > can be > used" and I am OK with that as long as ARM moves to it for 3.10, =20 > although > I would prefer to have some example of what this interface will be =20 > used > for besides irq chips otherwise it will be just another way to create > irqchip. We need a new way to create irqchips anyway, even if it's just what the= =20 XICS patchset adds (KVM_CREATE_IRQCHIP_ARGS, which is similar to =20 KVM_CREATE_DEVICE except it doesn't return an identifier for operating = =20 on a specific device). And of course we want to sort this out before =20 either patchset gets merged, so we don't end up adding both methods. I= =20 suspect the XICS patchset flew under your radar because it has "PPC:" =20 in front of it and the subject line doesn't mention the ioctl, but I'm = =20 not the only one that felt the need for a few new ioctls. As for other types of devices, x86 has i8254.c emulated in-kernel -- I = =20 know that's not going to switch to the new interface, but it could have= =20 if it existed back then. I can also see creating an in-kernel =20 emulation device for doing MMIO filtering on some piece of embedded =20 hardware that guests need to access with reasonable performance, but =20 the hardware desginers screwed up the protection slightly (e.g. put =20 other things in the same 4K page). We've done such filtering before in= =20 our standalone hypervisor; the question is whether it happens to =20 anything with enough performance requirements to be done in the kernel. > Second one is "how the interface should look like". And here I > think that strong distinction is needed between setting the attribute= s > and sending commands with side effects for reasons explained all over > this ml thread. OK, so let's just call them "commands". I like the split into "read" =20 and "write" commands, especially when most of the commands naturally =20 come in such pairs, but if you don't like that part it can be reduced =20 to a single read/write command (and then we'd define separate set/get =20 commands where appropriate). Note that the XICS patchset also involves device commands. It does it = =20 by passing any unknown vm ioctl to the irqchip (XICS implements =20 KVM_IRQCHIP_GET_SOURCES and KVM_IRQCHIP_SET_SOURCES in addition to =20 KVM_IRQ_LINE). Obviously both ways can work; I've given my reasons =20 elsewhere in the thread for preferring something that doesn't require a= =20 new ioctl for every device command. > > It's not about "silliness" as that this new thing I added for other > > reasons did the job just as well (again, except when it comes to > > irqfd), and avoided the need for a GSI table, etc. IRQ injection > > was not the main point of the new interface. > Having generic interface for device creation and then make some =20 > devices > special by allowing them to be used with KVM_IRQ_LINE makes little > sense, Well, we'd want to document which devices tie into which generic =20 interfaces, and which device is selected if multiple such devices are =20 created (if that is allowed for a particular device class). =46or KVM_IRQ_LINE, we could perhaps use the device id as the irqchip i= d =20 in kvm_irq_routing_irqchip (or, have an attribute/command that =20 retrieves the id to be used there). Unfortunately there is no irqchip = =20 field in kvm_irq_routing_msi, though since it's basically a command to = =20 write to an arbitrary MMIO address, maybe it could just be implemented = =20 that way? > > >> I see no need for a separate ioctl in terms of the underlying > > >> infrastructure for distinguishing "attribute" from "write-only > > >> command". I'm open to improvements on what the ioctl is called. > > >> It's basically like setting a register on a device, except I was > > >> concerned that if we actually called it a "register" that people > > >> would take it too literally and think it's only for the =20 > architected > > >> register state of the emulated device. > > >I agree "attribute" is better name than "register", but injecting > > >interrupt is not setting an attribute. > > > > It's a dynamic attribute -- the state of the input line. Better > > names are welcome. I don't see this difference as enough to warran= t > > separate ioctls. > As long as you use the same attribute for migration and interrupt =20 > injection > purpose I do. If you use separate attributes for migration and =20 > interrupt > injection then not having separate ioctl is just a hack. Why is it a hack? Is it also a hack to not use a separate ioctl to =20 reset the device, to move its address map, etc? -Scott