From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59947) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZUiWy-0004LE-37 for qemu-devel@nongnu.org; Wed, 26 Aug 2015 17:46:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZUiWt-0005WL-G3 for qemu-devel@nongnu.org; Wed, 26 Aug 2015 17:46:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40529) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZUiWt-0005W0-6Q for qemu-devel@nongnu.org; Wed, 26 Aug 2015 17:46:47 -0400 References: <76f0a328e53daa4a2303fe17d3bbe157d7b40d03.1437699224.git.alistair.francis@xilinx.com> <55D26157.8050502@suse.de> From: John Snow Message-ID: <55DE33C4.1010107@redhat.com> Date: Wed, 26 Aug 2015 17:46:44 -0400 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v1 2/3] object.c: object_class_dynamic_cast return NULL if the class has no type List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Crosthwaite , Alistair Francis Cc: Edgar Iglesias , Peter Maydell , mst@redhat.com, "qemu-devel@nongnu.org Developers" , Sai Pavan Boddu , =?UTF-8?Q?Andreas_F=c3=a4rber?= On 08/26/2015 05:02 PM, Peter Crosthwaite wrote: > On Wed, Aug 26, 2015 at 1:36 PM, Alistair Francis > wrote: >> On Tue, Aug 25, 2015 at 12:43 AM, Peter Crosthwaite >> wrote: >>> On Mon, Aug 24, 2015 at 4:36 PM, Alistair Francis >>> wrote: >>>> On Mon, Aug 17, 2015 at 4:37 PM, Peter Crosthwaite >>>> wrote: >>>>> On Mon, Aug 17, 2015 at 3:33 PM, Andreas F=C3=A4rber wrote: >>>>>> Am 18.08.2015 um 00:24 schrieb Alistair Francis: >>>>>>> On Sat, Aug 15, 2015 at 2:22 PM, Peter Crosthwaite >>>>>>> wrote: >>>>>>>> On Mon, Jul 27, 2015 at 11:37 AM, Alistair Francis >>>>>>>> wrote: >>>>>>>>> If the ObjectClass has no type return NULL instead of trying to= compare >>>>>>>>> the type name. >>>>>>>>> >>>>>>>> >>>>>>>> What was the issue? >>>>>>> >>>>>>> There is a seg fault in object_class_dynamic_cast() because there= is >>>>>>> no type in the ObjectClass struct. >>>>>> >>>>>> That should never happen, ever since TYPE_OBJECT is no longer NULL= . >>>>>> >>>>>>> It happens when it is trying to cast the "pci-device", which is c= alled >>>>>>> from the ahci_irq_lower() function. The function is testing if th= e >>>>>>> device is a pci device, so it should return NULL if it isn't vali= d. >>>>> >>>>> Yes so I vaguely remember this now. It is about MSI interrupts whic= h >>>>> have nothing to do with sysbus implementation. My solution was to r= ip >>>>> that PCI specific stuff out of AHCI completely in my branch. Should >>>>> sysbus and PCI AHCI classes install their own separate logic for th= is >>>>> part via a virtualised hook? >>>>> >>>>> On the topic though, I notice many PCI devices have this MSI specif= ic >>>>> logic in them. Is it possible for devs to just treat interrupts as >>>>> pins and the PCI layers do the MSI vs non-MSI logic switch in core >>>>> layers? >>>>> >>>>> If Andreas' idea don't work this is still a core QOM bug though. I >>>>> think object_dynamic_cast should not have this segfault when passed= a >>>>> non implementing object. >>>>> >>>>> Regards, >>>>> Peter >>>>> >>>>>> >>>>>> It rather sounds as if some build-time dependency is wrong, which = we >>>>>> used to run into for the Container type before Paolo macrofied thi= s. >>>>>> >>>>>> Please try again with a clean build - if it still occurs, we'll ne= ed a >>>>>> reproducible test case to investigate what is going on rather than >>>>>> papering over a latent bug. >>>> >>>> Hey, >>>> >>>> Sorry abut the delay, but I didn't get a chance to look at this last >>>> week. I tried with a clean setup and still see the seg fault. >>>> >>>> I will try to look into it more this week, but if anyone is interest= ed >>>> here are the steps to reproduce: >>>> >>>> On the latest mainline QEMU, with my 2nd and 3rd patches applied >>>> $ ./configure --target-list=3D"aarch64-softmmu,microblazeel-softmmu" >>>> --disable-pie --disable-sdl --disable-werror # This is what is >>>> required at work >>>> $ ./aarch64-softmmu/qemu-system-aarch64 -M xlnx-ep108 -display none >>>> -kernel ./u-boot.elf -m 8000000 -nographic -serial mon:stdio # Boot >>>> u-boot on QEMU >>>> >>>> The image I'm using is available at: http://1drv.ms/1NxDXLo >>>> >>> >>> So it's not a core bug. That container_of in ahci_lower_irq is >>> incorrectly assuming that the passed AHCIState * is always for a PCI, >>> which it is not in the sysbus case. So it's incorrectly getting the >>> offset of QOM the object and the QOM cast is treating some invalid >>> offset into the (or past) object as a QOM object base address. >>> >>> The simplest solution is a back pointer in AHCIState to the >>> encapsulating device (would be a DeviceState *). The container_of is >>> replaced with a nav of this pointer and then the conditional PCI cast >>> can work. >> >> This seems to fix the problem. >=20 > I assume you have the appropriate setters for the new variable > elsewhere in the code as well? >=20 >> It seems hacky though, I can't find a >> better way to check the validity of the PCIDevice. Any ideas? >> >=20 > So there a few problems in the way of a correct solution. The caller > for ahci_lower_irq does not have access to the QOM object pointer, > it's been abstracted away by AHCIState (which is not a QOM object). So > you would need to replumb the call path to ahci_lower_irq to pass the > QOM object. This would let you drop the container_of completely. >=20 > The next step would be to virtualise ahci_lower_irq, as this is > implementation dependent (assume specific devices really do need to > control the use of PCI MSI?), one implementation for sysbus, one for > PCI. This is blocked by the re-plumbing described above as the > virtualised called itself will need a ptr to the QOM object. >=20 > But I think the back ptr is an acceptable solution for the meantime, > this is a clear bug in Sysbus AHCI and should probably even go to > qemu-stable. >=20 I'm not intricately familiar with how the QOM plumbing works, but I can definitely see how assuming all AHCIState pointers come from AHCIPCIState is a problem... For the uninitiated, how does MSI work with Sysbus? What does a Sysbus AHCI device look like to a guest, and what happens if it tries to utilize the functionality? (Well, segfault, I guess.) If someone wants to clue in the device model newbie and send a patch my way, I'll take it. --js >> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c >> index 02d85fa..77e58a9 100644 >> --- a/hw/ide/ahci.c >> +++ b/hw/ide/ahci.c >> @@ -137,8 +137,11 @@ static void ahci_irq_raise(AHCIState *s, AHCIDevi= ce *dev) >> static void ahci_irq_lower(AHCIState *s, AHCIDevice *dev) >> { >> AHCIPCIState *d =3D container_of(s, AHCIPCIState, ahci); >> - PCIDevice *pci_dev =3D >> - (PCIDevice *)object_dynamic_cast(OBJECT(d), TYPE_PCI_DEVICE); >> + PCIDevice *pci_dev =3D NULL; >> + >> + if (s->parent_obj) { >=20 > I would make the parent obj compulsory for all AHCIState > implementations and drop the NULL guard. >=20 >> + pci_dev =3D PCI_DEVICE(d); >> + } >> >> DPRINTF(0, "lower irq\n"); >> >> diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h >> index c055d6b..ac7d2de 100644 >> --- a/hw/ide/ahci.h >> +++ b/hw/ide/ahci.h >> @@ -287,6 +287,8 @@ struct AHCIDevice { >> }; >> >> typedef struct AHCIState { >> + DeviceState *parent_obj; >=20 > This name is really for QOM inline parents. We decided a while back to > use "parent" for the QOM parents and "container" for non-parental > containers. Memory regions use the .container field for a similar > purpose. >=20 > Regards, > Peter >=20 >> + >> AHCIDevice *dev; >> AHCIControlRegs control_regs; >> MemoryRegion mem; >> >> Thanks, >> >> Alistair >> >>> >>> Regards, >>> Peter >>> >>>> Thanks, >>>> >>>> Alistair >>>> >>>>>> >>>>>> Thanks, >>>>>> Andreas >>>>>> >>>>>> -- >>>>>> SUSE Linux GmbH, Maxfeldstr. 5, 90409 N=C3=BCrnberg, Germany >>>>>> GF: Felix Imend=C3=B6rffer, Jane Smithard, Graham Norton; HRB 2128= 4 (AG N=C3=BCrnberg) >>>>> >>>