From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37308) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dsLG1-00014l-SI for qemu-devel@nongnu.org; Wed, 13 Sep 2017 23:56:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dsLFz-0000xj-8B for qemu-devel@nongnu.org; Wed, 13 Sep 2017 23:56:05 -0400 Date: Thu, 14 Sep 2017 13:30:22 +1000 From: David Gibson Message-ID: <20170914033022.GI3972@umbus.fritz.box> References: <1505054255-2990-3-git-send-email-mark.cave-ayland@ilande.co.uk> <20170911095059.101e0cfc@bahia.lan> <20170911093032.GA2857@work-vm> <20170911104854.GB2784@umbus.fritz.box> <20170912162100.GD2225@work-vm> <1bb6a27d-89a5-2e70-6976-74ab24430ec4@ilande.co.uk> <4f0cb74a-ae00-5921-291c-49c9acdb3e02@ilande.co.uk> <20170913060203.GG7550@umbus.fritz.box> <760c3889-8561-9bce-e565-ccbcb9eb8cf5@ilande.co.uk> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="Q59ABw34pTSIagmi" Content-Disposition: inline In-Reply-To: <760c3889-8561-9bce-e565-ccbcb9eb8cf5@ilande.co.uk> Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/4] ppc: add CPU IRQ state to PPC VMStateDescription List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Mark Cave-Ayland Cc: Alexey Kardashevskiy , lvivier@redhat.com, qemu-devel@nongnu.org, "Dr. David Alan Gilbert" , Greg Kurz , qemu-ppc@nongnu.org --Q59ABw34pTSIagmi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Sep 13, 2017 at 05:44:54PM +0100, Mark Cave-Ayland wrote: > On 13/09/17 07:02, David Gibson wrote: >=20 > >>> Alexey - do you recall from your analysis why these fields were no > >>> longer deemed necessary, and how your TCG tests were configured? > >> > >> I most certainly did not do analysis (my bad. sorry) - I took the patch > >> from David as he left the team, fixed to compile and pushed away. I am= also > >> very suspicions we did not try migrating TCG or anything but pseries. = My > >> guest that things did not break (if they did not which I am not sure a= bout, > >> for the TCG case) because the interrupt controller (XICS) or the > >> pseries-guest took care of resending an interrupt which does not seem = to be > >> the case for mac99. > >=20 > > Right, that's probably true. The main point, though, is that these > > fields were dropped a *long* time ago, when migration was barely > > working to begin with. In particular I'm pretty sure most of the > > non-pseries platforms were already pretty broken for migration > > (amongst other things). > >=20 > > Polishing the mac platforms up to working again, including migration, > > is a reasonable goal. But it can't be at the expense of pseries, > > which is already working, used in production, and much better tested > > than mac99 or g3beige ever were. >=20 > Oh I completely agree since I'm well aware pseries likely has more users > than the Mac machines - my question was directed more about why we > support backwards migration. >=20 > I spent several hours yesterday poking my Darwin test case with trying > the different combinations of pending_interrupts, irq_input_state and > access_type and could easily provoke migration failures unless all 3 of > the fields were present so a practical test shows they are still > required for TCG migration. I think ppc_set_irq()'s use of the interrupt > fields in hw/ppc/ppc.c and the subsequent reference to pending > interrupts in target/ppc may explain why I see freezes/hangs until a key > is pressed in many cases. Ok, I think we need to consider (pending_interrupts and irq_input_state) separately from access_type. The first two are pretty closely related to each other, and I've got at least a rough idea of what the problems there might be. access_type I'm pretty sure has to be an unrelated problem, and I've got much less of a handle on it. I suspect we could work around the problems with pending_interrupts and irq_input_state by having a post_load hook in the board level interrupt controller to reassert its output irq line based on its current state. I believe the relevant irq inputs to the cpu are effectively level triggered, so I think that will be enough. access_type I don't have any good ideas for yet. We really need to work out what the exact race is here that's causing its state to be lost harmfully. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --Q59ABw34pTSIagmi Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlm598wACgkQbDjKyiDZ s5LgfA//f2smmWZ5X9m3dmdGbSr9Y1IGh9aM32mIHmA+uu/d6KtnG9bhG1+Pm16t M8XTmnbXDZEPxdjPkgXoNASQEkIshEjB4PsSx7ZlcU4kx2xc/E6t97gXuqvhTrxD ldgkCXysWF3+KW/C47QnY1TOI7WbUztWIysnTcgG8bE3nSgL7ZkM+fhzjoCUAyJe sa+H97wUQgnTDANKKV+1ogEiD4+IzgAZYpPpFvWbxnqEI/Xv7tE82Uloa6CzWUJ3 lJWr8DNHPnLpDRWZbEsZzD3Xb+dBq0+B+Yy1Ft0xHyUlQz3n0EIhu+D/z/djt2AI GTTwh22N4JPmby4bcuBQKjXJYe8AQriHH9ORIK//Oe0hL10bMZgfb74/IdEbHeX1 ZDZ9SMP1a54LFOSGBidNyG3Lih5QcW7eP7z0DyIVkE5KVVDWNI5kDU6KV+dRGb4s iKAP0xZ+9hTrCc59dghOPlfysUQoJsMnZX5M6Ua0IGHspkt3V0ifALa9VZTdUQtW e6D+EjnUd46vQ7Qs4lNV7Ef+t6L++2r/rQSJsYo5TRtW8QT62F8ouafLaBqnH2gm Xdf9F/igwCKbmUeNd6A6PlAVYI7G/S4Usof49MbC8oWrFAzkM6VREN7lfyLSmxVv 52fHZ+wG2KCahmri2YNJb4FPFQc6V+FrHmqWo2eRgURNbqHIuEg= =+5M+ -----END PGP SIGNATURE----- --Q59ABw34pTSIagmi--