From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Xen 4 serial hangs during boot Date: Thu, 26 Jul 2012 09:50:01 -0400 Message-ID: <20120726135001.GA28024@phenom.dumpdata.com> References: <500DB9CD.5060900@theshore.net> <500E95D30200007800090218@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <500E95D30200007800090218@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Andrew Cooper , xen devel , Keir Fraser List-Id: xen-devel@lists.xenproject.org On Tue, Jul 24, 2012 at 11:32:19AM +0100, Jan Beulich wrote: > >>> On 23.07.12 at 22:53, "Christopher S. Aker" wrote: > > On 7/20/12 3:59 PM, Keir Fraser wrote: > >> Then it is Xen doing something to kill the serial interrupt. ;-) I haven't > >> seen anything like this reported before. Not sure what to suggest really... > >> Gather debug output from interrupt-related debug keys (via the xl debug-keys > >> interface) I suppose. I think that would be 'i' and 'z' keys. That plus Xen > >> and dom0 boot logs... something might become apparent. > > > > We hit this again today, and I grabbed boot and debug-keys output: > > > > http://theshore.net/~caker/xen/BUGS/serial/log.txt > > This isn't even 8k that make it over, whereas the transmit buffer > is 16k, and dropping of characters would only start when it first > got full. > > The part of the data that didn't make it out isn't big enough to > overflow the buffer - to check whether that would actually > happen, could you increase the log level of both hypervisor and > Dom0 kernel? To me this all (particularly the fact that you can > make the data appear combined with the amount of data not > being big enough to fill the buffer) looks as if there was some > buffering happening outside of the control of Xen. Did you check > whether this is possibly a problem with the remote end? This got me thinking - I've one particular AMD machine (prototype) that seems to hang often - but if I use 'sync_console' it works fine. This issue started oooh, I can't remember when but I do have some logs that could shed some light on the about date. I guess I was too quick to blame the prototype for being at fault here :-( Then recently (yesterday?) the upstream kernel started doing something wonky on this card: 01:05.0 Serial controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 01) Under Xen, when it boots it hits right here: [ 1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002 and then stops [note: I hadn't really done any investigation to see if the machine is dead or if it continues on, but with the serial port just wedged hard]. On baremetal it can actually read the IO bars: [ 1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002 [ 1.247075] pci 0000:01:05.0: reg 10: [io 0xe050-0xe057] [ 1.252734] pci 0000:01:05.0: reg 14: [io 0xe040-0xe047] [ 1.258394] pci 0000:01:05.0: reg 18: [io 0xe030-0xe037] [ 1.264054] pci 0000:01:05.0: reg 1c: [io 0xe020-0xe027] [ 1.269713] pci 0000:01:05.0: reg 20: [io 0xe010-0xe017] [ 1.275372] pci 0000:01:05.0: reg 24: [io 0xe000-0xe00f] so I am wondering if the back-ports in Xen 4.1 for dealing with PCI have something to do with this? > > Does this also happen with "sync_console"? Did you check > whether disabling the use of the associated IRQ makes any > difference, as suggested by Konrad (I think)? > > Does the port work flawlessly on native Linux? > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel