From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claus Rosenberger Subject: Re: Freeze with 2.6.32.19 and xen-4.0.1rc5 Date: Sun, 22 Aug 2010 00:08:53 +0200 Message-ID: <4C704E75.2050200@rocnet.de> References: <4C6FD90D.9080907@rocnet.de> <20100821140234.GX2804@reaktio.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20100821140234.GX2804@reaktio.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Am 21.08.2010 16:02, schrieb Pasi K=E4rkk=E4inen: > On Sat, Aug 21, 2010 at 03:47:57PM +0200, Claus Rosenberger wrote: >> Hi, >> >> i have big trouble with a Debian Lenny dom0 and latest kernel 2.6.32.1= 9 >> with xen-4.0.1rc5. Due some reason the system freezes from time to tim= e. >> I used kernel 2.6.31.9 with xen-3.4.2 before. The machine doesn't writ= e >> anything to serial console so there are no errors or something like th= at. >> >> Perhaps there is something to see from the logs ... >> > Hello, > > A couple of questions: > > - Do you use PCI passthru?=20 I tried but now i disabled to avoid a mixup of to many issues. > - Is there something special happening when it freezes?=20 Last time it happened as creating filesystems, perhaps it's something about disk usage. At the end of the mail i describe more about the disk problems. > - Does it freeze at regular intervals, at the same time/uptime, or ran= domly?=20 It happens or not, it's randomly. > - By freezing you mean it doesn't respond to anything? Or does it rebo= ot? If it's freezing then i cannot do anything, i can connect with iamt and reboot, nothing else. > - Can you try using the old 2.6.31.9 kernel with the new xen hyperviso= r? Sure. > -- Pasi > > >> Configuration Grub >> >> title Xen 4.0-amd64 / Debian GNU/Linux, kernel 2.6.32.19 >> root (hd0,0) >> kernel /boot/xen-4.0-amd64.gz dom0_mem=3D524288 cpufreq=3Dxen >> cpuidle console=3Dcom1 com1=3D115200,8n1,0xf1c0,0 sync_console > Try adding "loglvl=3Dall guest_loglvl=3Dall" for xen.gz. Sure. >> module /boot/vmlinuz-2.6.32.19 root=3D/dev/md0 ro console=3Dt= ty0 >> console=3Dhvc0 >> module /boot/initrd.img-2.6.32.19 >> > And try adding "nomodeset" for dom0 kernel (vmlinuz). Whats that parameter for? I switched the disk because there was an error on the last one, now on sata2 there is a brand new disk and i can see following on my console log. I cannot believe it's a disk problem, perhaps it's a disk controller problem instead or there is something with the kernel. I will add the parameters and switch off/on the machine to restart from scratch. Claus [17392.097849] sd 1:0:0:0: [sdb] Unhandled error code [17392.100047] BUG: soft lockup - CPU#0 stuck for 66s! [swapper:0] [17392.100049] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables xen_evtchn xenfs 8021q garp bridge stp coretemp lm85 hwmon_vid loop evdev video output tpm_tis tpm snd_pcsp tpm_bios psmouse snd_pcm serio_raw snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core processor button acpi_processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid1 md_mod sd_mod crc_t10dif ata_piix ehci_hcd uhci_hcd ata_generic libata usbcore nls_base scsi_mod e1000e thermal fan thermal_sys [last unloaded: scsi_wait_scan] [17392.100088] CPU 0: [17392.100089] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables xen_evtchn xenfs 8021q garp bridge stp coretemp lm85 hwmon_vid loop evdev video output tpm_tis tpm snd_pcsp tpm_bios psmouse snd_pcm serio_raw snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core processor button acpi_processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid1 md_mod sd_mod crc_t10dif ata_piix ehci_hcd uhci_hcd ata_generic libata usbcore nls_base scsi_mod e1000e thermal fan thermal_sys [last unloaded: scsi_wait_scan] [17392.100120] Pid: 0, comm: swapper Not tainted 2.6.32.19 #2 [17392.100122] RIP: e030:[] [] hypercall_page+0x28a/0x1001 [17392.100129] RSP: e02b:ffff880002f38df8 EFLAGS: 00000a07 [17392.100130] RAX: 0000000000000000 RBX: ffffc900081d2060 RCX: ffffffff8100928a [17392.100132] RDX: 0000000000000001 RSI: ffffc900081d51c0 RDI: 0000000000000001 [17392.100134] RBP: ffffc900081d2198 R08: 0000000000000000 R09: 0000000000000000 [17392.100135] R10: 0000000000015640 R11: 0000000000000a07 R12: 0000000000000003 [17392.100137] R13: 0000000000004620 R14: 0000000000000021 R15: 6db6db6db6db6db7 [17392.100142] FS: 00007f3b93f6a6e0(0000) GS:ffff880002f35000(0000) knlGS:0000000000000000 [17392.100144] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [17392.100145] CR2: 00007f3b93f69000 CR3: 000000001efba000 CR4: 0000000000002660 [17392.100147] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17392.100149] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [17392.100151] Call Trace: [17392.100153] [] ? net_tx_action+0x294/0x9be [17392.100160] [] ? xen_restore_fl_direct_end+0x0/0x1 [17392.100164] [] ? check_for_new_grace_period+0x9e/0x= a8 [17392.100167] [] ? tasklet_action+0x77/0xd3 [17392.100170] [] ? __do_softirq+0xe0/0x1a2 [17392.100173] [] ? __xen_evtchn_do_upcall+0x12a/0x16c [17392.100176] [] ? call_softirq+0x1c/0x30 [17392.100179] [] ? do_softirq+0x3f/0x7c [17392.100181] [] ? irq_exit+0x36/0x79 [17392.100184] [] ? xen_evtchn_do_upcall+0x35/0x42 [17392.100186] [] ? xen_do_hypervisor_callback+0x1e/0x= 30 [17392.100187] [] ? hypercall_page+0x3aa/0x1001 [17392.100191] [] ? hypercall_page+0x3aa/0x1001 [17392.100194] [] ? xen_safe_halt+0xc/0x15 [17392.100196] [] ? xen_idle+0x35/0x40 [17392.100199] [] ? cpu_idle+0xa3/0xdd [17392.100203] [] ? start_kernel+0x3da/0x3e5 [17392.100205] [] ? xen_start_kernel+0x5e6/0x5ea [17392.103027] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [17392.103031] ata1.00: failed command: READ DMA EXT [17392.103035] ata1.00: cmd 25/00:00:5d:88:39/00:04:3d:00:00/e0 tag 0 dma 524288 in [17392.103036] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [17392.103037] ata1.00: status: { DRDY } [17392.103052] ata1.00: hard resetting link [17392.372433] sd 1:0:0:0: [sdb] Result: hostbyte=3DDID_OK driverbyte=3DDRIVER_TIMEOUT [17392.376416] sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 3d 39 7f dd 00 04 00 00 [17392.380089] end_request: I/O error, dev sdb, sector 1027178461 [17392.384569] raid1: Disk failure on sdb3, disabling device. [17392.384569] raid1: Operation continuing on 1 devices. [17392.389710] BUG: soft lockup - CPU#1 stuck for 66s! [scsi_eh_1:538] [17392.389710] Modules linked in: [17392.393352] md: md2: resync done. [17392.393348] nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables xen_evtchn xenfs 8021q garp bridge stp coretemp lm85 hwmon_vid loop evdev video output tpm_tis tpm snd_pcsp tpm_bios psmouse snd_pcm serio_raw snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core processor button acpi_processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid1 md_mod sd_mod crc_t10dif ata_piix ehci_hcd uhci_hcd ata_generic libata usbcore nls_base scsi_mod e1000e thermal fan thermal_sys [last unloaded: scsi_wait_scan] [17392.420099] CPU 1: [17392.420099] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables xen_evtchn xenfs 8021q garp bridge stp coretemp lm85 hwmon_vid loop evdev video output tpm_tis tpm snd_pcsp tpm_bios psmouse snd_pcm serio_raw snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core processor button acpi_processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid1 md_mod sd_mod crc_t10dif ata_piix ehci_hcd uhci_hcd ata_generic libata usbcore nls_base scsi_mod e1000e thermal fan thermal_sys [last unloaded: scsi_wait_scan] [17392.448018] Pid: 538, comm: scsi_eh_1 Not tainted 2.6.32.19 #2 [17392.452071] RIP: e030:[] [] hypercall_page+0x22a/0x1001 [17392.456082] RSP: e02b:ffff88000205bbc8 EFLAGS: 00000246 [17392.460069] RAX: 0000000000040000 RBX: ffff880002353000 RCX: ffffffff8100922a [17392.464074] RDX: 000000000000d729 RSI: 0000000000000000 RDI: 0000000000000000 [17392.464074] RBP: ffff880002312000 R08: 0000000000000001 R09: 00000000000000fa [17392.468023] R10: ffff88000206d170 R11: 0000000000000246 R12: ffff880002338000 [17392.468023] R13: ffff880002353048 R14: ffff88001e7f0900 R15: ffff880002698000 [17392.468023] FS: 00007f3b93f6a6e0(0000) GS:ffff880002f52000(0000) knlGS:0000000000000000 [17392.472075] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [17392.472075] CR2: 00007f3b9356d1a4 CR3: 000000001f657000 CR4: 0000000000002660 [17392.472075] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17392.476070] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [17392.476070] Call Trace: [17392.476070] [] ? xen_force_evtchn_callback+0x9/0xa [17392.476070] [] ? check_events+0x12/0x20 [17392.480079] ata1.01: hard resetting link [17392.480099] [] ? xen_irq_enable_direct_end+0x0/0x7 [17392.480099] [] ? scsi_request_fn+0x3b9/0x4da [scsi_mod] [17392.480099] [] ? __blk_run_queue+0x35/0x66 [17392.484070] [] ? blk_run_queue+0x20/0x32 [17392.484070] [] ? scsi_run_queue+0x2da/0x370 [scsi_m= od] [17392.488084] [] ? kmem_cache_free+0x71/0xa4 [17392.488084] [] ? scsi_next_command+0x2d/0x39 [scsi_mod] [17392.488084] [] ? scsi_io_completion+0x1ed/0x416 [scsi_mod] [17392.488084] [] ? scsi_eh_flush_done_q+0xec/0x10d [scsi_mod] [17392.488084] [] ? ata_scsi_error+0x5e9/0x681 [libata= ] [17392.488084] [] ? scsi_error_handler+0xec/0x5a9 [scsi_mod] [17392.496345] [] ? scsi_error_handler+0x0/0x5a9 [scsi_mod] [17392.496345] [] ? kthread+0x75/0x7d [17392.496345] [] ? child_rip+0xa/0x20 [17392.496345] [] ? int_ret_from_sys_call+0x7/0x1b [17392.500072] [] ? retint_restore_args+0x5/0x6 [17392.500072] [] ? child_rip+0x0/0x20 [17392.956361] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [17392.957486] ata1.01: SATA link down (SStatus 0 SControl 300) [17392.972769] ata1.00: configured for UDMA/133 [17392.973795] ata1.00: device reported invalid CHS sector 0 [17392.974743] ata1: EH complete [17393.149020] md: checkpointing resync of md2. [17393.482545] RAID1 conf printout: [17393.483122] --- wd:1 rd:2 [17393.483585] disk 0, wo:0, o:1, dev:sda3 [17393.484259] disk 1, wo:1, o:0, dev:sdb3 [17393.492056] RAID1 conf printout: [17393.492628] --- wd:1 rd:2 [17393.493108] disk 0, wo:0, o:1, dev:sda3 [17393.494841] md: resync of RAID array md2 [17393.495559] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [17393.496573] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync. [17393.498235] md: using 128k window, over a total of 958020096 blocks. [17393.498466] md: resuming resync of md2 from checkpoint. [17393.498466] md: md2: resync done. [17393.824165] RAID1 conf printout: [17393.825572] --- wd:1 rd:2 [17393.826761] disk 0, wo:0, o:1, dev:sda3