From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jonathan Tripathy Subject: Re: Block Size for Windows Date: Tue, 28 Aug 2012 21:15:22 +0100 Message-ID: <503D26DA.8060205@abpni.co.uk> References: <503BC71A.6050207@abpni.co.uk> <503BC82F.1030107@abpni.co.uk> <503BC897.3020606@abpni.co.uk> <503BCDC6.9010807@abpni.co.uk> <503BD1E8.7060402@abpni.co.uk> <503BD37C.9010403@abpni.co.uk> <503BD55D.2060408@abpni.co.uk> <503BEE7F.4020202@abpni.co.uk> <6035A0D088A63A46850C3988ED045A4B29A75808@BITCOM1.int.sbss.com.au> <503D2411.70105@abpni.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <503D2411.70105-Nf8S+5hNwl710XsdtD+oqA@public.gmane.org> Sender: linux-bcache-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: James Harper Cc: Joseph Glanville , Kent Overstreet , "linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" List-Id: linux-bcache@vger.kernel.org On 28/08/2012 21:03, Jonathan Tripathy wrote: > On 28/08/2012 01:59, James Harper wrote: >>> On 27/08/2012 21:18, Joseph Glanville wrote: >>>> On 28 August 2012 06:15, Jonathan Tripathy wr= ote: >>>>> On 27/08/2012 21:07, Jonathan Tripathy wrote: >>>>>> On 27/08/2012 21:00, Jonathan Tripathy wrote: >>>>>>>> 2) The windows setup didn't complain that it couldn't install = on >>>>>>>> the LV, but once I clicked 'next', the Dom0 crashed and the se= rver >>>>>>>> rebooted. A lot of output was displayed on screen but quickly >>>>>>>> vanished as the system rebooted. I'm trying to see if the outp= ut >>>>>>>> was saved anywhere. Any ideas why this could of happened and/o= r >>> where the output might be saved? >>>>>>>> >>>>>>> I'd also like to add that after the server came back up, the md >>>>>>> raid array started rebuilding. I wondering if that's just a >>>>>>> coincidence (due to the forced reboot), or a sign of something >>>>>>> wrong with the md integration with bcache? >>>>>>> >>>>>>> I'm going to see if Windows installs natively on the md array (= it's >>>>>>> RAID >>>>>>> 10 btw) and post back here. >>>>>> Ok, so trying to install Windows directly onto the spindles caus= es >>>>>> the same thing to happen. I'm going to try and boot up into the >>>>>> non-bcache kernel (The default ubuntu one) and see if it works >>>>>> there. If it fails there, then this is clearly a xen and/or=20 >>>>>> mdraid issue... >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>> Ok, so booting into the default Ubuntu kernel, the windows >>>>> installation seems to progress just fine. >>>>> >>>>> Does this mean there is something wrong with the mdraid code in t= he >>>>> bcache kernel? >>>>> >>>>> Actually, I'm not telling the whole story. The kernel I'm using i= s >>>>> the >>>>> bcache-3.2 tree (from evilpriate.org) with changes merged in from >>>>> kernel.org's 3.2.27 tree. There were no merge conflicts when I di= d >>>>> the git merge. >>>>> >>>>> What do you think I should do? >>>>> >>>>> >>>>> Thanks >>>>> >>>>> --=20 >>>>> To unsubscribe from this list: send the line "unsubscribe >>>>> linux-bcache" in the body of a message to majordomo-u79uwXL29TbrhsbdSgBK9A@public.gmane.org= rg >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> I would recommend booting with the raw bcache-3.2 branch before >>>> applying the stable patches (even though they should be fine) and >>>> trying to catch the panic. >>>> This is easiest done with a serial port and setting it to the kern= el >>>> console on the kernel command line in grub. >>>> >>>> Joseph. >>> Hi There, >>> >>> I can confirm that the problem occurs even when using the raw=20 >>> bcache-3.2 >>> branch from evilpirate.org. Just to clarify, I am trying to install= =20 >>> Windows >>> Server 2008 in a Xen HVM DomU, onto an LV which is on top of a=20 >>> MDRAID 10 >>> array. Using the bcache-3.2 kernel, the system reboots (after >>> panicing) as soon as I click 'next' after selecting the drive to=20 >>> install windows >>> onto. Using the standard Ubuntu kernel everything works as normal. = This >>> leads me to believe that there is an issue with the mdraid code=20 >>> inside the >>> bcache-3.2 tree. I'd like to stress that I wasn't doing any bcachin= g=20 >>> during this >>> test. >>> >> FWIW, i'm using the 3.2 patches applied to a Debian kernel with lvm=20 >> on raid1 (not raid10) on bcache and it's all working fine since I=20 >> changed to a 512 byte block size. I haven't done an install of 2008,= =20 >> just 2003, but there doesn't seem to be any problems. >> >>> What should my next step be? Try and find a serial cable to capture= the >>> debug output? >>> >> Before tinkering with a serial cable, see if the system is alive=20 >> enough to use netconsole - it can be a bit of a timesaver. >> >> James > Hi Everyone, > > Here is the trace as capture by netconsole: > > [ 130.844069] ------------[ cut here ]------------ > [ 130.844165] kernel BUG at fs/bio.c:420! > [ 130.844232] invalid opcode: 0000 [#1] SMP > [ 130.844404] CPU 4 > [ 130.844448] Modules linked in: xen_netback xen_blkback 8021q garp=20 > xt_physdev bridge stp ebtable_filter ebtables ip6table_filter=20 > ip6_tables iptable_filter ip_tables x_tables xen_gntdev xen_evtchn=20 > xenfs nls_iso8859_1 nls_cp437 vfat fat netconsole configfs psmouse lp= =20 > joydev parport video mac_hid serio_raw usb_storage uas raid456=20 > async_pq async_xor xor async_memcpy async_raid6_recov usbhid hid=20 > raid10 e1000e raid6_pq async_tx raid1 raid0 multipath linear > [ 130.846688] > [ 130.846746] Pid: 0, comm: swapper/4 Not tainted 3.2.0+ #1 Supermicr= o=20 > X9SCI/X9SCA/X9SCI/X9SCA > [ 130.846956] RIP: e030:[] []=20 > bio_put+0x27/0x30 > [ 130.847089] RSP: e02b:ffff88005ff3cb80 EFLAGS: 00010246 > [ 130.847155] RAX: 0000000000000000 RBX: 00000000fffffffb RCX:=20 > 00000000000003a6 > [ 130.847224] RDX: 00000000000003a5 RSI: 0000000000016c00 RDI:=20 > ffff880039b58918 > [ 130.847293] RBP: ffff88005ff3cb80 R08: ffffffff81115e67 R09:=20 > 0000000000000100 > [ 130.847362] R10: ffff88001a16eea0 R11: 0000000000000000 R12:=20 > ffff880017fd4018 > [ 130.847431] R13: ffff880039b58918 R14: ffff880017220028 R15:=20 > ffff88001a6dd400 > [ 130.847504] FS: 00007f507e004700(0000) GS:ffff88005ff39000(0000)=20 > knlGS:0000000000000000 > [ 130.847590] CS: e033 DS: 002b ES: 002b CR0: 000000008005003b > [ 130.847656] CR2: 00007f507d6910b0 CR3: 000000003989a000 CR4:=20 > 0000000000002660 > [ 130.847725] DR0: 0000000000000000 DR1: 0000000000000000 DR2:=20 > 0000000000000000 > [ 130.847794] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:=20 > 0000000000000400 > [ 130.847863] Process swapper/4 (pid: 0, threadinfo ffff88003d646000,= =20 > task ffff88003d648000) > [ 130.847952] Stack: > [ 130.848012] ffff88005ff3cbc0 ffffffff8150238e ffff88003d6e40b0=20 > ffff880039b58900 > [ 130.848258] ffff880039b58920 ffff88001a278100 0000000000000018=20 > 0000000000000001 > [ 130.848505] ffff88005ff3cbd0 ffffffff811a5f5d ffff88005ff3cbf0=20 > ffffffff811a6f3e > [ 130.848749] Call Trace: > [ 130.848810] > [ 130.848914] [] clone_endio+0x8e/0xd0 > [ 130.848979] [] bio_endio+0x1d/0x40 > [ 130.849047] [] bio_pair_release+0x3e/0x50 > [ 130.849113] [] bio_pair_end+0x1f/0x30 > [ 130.849180] [] bio_endio+0x1d/0x40 > [ 130.849248] [] raid_end_bio_io+0xf2/0x100 [raid10= ] > [ 130.849319] [] one_write_done+0x38/0x50 [raid10] > [ 130.849390] [] raid10_end_write_request+0xc4/0x13= 0=20 > [raid10] > [ 130.849476] [] bio_endio+0x1d/0x40 > [ 130.849543] [] req_bio_endio.isra.49+0xa3/0xe0 > [ 130.849614] [] blk_update_request+0xfd/0x480 > [ 130.849681] [] blk_update_bidi_request+0x31/0x90 > [ 130.849751] [] blk_end_bidi_request+0x2c/0x80 > [ 130.849819] [] blk_end_request+0x10/0x20 > [ 130.849888] [] scsi_io_completion+0xaf/0x630 > [ 130.849960] [] ?=20 > _raw_spin_unlock_irqrestore+0x1e/0x30 > [ 130.850031] [] scsi_finish_command+0xc1/0x120 > [ 130.850098] [] scsi_softirq_done+0x13e/0x150 > [ 130.850167] [] blk_done_softirq+0x83/0xa0 > [ 130.850237] [] __do_softirq+0xa8/0x210 > [ 130.850304] [] ? __xen_evtchn_do_upcall+0x207/0x2= 50 > [ 130.850373] [] call_softirq+0x1c/0x30 > [ 130.850442] [] do_softirq+0x65/0xa0 > [ 130.850507] [] irq_exit+0x8e/0xb0 > [ 130.850574] [] xen_evtchn_do_upcall+0x35/0x50 > [ 130.850643] [] xen_do_hypervisor_callback+0x1e/0x= 30 > [ 130.850711] > [ 130.850810] [] ? hypercall_page+0x3aa/0x1000 > [ 130.850881] [] ? hypercall_page+0x3aa/0x1000 > [ 130.850950] [] ? xen_safe_halt+0x10/0x20 > [ 130.851017] [] ? default_idle+0x53/0x1d0 > [ 130.851085] [] ? cpu_idle+0xd6/0x120 > [ 130.851153] [] ? xen_irq_enable_direct_reloc+0x4/= 0x4 > [ 130.851223] [] ? cpu_bringup_and_idle+0xe/0x10 > [ 130.851291] Code: 00 00 00 00 55 48 89 e5 66 66 66 66 90 8b 47 40 8= 5=20 > c0 74 17 f0 ff 4f 40 0f 94 c0 84 c0 75 05 5d c3 0f 1f 00 e8 2b ff ff=20 > ff 5d c3 <0f> 0b 0f 1f 80 00 00 00 00 55 48 89 e5 53 48 83 ec 08 66 6= 6 66 > [ 130.854057] RIP [] bio_put+0x27/0x30 > [ 130.854163] RSP > [ 130.854226] ---[ end trace fa3fbcc21926358a ]--- > [ 130.855127] Kernel panic - not syncing: Fatal exception in interrup= t > [ 130.855198] Pid: 0, comm: swapper/4 Tainted: G D 3.2.0+ #1 > [ 130.855267] Call Trace: > [ 130.855327] [] panic+0x91/0x1a7 > [ 130.855438] [] oops_end+0xea/0xf0 > [ 130.855505] [] die+0x58/0x90 > [ 130.855571] [] do_trap+0xc4/0x170 > [ 130.855636] [] do_invalid_op+0x95/0xb0 > [ 130.855702] [] ? bio_put+0x27/0x30 > [ 130.855767] [] ? xen_force_evtchn_callback+0xd/0x= 10 > [ 130.855838] [] ? check_events+0x12/0x20 > [ 130.855906] [] invalid_op+0x1b/0x20 > [ 130.855974] [] ? mempool_free_slab+0x17/0x20 > [ 130.856041] [] ? bio_put+0x27/0x30 > [ 130.856109] [] clone_endio+0x8e/0xd0 > [ 130.856175] [] bio_endio+0x1d/0x40 > [ 130.856241] [] bio_pair_release+0x3e/0x50 > [ 130.856308] [] bio_pair_end+0x1f/0x30 > [ 130.856374] [] bio_endio+0x1d/0x40 > [ 130.856443] [] raid_end_bio_io+0xf2/0x100 [raid10= ] > [ 130.856513] [] one_write_done+0x38/0x50 [raid10] > [ 130.856584] [] raid10_end_write_request+0xc4/0x13= 0=20 > [raid10] > [ 130.856671] [] bio_endio+0x1d/0x40 > [ 130.856738] [] req_bio_endio.isra.49+0xa3/0xe0 > [ 130.856807] [] blk_update_request+0xfd/0x480 > [ 130.856877] [] blk_update_bidi_request+0x31/0x90 > [ 130.856946] [] blk_end_bidi_request+0x2c/0x80 > [ 130.857014] [] blk_end_request+0x10/0x20 > [ 130.857081] [] scsi_io_completion+0xaf/0x630 > [ 130.857150] [] ?=20 > _raw_spin_unlock_irqrestore+0x1e/0x30 > [ 130.857219] [] scsi_finish_command+0xc1/0x120 > [ 130.857287] [] scsi_softirq_done+0x13e/0x150 > [ 130.857356] [] blk_done_softirq+0x83/0xa0 > [ 130.857423] [] __do_softirq+0xa8/0x210 > [ 130.857491] [] ? __xen_evtchn_do_upcall+0x207/0x2= 50 > [ 130.857562] [] call_softirq+0x1c/0x30 > [ 130.857631] [] do_softirq+0x65/0xa0 > [ 130.857697] [] irq_exit+0x8e/0xb0 > [ 130.857761] [] xen_evtchn_do_upcall+0x35/0x50 > [ 130.857829] [] xen_do_hypervisor_callback+0x1e/0x= 30 > [ 130.857897] [] ? hypercall_page+0x3aa/0x100= 0 > [ 130.858010] [] ? hypercall_page+0x3aa/0x1000 > [ 130.858080] [] ? xen_safe_halt+0x10/0x20 > [ 130.858146] [] ? default_idle+0x53/0x1d0 > [ 130.858214] [] ? cpu_idle+0xd6/0x120 > [ 130.858282] [] ? xen_irq_enable_direct_reloc+0x4/= 0x4 > [ 130.858351] [] ? cpu_bringup_and_idle+0xe/0x10 > > > I hope this helps. Let me know if you need me to do any further=20 > testing, or if you have any other questions regarding my environment. > > Thanks > =EF=BB=BF Also, I'd like to add that this issue does not occur when using=20 MD-RAID1. It only occurs when using MD-RAID10. Thanks