* Software raid 5 with XFS causing strange lockup problems @ 2006-10-11 6:07 Ian Williamson 2006-10-11 13:53 ` Eric Sandeen 0 siblings, 1 reply; 9+ messages in thread From: Ian Williamson @ 2006-10-11 6:07 UTC (permalink / raw) To: xfs I am running XFS on a software raid 5. I am doing this with a PCI controller with 4 SATA drives attached to it. When I play my music over the network through Samba from the raid volume my audio client will often loose the connection. This isn't remediated until I restart the machine with the raid controller or wait for an unknown amount of time. Either way, the problem still persists. Initially I though that this was Samba's fault, but I think it may be xfs related due to what was in /var/log/messages: Oct 9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in: serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod ide_generic processor Oct 9 22:37:33 ionlinux kernel: [105657.982985] EIP: 0060:[<f8a1353e>] Not tainted VLI Oct 9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246 (2.6.18 #1) Oct 9 22:37:33 ionlinux kernel: [105657.984000] [<f89b290c>] xfs_bmap_search_extents+0xdc/0x100 [xfs] Oct 9 22:37:33 ionlinux kernel: [105657.984083] [<f89baa22>] xfs_bmapi+0x302/0x2840 [xfs] Oct 9 22:37:33 ionlinux kernel: [105657.984163] [ide_dma_exec_cmd+48/64] ide_dma_exec_cmd+0x30/0x40 Oct 9 22:37:33 ionlinux kernel: [105657.984228] [ide_dma_start+51/80] ide_dma_start+0x33/0x50 Oct 9 22:37:33 ionlinux kernel: [105657.984294] [mempool_alloc+58/224] mempool_alloc+0x3a/0xe0 Oct 9 22:37:33 ionlinux kernel: [105657.984360] [cfq_set_request+447/752] cfq_set_request+0x1bf/0x2f0 Oct 9 22:37:33 ionlinux kernel: [105657.984425] [do_timer+1117/3040] do_timer+0x45d/0xbe0 Oct 9 22:37:33 ionlinux kernel: [105657.984489] [scheduler_tick+275/800] scheduler_tick+0x113/0x320 Oct 9 22:37:33 ionlinux kernel: [105657.984556] [<f89fed13>] xfs_inactive_free_eofblocks+0x123/0x340 [xfs] Oct 9 22:37:33 ionlinux kernel: [105657.984643] [<f8a0584a>] xfs_inactive+0xfa/0xcb0 [xfs] Oct 9 22:37:33 ionlinux kernel: [105657.984723] [kmem_freepages+118/160] kmem_freepages+0x76/0xa0 Oct 9 22:37:33 ionlinux kernel: [105657.984784] [slab_destroy+60/208] slab_destroy+0x3c/0xd0 Oct 9 22:37:33 ionlinux kernel: [105657.984845] [__pagevec_release_nonlru+49/144] __pagevec_release_nonlru+0x31/0x90 Oct 9 22:37:33 ionlinux kernel: [105657.984910] [memmove+80/112] memmove+0x50/0x70 Oct 9 22:37:33 ionlinux kernel: [105657.984973] [<f89ddba0>] xfs_iextract+0x90/0x150 [xfs] Oct 9 22:37:33 ionlinux kernel: [105657.985053] [<f89ddd4e>] xfs_ilock+0x8e/0xc0 [xfs] Oct 9 22:37:33 ionlinux kernel: [105657.985132] [<f89e21c5>] xfs_idestroy+0x65/0x90 [xfs] Oct 9 22:37:33 ionlinux kernel: [105657.985212] [<f8a00c1f>] xfs_finish_reclaim+0x10f/0x150 [xfs] Oct 9 22:37:33 ionlinux kernel: [105657.985294] [<f8a00d78>] xfs_reclaim+0x118/0x120 [xfs] Oct 9 22:37:33 ionlinux kernel: [105657.985375] [<f8a117f2>] xfs_fs_clear_inode+0x42/0x80 [xfs] Oct 9 22:37:33 ionlinux kernel: [105657.985456] [dquot_drop+0/128] dquot_drop+0x0/0x80 Oct 9 22:37:33 ionlinux kernel: [105657.985519] [clear_inode+153/304] clear_inode+0x99/0x130 Oct 9 22:37:33 ionlinux kernel: [105657.985582] [dispose_list+30/192] dispose_list+0x1e/0xc0 Oct 9 22:37:33 ionlinux kernel: [105657.985642] [__activate_task+41/64] __activate_task+0x29/0x40 Oct 9 22:37:33 ionlinux kernel: [105657.985704] [shrink_icache_memory+462/528] shrink_icache_memory+0x1ce/0x210 Oct 9 22:37:33 ionlinux kernel: [105657.985767] [shrink_slab+297/400] shrink_slab+0x129/0x190 Oct 9 22:37:33 ionlinux kernel: [105657.985830] [kswapd+718/1120] kswapd+0x2ce/0x460 Oct 9 22:37:33 ionlinux kernel: [105657.985893] [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 Oct 9 22:37:33 ionlinux kernel: [105657.985960] [kswapd+0/1120] kswapd+0x0/0x460 Oct 9 22:37:33 ionlinux kernel: [105657.986019] [kthread+253/272] kthread+0xfd/0x110 Oct 9 22:37:33 ionlinux kernel: [105657.986079] [kthread+0/272] kthread+0x0/0x110 Oct 9 22:37:33 ionlinux kernel: [105657.986139] [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10 When I SSH into the machine I performed a "ps aux" command and noticed that there were a lot of /usr/bin/smbd -U processes running In addition to this is, when I am SSHed into the machine and I attempt to copy files or directories to or from the volume the ssh window stops responding and I am forced to close the connection and open a new one. Performing massive reads and writes has also caused the physical terminal(?) to freeze when I am using the machine (i.e. not sshed/using the thing locally) Does anyone have any idea what might be causing this? The raid controller card or my XFS setup? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Software raid 5 with XFS causing strange lockup problems 2006-10-11 6:07 Software raid 5 with XFS causing strange lockup problems Ian Williamson @ 2006-10-11 13:53 ` Eric Sandeen 2006-10-11 16:21 ` Ian Williamson 0 siblings, 1 reply; 9+ messages in thread From: Eric Sandeen @ 2006-10-11 13:53 UTC (permalink / raw) To: Ian Williamson; +Cc: xfs Ian Williamson wrote: > I am running XFS on a software raid 5. I am doing this with a PCI > controller with 4 SATA drives attached to it. > > When I play my music over the network through Samba from the raid > volume my audio client will often loose the connection. This isn't > remediated until I restart the machine with the raid controller or > wait for an unknown amount of time. Either way, the problem still > persists. > > Initially I though that this was Samba's fault, but I think it may be > xfs related due to what was in /var/log/messages: > > Oct 9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in: > serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart > i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk > ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod > ide_generic processor > Oct 9 22:37:33 ionlinux kernel: [105657.982985] EIP: > 0060:[<f8a1353e>] Not tainted VLI > Oct 9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246 > (2.6.18 #1) It looks like you've edited this a bit too much, what came before this in the logs? Are you running on 4k stacks, out of curiosity? -Eric ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Software raid 5 with XFS causing strange lockup problems 2006-10-11 13:53 ` Eric Sandeen @ 2006-10-11 16:21 ` Ian Williamson 2006-10-11 16:36 ` Eric Sandeen 2006-10-11 17:19 ` [UNSURE] " Justin Piszcz 0 siblings, 2 replies; 9+ messages in thread From: Ian Williamson @ 2006-10-11 16:21 UTC (permalink / raw) To: Eric Sandeen, xfs Eric, That's all I have for the event in /var/log/messages.. For the raid configuration I have the following: ian@ionlinux:~$ sudo mdadm --detail /dev/md0 Password: /dev/md0: Version : 00.90.03 Creation Time : Wed Sep 13 22:04:11 2006 Raid Level : raid5 Array Size : 732587712 (698.65 GiB 750.17 GB) Device Size : 244195904 (232.88 GiB 250.06 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Oct 9 00:02:30 2006 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 86770f56:8e4f51e5:fd754630:f1c65359 Events : 0.54082 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync /dev/sdc1 3 8 49 3 active sync /dev/sdd1 I really have no idea what could be causing this. Sometimes after restart it still won't work through Samba, and I can never perform massive local reads and writes, i.e. a recursive copy off of the raid. On 10/11/06, Eric Sandeen <sandeen@sandeen.net> wrote: > Ian Williamson wrote: > > I am running XFS on a software raid 5. I am doing this with a PCI > > controller with 4 SATA drives attached to it. > > > > When I play my music over the network through Samba from the raid > > volume my audio client will often loose the connection. This isn't > > remediated until I restart the machine with the raid controller or > > wait for an unknown amount of time. Either way, the problem still > > persists. > > > > Initially I though that this was Samba's fault, but I think it may be > > xfs related due to what was in /var/log/messages: > > > > Oct 9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in: > > serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart > > i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk > > ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod > > ide_generic processor > > Oct 9 22:37:33 ionlinux kernel: [105657.982985] EIP: > > 0060:[<f8a1353e>] Not tainted VLI > > Oct 9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246 > > (2.6.18 #1) > > It looks like you've edited this a bit too much, what came before this in the logs? > > Are you running on 4k stacks, out of curiosity? > > -Eric > -- Ian Williamson ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Software raid 5 with XFS causing strange lockup problems 2006-10-11 16:21 ` Ian Williamson @ 2006-10-11 16:36 ` Eric Sandeen 2006-10-11 17:19 ` [UNSURE] " Justin Piszcz 1 sibling, 0 replies; 9+ messages in thread From: Eric Sandeen @ 2006-10-11 16:36 UTC (permalink / raw) To: Ian Williamson; +Cc: xfs Ian Williamson wrote: > Eric, > That's all I have for the event in /var/log/messages.. Weird. I don't even know if that's an oops or if so why, it's just a backtrace. Does your kernel have 4k stacks? -Eric ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [UNSURE] Re: Software raid 5 with XFS causing strange lockup problems 2006-10-11 16:21 ` Ian Williamson 2006-10-11 16:36 ` Eric Sandeen @ 2006-10-11 17:19 ` Justin Piszcz 2006-10-11 18:41 ` Ian Williamson 1 sibling, 1 reply; 9+ messages in thread From: Justin Piszcz @ 2006-10-11 17:19 UTC (permalink / raw) To: Ian Williamson; +Cc: Eric Sandeen, xfs Also, quick question-- What type of speed do you get with 4 drives connected to 1 card vs. I have 8 drives connected to 3-4 cards. What speed write/read? Justin. On Wed, 11 Oct 2006, Ian Williamson wrote: > Eric, > That's all I have for the event in /var/log/messages.. > > For the raid configuration I have the following: > ian@ionlinux:~$ sudo mdadm --detail /dev/md0 > Password: > /dev/md0: > Version : 00.90.03 > Creation Time : Wed Sep 13 22:04:11 2006 > Raid Level : raid5 > Array Size : 732587712 (698.65 GiB 750.17 GB) > Device Size : 244195904 (232.88 GiB 250.06 GB) > Raid Devices : 4 > Total Devices : 4 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Mon Oct 9 00:02:30 2006 > State : clean > Active Devices : 4 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > UUID : 86770f56:8e4f51e5:fd754630:f1c65359 > Events : 0.54082 > > Number Major Minor RaidDevice State > 0 8 1 0 active sync /dev/sda1 > 1 8 17 1 active sync /dev/sdb1 > 2 8 33 2 active sync /dev/sdc1 > 3 8 49 3 active sync /dev/sdd1 > > I really have no idea what could be causing this. Sometimes after > restart it still won't work through Samba, and I can never perform > massive local reads and writes, i.e. a recursive copy off of the raid. > > On 10/11/06, Eric Sandeen <sandeen@sandeen.net> wrote: > > Ian Williamson wrote: > > > I am running XFS on a software raid 5. I am doing this with a PCI > > > controller with 4 SATA drives attached to it. > > > > > > When I play my music over the network through Samba from the raid > > > volume my audio client will often loose the connection. This isn't > > > remediated until I restart the machine with the raid controller or > > > wait for an unknown amount of time. Either way, the problem still > > > persists. > > > > > > Initially I though that this was Samba's fault, but I think it may be > > > xfs related due to what was in /var/log/messages: > > > > > > Oct 9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in: > > > serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart > > > i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk > > > ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod > > > ide_generic processor > > > Oct 9 22:37:33 ionlinux kernel: [105657.982985] EIP: > > > 0060:[<f8a1353e>] Not tainted VLI > > > Oct 9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246 > > > (2.6.18 #1) > > > > It looks like you've edited this a bit too much, what came before this in > > the logs? > > > > Are you running on 4k stacks, out of curiosity? > > > > -Eric > > > > > -- > Ian Williamson > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [UNSURE] Re: Software raid 5 with XFS causing strange lockup problems 2006-10-11 17:19 ` [UNSURE] " Justin Piszcz @ 2006-10-11 18:41 ` Ian Williamson 2006-10-11 18:42 ` Justin Piszcz 0 siblings, 1 reply; 9+ messages in thread From: Ian Williamson @ 2006-10-11 18:41 UTC (permalink / raw) To: Justin Piszcz; +Cc: Eric Sandeen, xfs Justin, How would I go about benchmarking that? Eric, Sorry, but I'm not quite an expert on the internals of Linux. What are 4k stacks and how do I know if I have them. If it helps, I am using Ubuntu with a custom compiled Linux kernel. (This xfs/raid problem was also occured on the default Ubuntu server kernel...) Also, if that trace from /var/log/messages isn't of any use do you know where I can look to find more information on this? Is it possible that this is being caused b the cheap PCI SATA controller card that I am using? (It's the Rosewill RC-209) - Ian On 10/11/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > Also, quick question-- What type of speed do you get with 4 drives > connected to 1 card vs. I have 8 drives connected to 3-4 cards. > > What speed write/read? > > Justin. > > On Wed, 11 Oct 2006, Ian Williamson wrote: > > > Eric, > > That's all I have for the event in /var/log/messages.. > > > > For the raid configuration I have the following: > > ian@ionlinux:~$ sudo mdadm --detail /dev/md0 > > Password: > > /dev/md0: > > Version : 00.90.03 > > Creation Time : Wed Sep 13 22:04:11 2006 > > Raid Level : raid5 > > Array Size : 732587712 (698.65 GiB 750.17 GB) > > Device Size : 244195904 (232.88 GiB 250.06 GB) > > Raid Devices : 4 > > Total Devices : 4 > > Preferred Minor : 0 > > Persistence : Superblock is persistent > > > > Update Time : Mon Oct 9 00:02:30 2006 > > State : clean > > Active Devices : 4 > > Working Devices : 4 > > Failed Devices : 0 > > Spare Devices : 0 > > > > Layout : left-symmetric > > Chunk Size : 64K > > > > UUID : 86770f56:8e4f51e5:fd754630:f1c65359 > > Events : 0.54082 > > > > Number Major Minor RaidDevice State > > 0 8 1 0 active sync /dev/sda1 > > 1 8 17 1 active sync /dev/sdb1 > > 2 8 33 2 active sync /dev/sdc1 > > 3 8 49 3 active sync /dev/sdd1 > > > > I really have no idea what could be causing this. Sometimes after > > restart it still won't work through Samba, and I can never perform > > massive local reads and writes, i.e. a recursive copy off of the raid. > > > > On 10/11/06, Eric Sandeen <sandeen@sandeen.net> wrote: > > > Ian Williamson wrote: > > > > I am running XFS on a software raid 5. I am doing this with a PCI > > > > controller with 4 SATA drives attached to it. > > > > > > > > When I play my music over the network through Samba from the raid > > > > volume my audio client will often loose the connection. This isn't > > > > remediated until I restart the machine with the raid controller or > > > > wait for an unknown amount of time. Either way, the problem still > > > > persists. > > > > > > > > Initially I though that this was Samba's fault, but I think it may be > > > > xfs related due to what was in /var/log/messages: > > > > > > > > Oct 9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in: > > > > serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart > > > > i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk > > > > ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod > > > > ide_generic processor > > > > Oct 9 22:37:33 ionlinux kernel: [105657.982985] EIP: > > > > 0060:[<f8a1353e>] Not tainted VLI > > > > Oct 9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246 > > > > (2.6.18 #1) > > > > > > It looks like you've edited this a bit too much, what came before this in > > > the logs? > > > > > > Are you running on 4k stacks, out of curiosity? > > > > > > -Eric > > > > > > > > > -- > > Ian Williamson > > > > > -- Ian Williamson ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [UNSURE] Re: Software raid 5 with XFS causing strange lockup problems 2006-10-11 18:41 ` Ian Williamson @ 2006-10-11 18:42 ` Justin Piszcz 2006-10-11 19:10 ` Ian Williamson 0 siblings, 1 reply; 9+ messages in thread From: Justin Piszcz @ 2006-10-11 18:42 UTC (permalink / raw) To: Ian Williamson; +Cc: Eric Sandeen, xfs A simple hdparm -t /dev/md0 for the read speed, but I'd be more interested in write speed. dd if=/dev/zero | pipebench > /path/on/raid.dat Then report the write speed in MB/s. I assume this is on a regular PCI card, which is why I am interested in the speeds. On Wed, 11 Oct 2006, Ian Williamson wrote: > Justin, > How would I go about benchmarking that? > > Eric, > Sorry, but I'm not quite an expert on the internals of Linux. What are > 4k stacks and how do I know if I have them. If it helps, I am using > Ubuntu with a custom compiled Linux kernel. (This xfs/raid problem was > also occured on the default Ubuntu server kernel...) > > Also, if that trace from /var/log/messages isn't of any use do you > know where I can look to find more information on this? Is it possible > that this is being caused b the cheap PCI SATA controller card that I > am using? (It's the Rosewill RC-209) > > - Ian > > On 10/11/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > > Also, quick question-- What type of speed do you get with 4 drives > > connected to 1 card vs. I have 8 drives connected to 3-4 cards. > > > > What speed write/read? > > > > Justin. > > > > On Wed, 11 Oct 2006, Ian Williamson wrote: > > > > > Eric, > > > That's all I have for the event in /var/log/messages.. > > > > > > For the raid configuration I have the following: > > > ian@ionlinux:~$ sudo mdadm --detail /dev/md0 > > > Password: > > > /dev/md0: > > > Version : 00.90.03 > > > Creation Time : Wed Sep 13 22:04:11 2006 > > > Raid Level : raid5 > > > Array Size : 732587712 (698.65 GiB 750.17 GB) > > > Device Size : 244195904 (232.88 GiB 250.06 GB) > > > Raid Devices : 4 > > > Total Devices : 4 > > > Preferred Minor : 0 > > > Persistence : Superblock is persistent > > > > > > Update Time : Mon Oct 9 00:02:30 2006 > > > State : clean > > > Active Devices : 4 > > > Working Devices : 4 > > > Failed Devices : 0 > > > Spare Devices : 0 > > > > > > Layout : left-symmetric > > > Chunk Size : 64K > > > > > > UUID : 86770f56:8e4f51e5:fd754630:f1c65359 > > > Events : 0.54082 > > > > > > Number Major Minor RaidDevice State > > > 0 8 1 0 active sync /dev/sda1 > > > 1 8 17 1 active sync /dev/sdb1 > > > 2 8 33 2 active sync /dev/sdc1 > > > 3 8 49 3 active sync /dev/sdd1 > > > > > > I really have no idea what could be causing this. Sometimes after > > > restart it still won't work through Samba, and I can never perform > > > massive local reads and writes, i.e. a recursive copy off of the raid. > > > > > > On 10/11/06, Eric Sandeen <sandeen@sandeen.net> wrote: > > > > Ian Williamson wrote: > > > > > I am running XFS on a software raid 5. I am doing this with a PCI > > > > > controller with 4 SATA drives attached to it. > > > > > > > > > > When I play my music over the network through Samba from the raid > > > > > volume my audio client will often loose the connection. This isn't > > > > > remediated until I restart the machine with the raid controller or > > > > > wait for an unknown amount of time. Either way, the problem still > > > > > persists. > > > > > > > > > > Initially I though that this was Samba's fault, but I think it may be > > > > > xfs related due to what was in /var/log/messages: > > > > > > > > > > Oct 9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in: > > > > > serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart > > > > > i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk > > > > > ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod > > > > > ide_generic processor > > > > > Oct 9 22:37:33 ionlinux kernel: [105657.982985] EIP: > > > > > 0060:[<f8a1353e>] Not tainted VLI > > > > > Oct 9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246 > > > > > (2.6.18 #1) > > > > > > > > It looks like you've edited this a bit too much, what came before this > > > > in > > > > the logs? > > > > > > > > Are you running on 4k stacks, out of curiosity? > > > > > > > > -Eric > > > > > > > > > > > > > -- > > > Ian Williamson > > > > > > > > > > > -- > Ian Williamson > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [UNSURE] Re: Software raid 5 with XFS causing strange lockup problems 2006-10-11 18:42 ` Justin Piszcz @ 2006-10-11 19:10 ` Ian Williamson 2006-10-12 1:12 ` Timothy Shimmin 0 siblings, 1 reply; 9+ messages in thread From: Ian Williamson @ 2006-10-11 19:10 UTC (permalink / raw) To: Eric Sandeen, xfs /dev/md0: Timing buffered disk reads: 286 MB in 3.01 seconds = 94.97 MB/sec For write I don't have pipebench installed, and this isn't internet facing at the moment, so I can't install it. I just ran an xfs_repair on /dev/md0 and it did this: ------------------------------------------------------------------------- ian@ionlinux:~$ sudo xfs_repair /dev/md0 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 bad attribute format 0 in inode 260, resetting value - agno = 1 inode 135921976 - bad extent starting block number 955543538733351, offset 2405220210012692 bad data fork in inode 135921976 cleared inode 135921976 zero length extent (off = 0, fsbno = 0) in ino 136766006 bad data fork in inode 136766006 cleared inode 136766006 - agno = 2 inode 268439335 - bad extent starting block number 4389451776, offset 8989827926016 bad data fork in inode 268439335 cleared inode 268439335 - agno = 3 inode 402653478 - bad extent starting block number 6493419520, offset 123364807018496 bad data fork in inode 402653478 cleared inode 402653478 - agno = 4 - agno = 5 - agno = 6 - agno = 7 inode 939524376 - bad extent starting block number 384617748308622, offset 13946791523993872 bad data fork in inode 939524376 cleared inode 939524376 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 inode 2550140476 - bad extent starting block number 3836083423429920, offset 1232124454554406 bad data fork in inode 2550140476 cleared inode 2550140476 - agno = 20 - agno = 21 inode 2818586148 - bad extent starting block number 2465278532745658, offset 9727159296556827 bad data fork in inode 2818586148 cleared inode 2818586148 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 30 - agno = 31 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - clear lost+found (if it exists) ... - clearing existing "lost+found" inode - deleting existing "lost+found" entry - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 entry "07 - Film Score Pt. II.mp3" at block 0 offset 312 in directory inode 135921969 references free inode 135921976 clearing inode number in entry at offset 312... entry "Torrent downloaded from Demonoid.com.txt" in shortform directory 136766004 references free inode 136766006 junking entry "Torrent downloaded from Demonoid.com.txt" in directory inode 136766004 - agno = 2 entry "robot_worldlight.png" at block 3 offset 2608 in directory inode 268436754 references free inode 268439335 clearing inode number in entry at offset 2608... - agno = 3 entry "automail.php" at block 0 offset 104 in directory inode 402653475 references free inode 402653478 clearing inode number in entry at offset 104... - agno = 4 - agno = 5 - agno = 6 - agno = 7 entry "core.write_compiled_include.php" at block 0 offset 808 in directory inode 939524356 references free inode 939524376 clearing inode number in entry at offset 808... - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 entry "auth.php" at block 0 offset 48 in directory inode 2550140475 references free inode 2550140476 clearing inode number in entry at offset 48... - agno = 20 - agno = 21 entry "IMG_0245.jpg" at block 0 offset 1944 in directory inode 2818581782 references free inode 2818586148 clearing inode number in entry at offset 1944... - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 30 - agno = 31 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... rebuilding directory inode 135921969 rebuilding directory inode 2818581782 rebuilding directory inode 2550140475 rebuilding directory inode 268436754 rebuilding directory inode 402653475 rebuilding directory inode 939524356 - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... disconnected dir inode 3221929786, moving to lost+found Phase 7 - verify and correct link counts... done ------------------------------------------------------------------------- Right now I am copying a 20Gig directory off of the raid onto another drive with no problems. Does an xfs filesystem need to be repaired on a regular basis? Any ideas on what might be "corrupting" it? On 10/11/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > A simple hdparm -t /dev/md0 for the read speed, but I'd be more interested > in write speed. > > dd if=/dev/zero | pipebench > /path/on/raid.dat > > Then report the write speed in MB/s. > > I assume this is on a regular PCI card, which is why I am interested in > the speeds. > > > On Wed, 11 Oct 2006, Ian Williamson wrote: > > > Justin, > > How would I go about benchmarking that? > > > > Eric, > > Sorry, but I'm not quite an expert on the internals of Linux. What are > > 4k stacks and how do I know if I have them. If it helps, I am using > > Ubuntu with a custom compiled Linux kernel. (This xfs/raid problem was > > also occured on the default Ubuntu server kernel...) > > > > Also, if that trace from /var/log/messages isn't of any use do you > > know where I can look to find more information on this? Is it possible > > that this is being caused b the cheap PCI SATA controller card that I > > am using? (It's the Rosewill RC-209) > > > > - Ian > > > > On 10/11/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > > > Also, quick question-- What type of speed do you get with 4 drives > > > connected to 1 card vs. I have 8 drives connected to 3-4 cards. > > > > > > What speed write/read? > > > > > > Justin. > > > > > > On Wed, 11 Oct 2006, Ian Williamson wrote: > > > > > > > Eric, > > > > That's all I have for the event in /var/log/messages.. > > > > > > > > For the raid configuration I have the following: > > > > ian@ionlinux:~$ sudo mdadm --detail /dev/md0 > > > > Password: > > > > /dev/md0: > > > > Version : 00.90.03 > > > > Creation Time : Wed Sep 13 22:04:11 2006 > > > > Raid Level : raid5 > > > > Array Size : 732587712 (698.65 GiB 750.17 GB) > > > > Device Size : 244195904 (232.88 GiB 250.06 GB) > > > > Raid Devices : 4 > > > > Total Devices : 4 > > > > Preferred Minor : 0 > > > > Persistence : Superblock is persistent > > > > > > > > Update Time : Mon Oct 9 00:02:30 2006 > > > > State : clean > > > > Active Devices : 4 > > > > Working Devices : 4 > > > > Failed Devices : 0 > > > > Spare Devices : 0 > > > > > > > > Layout : left-symmetric > > > > Chunk Size : 64K > > > > > > > > UUID : 86770f56:8e4f51e5:fd754630:f1c65359 > > > > Events : 0.54082 > > > > > > > > Number Major Minor RaidDevice State > > > > 0 8 1 0 active sync /dev/sda1 > > > > 1 8 17 1 active sync /dev/sdb1 > > > > 2 8 33 2 active sync /dev/sdc1 > > > > 3 8 49 3 active sync /dev/sdd1 > > > > > > > > I really have no idea what could be causing this. Sometimes after > > > > restart it still won't work through Samba, and I can never perform > > > > massive local reads and writes, i.e. a recursive copy off of the raid. > > > > > > > > On 10/11/06, Eric Sandeen <sandeen@sandeen.net> wrote: > > > > > Ian Williamson wrote: > > > > > > I am running XFS on a software raid 5. I am doing this with a PCI > > > > > > controller with 4 SATA drives attached to it. > > > > > > > > > > > > When I play my music over the network through Samba from the raid > > > > > > volume my audio client will often loose the connection. This isn't > > > > > > remediated until I restart the machine with the raid controller or > > > > > > wait for an unknown amount of time. Either way, the problem still > > > > > > persists. > > > > > > > > > > > > Initially I though that this was Samba's fault, but I think it may be > > > > > > xfs related due to what was in /var/log/messages: > > > > > > > > > > > > Oct 9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in: > > > > > > serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart > > > > > > i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk > > > > > > ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod > > > > > > ide_generic processor > > > > > > Oct 9 22:37:33 ionlinux kernel: [105657.982985] EIP: > > > > > > 0060:[<f8a1353e>] Not tainted VLI > > > > > > Oct 9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246 > > > > > > (2.6.18 #1) > > > > > > > > > > It looks like you've edited this a bit too much, what came before this > > > > > in > > > > > the logs? > > > > > > > > > > Are you running on 4k stacks, out of curiosity? > > > > > > > > > > -Eric > > > > > > > > > > > > > > > > > -- > > > > Ian Williamson > > > > > > > > > > > > > > > > > -- > > Ian Williamson > > > > > -- Ian Williamson ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [UNSURE] Re: Software raid 5 with XFS causing strange lockup problems 2006-10-11 19:10 ` Ian Williamson @ 2006-10-12 1:12 ` Timothy Shimmin 0 siblings, 0 replies; 9+ messages in thread From: Timothy Shimmin @ 2006-10-12 1:12 UTC (permalink / raw) To: Ian Williamson, xfs Hi Ian, --On 11 October 2006 2:10:28 PM -0500 Ian Williamson <notian@gmail.com> wrote: > /dev/md0: > Timing buffered disk reads: 286 MB in 3.01 seconds = 94.97 MB/sec > > For write I don't have pipebench installed, and this isn't internet > facing at the moment, so I can't install it. > > I just ran an xfs_repair on /dev/md0 and it did this: > ------------------------------------------------------------------------- > ian@ionlinux:~$ sudo xfs_repair /dev/md0 > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > - found root inode chunk > Phase 3 - for each AG... > - scan and clear agi unlinked lists... > - process known inodes and perform inode discovery... > - agno = 0 > bad attribute format 0 in inode 260, resetting value > - agno = 1 > inode 135921976 - bad extent starting block number 955543538733351, > offset 2405220210012692 > bad data fork in inode 135921976 > cleared inode 135921976 > zero length extent (off = 0, fsbno = 0) in ino 136766006 > bad data fork in inode 136766006 > cleared inode 136766006 > - agno = 2 > inode 268439335 - bad extent starting block number 4389451776, offset > 8989827926016 > bad data fork in inode 268439335 > cleared inode 268439335 > - agno = 3 > inode 402653478 - bad extent starting block number 6493419520, offset > 123364807018496 > bad data fork in inode 402653478 > cleared inode 402653478 > - agno = 4 > - agno = 5 > - agno = 6 > - agno = 7 > inode 939524376 - bad extent starting block number 384617748308622, > offset 13946791523993872 > bad data fork in inode 939524376 > cleared inode 939524376 > - agno = 8 > - agno = 9 > - agno = 10 > - agno = 11 > - agno = 12 > - agno = 13 > - agno = 14 > - agno = 15 > - agno = 16 > - agno = 17 > - agno = 18 > - agno = 19 > inode 2550140476 - bad extent starting block number 3836083423429920, > offset 1232124454554406 > bad data fork in inode 2550140476 > cleared inode 2550140476 > - agno = 20 > - agno = 21 > inode 2818586148 - bad extent starting block number 2465278532745658, > offset 9727159296556827 > bad data fork in inode 2818586148 > cleared inode 2818586148 > - agno = 22 > - agno = 23 > - agno = 24 > - agno = 25 > - agno = 26 > - agno = 27 > - agno = 28 > - agno = 29 > - agno = 30 > - agno = 31 > - process newly discovered inodes... > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - clear lost+found (if it exists) ... > - clearing existing "lost+found" inode > - deleting existing "lost+found" entry > - check for inodes claiming duplicate blocks... > - agno = 0 > - agno = 1 > entry "07 - Film Score Pt. II.mp3" at block 0 offset 312 in directory > inode 135921969 references free inode 135921976 > clearing inode number in entry at offset 312... > entry "Torrent downloaded from Demonoid.com.txt" in shortform > directory 136766004 references free inode 136766006 > junking entry "Torrent downloaded from Demonoid.com.txt" in directory > inode 136766004 > - agno = 2 > entry "robot_worldlight.png" at block 3 offset 2608 in directory inode > 268436754 references free inode 268439335 > clearing inode number in entry at offset 2608... > - agno = 3 > entry "automail.php" at block 0 offset 104 in directory inode > 402653475 references free inode 402653478 > clearing inode number in entry at offset 104... > - agno = 4 > - agno = 5 > - agno = 6 > - agno = 7 > entry "core.write_compiled_include.php" at block 0 offset 808 in > directory inode 939524356 references free inode 939524376 > clearing inode number in entry at offset 808... > - agno = 8 > - agno = 9 > - agno = 10 > - agno = 11 > - agno = 12 > - agno = 13 > - agno = 14 > - agno = 15 > - agno = 16 > - agno = 17 > - agno = 18 > - agno = 19 > entry "auth.php" at block 0 offset 48 in directory inode 2550140475 > references free inode 2550140476 > clearing inode number in entry at offset 48... > - agno = 20 > - agno = 21 > entry "IMG_0245.jpg" at block 0 offset 1944 in directory inode > 2818581782 references free inode 2818586148 > clearing inode number in entry at offset 1944... > - agno = 22 > - agno = 23 > - agno = 24 > - agno = 25 > - agno = 26 > - agno = 27 > - agno = 28 > - agno = 29 > - agno = 30 > - agno = 31 > Phase 5 - rebuild AG headers and trees... > - reset superblock... > Phase 6 - check inode connectivity... > - resetting contents of realtime bitmap and summary inodes > - ensuring existence of lost+found directory > - traversing filesystem starting at / ... > rebuilding directory inode 135921969 > rebuilding directory inode 2818581782 > rebuilding directory inode 2550140475 > rebuilding directory inode 268436754 > rebuilding directory inode 402653475 > rebuilding directory inode 939524356 > - traversal finished ... > - traversing all unattached subtrees ... > - traversals finished ... > - moving disconnected inodes to lost+found ... > disconnected dir inode 3221929786, moving to lost+found > Phase 7 - verify and correct link counts... > done > ------------------------------------------------------------------------- > Right now I am copying a 20Gig directory off of the raid onto another > drive with no problems. Does an xfs filesystem need to be repaired on > a regular basis? Ideally, no :-) We don't expect corruption on a regular basis :) > Any ideas on what might be "corrupting" it? No sorry. Some random thoughts: Has the filesystem had any unclean mounts? Like due to power loss? Do you have a "Disabling barriers" msg in your logs for xfs? What were your mkfs and mount parameters, version of linux? Before repairing the filesystem, you can run "xfsrepair -n" to find the errors and then get a better print out of the inodes using "xfs_db -r -c 'inode xxxx' -c 'p' device". --Tim ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-10-12 1:13 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-10-11 6:07 Software raid 5 with XFS causing strange lockup problems Ian Williamson 2006-10-11 13:53 ` Eric Sandeen 2006-10-11 16:21 ` Ian Williamson 2006-10-11 16:36 ` Eric Sandeen 2006-10-11 17:19 ` [UNSURE] " Justin Piszcz 2006-10-11 18:41 ` Ian Williamson 2006-10-11 18:42 ` Justin Piszcz 2006-10-11 19:10 ` Ian Williamson 2006-10-12 1:12 ` Timothy Shimmin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox