Software raid 5 with XFS causing strange lockup problems

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Software raid 5 with XFS causing strange lockup problems
@ 2006-10-11  6:07 Ian Williamson
  2006-10-11 13:53 ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Ian Williamson @ 2006-10-11  6:07 UTC (permalink / raw)
  To: xfs

I am running XFS on a software raid 5. I am doing this with a PCI
controller with 4 SATA drives attached to it.

When I play my music over the network through Samba from the raid
volume my audio client will often loose the connection. This isn't
remediated until I restart the machine with the raid controller or
wait for an unknown amount of time. Either way, the problem still
persists.

Initially I though that this was Samba's fault, but I think it may be
xfs related due to what was in /var/log/messages:

Oct  9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in:
serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart
i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk
ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod
ide_generic processor
Oct  9 22:37:33 ionlinux kernel: [105657.982985] EIP:
0060:[<f8a1353e>]    Not tainted VLI
Oct  9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246   (2.6.18 #1)
Oct  9 22:37:33 ionlinux kernel: [105657.984000]  [<f89b290c>]
xfs_bmap_search_extents+0xdc/0x100 [xfs]
Oct  9 22:37:33 ionlinux kernel: [105657.984083]  [<f89baa22>]
xfs_bmapi+0x302/0x2840 [xfs]
Oct  9 22:37:33 ionlinux kernel: [105657.984163]
[ide_dma_exec_cmd+48/64] ide_dma_exec_cmd+0x30/0x40
Oct  9 22:37:33 ionlinux kernel: [105657.984228]
[ide_dma_start+51/80] ide_dma_start+0x33/0x50
Oct  9 22:37:33 ionlinux kernel: [105657.984294]
[mempool_alloc+58/224] mempool_alloc+0x3a/0xe0
Oct  9 22:37:33 ionlinux kernel: [105657.984360]
[cfq_set_request+447/752] cfq_set_request+0x1bf/0x2f0
Oct  9 22:37:33 ionlinux kernel: [105657.984425]  [do_timer+1117/3040]
do_timer+0x45d/0xbe0
Oct  9 22:37:33 ionlinux kernel: [105657.984489]
[scheduler_tick+275/800] scheduler_tick+0x113/0x320
Oct  9 22:37:33 ionlinux kernel: [105657.984556]  [<f89fed13>]
xfs_inactive_free_eofblocks+0x123/0x340 [xfs]
Oct  9 22:37:33 ionlinux kernel: [105657.984643]  [<f8a0584a>]
xfs_inactive+0xfa/0xcb0 [xfs]
Oct  9 22:37:33 ionlinux kernel: [105657.984723]
[kmem_freepages+118/160] kmem_freepages+0x76/0xa0
Oct  9 22:37:33 ionlinux kernel: [105657.984784]
[slab_destroy+60/208] slab_destroy+0x3c/0xd0
Oct  9 22:37:33 ionlinux kernel: [105657.984845]
[__pagevec_release_nonlru+49/144] __pagevec_release_nonlru+0x31/0x90
Oct  9 22:37:33 ionlinux kernel: [105657.984910]  [memmove+80/112]
memmove+0x50/0x70
Oct  9 22:37:33 ionlinux kernel: [105657.984973]  [<f89ddba0>]
xfs_iextract+0x90/0x150 [xfs]
Oct  9 22:37:33 ionlinux kernel: [105657.985053]  [<f89ddd4e>]
xfs_ilock+0x8e/0xc0 [xfs]
Oct  9 22:37:33 ionlinux kernel: [105657.985132]  [<f89e21c5>]
xfs_idestroy+0x65/0x90 [xfs]
Oct  9 22:37:33 ionlinux kernel: [105657.985212]  [<f8a00c1f>]
xfs_finish_reclaim+0x10f/0x150 [xfs]
Oct  9 22:37:33 ionlinux kernel: [105657.985294]  [<f8a00d78>]
xfs_reclaim+0x118/0x120 [xfs]
Oct  9 22:37:33 ionlinux kernel: [105657.985375]  [<f8a117f2>]
xfs_fs_clear_inode+0x42/0x80 [xfs]
Oct  9 22:37:33 ionlinux kernel: [105657.985456]  [dquot_drop+0/128]
dquot_drop+0x0/0x80
Oct  9 22:37:33 ionlinux kernel: [105657.985519]
[clear_inode+153/304] clear_inode+0x99/0x130
Oct  9 22:37:33 ionlinux kernel: [105657.985582]
[dispose_list+30/192] dispose_list+0x1e/0xc0
Oct  9 22:37:33 ionlinux kernel: [105657.985642]
[__activate_task+41/64] __activate_task+0x29/0x40
Oct  9 22:37:33 ionlinux kernel: [105657.985704]
[shrink_icache_memory+462/528] shrink_icache_memory+0x1ce/0x210
Oct  9 22:37:33 ionlinux kernel: [105657.985767]
[shrink_slab+297/400] shrink_slab+0x129/0x190
Oct  9 22:37:33 ionlinux kernel: [105657.985830]  [kswapd+718/1120]
kswapd+0x2ce/0x460
Oct  9 22:37:33 ionlinux kernel: [105657.985893]
[autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60
Oct  9 22:37:33 ionlinux kernel: [105657.985960]  [kswapd+0/1120]
kswapd+0x0/0x460
Oct  9 22:37:33 ionlinux kernel: [105657.986019]  [kthread+253/272]
kthread+0xfd/0x110
Oct  9 22:37:33 ionlinux kernel: [105657.986079]  [kthread+0/272]
kthread+0x0/0x110
Oct  9 22:37:33 ionlinux kernel: [105657.986139]
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10

When I SSH into the machine I performed a "ps aux" command and noticed
that there were a lot of /usr/bin/smbd -U processes running

In addition to this is, when I am SSHed into the machine and I attempt
to copy files or directories to or from the volume the ssh window
stops responding and I am forced to close the connection and open a
new one. Performing massive reads and writes has also caused the
physical terminal(?) to freeze when I am using the machine (i.e. not
sshed/using the thing locally)

Does anyone have any idea what might be causing this? The raid
controller card or my XFS setup?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Software raid 5 with XFS causing strange lockup problems
  2006-10-11  6:07 Software raid 5 with XFS causing strange lockup problems Ian Williamson
@ 2006-10-11 13:53 ` Eric Sandeen
  2006-10-11 16:21   ` Ian Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2006-10-11 13:53 UTC (permalink / raw)
  To: Ian Williamson; +Cc: xfs

Ian Williamson wrote:
> I am running XFS on a software raid 5. I am doing this with a PCI
> controller with 4 SATA drives attached to it.
> 
> When I play my music over the network through Samba from the raid
> volume my audio client will often loose the connection. This isn't
> remediated until I restart the machine with the raid controller or
> wait for an unknown amount of time. Either way, the problem still
> persists.
> 
> Initially I though that this was Samba's fault, but I think it may be
> xfs related due to what was in /var/log/messages:
> 
> Oct  9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in:
> serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart
> i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk
> ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod
> ide_generic processor
> Oct  9 22:37:33 ionlinux kernel: [105657.982985] EIP:
> 0060:[<f8a1353e>]    Not tainted VLI
> Oct  9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246   
> (2.6.18 #1)

It looks like you've edited this a bit too much, what came before this in the logs?

Are you running on 4k stacks, out of curiosity?

-Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Software raid 5 with XFS causing strange lockup problems
  2006-10-11 13:53 ` Eric Sandeen
@ 2006-10-11 16:21   ` Ian Williamson
  2006-10-11 16:36     ` Eric Sandeen
  2006-10-11 17:19     ` [UNSURE] " Justin Piszcz
  0 siblings, 2 replies; 9+ messages in thread
From: Ian Williamson @ 2006-10-11 16:21 UTC (permalink / raw)
  To: Eric Sandeen, xfs

Eric,
That's all I have for the event in /var/log/messages..

For the raid configuration I have the following:
ian@ionlinux:~$ sudo mdadm --detail /dev/md0
Password:
/dev/md0:
        Version : 00.90.03
  Creation Time : Wed Sep 13 22:04:11 2006
     Raid Level : raid5
     Array Size : 732587712 (698.65 GiB 750.17 GB)
    Device Size : 244195904 (232.88 GiB 250.06 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Oct  9 00:02:30 2006
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 86770f56:8e4f51e5:fd754630:f1c65359
         Events : 0.54082

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

I really have no idea what could be causing this. Sometimes after
restart it still won't work through Samba, and I can never perform
massive local reads and writes, i.e. a recursive copy off of the raid.

On 10/11/06, Eric Sandeen <sandeen@sandeen.net> wrote:
> Ian Williamson wrote:
> > I am running XFS on a software raid 5. I am doing this with a PCI
> > controller with 4 SATA drives attached to it.
> >
> > When I play my music over the network through Samba from the raid
> > volume my audio client will often loose the connection. This isn't
> > remediated until I restart the machine with the raid controller or
> > wait for an unknown amount of time. Either way, the problem still
> > persists.
> >
> > Initially I though that this was Samba's fault, but I think it may be
> > xfs related due to what was in /var/log/messages:
> >
> > Oct  9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in:
> > serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart
> > i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk
> > ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod
> > ide_generic processor
> > Oct  9 22:37:33 ionlinux kernel: [105657.982985] EIP:
> > 0060:[<f8a1353e>]    Not tainted VLI
> > Oct  9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246
> > (2.6.18 #1)
>
> It looks like you've edited this a bit too much, what came before this in the logs?
>
> Are you running on 4k stacks, out of curiosity?
>
> -Eric
>


-- 
Ian Williamson

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Software raid 5 with XFS causing strange lockup problems
  2006-10-11 16:21   ` Ian Williamson
@ 2006-10-11 16:36     ` Eric Sandeen
  2006-10-11 17:19     ` [UNSURE] " Justin Piszcz
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Sandeen @ 2006-10-11 16:36 UTC (permalink / raw)
  To: Ian Williamson; +Cc: xfs

Ian Williamson wrote:
> Eric,
> That's all I have for the event in /var/log/messages..

Weird.  I don't even know if that's an oops or if so why, it's just a
backtrace.

Does your kernel have 4k stacks?

-Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [UNSURE] Re: Software raid 5 with XFS causing strange lockup problems
  2006-10-11 16:21   ` Ian Williamson
  2006-10-11 16:36     ` Eric Sandeen
@ 2006-10-11 17:19     ` Justin Piszcz
  2006-10-11 18:41       ` Ian Williamson
  1 sibling, 1 reply; 9+ messages in thread
From: Justin Piszcz @ 2006-10-11 17:19 UTC (permalink / raw)
  To: Ian Williamson; +Cc: Eric Sandeen, xfs

Also, quick question-- What type of speed do you get with 4 drives 
connected to 1 card vs. I have 8 drives connected to 3-4 cards.

What speed write/read?

Justin.

On Wed, 11 Oct 2006, Ian Williamson wrote:

> Eric,
> That's all I have for the event in /var/log/messages..
> 
> For the raid configuration I have the following:
> ian@ionlinux:~$ sudo mdadm --detail /dev/md0
> Password:
> /dev/md0:
>        Version : 00.90.03
>  Creation Time : Wed Sep 13 22:04:11 2006
>     Raid Level : raid5
>     Array Size : 732587712 (698.65 GiB 750.17 GB)
>   Device Size : 244195904 (232.88 GiB 250.06 GB)
>  Raid Devices : 4
>  Total Devices : 4
> Preferred Minor : 0
>    Persistence : Superblock is persistent
> 
>    Update Time : Mon Oct  9 00:02:30 2006
>          State : clean
> Active Devices : 4
> Working Devices : 4
> Failed Devices : 0
>  Spare Devices : 0
> 
>         Layout : left-symmetric
>     Chunk Size : 64K
> 
>           UUID : 86770f56:8e4f51e5:fd754630:f1c65359
>         Events : 0.54082
> 
>    Number   Major   Minor   RaidDevice State
>       0       8        1        0      active sync   /dev/sda1
>       1       8       17        1      active sync   /dev/sdb1
>       2       8       33        2      active sync   /dev/sdc1
>       3       8       49        3      active sync   /dev/sdd1
> 
> I really have no idea what could be causing this. Sometimes after
> restart it still won't work through Samba, and I can never perform
> massive local reads and writes, i.e. a recursive copy off of the raid.
> 
> On 10/11/06, Eric Sandeen <sandeen@sandeen.net> wrote:
> > Ian Williamson wrote:
> > > I am running XFS on a software raid 5. I am doing this with a PCI
> > > controller with 4 SATA drives attached to it.
> > >
> > > When I play my music over the network through Samba from the raid
> > > volume my audio client will often loose the connection. This isn't
> > > remediated until I restart the machine with the raid controller or
> > > wait for an unknown amount of time. Either way, the problem still
> > > persists.
> > >
> > > Initially I though that this was Samba's fault, but I think it may be
> > > xfs related due to what was in /var/log/messages:
> > >
> > > Oct  9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in:
> > > serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart
> > > i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk
> > > ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod
> > > ide_generic processor
> > > Oct  9 22:37:33 ionlinux kernel: [105657.982985] EIP:
> > > 0060:[<f8a1353e>]    Not tainted VLI
> > > Oct  9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246
> > > (2.6.18 #1)
> >
> > It looks like you've edited this a bit too much, what came before this in
> > the logs?
> >
> > Are you running on 4k stacks, out of curiosity?
> >
> > -Eric
> >
> 
> 
> -- 
> Ian Williamson
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [UNSURE] Re: Software raid 5 with XFS causing strange lockup problems
  2006-10-11 17:19     ` [UNSURE] " Justin Piszcz
@ 2006-10-11 18:41       ` Ian Williamson
  2006-10-11 18:42         ` Justin Piszcz
  0 siblings, 1 reply; 9+ messages in thread
From: Ian Williamson @ 2006-10-11 18:41 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Eric Sandeen, xfs

Justin,
How would I go about benchmarking that?

Eric,
Sorry, but I'm not quite an expert on the internals of Linux. What are
4k stacks and how do I know if I have them. If it helps, I am using
Ubuntu with a custom compiled Linux kernel. (This xfs/raid problem was
also occured on the default Ubuntu server kernel...)

Also, if that trace from /var/log/messages isn't of any use do you
know where I can look to find more information on this? Is it possible
that this is being caused b the cheap PCI SATA controller card that I
am using? (It's the Rosewill RC-209)

- Ian

On 10/11/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> Also, quick question-- What type of speed do you get with 4 drives
> connected to 1 card vs. I have 8 drives connected to 3-4 cards.
>
> What speed write/read?
>
> Justin.
>
> On Wed, 11 Oct 2006, Ian Williamson wrote:
>
> > Eric,
> > That's all I have for the event in /var/log/messages..
> >
> > For the raid configuration I have the following:
> > ian@ionlinux:~$ sudo mdadm --detail /dev/md0
> > Password:
> > /dev/md0:
> >        Version : 00.90.03
> >  Creation Time : Wed Sep 13 22:04:11 2006
> >     Raid Level : raid5
> >     Array Size : 732587712 (698.65 GiB 750.17 GB)
> >   Device Size : 244195904 (232.88 GiB 250.06 GB)
> >  Raid Devices : 4
> >  Total Devices : 4
> > Preferred Minor : 0
> >    Persistence : Superblock is persistent
> >
> >    Update Time : Mon Oct  9 00:02:30 2006
> >          State : clean
> > Active Devices : 4
> > Working Devices : 4
> > Failed Devices : 0
> >  Spare Devices : 0
> >
> >         Layout : left-symmetric
> >     Chunk Size : 64K
> >
> >           UUID : 86770f56:8e4f51e5:fd754630:f1c65359
> >         Events : 0.54082
> >
> >    Number   Major   Minor   RaidDevice State
> >       0       8        1        0      active sync   /dev/sda1
> >       1       8       17        1      active sync   /dev/sdb1
> >       2       8       33        2      active sync   /dev/sdc1
> >       3       8       49        3      active sync   /dev/sdd1
> >
> > I really have no idea what could be causing this. Sometimes after
> > restart it still won't work through Samba, and I can never perform
> > massive local reads and writes, i.e. a recursive copy off of the raid.
> >
> > On 10/11/06, Eric Sandeen <sandeen@sandeen.net> wrote:
> > > Ian Williamson wrote:
> > > > I am running XFS on a software raid 5. I am doing this with a PCI
> > > > controller with 4 SATA drives attached to it.
> > > >
> > > > When I play my music over the network through Samba from the raid
> > > > volume my audio client will often loose the connection. This isn't
> > > > remediated until I restart the machine with the raid controller or
> > > > wait for an unknown amount of time. Either way, the problem still
> > > > persists.
> > > >
> > > > Initially I though that this was Samba's fault, but I think it may be
> > > > xfs related due to what was in /var/log/messages:
> > > >
> > > > Oct  9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in:
> > > > serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart
> > > > i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk
> > > > ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod
> > > > ide_generic processor
> > > > Oct  9 22:37:33 ionlinux kernel: [105657.982985] EIP:
> > > > 0060:[<f8a1353e>]    Not tainted VLI
> > > > Oct  9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246
> > > > (2.6.18 #1)
> > >
> > > It looks like you've edited this a bit too much, what came before this in
> > > the logs?
> > >
> > > Are you running on 4k stacks, out of curiosity?
> > >
> > > -Eric
> > >
> >
> >
> > --
> > Ian Williamson
> >
> >
>


-- 
Ian Williamson

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [UNSURE] Re: Software raid 5 with XFS causing strange lockup problems
  2006-10-11 18:41       ` Ian Williamson
@ 2006-10-11 18:42         ` Justin Piszcz
  2006-10-11 19:10           ` Ian Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Justin Piszcz @ 2006-10-11 18:42 UTC (permalink / raw)
  To: Ian Williamson; +Cc: Eric Sandeen, xfs

A simple hdparm -t /dev/md0 for the read speed, but I'd be more interested 
in write speed.

dd if=/dev/zero | pipebench > /path/on/raid.dat

Then report the write speed in MB/s.

I assume this is on a regular PCI card, which is why I am interested in 
the speeds.


On Wed, 11 Oct 2006, Ian Williamson wrote:

> Justin,
> How would I go about benchmarking that?
> 
> Eric,
> Sorry, but I'm not quite an expert on the internals of Linux. What are
> 4k stacks and how do I know if I have them. If it helps, I am using
> Ubuntu with a custom compiled Linux kernel. (This xfs/raid problem was
> also occured on the default Ubuntu server kernel...)
> 
> Also, if that trace from /var/log/messages isn't of any use do you
> know where I can look to find more information on this? Is it possible
> that this is being caused b the cheap PCI SATA controller card that I
> am using? (It's the Rosewill RC-209)
> 
> - Ian
> 
> On 10/11/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> > Also, quick question-- What type of speed do you get with 4 drives
> > connected to 1 card vs. I have 8 drives connected to 3-4 cards.
> >
> > What speed write/read?
> >
> > Justin.
> >
> > On Wed, 11 Oct 2006, Ian Williamson wrote:
> >
> > > Eric,
> > > That's all I have for the event in /var/log/messages..
> > >
> > > For the raid configuration I have the following:
> > > ian@ionlinux:~$ sudo mdadm --detail /dev/md0
> > > Password:
> > > /dev/md0:
> > >        Version : 00.90.03
> > >  Creation Time : Wed Sep 13 22:04:11 2006
> > >     Raid Level : raid5
> > >     Array Size : 732587712 (698.65 GiB 750.17 GB)
> > >  Device Size : 244195904 (232.88 GiB 250.06 GB)
> > >  Raid Devices : 4
> > >  Total Devices : 4
> > > Preferred Minor : 0
> > >    Persistence : Superblock is persistent
> > >
> > >    Update Time : Mon Oct  9 00:02:30 2006
> > >          State : clean
> > > Active Devices : 4
> > > Working Devices : 4
> > > Failed Devices : 0
> > >  Spare Devices : 0
> > >
> > >         Layout : left-symmetric
> > >     Chunk Size : 64K
> > >
> > >           UUID : 86770f56:8e4f51e5:fd754630:f1c65359
> > >         Events : 0.54082
> > >
> > >    Number   Major   Minor   RaidDevice State
> > >       0       8        1        0      active sync   /dev/sda1
> > >       1       8       17        1      active sync   /dev/sdb1
> > >       2       8       33        2      active sync   /dev/sdc1
> > >       3       8       49        3      active sync   /dev/sdd1
> > >
> > > I really have no idea what could be causing this. Sometimes after
> > > restart it still won't work through Samba, and I can never perform
> > > massive local reads and writes, i.e. a recursive copy off of the raid.
> > >
> > > On 10/11/06, Eric Sandeen <sandeen@sandeen.net> wrote:
> > > > Ian Williamson wrote:
> > > > > I am running XFS on a software raid 5. I am doing this with a PCI
> > > > > controller with 4 SATA drives attached to it.
> > > > >
> > > > > When I play my music over the network through Samba from the raid
> > > > > volume my audio client will often loose the connection. This isn't
> > > > > remediated until I restart the machine with the raid controller or
> > > > > wait for an unknown amount of time. Either way, the problem still
> > > > > persists.
> > > > >
> > > > > Initially I though that this was Samba's fault, but I think it may be
> > > > > xfs related due to what was in /var/log/messages:
> > > > >
> > > > > Oct  9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in:
> > > > > serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart
> > > > > i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk
> > > > > ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod
> > > > > ide_generic processor
> > > > > Oct  9 22:37:33 ionlinux kernel: [105657.982985] EIP:
> > > > > 0060:[<f8a1353e>]    Not tainted VLI
> > > > > Oct  9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246
> > > > > (2.6.18 #1)
> > > >
> > > > It looks like you've edited this a bit too much, what came before this
> > > > in
> > > > the logs?
> > > >
> > > > Are you running on 4k stacks, out of curiosity?
> > > >
> > > > -Eric
> > > >
> > >
> > >
> > > --
> > > Ian Williamson
> > >
> > >
> >
> 
> 
> -- 
> Ian Williamson
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [UNSURE] Re: Software raid 5 with XFS causing strange lockup problems
  2006-10-11 18:42         ` Justin Piszcz
@ 2006-10-11 19:10           ` Ian Williamson
  2006-10-12  1:12             ` Timothy Shimmin
  0 siblings, 1 reply; 9+ messages in thread
From: Ian Williamson @ 2006-10-11 19:10 UTC (permalink / raw)
  To: Eric Sandeen, xfs

/dev/md0:
 Timing buffered disk reads:  286 MB in  3.01 seconds =  94.97 MB/sec

For write I don't have pipebench installed, and this isn't internet
facing at the moment, so I can't install it.

I just ran an xfs_repair on /dev/md0 and it did this:
-------------------------------------------------------------------------
ian@ionlinux:~$ sudo xfs_repair /dev/md0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
       - zero log...
       - scan filesystem freespace and inode maps...
       - found root inode chunk
Phase 3 - for each AG...
       - scan and clear agi unlinked lists...
       - process known inodes and perform inode discovery...
       - agno = 0
bad attribute format 0 in inode 260, resetting value
       - agno = 1
inode 135921976 - bad extent starting block number 955543538733351,
offset 2405220210012692
bad data fork in inode 135921976
cleared inode 135921976
zero length extent (off = 0, fsbno = 0) in ino 136766006
bad data fork in inode 136766006
cleared inode 136766006
       - agno = 2
inode 268439335 - bad extent starting block number 4389451776, offset
8989827926016
bad data fork in inode 268439335
cleared inode 268439335
       - agno = 3
inode 402653478 - bad extent starting block number 6493419520, offset
123364807018496
bad data fork in inode 402653478
cleared inode 402653478
       - agno = 4
       - agno = 5
       - agno = 6
       - agno = 7
inode 939524376 - bad extent starting block number 384617748308622,
offset 13946791523993872
bad data fork in inode 939524376
cleared inode 939524376
       - agno = 8
       - agno = 9
       - agno = 10
       - agno = 11
       - agno = 12
       - agno = 13
       - agno = 14
       - agno = 15
       - agno = 16
       - agno = 17
       - agno = 18
       - agno = 19
inode 2550140476 - bad extent starting block number 3836083423429920,
offset 1232124454554406
bad data fork in inode 2550140476
cleared inode 2550140476
       - agno = 20
       - agno = 21
inode 2818586148 - bad extent starting block number 2465278532745658,
offset 9727159296556827
bad data fork in inode 2818586148
cleared inode 2818586148
       - agno = 22
       - agno = 23
       - agno = 24
       - agno = 25
       - agno = 26
       - agno = 27
       - agno = 28
       - agno = 29
       - agno = 30
       - agno = 31
       - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
       - setting up duplicate extent list...
       - clear lost+found (if it exists) ...
       - clearing existing "lost+found" inode
       - deleting existing "lost+found" entry
       - check for inodes claiming duplicate blocks...
       - agno = 0
       - agno = 1
entry "07 - Film Score Pt. II.mp3" at block 0 offset 312 in directory
inode 135921969 references free inode 135921976
       clearing inode number in entry at offset 312...
entry "Torrent downloaded from Demonoid.com.txt" in shortform
directory 136766004 references free inode 136766006
junking entry "Torrent downloaded from Demonoid.com.txt" in directory
inode 136766004
       - agno = 2
entry "robot_worldlight.png" at block 3 offset 2608 in directory inode
268436754 references free inode 268439335
       clearing inode number in entry at offset 2608...
       - agno = 3
entry "automail.php" at block 0 offset 104 in directory inode
402653475 references free inode 402653478
       clearing inode number in entry at offset 104...
       - agno = 4
       - agno = 5
       - agno = 6
       - agno = 7
entry "core.write_compiled_include.php" at block 0 offset 808 in
directory inode 939524356 references free inode 939524376
       clearing inode number in entry at offset 808...
       - agno = 8
       - agno = 9
       - agno = 10
       - agno = 11
       - agno = 12
       - agno = 13
       - agno = 14
       - agno = 15
       - agno = 16
       - agno = 17
       - agno = 18
       - agno = 19
entry "auth.php" at block 0 offset 48 in directory inode 2550140475
references free inode 2550140476
       clearing inode number in entry at offset 48...
       - agno = 20
       - agno = 21
entry "IMG_0245.jpg" at block 0 offset 1944 in directory inode
2818581782 references free inode 2818586148
       clearing inode number in entry at offset 1944...
       - agno = 22
       - agno = 23
       - agno = 24
       - agno = 25
       - agno = 26
       - agno = 27
       - agno = 28
       - agno = 29
       - agno = 30
       - agno = 31
Phase 5 - rebuild AG headers and trees...
       - reset superblock...
Phase 6 - check inode connectivity...
       - resetting contents of realtime bitmap and summary inodes
       - ensuring existence of lost+found directory
       - traversing filesystem starting at / ...
rebuilding directory inode 135921969
rebuilding directory inode 2818581782
rebuilding directory inode 2550140475
rebuilding directory inode 268436754
rebuilding directory inode 402653475
rebuilding directory inode 939524356
       - traversal finished ...
       - traversing all unattached subtrees ...
       - traversals finished ...
       - moving disconnected inodes to lost+found ...
disconnected dir inode 3221929786, moving to lost+found
Phase 7 - verify and correct link counts...
done
-------------------------------------------------------------------------
Right now I am copying a 20Gig directory off of the raid onto another
drive with no problems. Does an xfs filesystem need to be repaired on
a regular basis? Any ideas on what might be "corrupting" it?

On 10/11/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> A simple hdparm -t /dev/md0 for the read speed, but I'd be more interested
> in write speed.
>
> dd if=/dev/zero | pipebench > /path/on/raid.dat
>
> Then report the write speed in MB/s.
>
> I assume this is on a regular PCI card, which is why I am interested in
> the speeds.
>
>
> On Wed, 11 Oct 2006, Ian Williamson wrote:
>
> > Justin,
> > How would I go about benchmarking that?
> >
> > Eric,
> > Sorry, but I'm not quite an expert on the internals of Linux. What are
> > 4k stacks and how do I know if I have them. If it helps, I am using
> > Ubuntu with a custom compiled Linux kernel. (This xfs/raid problem was
> > also occured on the default Ubuntu server kernel...)
> >
> > Also, if that trace from /var/log/messages isn't of any use do you
> > know where I can look to find more information on this? Is it possible
> > that this is being caused b the cheap PCI SATA controller card that I
> > am using? (It's the Rosewill RC-209)
> >
> > - Ian
> >
> > On 10/11/06, Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> > > Also, quick question-- What type of speed do you get with 4 drives
> > > connected to 1 card vs. I have 8 drives connected to 3-4 cards.
> > >
> > > What speed write/read?
> > >
> > > Justin.
> > >
> > > On Wed, 11 Oct 2006, Ian Williamson wrote:
> > >
> > > > Eric,
> > > > That's all I have for the event in /var/log/messages..
> > > >
> > > > For the raid configuration I have the following:
> > > > ian@ionlinux:~$ sudo mdadm --detail /dev/md0
> > > > Password:
> > > > /dev/md0:
> > > >        Version : 00.90.03
> > > >  Creation Time : Wed Sep 13 22:04:11 2006
> > > >     Raid Level : raid5
> > > >     Array Size : 732587712 (698.65 GiB 750.17 GB)
> > > >  Device Size : 244195904 (232.88 GiB 250.06 GB)
> > > >  Raid Devices : 4
> > > >  Total Devices : 4
> > > > Preferred Minor : 0
> > > >    Persistence : Superblock is persistent
> > > >
> > > >    Update Time : Mon Oct  9 00:02:30 2006
> > > >          State : clean
> > > > Active Devices : 4
> > > > Working Devices : 4
> > > > Failed Devices : 0
> > > >  Spare Devices : 0
> > > >
> > > >         Layout : left-symmetric
> > > >     Chunk Size : 64K
> > > >
> > > >           UUID : 86770f56:8e4f51e5:fd754630:f1c65359
> > > >         Events : 0.54082
> > > >
> > > >    Number   Major   Minor   RaidDevice State
> > > >       0       8        1        0      active sync   /dev/sda1
> > > >       1       8       17        1      active sync   /dev/sdb1
> > > >       2       8       33        2      active sync   /dev/sdc1
> > > >       3       8       49        3      active sync   /dev/sdd1
> > > >
> > > > I really have no idea what could be causing this. Sometimes after
> > > > restart it still won't work through Samba, and I can never perform
> > > > massive local reads and writes, i.e. a recursive copy off of the raid.
> > > >
> > > > On 10/11/06, Eric Sandeen <sandeen@sandeen.net> wrote:
> > > > > Ian Williamson wrote:
> > > > > > I am running XFS on a software raid 5. I am doing this with a PCI
> > > > > > controller with 4 SATA drives attached to it.
> > > > > >
> > > > > > When I play my music over the network through Samba from the raid
> > > > > > volume my audio client will often loose the connection. This isn't
> > > > > > remediated until I restart the machine with the raid controller or
> > > > > > wait for an unknown amount of time. Either way, the problem still
> > > > > > persists.
> > > > > >
> > > > > > Initially I though that this was Samba's fault, but I think it may be
> > > > > > xfs related due to what was in /var/log/messages:
> > > > > >
> > > > > > Oct  9 22:37:33 ionlinux kernel: [105657.982701] Modules linked in:
> > > > > > serio_raw i2c_nforce2 pcspkr forcedeth r8169 nvidia_agp agpgart
> > > > > > i2c_core psmouse sg evdev xfs dm_mod sd_mod generic sata_nv ide_disk
> > > > > > ehci_hcd ide_cd cdrom sata_sil ohci_hcd usbcore libata scsi_mod
> > > > > > ide_generic processor
> > > > > > Oct  9 22:37:33 ionlinux kernel: [105657.982985] EIP:
> > > > > > 0060:[<f8a1353e>]    Not tainted VLI
> > > > > > Oct  9 22:37:33 ionlinux kernel: [105657.982986] EFLAGS: 00010246
> > > > > > (2.6.18 #1)
> > > > >
> > > > > It looks like you've edited this a bit too much, what came before this
> > > > > in
> > > > > the logs?
> > > > >
> > > > > Are you running on 4k stacks, out of curiosity?
> > > > >
> > > > > -Eric
> > > > >
> > > >
> > > >
> > > > --
> > > > Ian Williamson
> > > >
> > > >
> > >
> >
> >
> > --
> > Ian Williamson
> >
> >
>


-- 
Ian Williamson

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [UNSURE] Re: Software raid 5 with XFS causing strange lockup problems
  2006-10-11 19:10           ` Ian Williamson
@ 2006-10-12  1:12             ` Timothy Shimmin
  0 siblings, 0 replies; 9+ messages in thread
From: Timothy Shimmin @ 2006-10-12  1:12 UTC (permalink / raw)
  To: Ian Williamson, xfs

Hi Ian,

--On 11 October 2006 2:10:28 PM -0500 Ian Williamson <notian@gmail.com> 
wrote:

> /dev/md0:
>  Timing buffered disk reads:  286 MB in  3.01 seconds =  94.97 MB/sec
>
> For write I don't have pipebench installed, and this isn't internet
> facing at the moment, so I can't install it.
>
> I just ran an xfs_repair on /dev/md0 and it did this:
> -------------------------------------------------------------------------
> ian@ionlinux:~$ sudo xfs_repair /dev/md0
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>        - zero log...
>        - scan filesystem freespace and inode maps...
>        - found root inode chunk
> Phase 3 - for each AG...
>        - scan and clear agi unlinked lists...
>        - process known inodes and perform inode discovery...
>        - agno = 0
> bad attribute format 0 in inode 260, resetting value
>        - agno = 1
> inode 135921976 - bad extent starting block number 955543538733351,
> offset 2405220210012692
> bad data fork in inode 135921976
> cleared inode 135921976
> zero length extent (off = 0, fsbno = 0) in ino 136766006
> bad data fork in inode 136766006
> cleared inode 136766006
>        - agno = 2
> inode 268439335 - bad extent starting block number 4389451776, offset
> 8989827926016
> bad data fork in inode 268439335
> cleared inode 268439335
>        - agno = 3
> inode 402653478 - bad extent starting block number 6493419520, offset
> 123364807018496
> bad data fork in inode 402653478
> cleared inode 402653478
>        - agno = 4
>        - agno = 5
>        - agno = 6
>        - agno = 7
> inode 939524376 - bad extent starting block number 384617748308622,
> offset 13946791523993872
> bad data fork in inode 939524376
> cleared inode 939524376
>        - agno = 8
>        - agno = 9
>        - agno = 10
>        - agno = 11
>        - agno = 12
>        - agno = 13
>        - agno = 14
>        - agno = 15
>        - agno = 16
>        - agno = 17
>        - agno = 18
>        - agno = 19
> inode 2550140476 - bad extent starting block number 3836083423429920,
> offset 1232124454554406
> bad data fork in inode 2550140476
> cleared inode 2550140476
>        - agno = 20
>        - agno = 21
> inode 2818586148 - bad extent starting block number 2465278532745658,
> offset 9727159296556827
> bad data fork in inode 2818586148
> cleared inode 2818586148
>        - agno = 22
>        - agno = 23
>        - agno = 24
>        - agno = 25
>        - agno = 26
>        - agno = 27
>        - agno = 28
>        - agno = 29
>        - agno = 30
>        - agno = 31
>        - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>        - setting up duplicate extent list...
>        - clear lost+found (if it exists) ...
>        - clearing existing "lost+found" inode
>        - deleting existing "lost+found" entry
>        - check for inodes claiming duplicate blocks...
>        - agno = 0
>        - agno = 1
> entry "07 - Film Score Pt. II.mp3" at block 0 offset 312 in directory
> inode 135921969 references free inode 135921976
>        clearing inode number in entry at offset 312...
> entry "Torrent downloaded from Demonoid.com.txt" in shortform
> directory 136766004 references free inode 136766006
> junking entry "Torrent downloaded from Demonoid.com.txt" in directory
> inode 136766004
>        - agno = 2
> entry "robot_worldlight.png" at block 3 offset 2608 in directory inode
> 268436754 references free inode 268439335
>        clearing inode number in entry at offset 2608...
>        - agno = 3
> entry "automail.php" at block 0 offset 104 in directory inode
> 402653475 references free inode 402653478
>        clearing inode number in entry at offset 104...
>        - agno = 4
>        - agno = 5
>        - agno = 6
>        - agno = 7
> entry "core.write_compiled_include.php" at block 0 offset 808 in
> directory inode 939524356 references free inode 939524376
>        clearing inode number in entry at offset 808...
>        - agno = 8
>        - agno = 9
>        - agno = 10
>        - agno = 11
>        - agno = 12
>        - agno = 13
>        - agno = 14
>        - agno = 15
>        - agno = 16
>        - agno = 17
>        - agno = 18
>        - agno = 19
> entry "auth.php" at block 0 offset 48 in directory inode 2550140475
> references free inode 2550140476
>        clearing inode number in entry at offset 48...
>        - agno = 20
>        - agno = 21
> entry "IMG_0245.jpg" at block 0 offset 1944 in directory inode
> 2818581782 references free inode 2818586148
>        clearing inode number in entry at offset 1944...
>        - agno = 22
>        - agno = 23
>        - agno = 24
>        - agno = 25
>        - agno = 26
>        - agno = 27
>        - agno = 28
>        - agno = 29
>        - agno = 30
>        - agno = 31
> Phase 5 - rebuild AG headers and trees...
>        - reset superblock...
> Phase 6 - check inode connectivity...
>        - resetting contents of realtime bitmap and summary inodes
>        - ensuring existence of lost+found directory
>        - traversing filesystem starting at / ...
> rebuilding directory inode 135921969
> rebuilding directory inode 2818581782
> rebuilding directory inode 2550140475
> rebuilding directory inode 268436754
> rebuilding directory inode 402653475
> rebuilding directory inode 939524356
>        - traversal finished ...
>        - traversing all unattached subtrees ...
>        - traversals finished ...
>        - moving disconnected inodes to lost+found ...
> disconnected dir inode 3221929786, moving to lost+found
> Phase 7 - verify and correct link counts...
> done
> -------------------------------------------------------------------------
> Right now I am copying a 20Gig directory off of the raid onto another
> drive with no problems. Does an xfs filesystem need to be repaired on
> a regular basis?

Ideally, no :-)
We don't expect corruption on a regular basis :)

> Any ideas on what might be "corrupting" it?
No sorry.
Some random thoughts:
Has the filesystem had any unclean mounts? Like due to power loss?
Do you have a "Disabling barriers" msg in your logs for xfs?
What were your mkfs and mount parameters, version of linux?
Before repairing the filesystem, you can run "xfsrepair -n" to find
the errors and then get a better print out of the inodes using
"xfs_db -r -c 'inode xxxx' -c 'p' device".

--Tim

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-10-12  1:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-11  6:07 Software raid 5 with XFS causing strange lockup problems Ian Williamson
2006-10-11 13:53 ` Eric Sandeen
2006-10-11 16:21   ` Ian Williamson
2006-10-11 16:36     ` Eric Sandeen
2006-10-11 17:19     ` [UNSURE] " Justin Piszcz
2006-10-11 18:41       ` Ian Williamson
2006-10-11 18:42         ` Justin Piszcz
2006-10-11 19:10           ` Ian Williamson
2006-10-12  1:12             ` Timothy Shimmin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox