All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [Ocfs2-users] How long for an fsck?
       [not found]     ` <201104230024.35576.guerrero@ice.cat>
@ 2011-04-23  0:22       ` Sunil Mushran
  2011-04-23 14:56         ` Tao Ma
  0 siblings, 1 reply; 5+ messages in thread
From: Sunil Mushran @ 2011-04-23  0:22 UTC (permalink / raw)
  To: ocfs2-devel

On 04/22/2011 03:24 PM, Josep Guerrero wrote:
>> How long did the debugfs output take?
> I think about 30 minutes. No more than 50 for sure (just by looking at the
> times of the mails).
>
>> Did fsck eventually finish?
> No. I had to cancel it after it stayed 24 hours in the same state, showing the
> same message. It never moved beyond "Pass 0a", and always was using 100% CPU
> in one core. I don't know if it would have finished on its own.
>
>> BTW, you said one of the cores was at 100%. What does top show?
>> Is fsck the main contributor or is some other process spinning?
> It was fsck (I kept a top opened the whole time, and fsck always was around
> 99% CPU usage).
>
>> I have a theory as to why it is slow. But I would like some confirmation.
>> My theory had fsck have high wait%. I seem to be missing something.
> I didn't look at the wait%, but I checked the physical disk load with iotop
> and it was very low, so it didn't look like fsck was being slow because of the
> disk. In the filesystem I successfully "fscked" before (the 3 TB one that took
> less than 60 minutes), it started doing something similar (very high CPU
> usage, low disk load) but after several minutes (when the rest of the messages
> after "Pass 0a" appeared), it did just the opposite: low CPU use, high disk
> load. Both filesystems are physically on the same set of disks (the 16TB
> logical volume is an striped LVM volume that fills about 75% of the 21 physical
> disks and the 3TB is another striped LVM volume filling the remaining space of
> the same disks) so I don't think it's a problem with the physical devices (of
> course, I could be wrong).

File a bz. This will need some investigation.

BTW, how much memory does your box have?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] [Ocfs2-users] How long for an fsck?
  2011-04-23  0:22       ` [Ocfs2-devel] [Ocfs2-users] How long for an fsck? Sunil Mushran
@ 2011-04-23 14:56         ` Tao Ma
  2011-04-23 15:57           ` Sunil Mushran
  0 siblings, 1 reply; 5+ messages in thread
From: Tao Ma @ 2011-04-23 14:56 UTC (permalink / raw)
  To: ocfs2-devel

Hi Josep,
	sorry, I don't subscribed to ocfs2-users after I left Oracle.
On 04/23/2011 08:22 AM, Sunil Mushran wrote:
> On 04/22/2011 03:24 PM, Josep Guerrero wrote:
>>> How long did the debugfs output take?
>> I think about 30 minutes. No more than 50 for sure (just by looking at the
>> times of the mails).
>>
>>> Did fsck eventually finish?
>> No. I had to cancel it after it stayed 24 hours in the same state, showing the
>> same message. It never moved beyond "Pass 0a", and always was using 100% CPU
>> in one core. I don't know if it would have finished on its own.
>>
>>> BTW, you said one of the cores was at 100%. What does top show?
>>> Is fsck the main contributor or is some other process spinning?
>> It was fsck (I kept a top opened the whole time, and fsck always was around
>> 99% CPU usage).
>>
>>> I have a theory as to why it is slow. But I would like some confirmation.
>>> My theory had fsck have high wait%. I seem to be missing something.
>> I didn't look at the wait%, but I checked the physical disk load with iotop
>> and it was very low, so it didn't look like fsck was being slow because of the
>> disk. In the filesystem I successfully "fscked" before (the 3 TB one that took
>> less than 60 minutes), it started doing something similar (very high CPU
>> usage, low disk load) but after several minutes (when the rest of the messages
>> after "Pass 0a" appeared), it did just the opposite: low CPU use, high disk
>> load. Both filesystems are physically on the same set of disks (the 16TB
>> logical volume is an striped LVM volume that fills about 75% of the 21 physical
>> disks and the 3TB is another striped LVM volume filling the remaining space of
>> the same disks) so I don't think it's a problem with the physical devices (of
>> course, I could be wrong).
> 
> File a bz. This will need some investigation.
> 
> BTW, how much memory does your box have?
So what is your version of fsck? I have met with some issue like that
when fsck is allocating a large number of memories and it stucks for
quite a long time of because of the swapping.

Regards,
Tao

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] [Ocfs2-users] How long for an fsck?
  2011-04-23 14:56         ` Tao Ma
@ 2011-04-23 15:57           ` Sunil Mushran
  2011-05-11 18:14             ` Goldwyn Rodrigues
  0 siblings, 1 reply; 5+ messages in thread
From: Sunil Mushran @ 2011-04-23 15:57 UTC (permalink / raw)
  To: ocfs2-devel

On 04/23/2011 07:56 AM, Tao Ma wrote:
>
> So what is your version of fsck? I have met with some issue like that
> when fsck is allocating a large number of memories and it stucks for
> quite a long time of because of the swapping.

It is not that issue. It is in pass0. I assumed there was a problem
is in cluster allocation chains. But debugfs managed to scan the
chain. No loops. Looks ok. So unsure where it could be spinning.

Note it is a 16T,  4k/4k fs.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] [Ocfs2-users] How long for an fsck?
  2011-04-23 15:57           ` Sunil Mushran
@ 2011-05-11 18:14             ` Goldwyn Rodrigues
  2011-05-11 18:21               ` Sunil Mushran
  0 siblings, 1 reply; 5+ messages in thread
From: Goldwyn Rodrigues @ 2011-05-11 18:14 UTC (permalink / raw)
  To: ocfs2-devel

Hi,

On Sat, Apr 23, 2011 at 10:57 AM, Sunil Mushran
<sunil.mushran@oracle.com> wrote:
> On 04/23/2011 07:56 AM, Tao Ma wrote:
>>
>> So what is your version of fsck? I have met with some issue like that
>> when fsck is allocating a large number of memories and it stucks for
>> quite a long time of because of the swapping.
>
> It is not that issue. It is in pass0. I assumed there was a problem
> is in cluster allocation chains. But debugfs managed to scan the
> chain. No loops. Looks ok. So unsure where it could be spinning.
>
> Note it is a 16T, ?4k/4k fs.



We had a similar problem which was fixed by
commit 2d741da9367b33f559802dfabe62d96f6adc7777

Version number would be helpful.

Regards,

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] [Ocfs2-users] How long for an fsck?
  2011-05-11 18:14             ` Goldwyn Rodrigues
@ 2011-05-11 18:21               ` Sunil Mushran
  0 siblings, 0 replies; 5+ messages in thread
From: Sunil Mushran @ 2011-05-11 18:21 UTC (permalink / raw)
  To: ocfs2-devel

On 05/11/2011 11:14 AM, Goldwyn Rodrigues wrote:
> Hi,
>
> On Sat, Apr 23, 2011 at 10:57 AM, Sunil Mushran
> <sunil.mushran@oracle.com>  wrote:
>> On 04/23/2011 07:56 AM, Tao Ma wrote:
>>> So what is your version of fsck? I have met with some issue like that
>>> when fsck is allocating a large number of memories and it stucks for
>>> quite a long time of because of the swapping.
>> It is not that issue. It is in pass0. I assumed there was a problem
>> is in cluster allocation chains. But debugfs managed to scan the
>> chain. No loops. Looks ok. So unsure where it could be spinning.
>>
>> Note it is a 16T,  4k/4k fs.
>
>
> We had a similar problem which was fixed by
> commit 2d741da9367b33f559802dfabe62d96f6adc7777
>
> Version number would be helpful.

Thanks for that. Josep was on 1.4.4.

Fixed in 1.6.4.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1323

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-05-11 18:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <201104211543.29963.guerrero@ice.cat>
     [not found] ` <201104211946.32493.guerrero@ice.cat>
     [not found]   ` <4DB1F431.5070003@oracle.com>
     [not found]     ` <201104230024.35576.guerrero@ice.cat>
2011-04-23  0:22       ` [Ocfs2-devel] [Ocfs2-users] How long for an fsck? Sunil Mushran
2011-04-23 14:56         ` Tao Ma
2011-04-23 15:57           ` Sunil Mushran
2011-05-11 18:14             ` Goldwyn Rodrigues
2011-05-11 18:21               ` Sunil Mushran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.