* [Ocfs2-devel] [Ocfs2-users] How long for an fsck? [not found] ` <201104230024.35576.guerrero@ice.cat> @ 2011-04-23 0:22 ` Sunil Mushran 2011-04-23 14:56 ` Tao Ma 0 siblings, 1 reply; 5+ messages in thread From: Sunil Mushran @ 2011-04-23 0:22 UTC (permalink / raw) To: ocfs2-devel On 04/22/2011 03:24 PM, Josep Guerrero wrote: >> How long did the debugfs output take? > I think about 30 minutes. No more than 50 for sure (just by looking at the > times of the mails). > >> Did fsck eventually finish? > No. I had to cancel it after it stayed 24 hours in the same state, showing the > same message. It never moved beyond "Pass 0a", and always was using 100% CPU > in one core. I don't know if it would have finished on its own. > >> BTW, you said one of the cores was at 100%. What does top show? >> Is fsck the main contributor or is some other process spinning? > It was fsck (I kept a top opened the whole time, and fsck always was around > 99% CPU usage). > >> I have a theory as to why it is slow. But I would like some confirmation. >> My theory had fsck have high wait%. I seem to be missing something. > I didn't look at the wait%, but I checked the physical disk load with iotop > and it was very low, so it didn't look like fsck was being slow because of the > disk. In the filesystem I successfully "fscked" before (the 3 TB one that took > less than 60 minutes), it started doing something similar (very high CPU > usage, low disk load) but after several minutes (when the rest of the messages > after "Pass 0a" appeared), it did just the opposite: low CPU use, high disk > load. Both filesystems are physically on the same set of disks (the 16TB > logical volume is an striped LVM volume that fills about 75% of the 21 physical > disks and the 3TB is another striped LVM volume filling the remaining space of > the same disks) so I don't think it's a problem with the physical devices (of > course, I could be wrong). File a bz. This will need some investigation. BTW, how much memory does your box have? ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Ocfs2-devel] [Ocfs2-users] How long for an fsck? 2011-04-23 0:22 ` [Ocfs2-devel] [Ocfs2-users] How long for an fsck? Sunil Mushran @ 2011-04-23 14:56 ` Tao Ma 2011-04-23 15:57 ` Sunil Mushran 0 siblings, 1 reply; 5+ messages in thread From: Tao Ma @ 2011-04-23 14:56 UTC (permalink / raw) To: ocfs2-devel Hi Josep, sorry, I don't subscribed to ocfs2-users after I left Oracle. On 04/23/2011 08:22 AM, Sunil Mushran wrote: > On 04/22/2011 03:24 PM, Josep Guerrero wrote: >>> How long did the debugfs output take? >> I think about 30 minutes. No more than 50 for sure (just by looking at the >> times of the mails). >> >>> Did fsck eventually finish? >> No. I had to cancel it after it stayed 24 hours in the same state, showing the >> same message. It never moved beyond "Pass 0a", and always was using 100% CPU >> in one core. I don't know if it would have finished on its own. >> >>> BTW, you said one of the cores was at 100%. What does top show? >>> Is fsck the main contributor or is some other process spinning? >> It was fsck (I kept a top opened the whole time, and fsck always was around >> 99% CPU usage). >> >>> I have a theory as to why it is slow. But I would like some confirmation. >>> My theory had fsck have high wait%. I seem to be missing something. >> I didn't look at the wait%, but I checked the physical disk load with iotop >> and it was very low, so it didn't look like fsck was being slow because of the >> disk. In the filesystem I successfully "fscked" before (the 3 TB one that took >> less than 60 minutes), it started doing something similar (very high CPU >> usage, low disk load) but after several minutes (when the rest of the messages >> after "Pass 0a" appeared), it did just the opposite: low CPU use, high disk >> load. Both filesystems are physically on the same set of disks (the 16TB >> logical volume is an striped LVM volume that fills about 75% of the 21 physical >> disks and the 3TB is another striped LVM volume filling the remaining space of >> the same disks) so I don't think it's a problem with the physical devices (of >> course, I could be wrong). > > File a bz. This will need some investigation. > > BTW, how much memory does your box have? So what is your version of fsck? I have met with some issue like that when fsck is allocating a large number of memories and it stucks for quite a long time of because of the swapping. Regards, Tao ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Ocfs2-devel] [Ocfs2-users] How long for an fsck? 2011-04-23 14:56 ` Tao Ma @ 2011-04-23 15:57 ` Sunil Mushran 2011-05-11 18:14 ` Goldwyn Rodrigues 0 siblings, 1 reply; 5+ messages in thread From: Sunil Mushran @ 2011-04-23 15:57 UTC (permalink / raw) To: ocfs2-devel On 04/23/2011 07:56 AM, Tao Ma wrote: > > So what is your version of fsck? I have met with some issue like that > when fsck is allocating a large number of memories and it stucks for > quite a long time of because of the swapping. It is not that issue. It is in pass0. I assumed there was a problem is in cluster allocation chains. But debugfs managed to scan the chain. No loops. Looks ok. So unsure where it could be spinning. Note it is a 16T, 4k/4k fs. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Ocfs2-devel] [Ocfs2-users] How long for an fsck? 2011-04-23 15:57 ` Sunil Mushran @ 2011-05-11 18:14 ` Goldwyn Rodrigues 2011-05-11 18:21 ` Sunil Mushran 0 siblings, 1 reply; 5+ messages in thread From: Goldwyn Rodrigues @ 2011-05-11 18:14 UTC (permalink / raw) To: ocfs2-devel Hi, On Sat, Apr 23, 2011 at 10:57 AM, Sunil Mushran <sunil.mushran@oracle.com> wrote: > On 04/23/2011 07:56 AM, Tao Ma wrote: >> >> So what is your version of fsck? I have met with some issue like that >> when fsck is allocating a large number of memories and it stucks for >> quite a long time of because of the swapping. > > It is not that issue. It is in pass0. I assumed there was a problem > is in cluster allocation chains. But debugfs managed to scan the > chain. No loops. Looks ok. So unsure where it could be spinning. > > Note it is a 16T, ?4k/4k fs. We had a similar problem which was fixed by commit 2d741da9367b33f559802dfabe62d96f6adc7777 Version number would be helpful. Regards, -- Goldwyn ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Ocfs2-devel] [Ocfs2-users] How long for an fsck? 2011-05-11 18:14 ` Goldwyn Rodrigues @ 2011-05-11 18:21 ` Sunil Mushran 0 siblings, 0 replies; 5+ messages in thread From: Sunil Mushran @ 2011-05-11 18:21 UTC (permalink / raw) To: ocfs2-devel On 05/11/2011 11:14 AM, Goldwyn Rodrigues wrote: > Hi, > > On Sat, Apr 23, 2011 at 10:57 AM, Sunil Mushran > <sunil.mushran@oracle.com> wrote: >> On 04/23/2011 07:56 AM, Tao Ma wrote: >>> So what is your version of fsck? I have met with some issue like that >>> when fsck is allocating a large number of memories and it stucks for >>> quite a long time of because of the swapping. >> It is not that issue. It is in pass0. I assumed there was a problem >> is in cluster allocation chains. But debugfs managed to scan the >> chain. No loops. Looks ok. So unsure where it could be spinning. >> >> Note it is a 16T, 4k/4k fs. > > > We had a similar problem which was fixed by > commit 2d741da9367b33f559802dfabe62d96f6adc7777 > > Version number would be helpful. Thanks for that. Josep was on 1.4.4. Fixed in 1.6.4. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1323 ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-05-11 18:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <201104211543.29963.guerrero@ice.cat>
[not found] ` <201104211946.32493.guerrero@ice.cat>
[not found] ` <4DB1F431.5070003@oracle.com>
[not found] ` <201104230024.35576.guerrero@ice.cat>
2011-04-23 0:22 ` [Ocfs2-devel] [Ocfs2-users] How long for an fsck? Sunil Mushran
2011-04-23 14:56 ` Tao Ma
2011-04-23 15:57 ` Sunil Mushran
2011-05-11 18:14 ` Goldwyn Rodrigues
2011-05-11 18:21 ` Sunil Mushran
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.