public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* very long log recovery at mount
@ 2015-10-21  9:27 Arkadiusz Miśkiewicz
  2015-10-21 18:39 ` Michael L. Semon
  2015-10-23  7:05 ` Dave Chinner
  0 siblings, 2 replies; 5+ messages in thread
From: Arkadiusz Miśkiewicz @ 2015-10-21  9:27 UTC (permalink / raw)
  To: xfs


Hi.

I got such situation, fresh boot, 4.1.10 kernel, init scripts start mounting 
filesystems. One fs wasn't very lucky:

[   15.979538] XFS (md3): Mounting V4 Filesystem
[   16.256316] XFS (md3): Ending clean mount
[   28.343346] XFS (md4): Mounting V4 Filesystem
[   28.629918] XFS (md4): Ending clean mount
[   28.662125] XFS (md5): Mounting V4 Filesystem
[   28.980142] XFS (md5): Ending clean mount
[   29.049421] XFS (md6): Mounting V4 Filesystem
[   29.447725] XFS (md6): Starting recovery (logdev: internal)
[ 4517.327332] XFS (md6): Ending recovery (logdev: internal)

It took over 1h to mount md6 filesystem.

Questions:
- is it possible to log how much data is needed to be recovered from log? Some 
data that would give a hint on how big this is (and thus rough estimate on how 
long it will take). Not sure if that's known at time when this message is 
being printed.

XFS (md6): Starting recovery (logdev: internal, to recover: xyzMB (?))

- now such long mount time is almost insane, so I wonder why could be the 
reason. Is the process multithreaded, single threaded? cpus were idle

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: very long log recovery at mount
  2015-10-21  9:27 very long log recovery at mount Arkadiusz Miśkiewicz
@ 2015-10-21 18:39 ` Michael L. Semon
  2015-10-23  7:05 ` Dave Chinner
  1 sibling, 0 replies; 5+ messages in thread
From: Michael L. Semon @ 2015-10-21 18:39 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz, xfs

On 10/21/15 05:27, Arkadiusz Miśkiewicz wrote:
>
> Hi.
>
> I got such situation, fresh boot, 4.1.10 kernel, init scripts start mounting
> filesystems. One fs wasn't very lucky:
>
> [   15.979538] XFS (md3): Mounting V4 Filesystem
> [   16.256316] XFS (md3): Ending clean mount
> [   28.343346] XFS (md4): Mounting V4 Filesystem
> [   28.629918] XFS (md4): Ending clean mount
> [   28.662125] XFS (md5): Mounting V4 Filesystem
> [   28.980142] XFS (md5): Ending clean mount
> [   29.049421] XFS (md6): Mounting V4 Filesystem
> [   29.447725] XFS (md6): Starting recovery (logdev: internal)
> [ 4517.327332] XFS (md6): Ending recovery (logdev: internal)
>
> It took over 1h to mount md6 filesystem.
>
> Questions:
> - is it possible to log how much data is needed to be recovered from log? Some
> data that would give a hint on how big this is (and thus rough estimate on how
> long it will take). Not sure if that's known at time when this message is
> being printed.
>
> XFS (md6): Starting recovery (logdev: internal, to recover: xyzMB (?))
>
> - now such long mount time is almost insane, so I wonder why could be the
> reason. Is the process multithreaded, single threaded? cpus were idle
>

My old PC used to suffer from intermittently long mount times, and I wrote a
one-character patch to fix that issue:

http://oss.sgi.com/archives/xfs/2015-01/msg00183.html

I'm putting it out there only to ask the list if wrapped-log handling could be
the issue here.  Don't try the patch without good reason:  I've tested it only
as late as kernel 3.18.22, on 32-bit Linux with MBR and GPT partitions, not
much of a test matrix.

Thanks!

Michael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: very long log recovery at mount
  2015-10-21  9:27 very long log recovery at mount Arkadiusz Miśkiewicz
  2015-10-21 18:39 ` Michael L. Semon
@ 2015-10-23  7:05 ` Dave Chinner
  2015-10-23  7:22   ` Arkadiusz Miśkiewicz
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2015-10-23  7:05 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: xfs

On Wed, Oct 21, 2015 at 11:27:52AM +0200, Arkadiusz Miśkiewicz wrote:
> 
> Hi.
> 
> I got such situation, fresh boot, 4.1.10 kernel, init scripts start mounting 
> filesystems. One fs wasn't very lucky:
> 
> [   15.979538] XFS (md3): Mounting V4 Filesystem
> [   16.256316] XFS (md3): Ending clean mount
> [   28.343346] XFS (md4): Mounting V4 Filesystem
> [   28.629918] XFS (md4): Ending clean mount
> [   28.662125] XFS (md5): Mounting V4 Filesystem
> [   28.980142] XFS (md5): Ending clean mount
> [   29.049421] XFS (md6): Mounting V4 Filesystem
> [   29.447725] XFS (md6): Starting recovery (logdev: internal)
> [ 4517.327332] XFS (md6): Ending recovery (logdev: internal)
> 
> It took over 1h to mount md6 filesystem.
> 
> Questions:
> - is it possible to log how much data is needed to be recovered
> from log?

Yes.

> Some data that would give a hint on how big this is (and thus
> rough estimate on how long it will take). Not sure if that's known
> at time when this message is being printed.

It's not known, then, and can't be known until recovery has sparsed
the log and read all the objects from disk it needs to recover.

> - now such long mount time is almost insane, so I wonder why could
> be the reason. Is the process multithreaded, single threaded? cpus
> were idle

What kernel? We now have readahead which minimises the IO latency of
pulling objects into the kernel for recovery, but if you are
recovering a couple of million individual inode changes (e.g. from a
'chproj -R /path/with/millions/of/files') then it take a long tiem
to read in all the inodes and write them all back out. A single
inode in the log lik ethis only consumes about 200 bytes of log
space, so there can easily be 5000 inodes to recover per megabyte
of log space you have. And if you have a 2GB log, then that could
contain 10 million inode core changes that need to be recovered....

And if you have a slow SATA RAID, the small random writes for inode
writeback might only retire a hundred inodes a second and log
recovery will block until all those inodes are completely written
back (i.e. same problem that can lead to unmount taking hours when
there are hundreds of thousands of dirty inode in memory).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: very long log recovery at mount
  2015-10-23  7:05 ` Dave Chinner
@ 2015-10-23  7:22   ` Arkadiusz Miśkiewicz
  2015-10-23 21:30     ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Arkadiusz Miśkiewicz @ 2015-10-23  7:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Friday 23 of October 2015, Dave Chinner wrote:
> On Wed, Oct 21, 2015 at 11:27:52AM +0200, Arkadiusz Miśkiewicz wrote:
> > Hi.
> > 
> > I got such situation, fresh boot, 4.1.10 kernel, init scripts start
> > mounting filesystems. One fs wasn't very lucky:
> > 
> > [   15.979538] XFS (md3): Mounting V4 Filesystem
> > [   16.256316] XFS (md3): Ending clean mount
> > [   28.343346] XFS (md4): Mounting V4 Filesystem
> > [   28.629918] XFS (md4): Ending clean mount
> > [   28.662125] XFS (md5): Mounting V4 Filesystem
> > [   28.980142] XFS (md5): Ending clean mount
> > [   29.049421] XFS (md6): Mounting V4 Filesystem
> > [   29.447725] XFS (md6): Starting recovery (logdev: internal)
> > [ 4517.327332] XFS (md6): Ending recovery (logdev: internal)
> > 
> > It took over 1h to mount md6 filesystem.
> > 
> > Questions:
> > - is it possible to log how much data is needed to be recovered
> > from log?
> 
> Yes.
> 
> > Some data that would give a hint on how big this is (and thus
> > rough estimate on how long it will take). Not sure if that's known
> > at time when this message is being printed.
> 
> It's not known, then, and can't be known until recovery has sparsed
> the log and read all the objects from disk it needs to recover.

So I assume not available early enough to be usable.

> > - now such long mount time is almost insane, so I wonder why could
> > be the reason. Is the process multithreaded, single threaded? cpus
> > were idle
> 
> What kernel? 

"> > I got such situation, fresh boot, 4.1.10 kernel"

> We now have readahead which minimises the IO latency of
> pulling objects into the kernel for recovery, but if you are
> recovering a couple of million individual inode changes (e.g. from a
> 'chproj -R /path/with/millions/of/files') then it take a long tiem
> to read in all the inodes and write them all back out.

It was like 10x rsnapshots there (so tons of files copied/hardlinked, then 
rsynced etc).

> A single
> inode in the log lik ethis only consumes about 200 bytes of log
> space, so there can easily be 5000 inodes to recover per megabyte
> of log space you have. And if you have a 2GB log, then that could
> contain 10 million inode core changes that need to be recovered....

Ok, so what I'm looking for is any kind of indication (in dmesg probably) that 
it will take long time (thus was asking about log recovery size). Because 
right now it's hard to estimate how long downtime will be and ability to 
estimate such things is important.

> And if you have a slow SATA RAID, the small random writes for inode
> writeback might only retire a hundred inodes a second and log
> recovery will block until all those inodes are completely written
> back (i.e. same problem that can lead to unmount taking hours when
> there are hundreds of thousands of dirty inode in memory).

That's a nightmare (fortunately rarely hapening here).

Thanks for explanations.

> Cheers,
> Dave.

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: very long log recovery at mount
  2015-10-23  7:22   ` Arkadiusz Miśkiewicz
@ 2015-10-23 21:30     ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2015-10-23 21:30 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: xfs

On Fri, Oct 23, 2015 at 09:22:04AM +0200, Arkadiusz Miśkiewicz wrote:
> On Friday 23 of October 2015, Dave Chinner wrote:
> > On Wed, Oct 21, 2015 at 11:27:52AM +0200, Arkadiusz Miśkiewicz wrote:
> > > Hi.
> > > 
> > > I got such situation, fresh boot, 4.1.10 kernel, init scripts start
> > > mounting filesystems. One fs wasn't very lucky:
> > > 
> > > [   15.979538] XFS (md3): Mounting V4 Filesystem
> > > [   16.256316] XFS (md3): Ending clean mount
> > > [   28.343346] XFS (md4): Mounting V4 Filesystem
> > > [   28.629918] XFS (md4): Ending clean mount
> > > [   28.662125] XFS (md5): Mounting V4 Filesystem
> > > [   28.980142] XFS (md5): Ending clean mount
> > > [   29.049421] XFS (md6): Mounting V4 Filesystem
> > > [   29.447725] XFS (md6): Starting recovery (logdev: internal)
> > > [ 4517.327332] XFS (md6): Ending recovery (logdev: internal)
> > > 
> > > It took over 1h to mount md6 filesystem.
> > > 
> > > Questions:
> > > - is it possible to log how much data is needed to be recovered
> > > from log?
> > 
> > Yes.
> > 
> > > Some data that would give a hint on how big this is (and thus
> > > rough estimate on how long it will take). Not sure if that's known
> > > at time when this message is being printed.
> > 
> > It's not known, then, and can't be known until recovery has sparsed
> > the log and read all the objects from disk it needs to recover.
> 
> So I assume not available early enough to be usable.

No, it's not.

> > > - now such long mount time is almost insane, so I wonder why could
> > > be the reason. Is the process multithreaded, single threaded? cpus
> > > were idle
> > 
> > What kernel? 
> 
> "> > I got such situation, fresh boot, 4.1.10 kernel"

Sorry, my fault, I missed that.

> > We now have readahead which minimises the IO latency of
> > pulling objects into the kernel for recovery, but if you are
> > recovering a couple of million individual inode changes (e.g. from a
> > 'chproj -R /path/with/millions/of/files') then it take a long tiem
> > to read in all the inodes and write them all back out.
> 
> It was like 10x rsnapshots there (so tons of files copied/hardlinked, then 
> rsynced etc).

Ok, so lots of hardlinks, which will result in logs of individual
inode cores being logged due to the link count change.

> > A single
> > inode in the log lik ethis only consumes about 200 bytes of log
> > space, so there can easily be 5000 inodes to recover per megabyte
> > of log space you have. And if you have a 2GB log, then that could
> > contain 10 million inode core changes that need to be recovered....
> 
> Ok, so what I'm looking for is any kind of indication (in dmesg probably) that 
> it will take long time (thus was asking about log recovery size). Because 
> right now it's hard to estimate how long downtime will be and ability to 
> estimate such things is important.

Well, we don't count such things at present. We multiple passes of
the log during recovery - the first pass records all the cancelled
buffers (i.e. freed metadata buffers) so that we can avoid replay
into them. xlog_recover_commit_pass1() doesn't count or look at
anything else, but I suspect we could add some kind of "this many
objects to recover" accounting output at the end of that phase. We
can't do any time estimates, though, because that's entirely
dependent on the underlying storage which we know nothing about.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-10-23 21:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-21  9:27 very long log recovery at mount Arkadiusz Miśkiewicz
2015-10-21 18:39 ` Michael L. Semon
2015-10-23  7:05 ` Dave Chinner
2015-10-23  7:22   ` Arkadiusz Miśkiewicz
2015-10-23 21:30     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox