From: "Jim Schutt" <jaschut@sandia.gov>
To: Greg Farnum <greg@inktank.com>
Cc: Wido den Hollander <wido@42on.com>, ceph-devel@vger.kernel.org
Subject: Re: CephFS Space Accounting and Quotas
Date: Mon, 18 Mar 2013 08:19:07 -0600 [thread overview]
Message-ID: <5147225B.5060702@sandia.gov> (raw)
In-Reply-To: <0B3FC8A87058441CAB834F4995E6C8C6@inktank.com>
On 03/15/2013 05:17 PM, Greg Farnum wrote:
> [Putting list back on cc]
>
> On Friday, March 15, 2013 at 4:11 PM, Jim Schutt wrote:
>
>> On 03/15/2013 04:23 PM, Greg Farnum wrote:
>>> As I come back and look at these again, I'm not sure what the context
>>> for these logs is. Which test did they come from, and which behavior
>>> (slow or not slow, etc) did you see? :) -Greg
>>
>>
>>
>> They come from a test where I had debug mds = 20 and debug ms = 1
>> on the MDS while writing files from 198 clients. It turns out that
>> for some reason I need debug mds = 20 during writing to reproduce
>> the slow stat behavior later.
>>
>> strace.find.dirs.txt.bz2 contains the log of running
>> strace -tt -o strace.find.dirs.txt find /mnt/ceph/stripe-4M -type d -exec ls -lhd {} \;
>>
>> From that output, I believe that the stat of at least these files is slow:
>> zero0.rc11
>> zero0.rc30
>> zero0.rc46
>> zero0.rc8
>> zero0.tc103
>> zero0.tc105
>> zero0.tc106
>> I believe that log shows slow stats on more files, but those are the first few.
>>
>> mds.cs28.slow-stat.partial.bz2 contains the MDS log from just before the
>> find command started, until just after the fifth or sixth slow stat from
>> the list above.
>>
>> I haven't yet tried to find other ways of reproducing this, but so far
>> it appears that something happens during the writing of the files that
>> ends up causing the condition that results in slow stat commands.
>>
>> I have the full MDS log from the writing of the files, as well, but it's
>> big....
>>
>> Is that what you were after?
>>
>> Thanks for taking a look!
>>
>> -- Jim
>
> I just was coming back to these to see what new information was
> available, but I realized we'd discussed several tests and I wasn't
> sure what these ones came from. That information is enough, yes.
>
> If in fact you believe you've only seen this with high-level MDS
> debugging, I believe the cause is as I mentioned last time: the MDS
> is flapping a bit and so some files get marked as "needsrecover", but
> they aren't getting recovered asynchronously, and the first thing
> that pokes them into doing a recover is the stat.
OK, that makes sense.
> That's definitely not the behavior we want and so I'll be poking
> around the code a bit and generating bugs, but given that explanation
> it's a bit less scary than random slow stats are so it's not such a
> high priority. :) Do let me know if you come across it without the
> MDS and clients having had connection issues!
No problem - thanks!
-- Jim
> -Greg
>
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
>
next prev parent reply other threads:[~2013-03-18 14:19 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <sfid-H20130305-170326-+024.05-1@marduk.tchpc.tcd.ie>
2013-03-05 17:03 ` CephFS First product release discussion Greg Farnum
2013-03-05 18:08 ` Wido den Hollander
2013-03-05 18:17 ` Greg Farnum
2013-03-05 18:28 ` Sage Weil
2013-03-05 18:36 ` Wido den Hollander
2013-03-05 18:48 ` Jim Schutt
2013-03-05 19:33 ` Sage Weil
2013-03-06 17:24 ` Wido den Hollander
2013-03-06 19:07 ` Jim Schutt
2013-03-06 19:13 ` CephFS Space Accounting and Quotas (was: CephFS First product release discussion) Greg Farnum
2013-03-06 19:58 ` CephFS Space Accounting and Quotas Jim Schutt
2013-03-06 20:21 ` Greg Farnum
2013-03-06 21:28 ` Jim Schutt
2013-03-06 21:39 ` Greg Farnum
2013-03-06 23:14 ` Jim Schutt
2013-03-07 0:18 ` Greg Farnum
2013-03-07 15:15 ` Jim Schutt
2013-03-08 22:45 ` Jim Schutt
2013-03-09 2:05 ` Greg Farnum
2013-03-11 14:47 ` Jim Schutt
2013-03-11 15:48 ` Greg Farnum
2013-03-11 16:48 ` Jim Schutt
2013-03-11 16:57 ` Greg Farnum
2013-03-11 20:40 ` Jim Schutt
2013-03-12 22:34 ` Jim Schutt
[not found] ` <513FAE0F.2010608@sandia.gov>
[not found] ` <BE627BF4B6E74BD49037D07821FC1DB9@inktank.com>
[not found] ` <5143AA84.50409@sandia.gov>
2013-03-15 23:17 ` Greg Farnum
2013-03-18 14:19 ` Jim Schutt [this message]
2013-03-06 21:42 ` Sage Weil
2013-03-06 5:01 ` [ceph-users] CephFS First product release discussion Neil Levine
[not found] ` <CANygib-U_MQi1TMmQuT_Q9MVwPfT+PzJwN=+BMcBK69WuRfu3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-07 13:11 ` Félix Ortega Hortigüela
[not found] ` <E0B1337A572647BA9FCC0CE8CA946F42-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org>
2013-03-07 11:54 ` Jimmy Tang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5147225B.5060702@sandia.gov \
--to=jaschut@sandia.gov \
--cc=ceph-devel@vger.kernel.org \
--cc=greg@inktank.com \
--cc=wido@42on.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.