All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fan Yong <yong.fan@whamcloud.com>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] New test results for "ls -Ul"
Date: Mon, 30 May 2011 16:11:59 +0800	[thread overview]
Message-ID: <4DE3514F.2050903@whamcloud.com> (raw)
In-Reply-To: <BA5D598A-2A89-48DF-A67A-4ACDD8B1F409@whamcloud.com>

Inline comments as following:

On 5/30/11 1:51 PM, Jinshan Xiong wrote:
>
> On May 26, 2011, at 6:01 AM, Eric Barton wrote:
>
>> Nasf,
>> Interesting results.  Thank you - especially for graphing the results 
>> so thoroughly.
>> I?m attaching them here and cc-ing lustre-devel since these are of 
>> general interest.
>> I don?t think your conclusion number (1), to say CLIO locking is 
>> slowing us down
>> is as obvious from these results as you imply.  If you just compare 
>> the 1.8 and
>> patched 2.x per-file times and how they scale with #stripes you get this?
>> <image001.png>
>> The gradients of these lines should correspond to the additional time 
>> per stripe required
>> to stat each file and I?ve graphed these times below (ignoring the 
>> 0-stripe data for this
>> calculation because I?m just interested in the incremental per-stripe 
>> overhead).
>> <image004.png>
>> They show per-stripe overhead for 1.8 well above patched 2.x for the 
>> lower stripe
>> counts, but whereas 1.8 gets better with more stripes, patched 2.x 
>> gets worse.  I?m
>> guessing that at high stripe counts, 1.8 puts many concurrent 
>> glimpses on the wire
>> and does it quite efficiently.  I?d like to understand better how you 
>> control the #
>> of glimpse-aheads you keep on the wire ? is it a single fixed number, 
>> or a fixed
>> number per OST or some other scheme?  In any case, it will be 
>> interesting to see
>> measurements at higher stripe counts.
>>
>>     Cheers,
>>                        Eric
>>
>> *From:*Fan Yong [mailto:yong.fan at whamcloud.com]
>> *Sent:*12 May 2011 10:18 AM
>> *To:*Eric Barton
>> *Cc:*Bryon Neitzel; Ian Colle; Liang Zhen
>> *Subject:*New test results for "ls -Ul"
>>
>> I have improved statahead load balance mechanism to distribute 
>> statahead load to more CPU units on client. And adjusted AGL 
>> according to CLIO lock state machine. After those improvement, 'ls 
>> -Ul' can run more fast than old patches, especially on large SMP node.
>>
>> On the other hand, as the increasing the degree of parallelism, the 
>> lower network scheduler is becoming performance bottleneck. So I 
>> combine my patches together with Liang's SMP patches in the test.
>>
>>
>> 	
>> client (fat-intel-4, 24 cores)
>> 	
>> server (client-xxx, 4 OSSes, 8 OSTs on each OSS)
>> b2x_patched
>> 	
>> my patches + SMP patches
>> 	
>> my patches
>> b18
>> 	
>> original b1_8
>> 	
>> share the same server with "b2x_patched"
>> b2x_original
>> 	
>> original b2_x
>> 	
>> original b2_x
>>
>>
>> Some notes:
>>
>> 1) Stripe count affects traversing performance much, and the impact 
>> is more than linear. Even if with all the patches applied on b2_x, 
>> the degree of stripe count impact is still larger than b1_8. It is 
>> related with the complex CLIO lock state machine and tedious 
>> iteration/repeat operations. It is not easy to make it run as 
>> efficiently as b1_8.
>
>
> Hi there,
>
> I did some tests to investigate the overhead of clio lock state 
> machine and glimpse lock, and I found something new.
>
> Basically I did the same thing as what Nasf had done, but I only cared 
> about the overhead of glimpse locks. For this purpose, I ran 'ls -lU' 
> twice for each test, and the 1st run is only used to create IBITS 
> UPDATE lock cache for files; then, I dropped cl_locks and ldlm_locks 
> from client side cache by setting zero to lru_size of ldlm namespaces, 
> then do 'ls -lU' once again. In the second run of 'ls -lU', the 
> statahead thread will always find cached IBITS lock(we can check mdc 
> lock_count for sure), so the elapsed time of ls will be glimpse related.
>
> This is what I got from the test:
>
>
>
>
>
> Description and test environment:
> - `ls -Ul time' means the time to finish the second run;
> - 100K means 100K files under the same directory; 400K means 400K 
> files under the same directory;
> - there are two OSSes in my test, and each OSS has 8 OSTs; OSTs are 
> crossed over on two OSSes, i.e., OST0, 2, 4,.. are on OSS0; 1, 3, 5, 
> .. are on OSS1;
> - each node has 12G memory, 4 CPU cores;
> - latest lustre-master build, b140
>
> and, prorated per stripe overhead:
>
>
>
>
>
> From the above test, it's very hard to make the conclusion that 
> cl_lock causes the increase of ls time by the stripe count.
>
> Here is the test script I used to do the test, and test output is 
> attached as well. Please let me know if I missed something.


In theory, processing glimpse RPC for each stripe of the same file 
should be in parallel. So means more stripe count, then less average 
overhead per-stripe, at least it is the expectation. Flat line cannot 
indicate the overhead is small enough. I suggest to compare with b1_8 
for the same tests.


>
>
>
>
>
>
> ===================
> Let's take a step back to reconsider what's real cause in Nasf's test. 
> I tend to think the load on OSSes might cause that symptom. It's 
> obvious that Async Glimpse Lock produces more stress on OSS, 
> especially in his test env where multiple OSTs are actually on the 
> same OSS. This will make the ls time increased by the stripe count as 
> well - since OSS has to handle more RPCs when the stripe count 
> increases in a specific time. This problem may be mitigated by 
> distributing OSTs to more OSSes.


Basically, I agree with you that the heavy load on OSS may be the 
performance bottleneck, just as I said in former email, we found the CPU 
loads on OSS were quite high when "ls -Ul" for large-striped cases. It 
is easy to be verified as long as we have enough powerful OSSes, 
unfortunately we have not now.

Cheers,
--
Nasf


>
> Thanks,
> Jinshan
>
>>
>> 2) Patched b2_x is much faster than original b2_x, for traversing 
>> 400K * 32-striped directory, it is 100 times or more improved.
>>
>> 3) Patched b2_x is also faster than b1_8, within our test, patched 
>> b2_x is at least 4X faster than b1_8, which matches the requirement 
>> in ORNL contract.
>>
>> 4) Original b2_x is faster than b1_8 only for small striped cases, 
>> not more than 4-striped. For large striped cases, slower than b1_8, 
>> which is consistent with ORNL test result.
>>
>> 5) The largest stripe count is 32 in our test. We have not enough 
>> resource to test more large striped cases. And I also wonder whether 
>> it is worth to test more large striped directory or not. Because how 
>> many customers want to use large and full striped directory? means 
>> contains 1M * 160-striped items in signal directory. If it is rare 
>> case, then wasting lots of time on that is worthless.
>>
>> We need to confirm with ORNL what is the last acceptance test cases 
>> and environment, includes:
>> a) stripe count
>> b) item count
>> c) network latency, w/o lnet router, suggest without router.
>> d) OST count on each OSS
>>
>>
>> Cheers,
>> --
>> Nasf
>> <result_20110512.xls>_______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel@lists.lustre.org>
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20110530/29c7a5d7/attachment.htm>

      reply	other threads:[~2011-05-30  8:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4DCBA5D4.5010902@whamcloud.com>
2011-05-26 13:01 ` [Lustre-devel] New test results for "ls -Ul" Eric Barton
2011-05-26 14:36   ` Fan Yong
2011-05-26 17:40     ` Eric Barton
2011-05-26 19:36       ` Andreas Dilger
2011-05-27  7:58         ` Fan Yong
2011-05-30  5:51   ` Jinshan Xiong
2011-05-30  8:11     ` Fan Yong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DE3514F.2050903@whamcloud.com \
    --to=yong.fan@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.