From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Whitehouse Date: Wed, 02 Jan 2008 10:19:28 +0000 Subject: [Cluster-devel] Re: Why the gfs2 performance regressed? In-Reply-To: <91b13c310801012353i2f57a6c4o884b3e9aeab5970a@mail.gmail.com> References: <91b13c310801012353i2f57a6c4o884b3e9aeab5970a@mail.gmail.com> Message-ID: <1199269168.22038.29.camel@quoit> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, On Wed, 2008-01-02 at 15:53 +0800, Cheng Renquan wrote: > hello, Steven: > > I have tested gfs2 in the out-of-box RHEL51 and in the latest gfs2-nmw > git repository, finding that the latest gfs2 performance regressed. > > the testing environment is RHEL5.1, default kernel + gfs2 + samba, I > use the preinstalled samba-3.0.25b to share a gfs2 mouted direcotry, > on Windows clients I've tested its performance is very good! it could > be up to 100MB/s on writing, almost up to the limit of the local > Gigabyte ether network, > > Now I want to test the latest gfs2 from gfs2-nmw.git repository, I > just compiled the gfs2-nmw kernel to replace the default, using the > .config from /boot/config-$(uname -r) > > But I found the performance is bad: > > Writers on Windows can only up to 6 MB/s, at the time I noticed on the linux: > > 1. from this I know the smbd thread 3627 is serving the Windows client > 192.168.76.226: > # lsof -nPi4 > smbd 3627 fstest 22u IPv4 7806 TCP > 192.168.76.200:445->192.168.76.226:1369 (ESTABLISHED) > > 2. list all opening files of thread 3627, finding that /mnt/gfs2/x.dat > is currently accessed by Windows client testing utilities through > samba: > # lsof -nP -a -p 3627 > smbd 3627 fstest 18uR REG 253,0 328704000 66392 /mnt/gfs2/x.dat > smbd 3627 fstest 22u IPv4 7806 TCP > 192.168.76.200:445->192.168.76.226:1369 (ESTABLISHED) > > 3. strace it and record every system call time, output the results to a file: > # strace -T -p 3627 -o fcntl64.3267 > > 4. sometime later interrupt it, and analysis the result. > > 5. from the result, I found some fcntl64 system call consumes 0.9 second: > fcntl64(18, F_GETLK64, {type=F_UNLCK, whence=SEEK_SET, > start=325263360, len=65536, pid=0}) = 0 <0.915883> > fcntl64(18, F_GETLK64, {type=F_UNLCK, whence=SEEK_SET, start=22093824, > len=65536, pid=0}) = 0 <0.916389> > ... > > 6. if straced thread 3627 without output redirected, the fcntl64 will > appear as an apparent pause. > > 7. Since the default-kernel+gfs2+samba serves high efficiently, and > the latest gfs2-nmw does not, this should be some problem in gfs2-nmw? > > 8. I noted sometimes the thread 3627 will becomes D state > (uninterruptible), I use /proc/sysrq-trigger to record the call trace > in the kernel space: > smbd D 00000000 2080 3627 3620 > 00000000 00200082 00000001 00000000 c042148e 00000000 19b1e094 000002b6 > f68f92c0 f68f94f0 c3616d80 00000000 f6dda040 000004ab 00000000 00000003 > f605ae9c f8c54340 c043bc87 f34fa0c0 f605af14 f605ae9c f488fe00 f8c4fbea > Call Trace: > [] __wake_up_common+0x32/0x5c > [] prepare_to_wait+0x24/0x3f > [] gdlm_plock_get+0xeb/0x16b [lock_dlm] > [] autoremove_wake_function+0x0/0x35 > [] gdlm_plock_get+0x0/0x16b [lock_dlm] > [] gfs2_lm_plock_get+0x30/0x39 [gfs2] > [] gfs2_lock+0x78/0xb5 [gfs2] > [] gfs2_lock+0x0/0xb5 [gfs2] > [] vfs_test_lock+0x18/0x23 > [] fcntl_getlk64+0x5f/0x108 > [] sys_fcntl64+0x38/0x6d > [] sysenter_past_esp+0x5f/0x85 > > 9. Someone could tell me if there is a problem of gfs2-nmw or not? Or > the configuration of mine has problems? > > the two attachments are the results of strace and dmesg, fcntl64.3267 > and dmesg.1 > Are you running single node? if so then use lock_nolock rather than lock_dlm as it will be much faster for fcntl locks. Even if you intend to run as a cluster eventually, a single node comparison against lock_nolock would be useful to try and eliminate some possibilities, Steve.