From: Steven Whitehouse <swhiteho@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] seq_file: Use larger buffer to reduce time traversing lists
Date: Fri, 01 Jun 2012 15:18:03 +0100 [thread overview]
Message-ID: <1338560283.2708.59.camel@menhir> (raw)
In-Reply-To: <1338557229.2760.1520.camel@edumazet-glaptop>
Hi,
On Fri, 2012-06-01 at 15:27 +0200, Eric Dumazet wrote:
> On Fri, 2012-06-01 at 14:14 +0100, Steven Whitehouse wrote:
>
> > Here it is (with the patch):
> >
> > [root at chywoon mnt]# time dd if=/sys/kernel/debug/gfs2/unity\:myfs/glocks
> > of=/dev/null bs=4k
> > 0+5726 records in
> > 0+5726 records out
> > 23107575 bytes (23 MB) copied, 82.3082 s, 281 kB/s
> >
> > real 1m22.311s
> > user 0m0.013s
> > sys 1m22.231s
> >
> > So thats slow, as promised :-)
> >
> > > I can't reproduce this slow behavior you have, using /proc/net seq
> > > files.
> > >
> > > Isn't it a problem with this particular file ?
> > >
> > Well, yes and no. The problem would affect any file with lots of records
> > in it, but there may not be many with that number of records. Do any of
> > your net files have numbers of entries in the region of hundreds of
> > thousands or more?
> >
> > > Does it want to output a single record ( m->op->show(m, p) ) much larger
> > > than 4KB ?
> > >
> > No. That appears to work ok, so far as I can tell, anyway. What we have
> > are lots of relatively short records. Here is an example of a few lines.
> > Each line starting G: is a new record, so this is 5 calls to ->show():
> >
> > G: s:SH n:5/1da5e f:Iqob t:SH d:EX/0 a:0 v:0 r:2 m:200
> > H: s:SH f:EH e:0 p:6577 [(ended)] gfs2_inode_lookup+0x116/0x2d0 [gfs2]
> > G: s:SH n:2/a852 f:IqLob t:SH d:EX/0 a:0 v:0 r:2 m:200
> > I: n:9712/43090 t:8 f:0x00 d:0x00000000 s:0
> > G: s:SH n:2/8bcd f:IqLob t:SH d:EX/0 a:0 v:0 r:2 m:200
> > I: n:2584/35789 t:8 f:0x00 d:0x00000000 s:0
> > G: s:SH n:2/1eea7 f:IqLob t:SH d:EX/0 a:0 v:0 r:2 m:200
> > I: n:58968/126631 t:8 f:0x00 d:0x00000000 s:0
> > G: s:SH n:2/12fbd f:IqLob t:SH d:EX/0 a:0 v:0 r:2 m:200
> > I: n:11120/77757 t:8 f:0x00 d:0x00000000 s:0
> >
> >
> > The key here is that we have a lot of them. My example using just over
> > 400k records is in fact a fairly modest example - it is not unusual to
> > see millions of records in this file. We use it for debug purposes only,
> > and this patch was prompted by people reporting it taking a very long
> > time to dump the file.
> >
> > The issue is not the time taken to create each record, or to copy the
> > data, but the time taken each time we have to find our place again in
> > the list of glocks (actually a hash table, but same thing applies as we
> > traverse it as a set of lists)
> >
> > I don't think there is really much we can easily do in the case of
> > readers requesting small reads of the file. At least we can make it much
> > more efficient when they request larger reads though,
>
> Issue is your seq_file provider has O(N^2) behavior
>
Well that is a slight simplification. The provider has O(N) behaviour
wrt to the number of entries when streaming data, and O(N) behaviour
when seeking to a specific file offset. It is the combination of
repeated calls of the seeking function that results in O(N^2)
> We used to have same issues in network land, and we fixed this some time
> ago, and we only use 4KB as seq_file buffer, not a huge one.
>
> Check commit a8b690f98baf9fb1 ( tcp: Fix slowness in
> read /proc/net/tcp ) for an example
>
So far as I can tell, you are just caching the last hash bucket, but are
still seeking down each hash chain. That will only work well when the
hash buckets are reasonably empty. We can try that in GFS2, however I'm
not sure that is the whole story.
It appears that /proc/net/unix still seems to suffer from this O(N^2)
problem, even if /proc/net/tcp does not. I did a quick test using a
program to create a bunch of tcp sockets, and using my kernel patch
landed up with these results:
[root at chywoon mnt]# time dd if=/proc/net/tcp of=/dev/null bs=4k
0+1046 records in
0+1046 records out
4236300 bytes (4.2 MB) copied, 1.20739 s, 3.5 MB/s
real 0m1.310s
user 0m0.003s
sys 0m1.220s
[root at chywoon mnt]# time dd if=/proc/net/tcp of=/dev/null bs=1M
0+5 records in
0+5 records out
4236300 bytes (4.2 MB) copied, 0.331139 s, 12.8 MB/s
real 0m0.470s
user 0m0.001s
sys 0m0.374s
So even with the current tcp scheme this appears to speed things up by
nearly 3x. Also that was with only 28000 entries in the file,
Steve.
next prev parent reply other threads:[~2012-06-01 14:18 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-01 10:39 [Cluster-devel] seq_file: Use larger buffer to reduce time traversing lists Steven Whitehouse
[not found] ` <1338552626.2760.1510.camel@edumazet-glaptop>
2012-06-01 12:24 ` Steven Whitehouse
[not found] ` <1338554890.2760.1517.camel@edumazet-glaptop>
2012-06-01 13:14 ` Steven Whitehouse
[not found] ` <1338557229.2760.1520.camel@edumazet-glaptop>
2012-06-01 14:18 ` Steven Whitehouse [this message]
[not found] ` <1338562627.2760.1526.camel@edumazet-glaptop>
2012-06-01 15:28 ` Steven Whitehouse
[not found] ` <1338562897.2760.1528.camel@edumazet-glaptop>
[not found] ` <1338563900.2760.1529.camel@edumazet-glaptop>
2012-06-01 15:30 ` Steven Whitehouse
[not found] ` <1338552870.2760.1512.camel@edumazet-glaptop>
2012-06-01 12:26 ` Steven Whitehouse
2012-06-01 15:54 ` Joe Perches
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1338560283.2708.59.camel@menhir \
--to=swhiteho@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).