[Cluster-devel] seq_file: Use larger buffer to reduce time traversing lists

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Steven Whitehouse <swhiteho@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] seq_file: Use larger buffer to reduce time traversing lists
Date: Fri, 01 Jun 2012 11:39:53 +0100	[thread overview]
Message-ID: <1338547193.2708.16.camel@menhir> (raw)

I've just been taking a look at the seq_read() code, since we've noticed
that dumping files with large numbers of records can take some
considerable time. This is due to seq_read() using a buffer which, at
most is a single page in size, and that it has to find its place again
on every call to seq_read(). That makes it rather inefficient.

As an example, I created a GFS2 filesystem with 100k inodes in it, and
then ran ls -l to get a decent number of cached inodes. This result in
there being approx 400k lines in the debugfs file containing GFS2's
glocks. I then timed how long it takes to read this file:

[root at chywoon mnt]# time dd if=/sys/kernel/debug/gfs2/unity\:myfs/glocks
of=/dev/null bs=1M
0+5769 records in
0+5769 records out
23273958 bytes (23 MB) copied, 63.3681 s, 367 kB/s

real    1m3.371s
user    0m0.005s
sys     1m3.317s

I tried this several times and it seems pretty consistent. Then I tried
again with the below patch applied:

[root at chywoon mnt]# time dd if=/sys/kernel/debug/gfs2/unity\:myfs/glocks
of=/dev/null bs=1M
0+23 records in
0+23 records out
23107575 bytes (23 MB) copied, 1.64175 s, 14.1 MB/s

real	0m1.644s
user	0m0.000s
sys	0m1.643s

So that is a speed up of approx 38x, which isn't too bad I think. I
noticed that currently, the code caches the buffer page rather than
doing a free/alloc on each seq_read() call. Since my patch can make that
buffer rather large, I was not sure that this was a good idea, in
general, as it would seem to lead to potential DoS attacks in pinning
down large amounts of kmalloced memory. So my patch will free the buffer
on each cycle unless it is only a single page in size. That way we won't
pin down any more memory than the current code does, even when using a
larger buffer. I suspect that any time spent doing alloc/free will be
more than made up by the reduction in time traversing the records in the
file.

Also, since multipage kmalloc allocations are likely to fail in low
memory conditions, I've made that happen silently, and it falls back to
a single page buffer if a larger buffer cannot be allocated.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 0cbd049..feb86f2 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -153,7 +153,9 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 	size_t n;
 	void *p;
 	int err = 0;
+	unsigned bsize = clamp(size, PAGE_SIZE, KMALLOC_MAX_SIZE);

+	bsize = round_up(bsize, PAGE_SIZE);
 	mutex_lock(&m->lock);

 	/*
@@ -169,6 +171,16 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 	 */
 	m->version = file->f_version;

+	if ((m->size < bsize) || !m->buf) {
+		kfree(m->buf);
+		m->size = 0;
+		m->buf = kmalloc(m->size = bsize, GFP_KERNEL|__GFP_NOWARN);
+		if (!m->buf)
+			m->buf = kmalloc(m->size = PAGE_SIZE, GFP_KERNEL);
+		if (!m->buf)
+			goto Enomem;
+	}
+
 	/* Don't assume *ppos is where we left it */
 	if (unlikely(*ppos != m->read_pos)) {
 		while ((err = traverse(m, *ppos)) == -EAGAIN)
@@ -277,6 +289,11 @@ Done:
 		m->read_pos += copied;
 	}
 	file->f_version = m->version;
+	if (m->size > PAGE_SIZE) {
+		kfree(m->buf);
+		m->buf = NULL;
+		m->size = 0;
+	}
 	mutex_unlock(&m->lock);
 	return copied;
 Enomem:

next             reply	other threads:[~2012-06-01 10:39 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-01 10:39 Steven Whitehouse [this message]
     [not found] ` <1338552626.2760.1510.camel@edumazet-glaptop>
2012-06-01 12:24   ` [Cluster-devel] seq_file: Use larger buffer to reduce time traversing lists Steven Whitehouse
     [not found]     ` <1338554890.2760.1517.camel@edumazet-glaptop>
2012-06-01 13:14       ` Steven Whitehouse
     [not found]         ` <1338557229.2760.1520.camel@edumazet-glaptop>
2012-06-01 14:18           ` Steven Whitehouse
     [not found]             ` <1338562627.2760.1526.camel@edumazet-glaptop>
2012-06-01 15:28               ` Steven Whitehouse
     [not found]               ` <1338562897.2760.1528.camel@edumazet-glaptop>
     [not found]                 ` <1338563900.2760.1529.camel@edumazet-glaptop>
2012-06-01 15:30                   ` Steven Whitehouse
     [not found]   ` <1338552870.2760.1512.camel@edumazet-glaptop>
2012-06-01 12:26     ` Steven Whitehouse
2012-06-01 15:54 ` Joe Perches

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:0cbd049 dfblob:feb86f2 )
 OR (
bs:"[Cluster-devel] seq_file: Use larger buffer to reduce time traversing lists" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1338547193.2708.16.camel@menhir \
    --to=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).