From: Andrew Morton <akpm@digeo.com>
To: Alex Tomas <bzzz@tmi.comex.ru>
Cc: linux-kernel@alex.org.uk, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: 2.5.59-mm5
Date: Fri, 24 Jan 2003 03:50:17 -0800 [thread overview]
Message-ID: <20030124035017.6276002f.akpm@digeo.com> (raw)
In-Reply-To: <m3d6mmvlip.fsf@lexa.home.net>
Alex Tomas <bzzz@tmi.comex.ru> wrote:
>
> >>>>> Andrew Morton (AM) writes:
>
> AM> But writes are completely different. There is no dependency
> AM> between them and at any point in time we know where on-disk a lot
> AM> of writes will be placed. We don't know that for reads, which is
> AM> why we need to twiddle thumbs until the application or filesystem
> AM> makes up its mind.
>
>
> it's significant that application doesn't want to wait read completion
> long and doesn't wait for write completion in most cases.
That's correct. Reads are usually synchronous and writes are rarely
synchronous.
The most common place where the kernel forces a user process to wait on
completion of a write is actually in unlink (truncate, really). Because
truncate must wait for in-progress I/O to complete before allowing the
filesystem to free (and potentially reuse) the affected blocks.
If there's a lot of writeout happening then truncate can take _ages_. Hence
this patch:
Truncates can take a very long time. Especially if there is a lot of
writeout happening, because truncate must wait on in-progress I/O.
And sys_unlink() is performing that truncate while holding the parent
directory's i_sem. This basically shuts down new accesses to the entire
directory until the synchronous I/O completes.
In the testing I've been doing, that directory is /tmp, and this hurts.
So change sys_unlink() to perform the actual truncate outside i_sem.
When there is a continuous streaming write to the same disk, this patch
reduces the time for `make -j4 bzImage' from 370 seconds to 220.
namei.c | 12 ++++++++++++
1 files changed, 12 insertions(+)
diff -puN fs/namei.c~unlink-latency-fix fs/namei.c
--- 25/fs/namei.c~unlink-latency-fix 2003-01-24 02:41:04.000000000 -0800
+++ 25-akpm/fs/namei.c 2003-01-24 02:47:36.000000000 -0800
@@ -1659,12 +1659,19 @@ int vfs_unlink(struct inode *dir, struct
return error;
}
+/*
+ * Make sure that the actual truncation of the file will occur outside its
+ * diretory's i_sem. truncate can take a long time if there is a lot of
+ * writeout happening, and we don't want to prevent access to the directory
+ * while waiting on the I/O.
+ */
asmlinkage long sys_unlink(const char * pathname)
{
int error = 0;
char * name;
struct dentry *dentry;
struct nameidata nd;
+ struct inode *inode = NULL;
name = getname(pathname);
if(IS_ERR(name))
@@ -1683,6 +1690,9 @@ asmlinkage long sys_unlink(const char *
/* Why not before? Because we want correct error value */
if (nd.last.name[nd.last.len])
goto slashes;
+ inode = dentry->d_inode;
+ if (inode)
+ inode = igrab(inode);
error = vfs_unlink(nd.dentry->d_inode, dentry);
exit2:
dput(dentry);
@@ -1693,6 +1703,8 @@ exit1:
exit:
putname(name);
+ if (inode)
+ iput(inode); /* truncate the inode here */
return error;
slashes:
_
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@digeo.com>
To: Alex Tomas <bzzz@tmi.comex.ru>
Cc: linux-kernel@alex.org.uk, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: 2.5.59-mm5
Date: Fri, 24 Jan 2003 03:50:17 -0800 [thread overview]
Message-ID: <20030124035017.6276002f.akpm@digeo.com> (raw)
In-Reply-To: <m3d6mmvlip.fsf@lexa.home.net>
Alex Tomas <bzzz@tmi.comex.ru> wrote:
>
> >>>>> Andrew Morton (AM) writes:
>
> AM> But writes are completely different. There is no dependency
> AM> between them and at any point in time we know where on-disk a lot
> AM> of writes will be placed. We don't know that for reads, which is
> AM> why we need to twiddle thumbs until the application or filesystem
> AM> makes up its mind.
>
>
> it's significant that application doesn't want to wait read completion
> long and doesn't wait for write completion in most cases.
That's correct. Reads are usually synchronous and writes are rarely
synchronous.
The most common place where the kernel forces a user process to wait on
completion of a write is actually in unlink (truncate, really). Because
truncate must wait for in-progress I/O to complete before allowing the
filesystem to free (and potentially reuse) the affected blocks.
If there's a lot of writeout happening then truncate can take _ages_. Hence
this patch:
Truncates can take a very long time. Especially if there is a lot of
writeout happening, because truncate must wait on in-progress I/O.
And sys_unlink() is performing that truncate while holding the parent
directory's i_sem. This basically shuts down new accesses to the entire
directory until the synchronous I/O completes.
In the testing I've been doing, that directory is /tmp, and this hurts.
So change sys_unlink() to perform the actual truncate outside i_sem.
When there is a continuous streaming write to the same disk, this patch
reduces the time for `make -j4 bzImage' from 370 seconds to 220.
namei.c | 12 ++++++++++++
1 files changed, 12 insertions(+)
diff -puN fs/namei.c~unlink-latency-fix fs/namei.c
--- 25/fs/namei.c~unlink-latency-fix 2003-01-24 02:41:04.000000000 -0800
+++ 25-akpm/fs/namei.c 2003-01-24 02:47:36.000000000 -0800
@@ -1659,12 +1659,19 @@ int vfs_unlink(struct inode *dir, struct
return error;
}
+/*
+ * Make sure that the actual truncation of the file will occur outside its
+ * diretory's i_sem. truncate can take a long time if there is a lot of
+ * writeout happening, and we don't want to prevent access to the directory
+ * while waiting on the I/O.
+ */
asmlinkage long sys_unlink(const char * pathname)
{
int error = 0;
char * name;
struct dentry *dentry;
struct nameidata nd;
+ struct inode *inode = NULL;
name = getname(pathname);
if(IS_ERR(name))
@@ -1683,6 +1690,9 @@ asmlinkage long sys_unlink(const char *
/* Why not before? Because we want correct error value */
if (nd.last.name[nd.last.len])
goto slashes;
+ inode = dentry->d_inode;
+ if (inode)
+ inode = igrab(inode);
error = vfs_unlink(nd.dentry->d_inode, dentry);
exit2:
dput(dentry);
@@ -1693,6 +1703,8 @@ exit1:
exit:
putname(name);
+ if (inode)
+ iput(inode); /* truncate the inode here */
return error;
slashes:
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
next prev parent reply other threads:[~2003-01-24 11:40 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-01-24 3:50 2.5.59-mm5 Andrew Morton
2003-01-24 3:50 ` 2.5.59-mm5 Andrew Morton
2003-01-24 11:03 ` 2.5.59-mm5 Alex Bligh - linux-kernel
2003-01-24 11:03 ` 2.5.59-mm5 Alex Bligh - linux-kernel
2003-01-24 11:16 ` 2.5.59-mm5 Andrew Morton
2003-01-24 11:16 ` 2.5.59-mm5 Andrew Morton
2003-01-24 11:23 ` 2.5.59-mm5 Alex Tomas
2003-01-24 11:23 ` 2.5.59-mm5 Alex Tomas
2003-01-24 11:50 ` Andrew Morton [this message]
2003-01-24 11:50 ` 2.5.59-mm5 Andrew Morton
2003-01-24 12:05 ` 2.5.59-mm5 Alex Tomas
2003-01-24 12:05 ` 2.5.59-mm5 Alex Tomas
2003-01-24 19:12 ` 2.5.59-mm5 Andrew Morton
2003-01-24 19:12 ` 2.5.59-mm5 Andrew Morton
2003-01-24 19:58 ` 2.5.59-mm5 Alex Tomas
2003-01-24 19:58 ` 2.5.59-mm5 Alex Tomas
2003-01-25 17:32 ` 2.5.59-mm5 Ed Tomlinson
2003-01-25 17:41 ` 2.5.59-mm5 Andrew Morton
2003-01-25 20:34 ` 2.5.59-mm5 Ed Tomlinson
2003-01-25 22:33 ` 2.5.59-mm5 Andrew Morton
2003-01-26 1:43 ` 2.5.59-mm5 Ed Tomlinson
2003-01-26 2:17 ` 2.5.59-mm5 Andrew Morton
2003-01-26 3:51 ` 2.5.59-mm5 Ed Tomlinson
2003-01-26 4:04 ` 2.5.59-mm5 Andrew Morton
2003-01-24 15:56 ` 2.5.59-mm5 Oliver Xymoron
2003-01-24 15:56 ` 2.5.59-mm5 Oliver Xymoron
2003-01-24 16:04 ` 2.5.59-mm5 Nick Piggin
2003-01-24 16:04 ` 2.5.59-mm5 Nick Piggin
2003-01-24 17:09 ` 2.5.59-mm5 Giuliano Pochini
2003-01-24 17:09 ` 2.5.59-mm5 Giuliano Pochini
2003-01-24 17:22 ` 2.5.59-mm5 Nick Piggin
2003-01-24 17:22 ` 2.5.59-mm5 Nick Piggin
2003-01-24 19:34 ` 2.5.59-mm5 Valdis.Kletnieks
2003-01-24 20:04 ` 2.5.59-mm5 Jens Axboe
2003-01-24 20:04 ` 2.5.59-mm5 Jens Axboe
2003-01-24 22:02 ` 2.5.59-mm5 Valdis.Kletnieks
2003-01-25 12:28 ` 2.5.59-mm5 Jens Axboe
2003-01-25 12:28 ` 2.5.59-mm5 Jens Axboe
2003-01-24 12:14 ` 2.5.59-mm5 Nikita Danilov
2003-01-24 12:14 ` 2.5.59-mm5 Nikita Danilov
2003-01-24 16:00 ` 2.5.59-mm5 Nick Piggin
2003-01-24 16:00 ` 2.5.59-mm5 Nick Piggin
2003-01-24 11:23 ` 2.5.59-mm5 Jens Axboe
2003-01-24 11:23 ` 2.5.59-mm5 Jens Axboe
2003-01-24 13:59 ` 2.5.59-mm5 got stuck during boot Helge Hafting
2003-01-24 13:59 ` Helge Hafting
2003-01-24 17:44 ` Ed Tomlinson
2003-01-24 17:56 ` Nick Piggin
2003-01-24 19:18 ` Ed Tomlinson
2003-01-24 16:17 ` 2.5.59-mm5 jlnance
2003-01-24 19:05 ` 2.5.59-mm5 Andrew Morton
2003-01-25 8:33 ` 2.5.59-mm5 Andres Salomon
2003-01-25 8:33 ` 2.5.59-mm5 Andres Salomon
-- strict thread matches above, loose matches on Subject: below --
2003-01-24 16:59 2.5.59-mm5 Luck, Tony
2003-01-24 21:31 ` 2.5.59-mm5 Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030124035017.6276002f.akpm@digeo.com \
--to=akpm@digeo.com \
--cc=bzzz@tmi.comex.ru \
--cc=linux-kernel@alex.org.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.