From: Bart Samwel <bart@samwel.tk>
To: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Andrew Morton <akpm@osdl.org>, Sam Vilain <sam@vilain.net>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Write the inode itself in block_fsync()
Date: Fri, 10 Mar 2006 18:32:52 +0100 [thread overview]
Message-ID: <4411B844.1080108@samwel.tk> (raw)
In-Reply-To: <8764mms4zt.fsf@duaron.myhome.or.jp>
OGAWA Hirofumi wrote:
> Bart Samwel <bart@samwel.tk> writes:
>
>> Andrew Morton wrote:
>>> Sam Vilain <sam@vilain.net> wrote:
>>>> OGAWA Hirofumi wrote:
>> >>
>>>> Ouch... won't that halve performance of database transaction logs?
>>> Yes, it could well cause a lot more seeking to do atime and/or mtime
>>> writes. Which aren't terribly important, really.
>>>
>>> Unless I'm missing something, I suspect we'd be better off without this,
>>> even though it's a correctness fix :(
>> Maybe atime/mtime aren't important, but I would be unhappy if a file
>> size change wasn't written to disk on fsync.
>
> Please don't worry, we should be doing a right thing for normal files
> already. This patch is just for block device file.
Ahhh, I missed that. I interpreted:
>For block device's inode, we don't write a inode's meta data
>itself. But, I think we should write inode's meta data for fsync().
as "for block devices we don't, for normal files, yes", but apparently
that's not what you meant. :-)
>> Anyway, shouldn't databases be using a combination of fixed-size files
>> and fdatasync? fsync doesn't perform well by definition, and I guess the
>> only reason databases still use it is because the kernel failed to
>> implement the sucky part of the behaviour.
>
> Yes, I agree. The changes of atime/mtime only sets I_DIRTY_SYNC, so,
> usually this patch doesn't change fdatasync() at all.
>
> Umm... however, I also can understand what akpm says.... check some databases.
>
> berkeley db 4.4: use fdatasync() if available
> mysql 5.0: use fdatasync() if available (innobase)
> use fsync() (bdb)
> postgresql: use fdatasync() if available
> sqlite: use fsync
Nice piece of info. Apparently all of the "large" database engines can
use fdatasync, only the smaller ones (bdb, sqlite) don't. I've done some
extra research:
* From a quick look at the docs it seems to me that bdb can't be
configured to put its transaction log directly on a block device, so bdb
won't be affected.
* SQLite definitely can't write logs to a block device, the docs
explicitly say that the transaction log is a regular file with a
specific name, so we can write off sqlite as well. (It does seem to use
fdatasync btw, since version 3.2.6, see http://www.sqlite.org/changes.html.)
If we've missed none, that leaves only proprietary databases at risk.
But I would be genuinely surprised if a database like Oracle would use
fsync. If we assume that Oracle et al. are not a problem, the risks of
this patch are very low.
Cheers,
Bart
prev parent reply other threads:[~2006-03-10 17:33 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-09 16:22 [PATCH] Write the inode itself in block_fsync() OGAWA Hirofumi
2006-03-10 1:05 ` Sam Vilain
2006-03-10 4:10 ` Andrew Morton
2006-03-10 14:12 ` Bart Samwel
2006-03-10 15:18 ` OGAWA Hirofumi
2006-03-10 17:32 ` Bart Samwel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4411B844.1080108@samwel.tk \
--to=bart@samwel.tk \
--cc=akpm@osdl.org \
--cc=hirofumi@mail.parknet.co.jp \
--cc=linux-kernel@vger.kernel.org \
--cc=sam@vilain.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.