All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bart Samwel <bart@samwel.tk>
To: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Andrew Morton <akpm@osdl.org>, Sam Vilain <sam@vilain.net>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Write the inode itself in block_fsync()
Date: Fri, 10 Mar 2006 18:32:52 +0100	[thread overview]
Message-ID: <4411B844.1080108@samwel.tk> (raw)
In-Reply-To: <8764mms4zt.fsf@duaron.myhome.or.jp>

OGAWA Hirofumi wrote:
> Bart Samwel <bart@samwel.tk> writes:
> 
>> Andrew Morton wrote:
>>> Sam Vilain <sam@vilain.net> wrote:
>>>> OGAWA Hirofumi wrote:
>>  >>
>>>>  Ouch... won't that halve performance of database transaction logs?
>>> Yes, it could well cause a lot more seeking to do atime and/or mtime
>>> writes.   Which aren't terribly important, really.
>>>
>>> Unless I'm missing something, I suspect we'd be better off without this,
>>> even though it's a correctness fix :(
>> Maybe atime/mtime aren't important, but I would be unhappy if a file 
>> size change wasn't written to disk on fsync.
> 
> Please don't worry, we should be doing a right thing for normal files
> already. This patch is just for block device file.

Ahhh, I missed that. I interpreted:

 >For block device's inode, we don't write a inode's meta data
 >itself. But, I think we should write inode's meta data for fsync().

as "for block devices we don't, for normal files, yes", but apparently 
that's not what you meant. :-)

>> Anyway, shouldn't databases be using a combination of fixed-size files 
>> and fdatasync? fsync doesn't perform well by definition, and I guess the 
>> only reason databases still use it is because the kernel failed to 
>> implement the sucky part of the behaviour.
> 
> Yes, I agree. The changes of atime/mtime only sets I_DIRTY_SYNC, so,
> usually this patch doesn't change fdatasync() at all.
> 
> Umm... however, I also can understand what akpm says.... check some databases.
> 
> 	berkeley db 4.4: use fdatasync() if available
>         mysql 5.0:	 use fdatasync() if available (innobase)
> 			 use fsync() (bdb)
> 	postgresql:	 use fdatasync() if available
> 	sqlite:		 use fsync

Nice piece of info. Apparently all of the "large" database engines can 
use fdatasync, only the smaller ones (bdb, sqlite) don't. I've done some 
extra research:

* From a quick look at the docs it seems to me that bdb can't be 
configured to put its transaction log directly on a block device, so bdb 
won't be affected.

* SQLite definitely can't write logs to a block device, the docs 
explicitly say that the transaction log is a regular file with a 
specific name, so we can write off sqlite as well. (It does seem to use 
fdatasync btw, since version 3.2.6, see http://www.sqlite.org/changes.html.)

If we've missed none, that leaves only proprietary databases at risk. 
But I would be genuinely surprised if a database like Oracle would use 
fsync. If we assume that Oracle et al. are not a problem, the risks of 
this patch are very low.

Cheers,
Bart

      reply	other threads:[~2006-03-10 17:33 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-09 16:22 [PATCH] Write the inode itself in block_fsync() OGAWA Hirofumi
2006-03-10  1:05 ` Sam Vilain
2006-03-10  4:10   ` Andrew Morton
2006-03-10 14:12     ` Bart Samwel
2006-03-10 15:18       ` OGAWA Hirofumi
2006-03-10 17:32         ` Bart Samwel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4411B844.1080108@samwel.tk \
    --to=bart@samwel.tk \
    --cc=akpm@osdl.org \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sam@vilain.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.