From: Andrew Morton <akpm@osdl.org>
To: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [EXPERIMENT,RFC] FAT: Add "flush" option for hotplug devices
Date: Thu, 20 Oct 2005 13:50:14 -0700 [thread overview]
Message-ID: <20051020135014.2289fa01.akpm@osdl.org> (raw)
In-Reply-To: <871x2gf8f5.fsf@devron.myhome.or.jp>
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>
> Hi,
>
> This adds new "flush" option on experiment for hotplug devices.
>
> Current implementation of "flush" option does,
>
> - synchronizing data pages at ->release() (last close(2))
> - if user's work seems to be done (fs is not active), all
> metadata syncs by pdflush()
Seems like a sensible thing to do.
> This option would provide kind of sane progress, and dirty buffers is
> flushed more frequently (if fs is not active).
Your implementation doesn't really do this. bdi_write_congested() only
returns true if the device is super-busy. To determine whether it's "not
active" we'd need to peek at the queue's disk_stat accounting, or at the
queue's outstanding read/write requests. We covered this a couple of weeks
ago in the context of Con's swap prefetch work.
> This option doesn't
> provide any robustness (robustness is provided by other options), but
> probably the option is proper for hotplug devices.
Well... It does a full fsync_super() - that's pretty robust.
> +EXPORT_SYMBOL(filemap_write_and_wait);
_GPL please.
> +EXPORT_SYMBOL(fsync_super);
Ditto
> +/*
> + * Copyright (C) 2005, OGAWA Hirofumi
> + * Released under GPL v2.
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/blkdev.h>
> +#include <linux/writeback.h>
> +#include <linux/msdos_fs.h>
> +
> +int fat_sync_fdata(struct inode *inode, struct file *filp)
> +{
> + int err = 0;
> +
> + if (filp->f_mode & FMODE_WRITE) {
> +#if 1
> + current->flags |= PF_SYNCWRITE;
> + err = filemap_write_and_wait(inode->i_mapping);
> + current->flags &= ~PF_SYNCWRITE;
> +#else
> + down(&inode->i_sem);
> +#if 1
> + err = generic_osync_inode(inode, inode->i_mapping, OSYNC_DATA);
> +#else
> + err = filp->f_op->fsync(filp, filp->f_dentry, 1);
> +#endif
> + up(&inode->i_sem);
> +#endif
> + }
> + return err;
> +}
Can't we just split up do_fsync() a bit and use that?
> +static void fat_pdflush_handler(unsigned long arg)
> +{
> + struct super_block *sb = (struct super_block *)arg;
> + fsync_super(sb);
> +}
It would be nice if /proc/sys/vm/dirty_writeback_centisecs was a per-fs
thing. That's non-trivial.
> +static void fat_flush_timer(unsigned long data)
> +{
> + struct super_block *sb = (struct super_block *)data;
> + struct msdos_sb_info *sbi = MSDOS_SB(sb);
> + struct backing_dev_info *bdi = blk_get_backing_dev_info(sb->s_bdev);
> + unsigned long last_flush_jiff;
> +
> + if (bdi_write_congested(bdi)) {
As indicated above, this won't be very effective.
> + mod_timer(&sbi->flush_timer, jiffies + (HZ / 10));
> + return;
> + }
> +
> + last_flush_jiff = sbi->last_flush_jiff;
> +
> + if (!time_after_eq(jiffies, last_flush_jiff + (HZ / 2))) {
> + mod_timer(&sbi->flush_timer, last_flush_jiff + (HZ / 2));
> + return;
> + }
What's the above doing?
> + if (pdflush_operation(fat_pdflush_handler, (unsigned long)sb) < 0)
> + mod_timer(&sbi->flush_timer, jiffies + HZ);
> +}
> +
> +void __fat_mark_flush(struct super_block *sb)
> +{
> + struct msdos_sb_info *sbi = MSDOS_SB(sb);
> +
> + sbi->last_flush_jiff = jiffies;
> + /*
> + * make sure by smb_wmb() that dirty buffers before here is
> + * processed at the timer routine.
> + */
> + smp_wmb();
> +
> + if (!timer_pending(&sbi->flush_timer))
> + mod_timer(&sbi->flush_timer, jiffies + HZ);
> +}
> +EXPORT_SYMBOL(__fat_mark_flush);
_GPL?
> +void fat_flush_stop(struct super_block *sb)
> +{
> + del_timer_sync(&MSDOS_SB(sb)->flush_timer);
> +}
whoops, the pdflush_operation could still be in progress.
To avoid umount races I think the pdflush callback is going to need to take
sb_lock, increment s_count, take ->s_umount, test ->s_root. Like, for
example, __sync_inodes.
> +void fat_flush_init(struct super_block *sb)
> +{
> + struct msdos_sb_info *sbi = MSDOS_SB(sb);
> + init_timer(&sbi->flush_timer);
> + sbi->flush_timer.data = (unsigned long)sb;
> + sbi->flush_timer.function = fat_flush_timer;
-mm has setup_timer()
> +
> +static inline void fat_mark_flush(struct super_block *sb)
> +{
> + if (MSDOS_SB(sb)->options.flush)
> + __fat_mark_flush(sb);
> +}
It'd be nice to make this a more generic thing, so other filesystems can
use it without copying lots of code.
> + case Opt_flush:
MS_FLUSH? I added MS_DIRSYNC a few years ago - it wasn't too complex.
next prev parent reply other threads:[~2005-10-20 20:50 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-20 16:41 [EXPERIMENT,RFC] FAT: Add "flush" option for hotplug devices OGAWA Hirofumi
2005-10-20 20:50 ` Andrew Morton [this message]
2005-10-20 21:08 ` running linux TCP/IP stack on a real slow machine Kallol Biswas
2005-10-22 7:45 ` [EXPERIMENT,RFC] FAT: Add "flush" option for hotplug devices OGAWA Hirofumi
2005-10-22 10:15 ` Christoph Hellwig
2005-10-22 10:45 ` OGAWA Hirofumi
2005-10-22 10:56 ` Christoph Hellwig
2005-10-22 11:24 ` OGAWA Hirofumi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051020135014.2289fa01.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=hirofumi@mail.parknet.co.jp \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox