* ext3: kjournald and spun-down disks @ 2001-11-24 1:10 Oliver Xymoron 2001-11-24 1:25 ` Andrew Morton 0 siblings, 1 reply; 12+ messages in thread From: Oliver Xymoron @ 2001-11-24 1:10 UTC (permalink / raw) To: linux-kernel My laptop drive seems to be waking up more often today and I suspect it's somehow ext3/kjournald that's to blame. Does it obey the timings in /proc/sys/vm/bdflush or does it have its own flush timer? There's a more general problem with VM on laptops which is that the system doesn't have any notion of spun-down disks. Flush intervals should be short when the disk is running and long when it isn't and decisions about which pages to discard or swap might be improvable. Pre-emptive swap when the disk is spun down is a loss.. -- "Love the dolphins," she advised him. "Write by W.A.S.T.E.." ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext3: kjournald and spun-down disks 2001-11-24 1:10 ext3: kjournald and spun-down disks Oliver Xymoron @ 2001-11-24 1:25 ` Andrew Morton 2001-11-24 1:58 ` Oliver Xymoron 2001-11-26 23:25 ` Daniel Kobras 0 siblings, 2 replies; 12+ messages in thread From: Andrew Morton @ 2001-11-24 1:25 UTC (permalink / raw) To: Oliver Xymoron; +Cc: linux-kernel Oliver Xymoron wrote: > > My laptop drive seems to be waking up more often today and I suspect it's > somehow ext3/kjournald that's to blame. Does it obey the timings in > /proc/sys/vm/bdflush or does it have its own flush timer? It has its own flush timer. This is something we need to crunch on and think about. There's an untested patch here which may suffice. > There's a more general problem with VM on laptops which is that the system > doesn't have any notion of spun-down disks. Flush intervals should be > short when the disk is running and long when it isn't and decisions about > which pages to discard or swap might be improvable. Pre-emptive swap when > the disk is spun down is a loss.. Yup. The current VM is a bit too swap-happy, IMO. In try_to_free_pages(), replace `priority = DEF_PRIORITY' with `priority = DEF_PRIORITY + 2'. Also, if we had appropriate hooks into the request layer, we could detect when the disk was being spun up for a read, and opporunistically flush out any pending writes. Tell me if this is joyful: --- linux-2.4.15/fs/buffer.c Thu Nov 22 23:02:58 2001 +++ linux-akpm/fs/buffer.c Fri Nov 23 17:21:04 2001 @@ -119,6 +119,12 @@ union bdflush_param { int bdflush_min[N_PARAM] = { 0, 10, 5, 25, 0, 1*HZ, 0, 0, 0}; int bdflush_max[N_PARAM] = {100,50000, 20000, 20000,10000*HZ, 6000*HZ, 100, 0, 0}; +int dirty_buffer_flush_interval(void) +{ + return bdf_prm.b_un.interval; +} +EXPORT_SYMBOL(dirty_buffer_flush_interval); + void unlock_buffer(struct buffer_head *bh) { clear_bit(BH_Wait_IO, &bh->b_state); --- linux-2.4.15/fs/jbd/transaction.c Thu Nov 22 23:02:59 2001 +++ linux-akpm/fs/jbd/transaction.c Fri Nov 23 17:21:37 2001 @@ -43,6 +43,8 @@ extern spinlock_t journal_datalist_lock; * processes trying to touch the journal while it is in transition. */ +extern int dirty_buffer_flush_interval(void); + static transaction_t * get_transaction (journal_t * journal, int is_try) { transaction_t * transaction; @@ -56,7 +58,7 @@ static transaction_t * get_transaction ( transaction->t_journal = journal; transaction->t_state = T_RUNNING; transaction->t_tid = journal->j_transaction_sequence++; - transaction->t_expires = jiffies + journal->j_commit_interval; + transaction->t_expires = jiffies + dirty_buffer_flush_interval(); /* Set up the commit timer for the new transaction. */ J_ASSERT (!journal->j_commit_timer_active); ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext3: kjournald and spun-down disks 2001-11-24 1:25 ` Andrew Morton @ 2001-11-24 1:58 ` Oliver Xymoron 2001-11-24 2:32 ` Andrew Morton 2001-11-26 23:25 ` Daniel Kobras 1 sibling, 1 reply; 12+ messages in thread From: Oliver Xymoron @ 2001-11-24 1:58 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Fri, 23 Nov 2001, Andrew Morton wrote: > Oliver Xymoron wrote: > > > > My laptop drive seems to be waking up more often today and I suspect it's > > somehow ext3/kjournald that's to blame. Does it obey the timings in > > /proc/sys/vm/bdflush or does it have its own flush timer? > > It has its own flush timer. This is something we need to crunch > on and think about. Ok. I think we'll probably end up needing per-device flush timers. Flushes to jffs should work differently than flushes to disk, or to network attached storage (iSCSI, nbd). > > There's a more general problem with VM on laptops which is that the system > > doesn't have any notion of spun-down disks. Flush intervals should be > > short when the disk is running and long when it isn't and decisions about > > which pages to discard or swap might be improvable. Pre-emptive swap when > > the disk is spun down is a loss.. > > Yup. The current VM is a bit too swap-happy, IMO. In try_to_free_pages(), > replace `priority = DEF_PRIORITY' with `priority = DEF_PRIORITY + 2'. > > Also, if we had appropriate hooks into the request layer, we could detect > when the disk was being spun up for a read, and opporunistically flush > out any pending writes. I think if the disk wakes up, then the time to next flush gets shortened from long_interval to short_interval. If short_interval makes the next flush in the past, it happens now. But if we sleep the disk and wake it up immediately, we don't necessarily want to trigger a flush. > Tell me if this is joyful: Haven't tried it yet, but I'm afraid I don't see what makes it actually sync with the dirty buffer flush. Wouldn't it be better to export a chain of flush funcs hung off a timer? -- "Love the dolphins," she advised him. "Write by W.A.S.T.E.." ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext3: kjournald and spun-down disks 2001-11-24 1:58 ` Oliver Xymoron @ 2001-11-24 2:32 ` Andrew Morton 2001-11-26 3:15 ` Oliver Xymoron 0 siblings, 1 reply; 12+ messages in thread From: Andrew Morton @ 2001-11-24 2:32 UTC (permalink / raw) To: Oliver Xymoron; +Cc: linux-kernel Oliver Xymoron wrote: > > > Tell me if this is joyful: > > Haven't tried it yet, but I'm afraid I don't see what makes it actually > sync with the dirty buffer flush. Wouldn't it be better to export a chain > of flush funcs hung off a timer? It doesn't sync with kupdate. If you want to do that, just defeat the journal timer altogether. So: transaction->t_expires = jiffies + 1000000000; in get_transaction(). That way, kupdate's write_super() will run a commit every bdf_prm.b_un.interval jiffies. - ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext3: kjournald and spun-down disks 2001-11-24 2:32 ` Andrew Morton @ 2001-11-26 3:15 ` Oliver Xymoron 2001-11-26 3:34 ` Andrew Morton 0 siblings, 1 reply; 12+ messages in thread From: Oliver Xymoron @ 2001-11-26 3:15 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Fri, 23 Nov 2001, Andrew Morton wrote: > Oliver Xymoron wrote: > > > > > Tell me if this is joyful: > > > > Haven't tried it yet, but I'm afraid I don't see what makes it actually > > sync with the dirty buffer flush. Wouldn't it be better to export a chain > > of flush funcs hung off a timer? > > It doesn't sync with kupdate. > > If you want to do that, just defeat the journal timer altogether. So: > > transaction->t_expires = jiffies + 1000000000; > > in get_transaction(). That way, kupdate's write_super() will > run a commit every bdf_prm.b_un.interval jiffies. Ok, so what's the theory behind the journal timer? Why would we want ext3 journal flushed more or less often than ext2 metadata given that they're of equivalent importance? -- "Love the dolphins," she advised him. "Write by W.A.S.T.E.." ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext3: kjournald and spun-down disks 2001-11-26 3:15 ` Oliver Xymoron @ 2001-11-26 3:34 ` Andrew Morton 2001-11-26 15:22 ` Oliver Xymoron 0 siblings, 1 reply; 12+ messages in thread From: Andrew Morton @ 2001-11-26 3:34 UTC (permalink / raw) To: Oliver Xymoron; +Cc: linux-kernel Oliver Xymoron wrote: > > Ok, so what's the theory behind the journal timer? Why would we want > ext3 journal flushed more or less often than ext2 metadata given that > they're of equivalent importance? umm, err.. If your machine crashes, ext3 will restore its state to that which pertained between zero and five seconds before the crash. With ext2+fsck, things are not as clear. Your data will be restored to that which pertained from zero to thirty seconds prior to crash. inodes and superblock to that which pertained from zero to thirty five seconds before the crash, stuff like that. A five second window is short enough for you to be confident that everything you want is still there. With thirty seconds, uncertainty creeps in. Yes, it needs to be configurable. - ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext3: kjournald and spun-down disks 2001-11-26 3:34 ` Andrew Morton @ 2001-11-26 15:22 ` Oliver Xymoron 0 siblings, 0 replies; 12+ messages in thread From: Oliver Xymoron @ 2001-11-26 15:22 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Sun, 25 Nov 2001, Andrew Morton wrote: > Oliver Xymoron wrote: > > > > Ok, so what's the theory behind the journal timer? Why would we want > > ext3 journal flushed more or less often than ext2 metadata given that > > they're of equivalent importance? > > umm, err.. If your machine crashes, ext3 will restore its state > to that which pertained between zero and five seconds before the crash. > > With ext2+fsck, things are not as clear. Your data will be restored > to that which pertained from zero to thirty seconds prior to crash. And that's my point exactly. In terms of integrity, each timer serves the same purpose - get the filesystem on disk in sync with what's in memory. Obviously ext3 does a better job of this than ext2 in terms of recovering from partial transactions, but in both cases the flush is accomplishing the same thing. I can see no a priori reason why the ext3 journal flush would be timed differently than ext2 journal flush. If the flush time for ext3 should be shorter, then so should the time for everything else. See? -- "Love the dolphins," she advised him. "Write by W.A.S.T.E.." ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext3: kjournald and spun-down disks 2001-11-24 1:25 ` Andrew Morton 2001-11-24 1:58 ` Oliver Xymoron @ 2001-11-26 23:25 ` Daniel Kobras 2001-11-26 23:40 ` Andrew Morton ` (2 more replies) 1 sibling, 3 replies; 12+ messages in thread From: Daniel Kobras @ 2001-11-26 23:25 UTC (permalink / raw) To: Andrew Morton; +Cc: Oliver Xymoron, linux-kernel On Fri, Nov 23, 2001 at 05:25:46PM -0800, Andrew Morton wrote: > Also, if we had appropriate hooks into the request layer, we could detect > when the disk was being spun up for a read, and opporunistically flush > out any pending writes. Actually you can't. SCSI spinup code isn't very useful anyway, and IDE disks mostly handle spinup themselves. The kernel has too issue a reset to get a disk back alive from sleep mode, but revival from standby doesn't involve the kernel at all. When using the disk's internal timer, it isn't involved in spindown either. Teaching the request layer about disk state might therefore turn out to become rather messy, I suspect. > Tell me if this is joyful: [...] > - transaction->t_expires = jiffies + journal->j_commit_interval; > + transaction->t_expires = jiffies + dirty_buffer_flush_interval(); This change doesn't take care of kupdated's most interesting feature, i.e. that you can entirely stop it (with a flush interval of zero and/or a SIGSTOP). Now, if kjournald honoured SIGSTOP/SIGCONT, I could teach noflushd to handle the spindown issue in userland. Uh, at least for one small detail: Is there a way to tell which kjournald process is associated to which partition? A fake cmdline, or an fd to the partition's device node that shows up in /proc/<pid>/fd would indeed be quite helpful. Regards, Daniel. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext3: kjournald and spun-down disks 2001-11-26 23:25 ` Daniel Kobras @ 2001-11-26 23:40 ` Andrew Morton 2001-11-27 0:18 ` Andreas Dilger 2001-11-27 0:03 ` Andre Hedrick 2001-11-27 0:21 ` Oliver Xymoron 2 siblings, 1 reply; 12+ messages in thread From: Andrew Morton @ 2001-11-26 23:40 UTC (permalink / raw) To: Daniel Kobras; +Cc: Oliver Xymoron, linux-kernel Daniel Kobras wrote: > > On Fri, Nov 23, 2001 at 05:25:46PM -0800, Andrew Morton wrote: > > Also, if we had appropriate hooks into the request layer, we could detect > > when the disk was being spun up for a read, and opporunistically flush > > out any pending writes. > > Actually you can't. SCSI spinup code isn't very useful anyway, and IDE disks > mostly handle spinup themselves. The kernel has too issue a reset to get a > disk back alive from sleep mode, but revival from standby doesn't involve > the kernel at all. When using the disk's internal timer, it isn't involved in > spindown either. Teaching the request layer about disk state might therefore > turn out to become rather messy, I suspect. Much simpler approach: if (we're about to read from the disk) { if (we have dirty data which is > 10 seconds old) { write_it_now(); } } > > Tell me if this is joyful: > [...] > > - transaction->t_expires = jiffies + journal->j_commit_interval; > > + transaction->t_expires = jiffies + dirty_buffer_flush_interval(); > > This change doesn't take care of kupdated's most interesting feature, i.e. > that you can entirely stop it (with a flush interval of zero and/or a > SIGSTOP). yup. > Now, if kjournald honoured SIGSTOP/SIGCONT, I could teach noflushd > to handle the spindown issue in userland. Uh, at least for one small detail: > Is there a way to tell which kjournald process is associated to which > partition? A fake cmdline, or an fd to the partition's device node that > shows up in /proc/<pid>/fd would indeed be quite helpful. Andreas has a patch which puts the device major/minor into kjournald's process name. Simply setting the journal timer to infinity happens to work out OK. Commits are triggered by kupdate. This is because kupdate's superblock writeout runs a commit. Because ext3 is unable to distinguish it from a sys_sync(). Sigh. - ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext3: kjournald and spun-down disks 2001-11-26 23:40 ` Andrew Morton @ 2001-11-27 0:18 ` Andreas Dilger 0 siblings, 0 replies; 12+ messages in thread From: Andreas Dilger @ 2001-11-27 0:18 UTC (permalink / raw) To: Andrew Morton; +Cc: Daniel Kobras, Oliver Xymoron, linux-kernel On Nov 26, 2001 15:40 -0800, Andrew Morton wrote: > Daniel Kobras wrote: > > Is there a way to tell which kjournald process is associated to which > > partition? A fake cmdline, or an fd to the partition's device node that > > shows up in /proc/<pid>/fd would indeed be quite helpful. > > Andreas has a patch which puts the device major/minor into kjournald's > process name. It is in CVS HEAD, but appears not to be in the branches. It is below. This should not have a problem with the 16-byte command length, because kdevname() only returns strings of the form mm:nn, so my system has: root 8 1 0 08:58 ? 00:00:11 [kjournald-03:07] root 39 1 0 08:58 ? 00:00:00 [kjournald-03:05] root 40 1 0 08:58 ? 00:00:00 [kjournald-03:09] root 41 1 0 08:58 ? 00:00:00 [kjournald-03:0a] root 1219 1 0 09:23 ? 00:00:02 [kjournald-3a:01] Which are all within 16 bytes (including NUL), until we get larger major/minor numbers. Cheers, Andreas =========================================================================== diff -u -u -r1.11.2.2 -r1.52 --- fs/jbd/journal.c 2001/11/11 05:11:06 1.11.2.2 +++ fs/jbd/journal.c 2001/11/27 00:10:39 1.52 @@ -210,7 +176,7 @@ recalc_sigpending(current); spin_unlock_irq(¤t->sigmask_lock); - sprintf(current->comm, "kjournald"); + sprintf(current->comm, "kjournald-%s", kdevname(journal->j_dev)); /* Set up an interval timer which can be used to trigger a commit wakeup after the commit interval expires */ -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext3: kjournald and spun-down disks 2001-11-26 23:25 ` Daniel Kobras 2001-11-26 23:40 ` Andrew Morton @ 2001-11-27 0:03 ` Andre Hedrick 2001-11-27 0:21 ` Oliver Xymoron 2 siblings, 0 replies; 12+ messages in thread From: Andre Hedrick @ 2001-11-27 0:03 UTC (permalink / raw) To: Daniel Kobras; +Cc: Andrew Morton, Oliver Xymoron, linux-kernel On Tue, 27 Nov 2001, Daniel Kobras wrote: > On Fri, Nov 23, 2001 at 05:25:46PM -0800, Andrew Morton wrote: > > Also, if we had appropriate hooks into the request layer, we could detect > > when the disk was being spun up for a read, and opporunistically flush > > out any pending writes. > > Actually you can't. SCSI spinup code isn't very useful anyway, and IDE disks > mostly handle spinup themselves. The kernel has too issue a reset to get a > disk back alive from sleep mode, but revival from standby doesn't involve > the kernel at all. When using the disk's internal timer, it isn't involved in > spindown either. Teaching the request layer about disk state might therefore > turn out to become rather messy, I suspect. No messier than corrupted data -- > > Tell me if this is joyful: > [...] > > - transaction->t_expires = jiffies + journal->j_commit_interval; > > + transaction->t_expires = jiffies + dirty_buffer_flush_interval(); > > This change doesn't take care of kupdated's most interesting feature, i.e. > that you can entirely stop it (with a flush interval of zero and/or a > SIGSTOP). Now, if kjournald honoured SIGSTOP/SIGCONT, I could teach noflushd > to handle the spindown issue in userland. Uh, at least for one small detail: > Is there a way to tell which kjournald process is associated to which > partition? A fake cmdline, or an fd to the partition's device node that > shows up in /proc/<pid>/fd would indeed be quite helpful. LOL The low-level spindles can not walk backwards to find a partition because of the bogus aliased/virtual LBA(0)s that litter a spindle. The LBA(0) count == Number of Partitions + 1; This is utter crap but it is scheduled to be fixed in 2.5, now that it has started. Solution : Do not partition use the entire raw device but that will not work because of the real LBA 0 -- EEK Cheers, Andre Hedrick CEO/President, LAD Storage Consulting Group Linux ATA Development Linux Disk Certification Project ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ext3: kjournald and spun-down disks 2001-11-26 23:25 ` Daniel Kobras 2001-11-26 23:40 ` Andrew Morton 2001-11-27 0:03 ` Andre Hedrick @ 2001-11-27 0:21 ` Oliver Xymoron 2 siblings, 0 replies; 12+ messages in thread From: Oliver Xymoron @ 2001-11-27 0:21 UTC (permalink / raw) To: Daniel Kobras; +Cc: Andrew Morton, linux-kernel On Tue, 27 Nov 2001, Daniel Kobras wrote: > On Fri, Nov 23, 2001 at 05:25:46PM -0800, Andrew Morton wrote: > > Also, if we had appropriate hooks into the request layer, we could detect > > when the disk was being spun up for a read, and opporunistically flush > > out any pending writes. > > Actually you can't. SCSI spinup code isn't very useful anyway, and IDE disks > mostly handle spinup themselves. The kernel has too issue a reset to get a > disk back alive from sleep mode, but revival from standby doesn't involve > the kernel at all. When using the disk's internal timer, it isn't involved in > spindown either. Teaching the request layer about disk state might therefore > turn out to become rather messy, I suspect. Depends on how far you want to take it. The kernel can of course query to discover whether a device is on standby and delay writes if possible before actually initiating a flush. -- "Love the dolphins," she advised him. "Write by W.A.S.T.E.." ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2001-11-27 0:21 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-11-24 1:10 ext3: kjournald and spun-down disks Oliver Xymoron 2001-11-24 1:25 ` Andrew Morton 2001-11-24 1:58 ` Oliver Xymoron 2001-11-24 2:32 ` Andrew Morton 2001-11-26 3:15 ` Oliver Xymoron 2001-11-26 3:34 ` Andrew Morton 2001-11-26 15:22 ` Oliver Xymoron 2001-11-26 23:25 ` Daniel Kobras 2001-11-26 23:40 ` Andrew Morton 2001-11-27 0:18 ` Andreas Dilger 2001-11-27 0:03 ` Andre Hedrick 2001-11-27 0:21 ` Oliver Xymoron
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox