public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* RFC: Exporting NOCMTIME to userspace
@ 2010-06-21  6:36 Aleksandr Koltsoff
  2010-06-21  8:56 ` Andi Kleen
  0 siblings, 1 reply; 4+ messages in thread
From: Aleksandr Koltsoff @ 2010-06-21  6:36 UTC (permalink / raw)
  To: linux-kernel

Hello all,

I'm currently investigating various techniques to lessen the block erase
load on consumer nand flash devices (usb-ftl + nand and sd-ftl + nand).
While SSDs are of no current interest to me, this use case applies to
them as well.

The use case I'm working with is a set of pre-allocated files that get
updated periodically with data that overwrites some of the existing
data. The files never grow/shrink. (rrdtool is a good example of such
behaviour).

I've been looking for a mechanism to avoid writes going to the inodes of
the files, but have been unsuccessful so far. At least ctime will get
updated periodically.

In this use case, the timestamps are really irrelevant (the file content
contains timestamps), and I'd like to avoid updating the inodes
all-together. What happens now is extra block erasures when they aren't
actually needed or wanted.

While there is a flag in the inode structure to stop m/ctime updates,
this flag is not currently exported to userspace (similar to O_NOATIME).
I feel that the least intrusive way would be such a flag, since this is
certainly an application/file-specific problem and a mount-point flag
would be not so useful.

How do people feel about this? I'm aware of some of the ramifications of
not updating ctime (regarding dumping filesystem changes, and making
life more difficult for rsync/mtime-based backups). However, I still
feel quite strongly that providing an option for applications to avoid
extra nand erases would be a good thing.

On the other hand, if someone can suggest a way to avoid timestamp
updates/causing inode writes, I'm all ears and eyes. (using the
block-layer directly or writing a custom fs is not really an elegant
solution, IMO).

Thank you for your time & best regards,

Aleksandr Koltsoff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: Exporting NOCMTIME to userspace
  2010-06-21  6:36 RFC: Exporting NOCMTIME to userspace Aleksandr Koltsoff
@ 2010-06-21  8:56 ` Andi Kleen
  2010-06-21  9:12   ` Aleksandr Koltsoff
  0 siblings, 1 reply; 4+ messages in thread
From: Andi Kleen @ 2010-06-21  8:56 UTC (permalink / raw)
  To: Aleksandr Koltsoff; +Cc: linux-kernel

Aleksandr Koltsoff <aleksandr.koltsoff@ebts.fi> writes:
>
> On the other hand, if someone can suggest a way to avoid timestamp
> updates/causing inode writes, I'm all ears and eyes. (using the
> block-layer directly or writing a custom fs is not really an elegant
> solution, IMO).

I recently looked at this for some other reason. One of the reasons
c/m time became a problem recently are sub second time stamps
in newer file systems, which can be a performance problem
on some extreme loads (updating the time stamp requires taking
locks and takes CPU time)

I think what would be better would be to have flush intervals
that specify that m/c time are only flushed with longer 
intervals (similar to the deferred atime that's now in there)

This would still cause the inode to be written if it gets flushed from
memory on low memory and occasionally depending on the interval, but
most of the writes would be gone. All still with the same semantics.

I think doing it this way would be preferable over just
disabling it.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: Exporting NOCMTIME to userspace
  2010-06-21  8:56 ` Andi Kleen
@ 2010-06-21  9:12   ` Aleksandr Koltsoff
  2010-06-21 10:32     ` Andi Kleen
  0 siblings, 1 reply; 4+ messages in thread
From: Aleksandr Koltsoff @ 2010-06-21  9:12 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

Andi Kleen wrote:
> Aleksandr Koltsoff <aleksandr.koltsoff@ebts.fi> writes:
>> On the other hand, if someone can suggest a way to avoid timestamp
>> updates/causing inode writes, I'm all ears and eyes. (using the
>> block-layer directly or writing a custom fs is not really an elegant
>> solution, IMO).
> 
> I think what would be better would be to have flush intervals
> that specify that m/c time are only flushed with longer 
> intervals (similar to the deferred atime that's now in there)
> 
> This would still cause the inode to be written if it gets flushed from
> memory on low memory and occasionally depending on the interval, but
> most of the writes would be gone. All still with the same semantics.

While this might solve the performance aspect of the problem, it will
only migitate the reliability aspect with NANDs, since the inodes will
eventually be flushed anyway (thus causing irreversible wear).

Also, a tunable like you're suggesting would affect all files on a
single filesystem, instead of just a subset of files. Having an option
to "disable" m/ctime updates per file would be optimal in our case,
since we're talking about several years of runtime with the use case.
There are other issues with rrdtool which make this hard, but those are
all solvable without kernel modifications.

That said, the migitative solution would be better than none :-).

Regards,

ak.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: Exporting NOCMTIME to userspace
  2010-06-21  9:12   ` Aleksandr Koltsoff
@ 2010-06-21 10:32     ` Andi Kleen
  0 siblings, 0 replies; 4+ messages in thread
From: Andi Kleen @ 2010-06-21 10:32 UTC (permalink / raw)
  To: Aleksandr Koltsoff; +Cc: Andi Kleen, linux-kernel

> While this might solve the performance aspect of the problem, it will
> only migitate the reliability aspect with NANDs, since the inodes will
> eventually be flushed anyway (thus causing irreversible wear).

As long as you don't run out of memory it will happen very rarely.
Completely elimintating inode writes is not a good goal imho.

Iff you don't want inodes at all then perhaps consider using a LVM volume or similar 
instead of a file system.

> Also, a tunable like you're suggesting would affect all files on a
> single filesystem, instead of just a subset of files. Having an option
> to "disable" m/ctime updates per file would be optimal in our case,
> since we're talking about several years of runtime with the use case.
> There are other issues with rrdtool which make this hard, but those are
> all solvable without kernel modifications.

Yes, but the nice thing is that the semantics are the same, so unless
you crash you won't really notice.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-06-21 10:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-21  6:36 RFC: Exporting NOCMTIME to userspace Aleksandr Koltsoff
2010-06-21  8:56 ` Andi Kleen
2010-06-21  9:12   ` Aleksandr Koltsoff
2010-06-21 10:32     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox