All of lore.kernel.org
 help / color / mirror / Atom feed
* solving(?) the updatedb problem w/ the kernel cache
@ 2007-07-27 12:55 Douglas J Hunley
  2007-07-27 16:42 ` Ray Lee
  0 siblings, 1 reply; 5+ messages in thread
From: Douglas J Hunley @ 2007-07-27 12:55 UTC (permalink / raw)
  To: slocate; +Cc: linux-kernel

I've been following lkml for a little while (not understanding it all, but 
following nonetheless <g>) and I've noticed that in a lot of the talks about 
schedulers, elevators, and performance, the issue of running updatedb and its 
effects on the kernel's fs cache seems to recur. I've also yet to see anyone 
present a solution that others think is worth pursuing. I'm curious why we're 
trying to solve the problem, when we can simply avoid the problem to begin 
with by making use of inotify and introducing a new user-space 
daemon, 'located'. 

This daemon would be started by the init scripts, register one or more inotify 
listeners based on /etc/updatedb.conf and then go to sleep. Whenever the fs 
is modified, an inotify event is triggered and 'located' wakes up, 
adjusts 'locate.db', and then goes back to sleep. It's about as simple as a 
daemon can get and still be of use, I think.

Add a little 'configure' foo to detect if the system has inotify support and 
if it does, build/install the new daemon. If it doesn't, then don't build it 
and use the old behavior. With the daemon up and running, you can rm the 
updatedb entry in cron.daily and the negative impact it causes on the 
kernel's fs simply goes away. 

You wouldn't need to modify 'locate' itself, as it would still read 
from 'locate.db' like it does now. As an added bonus, you'd now have 
real-time results from 'locate'. If your Sys Admin just 
installed /usr/bin/foo, then 'locate foo' would find it "immediately". No 
more waiting until the next run of 'updatedb' for accurate results.

All the major distros support inotify at this point, I believe. If I could 
code, I'd have attached the daemon here myself :)

I must admit, though, I'm a little perplexed why greater minds than mine 
haven't already implemented this. I must be missing some technical issue. Can 
anyone enlighten me on the same?
-- 
Douglas J Hunley (doug at hunley.homeip.net)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: solving(?) the updatedb problem w/ the kernel cache
  2007-07-27 12:55 solving(?) the updatedb problem w/ the kernel cache Douglas J Hunley
@ 2007-07-27 16:42 ` Ray Lee
  2007-07-27 17:09   ` Michael Tharp
  2007-07-27 17:22   ` Kevin Lindsay
  0 siblings, 2 replies; 5+ messages in thread
From: Ray Lee @ 2007-07-27 16:42 UTC (permalink / raw)
  To: Douglas J Hunley; +Cc: slocate, linux-kernel

On 7/27/07, Douglas J Hunley <doug@hunley.homeip.net> wrote:
> I've been following lkml for a little while (not understanding it all, but
> following nonetheless <g>) and I've noticed that in a lot of the talks about
> schedulers, elevators, and performance, the issue of running updatedb and its
> effects on the kernel's fs cache seems to recur. I've also yet to see anyone
> present a solution that others think is worth pursuing. I'm curious why we're
> trying to solve the problem, when we can simply avoid the problem to begin
> with by making use of inotify and introducing a new user-space
> daemon, 'located'.

inotify doesn't scale for lots of directories. I have about 18,000
directories under ~ on my laptop, and that's with a few source trees
that I use infrequently tarballed up.

But yes, if we had a full filesystem events notifier, then we could
just toss updatedb aside and have the benefit of a live index into the
system. It's been suggested before, at least by me. Other projects
want this as well, such as an on-demand virus scanner, or a live
backup to another site, or beagle/tracker who would like to index
documents on the fly. beagled already uses inotify, I think, but as it
takes over my system (in a bad way) whenever I tried to run it, I had
no choice but to remove it.

Perhaps it was choking on the 18k subdirectories, dunno.

Ray

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: solving(?) the updatedb problem w/ the kernel cache
  2007-07-27 16:42 ` Ray Lee
@ 2007-07-27 17:09   ` Michael Tharp
  2007-07-27 17:24     ` J. Bruce Fields
  2007-07-27 17:22   ` Kevin Lindsay
  1 sibling, 1 reply; 5+ messages in thread
From: Michael Tharp @ 2007-07-27 17:09 UTC (permalink / raw)
  To: Ray Lee; +Cc: Douglas J Hunley, slocate, linux-kernel

Ray Lee wrote:
> But yes, if we had a full filesystem events notifier, then we could
> just toss updatedb aside and have the benefit of a live index into the
> system. It's been suggested before, at least by me. Other projects
> want this as well, such as an on-demand virus scanner, or a live
> backup to another site, or beagle/tracker who would like to index
> documents on the fly. beagled already uses inotify, I think, but as it
> takes over my system (in a bad way) whenever I tried to run it, I had
> no choice but to remove it.

Beagle's problem is that it inspects the file contents, often far too
closely. I, too, had to uninstall it after it started indexing 40GB raw
huffyuv video files (probably treating them as text) and driving load
averages through the roof. Just watching for structure changes won't be
nearly as painful, assuming inotify can handle watching the entire
filesystem tree.


-- m. tharp

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: solving(?) the updatedb problem w/ the kernel cache
  2007-07-27 16:42 ` Ray Lee
  2007-07-27 17:09   ` Michael Tharp
@ 2007-07-27 17:22   ` Kevin Lindsay
  1 sibling, 0 replies; 5+ messages in thread
From: Kevin Lindsay @ 2007-07-27 17:22 UTC (permalink / raw)
  To: Ray Lee; +Cc: Douglas J Hunley, slocate, linux-kernel

On Fri, Jul 27, 2007 at 09:42:27AM -0700, Ray Lee wrote:

> On 7/27/07, Douglas J Hunley <doug@hunley.homeip.net> wrote:
> > I've been following lkml for a little while (not understanding it all, but
> > following nonetheless <g>) and I've noticed that in a lot of the talks about
> > schedulers, elevators, and performance, the issue of running updatedb and its
> > effects on the kernel's fs cache seems to recur. I've also yet to see anyone
> > present a solution that others think is worth pursuing. I'm curious why we're
> > trying to solve the problem, when we can simply avoid the problem to begin
> > with by making use of inotify and introducing a new user-space
> > daemon, 'located'.
> 
> inotify doesn't scale for lots of directories. I have about 18,000
> directories under ~ on my laptop, and that's with a few source trees
> that I use infrequently tarballed up.
> 
> But yes, if we had a full filesystem events notifier, then we could
> just toss updatedb aside and have the benefit of a live index into the
> system. It's been suggested before, at least by me. Other projects
> want this as well, such as an on-demand virus scanner, or a live
> backup to another site, or beagle/tracker who would like to index
> documents on the fly. beagled already uses inotify, I think, but as it
> takes over my system (in a bad way) whenever I tried to run it, I had
> no choice but to remove it.
> 
> Perhaps it was choking on the 18k subdirectories, dunno.

The interface for inotify requires you to explicity watch files and folders.
As Ray suggests, I am also skeptical that using inotify to watch 18k of inodes
is very efficient, although it would be nice to be wrong. Possibly someone
needs to take a peak into OS X's fsevents mechanism?

The other problems with this approach is that the locate DB uses incremental
encoding. Each change to the filesystem may require large protions or possibly
a complete re-encoding of the database. An alterntive DB format would need to
be considered.

Kevin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: solving(?) the updatedb problem w/ the kernel cache
  2007-07-27 17:09   ` Michael Tharp
@ 2007-07-27 17:24     ` J. Bruce Fields
  0 siblings, 0 replies; 5+ messages in thread
From: J. Bruce Fields @ 2007-07-27 17:24 UTC (permalink / raw)
  To: Michael Tharp; +Cc: Ray Lee, Douglas J Hunley, slocate, linux-kernel

On Fri, Jul 27, 2007 at 01:09:13PM -0400, Michael Tharp wrote:
> Ray Lee wrote:
> > But yes, if we had a full filesystem events notifier, then we could
> > just toss updatedb aside and have the benefit of a live index into the
> > system. It's been suggested before, at least by me. Other projects
> > want this as well, such as an on-demand virus scanner, or a live
> > backup to another site, or beagle/tracker who would like to index
> > documents on the fly. beagled already uses inotify, I think, but as it
> > takes over my system (in a bad way) whenever I tried to run it, I had
> > no choice but to remove it.
> 
> Beagle's problem is that it inspects the file contents, often far too
> closely. I, too, had to uninstall it after it started indexing 40GB raw
> huffyuv video files (probably treating them as text) and driving load
> averages through the roof. Just watching for structure changes won't be
> nearly as painful, assuming inotify can handle watching the entire
> filesystem tree.

Events notification only helps while you've got someone around to listen
to the events.  If you reboot (or even just log out?  I don't know),
then when you come back the only completely reliable way to find out
what's changed may be to re-read everything.

So I'd think that far more important and basic than events notification
would be ways to reliably tell when a file have changed by looking just
at the attributes.

You can just use everything that "stat" gives you and figure that if a
file is still at the same path, with the same ctime, mtime, size,
permissions, owner, inode number, etc., then it's probably the same
file.

If that's not enough, then maybe you want a change attribute (that's
guaranteed to change even when changes happen within less than the
granularity of ctime), and generation number (that's bumped whenever an
inode number is reused).

--b.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-07-27 17:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-27 12:55 solving(?) the updatedb problem w/ the kernel cache Douglas J Hunley
2007-07-27 16:42 ` Ray Lee
2007-07-27 17:09   ` Michael Tharp
2007-07-27 17:24     ` J. Bruce Fields
2007-07-27 17:22   ` Kevin Lindsay

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.