From mboxrd@z Thu Jan  1 00:00:00 1970
From: Eric Barton <eeb@whamcloud.com>
Date: Tue, 11 Oct 2011 14:04:29 +0100
Subject: [Lustre-devel] Erratum about indexes in robinhood DB
In-Reply-To: <4E8D8A99.1010804@cea.fr>
References: <4E84346B.8060300@cea.fr> <038c01cc835f$2f30f090$8d92d1b0$@com>
	<4E8D8A99.1010804@cea.fr>
Message-ID: <06d901cc8816$50808600$f1819200$@com>
List-Id: <lustre-devel-lustre.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: lustre-devel@lists.lustre.org

Thomas,

Interesting point about changelog entries requiring a 'stat'.

Nathan, what's your take on making changelogs tell you what has
changed - even if only on "easy" changes?

          Cheers,
                   Eric

> -----Original Message-----
> From: LEIBOVICI Thomas [mailto:thomas.leibovici at cea.fr]
> Sent: 06 October 2011 12:02 PM
> To: Eric Barton
> Cc: lustre-devel at lists.lustre.org
> Subject: Re: Erratum about indexes in robinhood DB
> 
> Hello Eric,
> 
> With a fast enough feeder, the ingest rate robinhood can currently
> sustain is between 50.000/sec and 100.000/sec
> (depending on insert/update/remove ratio) with a basic MySQL DB stored
> on a local disk.
> This can certainly still be improved with MySQL tunings and/or better HW
> and/or enterprise class DB,
> but for now, we notice it is easily high enough for reading a MDT
> changelog stream on a Petaflopic system.
> 
> This rate is actually lower when processing Lustre MDT changelogs (but I
> have no measurement) because of "stat" operations to get file attributes
> (unfortunately, changelogs do not give the new value of what has just
> changed, e.g new uid for a chown operation, new size&mtime with a mtime
> event...)
> SOM will probably improve that point, but it could be a good idea to add
> more info in changelogs.
> 
> Handling chglogs from multiple MDTs is indeed a very interesting point
> to address.
> The main issue is the database scaling in terms of operation rate,
> volume and entry location.
> A solution could be using an existing clustered DB engine (MySQL
> cluster, NOSQL DBs...),
> thus we are going to take a look at the different alternatives and see
> if they could match the need.
> For that, it would be interesting to know how records will be splitted
> into the multiple changelog streams:
> is a given fid always reported by the same stream? what about the parent
> fid (like in create/unlink operations)?
> If you have a document about DNE design, I think it would give a more
> precise idea about
> what event and fid is supposed to be reported by each MDT.
> 
> Thanks,
> Thomas
> 
> Eric Barton wrote:
> > Thomas,
> >
> > Thanks a lot and I hope you don't mind me cc-ing lustre-devel as this
> > seems to be of general interest.
> >
> > Do you have a feel (or measurements :) for the rate at which a changelog
> > can be ingested into robinhood?  And I'm wondering about DNE and multiple
> > changelogs coming from multiple MDTs.  I'd be very interested to know if
> > you've thought about this and have views on what the maximum ingest rate
> > could be and whether there will be issues coordinating/merging events
> > across multiple feeds.
> >
> >           Cheers,
> >                    Eric
> >
> > Eric Barton
> > CTO Whamcloud, Inc.
> >
> >
> >> -----Original Message-----
> >> From: LEIBOVICI Thomas [mailto:thomas.leibovici at cea.fr]
> >> Sent: 29 September 2011 10:04 AM
> >> To: Eric Barton
> >> Subject: Erratum about indexes in robinhood DB
> >>
> >> Hello Eric,
> >>
> >> Re-thinking about your question on indexes in robinhood DB, my answer
> >> was incomplete.
> >> Actually, there are indexes on user/group/type/status, but there are not
> >> on the main table:
> >>
> >> 1) As I said you, on the main table (the one that list all FS entries),
> >> there are as few indexes as possible (just fid as primary key, and
> >> parent fid)
> >> in order to preserve a good insert/update rate on this table whatever
> >> the FS size (the deeper the DB index trees, the slower those requests).
> >>
> >> 2) There is a secondary table where robinhood maintains aggregated
> >> statitics like nbr entries, volume per user/group/type/(hsm)status and
> >> which is updated on the fly.
> >> This one as indexes on quite all its fields, which makes it possible to
> >> get instantaneous stats per user, etc. without penalizing insert/update
> >> rate on main table.
> >> Indexes on this secondary table are less expensive, given that the set
> >> of users is much more resticted that the nbr of entries.
> >>
> >> This time you have a more complete answer.
> >>
> >> Best regards
> >> Thomas
> >>
> >
> >