public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: jlnance@unity.ncsu.edu
Cc: "Pavel Machek" <pavel@ucw.cz>,
	"J�rn Engel" <joern@wohnheim.fh-wedel.de>,
	mj@ucw.cz, jack@ucw.cz,
	"Patrick J. LoPresti" <patl@users.sourceforge.net>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] cowlinks v2
Date: Mon, 5 Apr 2004 13:35:56 +0100	[thread overview]
Message-ID: <20040405123556.GB19842@mail.shareable.org> (raw)
In-Reply-To: <20040405111033.GA1456@ncsu.edu>

jlnance@unity.ncsu.edu wrote:
> Perhaps diff would run faster but that
> seems like a very special case thing, and diff will certainly work w/o it.

We are talking about a difference between 20 minutes and 1 second.
It's quite significant, when it's a regular part of your diffing &
patching day.

I agree with your general sentiment that we shouldn't expose
filesystem details, e.g. a 32-bit integer.  See below for an
alternative interface.

> Tar might also be faster creating archives if it had this information
> available.  However to make tar useful wrt cowlinks, it will need to be
> able to create these links at extract time from tarfiles which were created
> on non-cowlink filesystems, so I don't think there is a pressing need.

I agree.  The purpose of cowlinks is to be semantically invisible.  If tar
or some other archiver/transferer wanted to use this information, it
should really be checking for equivalent files in general (like cmp)
and use this call as an optimisation only.

Btw, when we treat cowlinks as a semantically invisible, there is no
problem searching an entire filesystem for files with identical
content and linking them together to save space, in a cron job.  It's
invisible to applications, except that space is saved and sometimes
the first write takes longer.

That still permits the get_data_id() optimisation, but that now
strictly means "kernel knows and returns a unique id of the data
(unique in this filesystem)".

Instead of get_data_id(), we'd use a POSIX attribute called "data-id"
returned by getxattr().  An absence of the attribute indicates that no
data-id is known.  Otherwise, it's a unique id for that data in the
current filesystem.

It's a short byte string (another reason for making it a POSIX
attribute).  On ext2/ext3, it's just the bytes of the shared inode
number plus a filesystem-wide generation number.  On a hypothetical
httpfs, it could be the host name and ETag (a strong validator).  On
any filesystem, it could be the SHA1 digest if that is known.  It
would have the nice property of working over NFSv4, too.

-- Jamie

  parent reply	other threads:[~2004-04-05 12:36 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-20  8:34 [PATCH] cowlinks v2 Jörn Engel
2004-03-20  8:49 ` Andrew Morton
2004-03-20 11:27   ` Jörn Engel
2004-03-20 19:28     ` Andrew Morton
2004-03-21 12:43       ` Jörn Engel
2004-03-21 18:53       ` Jörn Engel
     [not found] ` <mit.lcs.mail.linux-kernel/20040320083411.GA25934@wohnheim.fh-wedel.de>
2004-03-20 15:03   ` Patrick J. LoPresti
2004-03-20 15:23     ` Jörn Engel
2004-03-29 17:12       ` Pavel Machek
2004-03-29 21:05         ` Patrick J. LoPresti
2004-03-29 23:16           ` Pavel Machek
2004-03-31 14:34             ` Jamie Lokier
2004-03-31 14:45               ` Pavel Machek
2004-03-31 15:20                 ` Jamie Lokier
2004-04-02 11:44                 ` Tim Connors
2004-04-02 16:54             ` Jörn Engel
2004-04-02 18:01               ` Pavel Machek
2004-04-02 18:17                 ` Jörn Engel
2004-04-02 18:23                   ` Pavel Machek
2004-04-02 19:28                     ` Ross Biro
2004-04-02 21:35                       ` Pavel Machek
2004-04-05  8:12                       ` Jörn Engel
2004-04-05  8:19                         ` Pavel Machek
2004-04-05  8:45                           ` Jörn Engel
2004-04-02 20:09                     ` Jamie Lokier
2004-04-02 21:39                       ` Pavel Machek
2004-04-02 22:00                         ` Chris Friesen
2004-04-03  0:49                           ` Jamie Lokier
2004-04-03  8:23                             ` Pavel Machek
2004-04-03 13:15                               ` Jamie Lokier
2004-04-05  8:19                                 ` Jörn Engel
2004-04-05  8:22                                   ` Pavel Machek
2004-04-03  0:46                         ` Jamie Lokier
2004-04-03  1:04                         ` Jamie Lokier
2004-04-03  1:21                           ` Erik Andersen
2004-04-03  1:59                             ` Jamie Lokier
2004-04-03  3:55                               ` Ross Biro
2004-04-03  9:09                               ` Pavel Machek
2004-04-03 13:27                                 ` Jamie Lokier
2004-04-03 18:39                           ` Eric W. Biederman
2004-04-03 19:43                             ` Jamie Lokier
2004-04-03 20:30                               ` Eric W. Biederman
2004-04-03 21:59                                 ` Jamie Lokier
2004-04-04  8:15                                   ` Eric W. Biederman
2004-04-05  8:35                               ` Jörn Engel
2004-04-05  9:15                                 ` Eric W. Biederman
2004-04-05  9:18                                   ` Jörn Engel
2004-04-05 11:43                                   ` Pavel Machek
2004-04-05 12:17                                     ` Jamie Lokier
2004-04-05 12:39                                   ` Jamie Lokier
2004-04-05 12:41                                 ` Jamie Lokier
2004-04-05 18:03                                   ` Jörn Engel
2004-04-05 11:10                         ` jlnance
2004-04-05 11:46                           ` Pavel Machek
2004-04-05 12:35                           ` Jamie Lokier [this message]
2004-04-05  8:43                     ` Jörn Engel
2004-04-03 19:47               ` Eric W. Biederman
2004-04-05  8:54                 ` Jörn Engel
2004-04-05  9:07                   ` Eric W. Biederman
2004-03-20 16:48     ` Davide Libenzi
2004-03-21 12:57       ` Jörn Engel
2004-03-21 17:59         ` Davide Libenzi
2004-03-21 18:14           ` Jörn Engel
2004-03-21 20:26             ` Davide Libenzi
2004-03-21 20:35               ` Jörn Engel
2004-03-22  0:18             ` Eric W. Biederman
2004-03-22  0:25               ` Davide Libenzi
2004-03-22  5:07                 ` Eric W. Biederman
2004-03-22  5:11                   ` Davide Libenzi
2004-03-22 11:20                     ` Eric W. Biederman
2004-03-22 16:02                       ` Davide Libenzi
2004-03-25 17:49               ` Jamie Lokier
2004-03-25 18:06                 ` Eric W. Biederman
2004-03-25 19:43                   ` Jamie Lokier
2004-03-25 20:38                     ` Linus Torvalds
2004-03-25 22:16                       ` Eric W. Biederman
2004-04-01 14:53                         ` Jörn Engel
2004-04-02 11:54                         ` Tim Connors
2004-03-25 21:46                     ` Eric W. Biederman
2004-03-27 10:28                       ` Jamie Lokier
2004-03-27 21:00                         ` Eric W. Biederman
2004-03-27 21:42                           ` Jamie Lokier
2004-03-27 23:45                             ` Eric W. Biederman
2004-03-28  0:43                               ` Eric W. Biederman
2004-03-28 12:22                                 ` Jamie Lokier
2004-03-28 20:07                                   ` Eric W. Biederman
2004-03-28 23:55                                     ` Jamie Lokier
2004-03-29  1:31                                       ` Eric W. Biederman
2004-03-29 12:36                                         ` Jamie Lokier
2004-03-29 19:36                                           ` Eric W. Biederman
2004-03-29 23:05                                             ` Jamie Lokier
2004-03-29 23:58                                               ` Eric W. Biederman
2004-03-29  7:45                                       ` Denis Vlasenko
2004-03-29  9:28                             ` Pavel Machek
2004-03-29 12:40                               ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040405123556.GB19842@mail.shareable.org \
    --to=jamie@shareable.org \
    --cc=jack@ucw.cz \
    --cc=jlnance@unity.ncsu.edu \
    --cc=joern@wohnheim.fh-wedel.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mj@ucw.cz \
    --cc=patl@users.sourceforge.net \
    --cc=pavel@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox