From mboxrd@z Thu Jan 1 00:00:00 1970 From: Denis Vlasenko Subject: Re: Finding hardlinks Date: Fri, 12 Jan 2007 00:35:37 +0100 Message-ID: <200701120035.37337.vda.linux@googlemail.com> References: <4593890C.8030207@panasas.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Cc: Mikulas Patocka , Arjan van de Ven , Jan Harkes , Miklos Szeredi , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, nfsv4@ietf.org Return-path: Received: from ug-out-1314.google.com ([66.249.92.172]:1242 "EHLO ug-out-1314.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932813AbXAKXhU (ORCPT ); Thu, 11 Jan 2007 18:37:20 -0500 Received: by ug-out-1314.google.com with SMTP id 44so643061uga for ; Thu, 11 Jan 2007 15:37:19 -0800 (PST) To: Benny Halevy In-Reply-To: <4593890C.8030207@panasas.com> Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Thursday 28 December 2006 10:06, Benny Halevy wrote: > Mikulas Patocka wrote: > >>> If user (or script) doesn't specify that flag, it doesn't help. I think > >>> the best solution for these filesystems would be either to add new syscall > >>> int is_hardlink(char *filename1, char *filename2) > >>> (but I know adding syscall bloat may be objectionable) > >> it's also the wrong api; the filenames may have been changed under you > >> just as you return from this call, so it really is a > >> "was_hardlink_at_some_point()" as you specify it. > >> If you make it work on fd's.. it has a chance at least. > > > > Yes, but it doesn't matter --- if the tree changes under "cp -a" command, > > no one guarantees you what you get. > > int fis_hardlink(int handle1, int handle 2); > > Is another possibility but it can't detect hardlinked symlinks. It also suffers from combinatorial explosion. cp -a on 10^6 files will require ~0.5 * 10^12 compares... > It seems like the posix idea of unique doesn't > hold water for modern file systems and that creates real problems for > backup apps which rely on that to detect hard links. Yes, and it should have been obvious at 32->64bit inode# transition. Unfortunately people tend to think "ok, NOW this new shiny BIGNUM-bit field is big enough for everybody". Then cycle repeats in five years... I think the solution is that inode "numbers" should become opaque _variable-length_ hashes. They are already just hash values, this is nothing new. All problems stem from fixed width of inode# only. -- vda