linux-c-programming.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: wwp <subscript@free.fr>
To: linux-c-programming@vger.kernel.org
Subject: Re: file deletion
Date: Thu, 30 Dec 2004 13:04:32 +0100	[thread overview]
Message-ID: <20041230130432.4f93abe2@tethys.montpellier.4js.com> (raw)
In-Reply-To: <16851.58845.416665.971234@gargle.gargle.HOWL>

Hello Glynn et al,


On Thu, 30 Dec 2004 11:26:21 +0000 Glynn Clements <glynn@gclements.plus.com> wrote:

> 
> Andy wrote:
> 
> > Does anybody know of any useful code or c commands that
> > you can use to search for duplicate files within a linux/unix directory and
> > its subdirectories and remove them?
> 
> First, you have to define "duplicate". Also, once you've decided that
> two or more files are duplicates, you have to decide which one you
> wish to keep and which ones are to be deleted.
> 
> If you consider any files with identical MD5 hashes as duplicates, and
> don't care about which one is kept, you could use something like:
> 
> find . -type f -exec md5sum {} \; | sort | uniq -w32 -d | cut -b 35- | \
> 	while read file ; do rm -- "$file" ; done
> 
> However, if you have 3 or more copies of a given file, the above will
> only delete one of them ("uniq -d ..." only prints one instance of
> each duplicate line; "uniq -D ..." prints all instances, which would
> result in all copies being removed; there isn't an all-but-one option).

An idea would also be to remove all-but-one and create hardlinks in place of
the removed ones, maybe? Of course it would depend Andy's answers to the
questions raised by Glynn: what means 'duplicate' and what to do w/ those
dups (even why are you checkinf for dups and what are you expecting to be
able to do?).


Regards,

-- 
wwp

  reply	other threads:[~2004-12-30 12:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-23 13:14 Thread scheduling on an SMP Darío Mariani
2004-12-23 17:08 ` Richard Nairn
2004-12-24 13:14 ` file deletion Andy
2004-12-23 13:49   ` Jan-Benedict Glaw
2004-12-27  3:50   ` J.
2004-12-30 11:26   ` Glynn Clements
2004-12-30 12:04     ` wwp [this message]
2005-01-01 11:09       ` Andy
2004-12-31 17:55         ` Jan-Benedict Glaw
2005-01-06  7:07         ` Linux source code: malloc.c Venkatesh Joshi
2005-01-06  9:11           ` Steven Smith
2005-01-06 15:27             ` davidgn
2005-01-06 16:10               ` Steven Smith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041230130432.4f93abe2@tethys.montpellier.4js.com \
    --to=subscript@free.fr \
    --cc=linux-c-programming@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).