All of lore.kernel.org
 help / color / mirror / Atom feed
From: wwp <subscript@free.fr>
To: linux-c-programming@vger.kernel.org
Subject: Re: file deletion
Date: Thu, 30 Dec 2004 13:04:32 +0100	[thread overview]
Message-ID: <20041230130432.4f93abe2@tethys.montpellier.4js.com> (raw)
In-Reply-To: <16851.58845.416665.971234@gargle.gargle.HOWL>

Hello Glynn et al,


On Thu, 30 Dec 2004 11:26:21 +0000 Glynn Clements <glynn@gclements.plus.com> wrote:

> 
> Andy wrote:
> 
> > Does anybody know of any useful code or c commands that
> > you can use to search for duplicate files within a linux/unix directory and
> > its subdirectories and remove them?
> 
> First, you have to define "duplicate". Also, once you've decided that
> two or more files are duplicates, you have to decide which one you
> wish to keep and which ones are to be deleted.
> 
> If you consider any files with identical MD5 hashes as duplicates, and
> don't care about which one is kept, you could use something like:
> 
> find . -type f -exec md5sum {} \; | sort | uniq -w32 -d | cut -b 35- | \
> 	while read file ; do rm -- "$file" ; done
> 
> However, if you have 3 or more copies of a given file, the above will
> only delete one of them ("uniq -d ..." only prints one instance of
> each duplicate line; "uniq -D ..." prints all instances, which would
> result in all copies being removed; there isn't an all-but-one option).

An idea would also be to remove all-but-one and create hardlinks in place of
the removed ones, maybe? Of course it would depend Andy's answers to the
questions raised by Glynn: what means 'duplicate' and what to do w/ those
dups (even why are you checkinf for dups and what are you expecting to be
able to do?).


Regards,

-- 
wwp

  reply	other threads:[~2004-12-30 12:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-23 13:14 Thread scheduling on an SMP Darío Mariani
2004-12-23 17:08 ` Richard Nairn
2004-12-24 13:14 ` file deletion Andy
2004-12-23 13:49   ` Jan-Benedict Glaw
2004-12-27  3:50   ` J.
2004-12-30 11:26   ` Glynn Clements
2004-12-30 12:04     ` wwp [this message]
2005-01-01 11:09       ` Andy
2004-12-31 17:55         ` Jan-Benedict Glaw
2005-01-06  7:07         ` Linux source code: malloc.c Venkatesh Joshi
2005-01-06  9:11           ` Steven Smith
2005-01-06 15:27             ` davidgn
2005-01-06 16:10               ` Steven Smith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041230130432.4f93abe2@tethys.montpellier.4js.com \
    --to=subscript@free.fr \
    --cc=linux-c-programming@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.