From: Glynn Clements <glynn@gclements.plus.com>
To: Andy <andy_webb@onetel.com>
Cc: linux-c-programming@vger.kernel.org
Subject: Re: file deletion
Date: Thu, 30 Dec 2004 11:26:21 +0000 [thread overview]
Message-ID: <16851.58845.416665.971234@gargle.gargle.HOWL> (raw)
In-Reply-To: <003f01c4e9ba$7679d020$316c4ed5@j0s6l8>
Andy wrote:
> Does anybody know of any useful code or c commands that
> you can use to search for duplicate files within a linux/unix directory and
> its subdirectories and remove them?
First, you have to define "duplicate". Also, once you've decided that
two or more files are duplicates, you have to decide which one you
wish to keep and which ones are to be deleted.
If you consider any files with identical MD5 hashes as duplicates, and
don't care about which one is kept, you could use something like:
find . -type f -exec md5sum {} \; | sort | uniq -w32 -d | cut -b 35- | \
while read file ; do rm -- "$file" ; done
However, if you have 3 or more copies of a given file, the above will
only delete one of them ("uniq -d ..." only prints one instance of
each duplicate line; "uniq -D ..." prints all instances, which would
result in all copies being removed; there isn't an all-but-one option).
--
Glynn Clements <glynn@gclements.plus.com>
next prev parent reply other threads:[~2004-12-30 11:26 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-23 13:14 Thread scheduling on an SMP Darío Mariani
2004-12-23 17:08 ` Richard Nairn
2004-12-24 13:14 ` file deletion Andy
2004-12-23 13:49 ` Jan-Benedict Glaw
2004-12-27 3:50 ` J.
2004-12-30 11:26 ` Glynn Clements [this message]
2004-12-30 12:04 ` wwp
2005-01-01 11:09 ` Andy
2004-12-31 17:55 ` Jan-Benedict Glaw
2005-01-06 7:07 ` Linux source code: malloc.c Venkatesh Joshi
2005-01-06 9:11 ` Steven Smith
2005-01-06 15:27 ` davidgn
2005-01-06 16:10 ` Steven Smith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=16851.58845.416665.971234@gargle.gargle.HOWL \
--to=glynn@gclements.plus.com \
--cc=andy_webb@onetel.com \
--cc=linux-c-programming@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).