From: wwp <subscript@free.fr>
To: linux-c-programming@vger.kernel.org
Subject: Re: file deletion
Date: Thu, 30 Dec 2004 13:04:32 +0100 [thread overview]
Message-ID: <20041230130432.4f93abe2@tethys.montpellier.4js.com> (raw)
In-Reply-To: <16851.58845.416665.971234@gargle.gargle.HOWL>
Hello Glynn et al,
On Thu, 30 Dec 2004 11:26:21 +0000 Glynn Clements <glynn@gclements.plus.com> wrote:
>
> Andy wrote:
>
> > Does anybody know of any useful code or c commands that
> > you can use to search for duplicate files within a linux/unix directory and
> > its subdirectories and remove them?
>
> First, you have to define "duplicate". Also, once you've decided that
> two or more files are duplicates, you have to decide which one you
> wish to keep and which ones are to be deleted.
>
> If you consider any files with identical MD5 hashes as duplicates, and
> don't care about which one is kept, you could use something like:
>
> find . -type f -exec md5sum {} \; | sort | uniq -w32 -d | cut -b 35- | \
> while read file ; do rm -- "$file" ; done
>
> However, if you have 3 or more copies of a given file, the above will
> only delete one of them ("uniq -d ..." only prints one instance of
> each duplicate line; "uniq -D ..." prints all instances, which would
> result in all copies being removed; there isn't an all-but-one option).
An idea would also be to remove all-but-one and create hardlinks in place of
the removed ones, maybe? Of course it would depend Andy's answers to the
questions raised by Glynn: what means 'duplicate' and what to do w/ those
dups (even why are you checkinf for dups and what are you expecting to be
able to do?).
Regards,
--
wwp
next prev parent reply other threads:[~2004-12-30 12:04 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-23 13:14 Thread scheduling on an SMP Darío Mariani
2004-12-23 17:08 ` Richard Nairn
2004-12-24 13:14 ` file deletion Andy
2004-12-23 13:49 ` Jan-Benedict Glaw
2004-12-27 3:50 ` J.
2004-12-30 11:26 ` Glynn Clements
2004-12-30 12:04 ` wwp [this message]
2005-01-01 11:09 ` Andy
2004-12-31 17:55 ` Jan-Benedict Glaw
2005-01-06 7:07 ` Linux source code: malloc.c Venkatesh Joshi
2005-01-06 9:11 ` Steven Smith
2005-01-06 15:27 ` davidgn
2005-01-06 16:10 ` Steven Smith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041230130432.4f93abe2@tethys.montpellier.4js.com \
--to=subscript@free.fr \
--cc=linux-c-programming@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).