From: "Andy" <andy_webb@onetel.com>
To: wwp <subscript@free.fr>, linux-c-programming@vger.kernel.org
Subject: Re: file deletion
Date: Sat, 1 Jan 2005 11:09:37 -0000 [thread overview]
Message-ID: <000b01c4eff2$623ff7e0$8a7e4ed5@j0s6l8> (raw)
In-Reply-To: 20041230130432.4f93abe2@tethys.montpellier.4js.com
Gents,
thanks for this, this script seems to do the trick.
Actually what I meant by "duplicate" but didn't clarify
was identical name and file size, dates do not matter.
What does md5sum {} do?
Thanks
Andy
----- Original Message -----
From: "wwp" <subscript@free.fr>
To: <linux-c-programming@vger.kernel.org>
Sent: Thursday, December 30, 2004 12:04 PM
Subject: Re: file deletion
> Hello Glynn et al,
>
>
> On Thu, 30 Dec 2004 11:26:21 +0000 Glynn Clements
<glynn@gclements.plus.com> wrote:
>
> >
> > Andy wrote:
> >
> > > Does anybody know of any useful code or c commands that
> > > you can use to search for duplicate files within a linux/unix
directory and
> > > its subdirectories and remove them?
> >
> > First, you have to define "duplicate". Also, once you've decided that
> > two or more files are duplicates, you have to decide which one you
> > wish to keep and which ones are to be deleted.
> >
> > If you consider any files with identical MD5 hashes as duplicates, and
> > don't care about which one is kept, you could use something like:
> >
> > find . -type f -exec md5sum {} \; | sort | uniq -w32 -d | cut -b 35- | \
> > while read file ; do rm -- "$file" ; done
> >
> > However, if you have 3 or more copies of a given file, the above will
> > only delete one of them ("uniq -d ..." only prints one instance of
> > each duplicate line; "uniq -D ..." prints all instances, which would
> > result in all copies being removed; there isn't an all-but-one option).
>
> An idea would also be to remove all-but-one and create hardlinks in place
of
> the removed ones, maybe? Of course it would depend Andy's answers to the
> questions raised by Glynn: what means 'duplicate' and what to do w/ those
> dups (even why are you checkinf for dups and what are you expecting to be
> able to do?).
>
>
> Regards,
>
> --
> wwp
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-c-programming" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2005-01-01 11:09 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-23 13:14 Thread scheduling on an SMP Darío Mariani
2004-12-23 17:08 ` Richard Nairn
2004-12-24 13:14 ` file deletion Andy
2004-12-23 13:49 ` Jan-Benedict Glaw
2004-12-27 3:50 ` J.
2004-12-30 11:26 ` Glynn Clements
2004-12-30 12:04 ` wwp
2005-01-01 11:09 ` Andy [this message]
2004-12-31 17:55 ` Jan-Benedict Glaw
2005-01-06 7:07 ` Linux source code: malloc.c Venkatesh Joshi
2005-01-06 9:11 ` Steven Smith
2005-01-06 15:27 ` davidgn
2005-01-06 16:10 ` Steven Smith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='000b01c4eff2$623ff7e0$8a7e4ed5@j0s6l8' \
--to=andy_webb@onetel.com \
--cc=linux-c-programming@vger.kernel.org \
--cc=subscript@free.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).