All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Andy" <andy_webb@onetel.com>
To: wwp <subscript@free.fr>, linux-c-programming@vger.kernel.org
Subject: Re: file deletion
Date: Sat, 1 Jan 2005 11:09:37 -0000	[thread overview]
Message-ID: <000b01c4eff2$623ff7e0$8a7e4ed5@j0s6l8> (raw)
In-Reply-To: 20041230130432.4f93abe2@tethys.montpellier.4js.com

Gents,
thanks for this, this script seems to do the trick.
Actually what I meant by "duplicate" but didn't clarify
was identical name and file size, dates do not matter.
What does md5sum {} do?
Thanks
Andy
----- Original Message -----
From: "wwp" <subscript@free.fr>
To: <linux-c-programming@vger.kernel.org>
Sent: Thursday, December 30, 2004 12:04 PM
Subject: Re: file deletion


> Hello Glynn et al,
>
>
> On Thu, 30 Dec 2004 11:26:21 +0000 Glynn Clements
<glynn@gclements.plus.com> wrote:
>
> >
> > Andy wrote:
> >
> > > Does anybody know of any useful code or c commands that
> > > you can use to search for duplicate files within a linux/unix
directory and
> > > its subdirectories and remove them?
> >
> > First, you have to define "duplicate". Also, once you've decided that
> > two or more files are duplicates, you have to decide which one you
> > wish to keep and which ones are to be deleted.
> >
> > If you consider any files with identical MD5 hashes as duplicates, and
> > don't care about which one is kept, you could use something like:
> >
> > find . -type f -exec md5sum {} \; | sort | uniq -w32 -d | cut -b 35- | \
> > while read file ; do rm -- "$file" ; done
> >
> > However, if you have 3 or more copies of a given file, the above will
> > only delete one of them ("uniq -d ..." only prints one instance of
> > each duplicate line; "uniq -D ..." prints all instances, which would
> > result in all copies being removed; there isn't an all-but-one option).
>
> An idea would also be to remove all-but-one and create hardlinks in place
of
> the removed ones, maybe? Of course it would depend Andy's answers to the
> questions raised by Glynn: what means 'duplicate' and what to do w/ those
> dups (even why are you checkinf for dups and what are you expecting to be
> able to do?).
>
>
> Regards,
>
> --
> wwp
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-c-programming" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


  reply	other threads:[~2005-01-01 11:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-23 13:14 Thread scheduling on an SMP Darío Mariani
2004-12-23 17:08 ` Richard Nairn
2004-12-24 13:14 ` file deletion Andy
2004-12-23 13:49   ` Jan-Benedict Glaw
2004-12-27  3:50   ` J.
2004-12-30 11:26   ` Glynn Clements
2004-12-30 12:04     ` wwp
2005-01-01 11:09       ` Andy [this message]
2004-12-31 17:55         ` Jan-Benedict Glaw
2005-01-06  7:07         ` Linux source code: malloc.c Venkatesh Joshi
2005-01-06  9:11           ` Steven Smith
2005-01-06 15:27             ` davidgn
2005-01-06 16:10               ` Steven Smith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000b01c4eff2$623ff7e0$8a7e4ed5@j0s6l8' \
    --to=andy_webb@onetel.com \
    --cc=linux-c-programming@vger.kernel.org \
    --cc=subscript@free.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.