All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan-Benedict Glaw <jbglaw@lug-owl.de>
To: linux-c-programming@vger.kernel.org
Subject: Re: file deletion
Date: Thu, 23 Dec 2004 14:49:52 +0100	[thread overview]
Message-ID: <20041223134951.GY2460@lug-owl.de> (raw)
In-Reply-To: <003f01c4e9ba$7679d020$316c4ed5@j0s6l8>

[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]

On Fri, 2004-12-24 13:14:11 -0000, Andy <andy_webb@onetel.com>
wrote in message <003f01c4e9ba$7679d020$316c4ed5@j0s6l8>:
> Does anybody know of any useful code or c commands that
> you can use to search for duplicate files within a linux/unix directory and
> its subdirectories and remove them?

As long as you believe in cryptographical hashes, something like this
should do the trick:

SOME_DIR=/some/directory

find "${SOME_DIR}" -type f -exec sha1sum {} \; | \
	sort | \
	uniq -c | \
	sed -e 's/^[[:space:]]*//g' | \
	egrep -v '\<1\>' | \
	cut -f 4- -d ' ' | \
	while read DOUBLE_FILE_NAME; do
		rm -f "${SOME_DIR}/${DOUBLE_FILE_NAME}"
	done

That's untested, but should work (it's essentially a one-liner).
However, it removes *all* instances of files which are believed to be
identical...

MfG, JBG

-- 
Jan-Benedict Glaw       jbglaw@lug-owl.de    . +49-172-7608481             _ O _
"Eine Freie Meinung in  einem Freien Kopf    | Gegen Zensur | Gegen Krieg  _ _ O
 fuer einen Freien Staat voll Freier Bürger" | im Internet! |   im Irak!   O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2004-12-23 13:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-23 13:14 Thread scheduling on an SMP Darío Mariani
2004-12-23 17:08 ` Richard Nairn
2004-12-24 13:14 ` file deletion Andy
2004-12-23 13:49   ` Jan-Benedict Glaw [this message]
2004-12-27  3:50   ` J.
2004-12-30 11:26   ` Glynn Clements
2004-12-30 12:04     ` wwp
2005-01-01 11:09       ` Andy
2004-12-31 17:55         ` Jan-Benedict Glaw
2005-01-06  7:07         ` Linux source code: malloc.c Venkatesh Joshi
2005-01-06  9:11           ` Steven Smith
2005-01-06 15:27             ` davidgn
2005-01-06 16:10               ` Steven Smith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041223134951.GY2460@lug-owl.de \
    --to=jbglaw@lug-owl.de \
    --cc=linux-c-programming@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.