From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: linux-kernel@vger.kernel.org, randy_dunlap <rdunlap@xenotime.net>
Subject: CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code
Date: Wed, 13 Jul 2005 03:47:08 -0300 [thread overview]
Message-ID: <20050713064708.GA5988@dmt.cnet> (raw)
FYI, CP-Miner has binary-only version has been released
for academic use.
http://opera.cs.uiuc.edu/Projects/ARTS/CP-Miner.htm
By Zhenmin Li, Shan Lu, Suvda Myagmar and Yuanyuan Zhou
Published in Proceedings of the Sixth Symposium on Operating System Design
and Implementation (OSDI'04), Dec. 2004
Copy-pasted code is very common in large software because programmers
prefer reusing code via copy-paste in order to reduce programming effort.
Copy-paste is prone to introducing bugs. Recent studies show that a
significant portion of operating system bugs concentrate in copy-pasted
code. Unfortunately, it is very challenging to efficiently identify
copy-pasted code in large software. Existing copy-paste detection tools are
either not scalable to large software, or cannot handle small modifications
in copy-paste. Furthermore, few tools are available to detect copy-paste
related bugs.
In this paper, we propose a tool, called CP-Miner, that uses data mining
techniques to efficiently identify copy-pasted code in large software
including operating systems, and detect copy-paste related bugs.
Specifically, it takes less than 20 minutes for CP-Miner to identify
190,000 and 150,000 copy-pasted segments in Linux and FreeBSD,
respectively. The copy-pasted code accounts for 20-22% of code in Linux and
FreeBSD. Similarly, CP-Miner also identifies many copy-pasted segments in
the Apache Web Server and PostgresSQL, which account for 17.7-22% of code
in these software.
Moreover, CP-Miner has detected 49 and 31 copy-paste related bugs in the
latest versions of Linux and FreeBSD, respectively. Some of these bugs have
been reported by us to the open source community and are then fixed by the
developers.
reply other threads:[~2005-07-13 17:12 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050713064708.GA5988@dmt.cnet \
--to=marcelo.tosatti@cyclades.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rdunlap@xenotime.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox