今天在做如何从两个数组中取出相同的元素时碰到了一个问题,想知道下面哪种
算法更快,(听说是HashSet 的会更快,但是想知道为什么会更快呢?):
具体测试程序:
/**
* get the same element in two arrays
*/
import java.util.*;
import java.io.*;
public class AlgorithmTest{
public static void main(String[] args)throws Exception{
// read the data from file
FileReader fr = new FileReader("C:/test_data_1.txt");
int temp;
StringBuilder sb= new StringBuilder();
while((temp=fr.read())!=-1){
sb.append((char)temp);
}
// System.out.println(sb);
String[] a = sb.toString().split(" ");
Integer[] in1 = new Integer[a.length];
for(int i=0;i<a.length;i++){
in1[i] = Integer.parseInt(a[i]);
}
Integer[] in2 = new Integer[10000];
for(int i=0;i<10000;i++){
in2[i] = in1[i*2];
}
long before,after;
// Algorithm 1
Set bs = new HashSet(Arrays.asList(in2));
List<Integer> result = new ArrayList<Integer>();
before = System.currentTimeMillis();
for(int i=0;i<in1.length;i++){
if(bs.contains(in1[i])){
// System.out.println(a[i]);
result.add(in1[i]);
}
}
after = System.currentTimeMillis();
System.out.println("HashSet Result :" + result.size());
System.out.println("HashSet Algorithm using time :" + (after-before));
// Algorithm 2
result = new ArrayList<Integer>();
before = System.currentTimeMillis();
for(int i=0;i<in1.length;i++)
for(int j=0;j<in2.length;j++)
{
if(in1[i].equals(in2[j])){
result.add(in1[i]);
}
}
after = System.currentTimeMillis();
System.out.println("two loops Result :" + result.size());
System.out.println("two loops Algorithm using time :" + (after-before));
}
}
其中,test data file 是用这个类生成的
/**
* generate the large test data
*/
import java.io.*;
public class GenTestData{
public static void main(String[] args)throws Exception{
String fn = "C:/test_data_1.txt";
FileWriter fs = new FileWriter(fn);
for(int i=0;i<100000;i++){
// 直接写字符串
fs.write(i+" ");
}
}
}
运行结果:
HashSet Result :10001
HashSet Algorithm using time :15
two loops Result :10001
two loops Algorithm using time :11060
结果分析:
java 的HashSet 的contains(Object o) 方法内部采用的是map.containsKey()方法。
private transient HashMap<E,Object> map;
public boolean contains(Object o) {
return map.containsKey(o);
}
而containsKey 方法的具体实现:
/**
* Returns <tt>true</tt> if this map contains a mapping for the
* specified key.
*
* @param key The key whose presence in this map is to be tested
* @return <tt>true</tt> if this map contains a mapping for the specified
* key.
*/
public boolean containsKey(Object key) {
Object k = maskNull(key);
int hash = hash(k.hashCode());
int i = indexFor(hash, table.length);
Entry e = table[i];
while (e != null) {
if (e.hash == hash && eq(k, e.key))
return true;
e = e.next;
}
return false;
}
上面的具体实现用到了哈希算法,即根据一个对象的hashCode 可以用(1)的时间判断该对象是否在这个map 里,如果在,只需要跟一些hashCode 值跟该对象一样的对象比较一下内容,就可得出答案,而这个在双重循环里需要一个一个跟所有的对象比较,显然效率不一样。
在上面的代码实现中用到了API一些函数,列出在此了解参考
/**
* Check for equality of non-null reference x and possibly-null y.
*/
static boolean eq(Object x, Object y) {
return x == y || x.equals(y);
}
Entry class : static class Entry<K,V> implements Map.Entry<K,V>
// Returns internal representation for key. Use NULL_
KEY if key is null.
static <T> T maskNull(T key) {
return key == null ? (T)NULL_KEY : key;
}