Java 集合_JAVA_程序员俱乐部

您所在的位置：程序员俱乐部 > 编程开发 > JAVA > Java 集合

Java 集合

2012/4/23 12:40:11 naouguhtaeyeti 程序员俱乐部我要评论(0)

摘要：今天在做如何从两个数组中取出相同的元素时碰到了一个问题，想知道下面哪种算法更快，（听说是HashSet的会更快，但是想知道为什么会更快呢？）：具体测试程序：/***getthesameelementintwoarrays*/importjava.util.*;importjava.io.*;publicclassAlgorithmTest{publicstaticvoidmain(String[]args)throwsException
标签：Java

今天在做如何从两个数组中取出相同的元素时碰到了一个问题，想知道下面哪种算法更快，（听说是HashSet 的会更快，但是想知道为什么会更快呢？）：
具体测试程序：

/**
 * get the same element in two arrays 
 */
import java.util.*;
import java.io.*;
public class AlgorithmTest{
	public static void main(String[] args)throws Exception{
		
		// read the data from file
		FileReader fr = new FileReader("C:/test_data_1.txt");
		int temp;
		StringBuilder sb= new StringBuilder();
		while((temp=fr.read())!=-1){
			sb.append((char)temp);
		}
	//	System.out.println(sb);
		String[] a = sb.toString().split(" ");
		Integer[] in1 = new Integer[a.length];
		for(int i=0;i<a.length;i++){
		 	in1[i] = Integer.parseInt(a[i]);
			}

		Integer[] in2 = new Integer[10000];
		for(int i=0;i<10000;i++){
			in2[i] = in1[i*2];
		}
		long before,after;
		
	
		
		
		// Algorithm 1
		Set bs = new HashSet(Arrays.asList(in2));
		List<Integer> result = new ArrayList<Integer>();
		before = System.currentTimeMillis();
		for(int i=0;i<in1.length;i++){
			if(bs.contains(in1[i])){
			//	System.out.println(a[i]);
				result.add(in1[i]);
			}
		}
		after = System.currentTimeMillis();
		System.out.println("HashSet Result :" + result.size());
		System.out.println("HashSet Algorithm using time :" + (after-before));
		
		// Algorithm 2
		result = new ArrayList<Integer>();
		before = System.currentTimeMillis();
		for(int i=0;i<in1.length;i++)
			for(int j=0;j<in2.length;j++)
			{
				if(in1[i].equals(in2[j])){
					result.add(in1[i]);
				}
			}
		after = System.currentTimeMillis();
		System.out.println("two loops Result :" + result.size());
		System.out.println("two loops Algorithm using time :" + (after-before));
	}
}


其中，test data file 是用这个类生成的
/**
 * generate the large test data
 */
import java.io.*;
public class GenTestData{
	public static void main(String[] args)throws Exception{
	  String fn = "C:/test_data_1.txt";
		FileWriter fs = new FileWriter(fn);
		for(int i=0;i<100000;i++){
			// 直接写字符串
			fs.write(i+" ");
		}	
	}
}

运行结果：

HashSet Result :10001
HashSet Algorithm using time :15
two loops Result :10001
two loops Algorithm using time :11060

结果分析：

java 的HashSet 的contains(Object o) 方法内部采用的是map.containsKey()方法。

private transient HashMap<E,Object> map;
public boolean contains(Object o) {
	return map.containsKey(o);
}
而containsKey 方法的具体实现：
/**
     * Returns <tt>true</tt> if this map contains a mapping for the
     * specified key.
     *
     * @param   key   The key whose presence in this map is to be tested
     * @return <tt>true</tt> if this map contains a mapping for the specified
     * key.
*/
public boolean containsKey(Object key) {
        Object k = maskNull(key);

        int hash = hash(k.hashCode());
        int i = indexFor(hash, table.length);
        Entry e = table[i]; 
        while (e != null) {
            if (e.hash == hash && eq(k, e.key)) 
                return true;
            e = e.next;
        }
        return false;
}

上面的具体实现用到了哈希算法，即根据一个对象的hashCode 可以用(1)的时间判断该对象是否在这个map 里，如果在，只需要跟一些hashCode 值跟该对象一样的对象比较一下内容，就可得出答案，而这个在双重循环里需要一个一个跟所有的对象比较，显然效率不一样。

在上面的代码实现中用到了API一些函数，列出在此了解参考
    /**
     * Check for equality of non-null reference x and possibly-null y.
     */
    static boolean eq(Object x, Object y) {
       return x == y || x.equals(y);
    }
Entry class : static class Entry<K,V> implements Map.Entry<K,V>
        // Returns internal representation for key. Use NULL_KEY if key is null.

static <T> T maskNull(T key) {
        return key == null ? (T)NULL_KEY : key;
    }

上一篇：浅谈Volatile 下一篇： java千万级别数据处理(2)-千万级别FTP下载