今天有朋友提到一个叫 ReflectASM的库,为Java环境提供高性能的反射操作支持。它的实现方式是动态代码生成。
以前我的一篇日志里写过,Oracle/Sun JDK6的反射方法调用的实现当中重要的一环就是动态代码生成。
但是今天的主角不是ReflectASM,而是Oracle/Sun JDK里的sun.misc.Unsafe类,以及这个类上的getInt(Object, long)方法:
引用
/**
* Fetches a value from a given Java variable.
* More specifically, fetches a field or array element within the given
* object <code>o</code> at the given offset, or (if <code>o</code> is
* null) from the memory address whose numerical value is the given
* offset.
* <p>
* The results are undefined unless one of the following cases is true:
* <ul>
* <li>The offset was obtained from {@link #objectFieldOffset} on
* the {@link java.lang.reflect.Field} of some Java field and the object
* referred to by <code>o</code> is of a class compatible with that
* field's class.
*
* <li>The offset and object reference <code>o</code> (either null or
* non-null) were both obtained via {@link #staticFieldOffset}
* and {@link #staticFieldBase} (respectively) from the
* reflective {@link Field} representation of some Java field.
*
* <li>The object referred to by <code>o</code> is an array, and the offset
* is an integer of the form <code>B+N*S</code>, where <code>N</code> is
* a valid index into the array, and <code>B</code> and <code>S</code> are
* the values obtained by {@link #arrayBaseOffset} and {@link
* #arrayIndexScale} (respectively) from the array's class. The value
* referred to is the <code>N</code><em>th</em> element of the array.
*
* </ul>
* <p>
* If one of the above cases is true, the call references a specific Java
* variable (field or array element). However, the results are undefined
* if that variable is not in fact of the type returned by this method.
* <p>
* This method refers to a variable by means of two parameters, and so
* it provides (in effect) a <em>double-register</em> addressing mode
* for Java variables. When the object reference is null, this method
* uses its offset as an absolute address. This is similar in operation
* to methods such as {@link #getInt(long)}, which provide (in effect) a
* <em>single-register</em> addressing mode for non-Java variables.
* However, because Java variables may have a different layout in memory
* from non-Java variables, programmers should not assume that these
* two addressing modes are ever equivalent. Also, programmers should
* remember that offsets from the double-register addressing mode cannot
* be portably confused with longs used in the single-register addressing
* mode.
*
* @param o Java heap object in which the variable resides, if any, else
* null
* @param offset indication of where the variable resides in a Java heap
* object, if any, else a memory address locating the variable
* statically
* @return the value fetched from the indicated Java variable
* @throws RuntimeException No defined exceptions are thrown, not even
* {@link NullPointerException}
*/
public native int getInt(Object o, long offset);
(注意:sun.misc.Unsafe是Oracle/Sun JDK里的私有实现,不是Java标准库的共有API,请权衡好利害关系后再使用。)
朋友利用Unsafe.getInt()来实现了快速的反射访问字段的功能,写了这么篇日志:
Java 反射调用的一种优化
在文章末尾,他提到直接用Unsafe.getInt()与用普通反射访问字段的性能对比,
aliveTimeID 写道
通过测试发现,效率是普通java.lang.reflect.Field.get(Object)的3倍,当然,性能这个东西,还是自己测试了放心。
于是问题就来了:在我的机器上也是快三倍么?为什么“快了三倍”?是什么跟什么比较快了三倍?
把原本的测试代码拿来:(有细微改动)
import java.io.Serializable;
import java.lang.reflect.Field;
import sun.misc.Unsafe;
/**
* @author haitao.yao Dec 14, 2010
*/
public class ReflectionCompare {
private static final int count = 10000000;
/**
* @param args
*/
public static void main(String[] args) {
long duration = testIntCommon();
System.out.println("int common test for " + count
+ " times, duration: " + duration);
duration = testUnsafe();
System.out.println("int unsafe test for " + count
+ " times, duration: " + duration);
}
private static long testUnsafe() {
long start = System.currentTimeMillis();
sun.misc.Unsafe unsafe = getUnsafe();
int temp = count;
Field field = getIntField();
long offset = unsafe.objectFieldOffset(field);
while (temp-- > 0) {
unsafe.getInt(new TestBean(), offset);
}
return System.currentTimeMillis() - start;
}
private static long testIntCommon() {
long start = System.currentTimeMillis();
int temp = count;
getIntField().setAccessible(true);
try {
while (temp-- > 0) {
TestBean bean = new TestBean();
getIntField().get(bean);
}
return System.currentTimeMillis() - start;
} catch (Exception e) {
e.printStackTrace();
}
return -1;
}
private static final sun.misc.Unsafe unsafe;
static {
sun.misc.Unsafe value = null;
try {
Class<?> clazz = Class.forName("sun.misc.Unsafe");
Field field = clazz.getDeclaredField("theUnsafe");
field.setAccessible(true);
value = (Unsafe) field.get(null);
} catch (Exception e) {
e.printStackTrace();
throw new RuntimeException("error to get theUnsafe", e);
}
unsafe = value;
}
public static final sun.misc.Unsafe getUnsafe() {
return unsafe;
}
private static final Field intField;
private static final Field stringField;
static {
try {
intField = TestBean.class.getDeclaredField("age");
stringField = TestBean.class.getDeclaredField("name");
} catch (Exception e) {
e.printStackTrace();
throw new IllegalStateException("failed to init testbean field", e);
}
}
public static final Field getIntField() {
return intField;
}
public static final Field getStringField() {
return stringField;
}
/**
* @author haitao.yao
* Dec 14, 2010
*/
static class TestBean implements Serializable{
/**
*
*/
private static final long serialVersionUID = -5994966479456252766L;
private String name;
private int age;
/**
* @return the name
*/
public String getName() {
return name;
}
/**
* @param name the name to set
*/
public void setName(String name) {
this.name = name;
}
/**
* @return the age
*/
public int getAge() {
return age;
}
/**
* @param age the age to set
*/
public void setAge(int age) {
this.age = age;
}
}
}
在我的测试环境里跑:
command prompt">$ java -version
java version "1.6.0_25"
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11, mixed mode)
$ java -cp . ReflectionCompare
int common test for 10000000 times, duration: 616
int unsafe test for 10000000 times, duration: 10
$ java -cp . ReflectionCompare
int common test for 10000000 times, duration: 425
int unsafe test for 10000000 times, duration: 274
$ java -cp . ReflectionCompare
int common test for 10000000 times, duration: 502
int unsafe test for 10000000 times, duration: 10
$ java -cp . ReflectionCompare
int common test for 10000000 times, duration: 659
int unsafe test for 10000000 times, duration: 9
$ java -cp . ReflectionCompare
int common test for 10000000 times, duration: 646
int unsafe test for 10000000 times, duration: 10
这时间看起来不像是“三倍”,而更像是哪里出岔子了
那么在揭开谜底前,我们来从侧面看看能不能有啥发现。
我们已经知道上面对比的双方是Unsafe.getInt()和Field.get()。Unsafe.getInt()是个native方法,就算是到头了。
实际上Unsafe.getInt()方法在HotSpot VM里是个
intrinsic方法,有特别的优化处理。欲知详情可以参考HotSpot server compiler的LibraryCallKit::inline_unsafe_access()函数,它可以将原本的Unsafe.getInt()方法调用内联到调用点上,彻底削除JNI函数调用开销。
但想必这点优化没啥影响,大家是公平的?后面再继续说。
在Oracle/Sun JDK6里,Field.get()又是怎样实现的呢?
跟随Field、ReflectionFactory、UnsafeFieldAccessorFactory、UnsafeFieldAccessorImpl、UnsafeIntegerFieldAccessorImpl这些类的相关方法的实现,可以发现当我们要通过反射获取一个int类型的字段时,最终在UnsafeIntegerFieldAccessorImpl也是通过Unsafe.getInt()来干活的:
public int getInt(Object obj) throws IllegalArgumentException {
ensureObj(obj);
return unsafe.getInt(obj, fieldOffset);
}
也就是说,Oracle/Sun JDK6虽然用动态生成代码的方式来实现反射调用方法,但在反射访问字段上却选择了不动态生成代码而直接使用Unsafe这样的native后门。
那么,自己直接使用Unsafe.getInt()比Field.get()占有的地方就在于前者很直接而后者经过了若干层间接/包装,所以前者比后者快。这足以上面看到的测试时间的差异么?
现在让我们来看看事情的真相。
换用Oracle JDK 6 update 25 build 03 fast
debug版,通过-XX:+PrintOptoAssembly来观察JIT编译器生成出来的代码的样子:
$ ~/jdk/6u25b03_x64_debug/fastdebug/bin/java -cp . -XX:+PrintOptoAssembly ReflectionCompare
可以看到测试里的ReflectionCompare.testUnsafe()方法中的主要计时循环生成为这样的代码:
060 B6: # B6 B7 <- B5 B6 Loop: B6-B6 inner stride: not constant Freq: 999998
060
060
060 decl RBP # int
062 cmpl RBP, #-1
065 jg,s B6 # loop end P=1.000000 C=12203.000000
换回等价的Java代码来表达,也就是:
while (temp-- > 0) {
}
但原本测试代码里写的是:
while (temp-- > 0) {
unsafe.getInt(new TestBean(), offset);
}
而这正是本篇日志的标题所说的:
别测空循环。
连被测的对象都被优化掉了的话,这种microbenchmark就彻底没
意义了。
前面已经提到,Unsafe.getInt()是HotSpot VM的一个intrinsic方法,因而它的所有特性都是VM清楚了解的,包括有没有副作用、是否能安全的内联、内联后是否可以进一步优化直至完全削除,等。
所以说不要小看了某个native方法的优化对microbenchmark的影响。
用同样办法观察原本测试里的ReflectionCompare.testIntCommon()方法,能看到整个调用过程,包括unsafe.getInt()被内联后的逻辑,都还完好的存在于JIT编译生成的代码中。
在这种条件下比较两边的性能,自然是毫无参考价值可言。
当然,我们还是能得到一些收获。
这个测试里,两边最终干活的都是Unsafe.getInt(),但直接调就优化掉了,而通过Field.get()来调就没被优化掉。
这正好体现了通用型API的多层繁复的封装带来的问题——过多的封装引来了过多间接层,间接层多了之后编译器要优化就会很吃力。
想像一下,如果有人发现直接调用Unsafe.getInt()方法字段的速度很快,然后好心的想把它封装成通用的反射库提供给大家用,很可能发生的状况就是他又把间接层堆积了起来,结果最终用户用的时候就跟直接用反射差不多。
话说回来,一般
程序员写Java程序的时候也无需为这种问题纠结就是了。这篇故事看了笑笑就好