HBase通过租约来控制每个s
canner的操作时间。
1. 租约线程初始化:
HRegionServer的run方法会调用一次preRegistrationInitialization方法,再调用initializeThreads时,会new lease
class="java" name="code"> this.leases = new Leases((int) conf.getLong(
HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY,
HConstants.DEFAULT_HBASE_REGIONSERVER_LEASE_PERIOD),
this.threadWakeFrequency);
这里默认的过期时间是60s:
public static String HBASE_REGIONSERVER_LEASE_PERIOD_KEY =
"hbase.regionserver.lease.period";
public static long DEFAULT_HBASE_REGIONSERVER_LEASE_PERIOD = 60000;
默认的lease线程周期性检查时间是10s
/** Parameter name for how often threads should wake up */
public static final String THREAD_WAKE_FREQUENCY = "hbase.server.thread.wakefrequency";
/** Default value for thread wake frequency */
public static final int DEFAULT_THREAD_WAKE_FREQUENCY = 10 * 1000;
最终在HRegionServer的start
ServiceThreads启动lease线程。
this.leases.setName(n + ".leaseChecker");
this.leases.start();
2. 租约的创建
在openScanner和addRowLock时会创建租约
openScanner时,对于一个新的scanner会creatLease.
protected long addScanner(RegionScanner s) throws LeaseStillHeldException {
long scannerId = -1L;
while (true) {
scannerId = rand.nextLong();
if (scannerId == -1) continue;
String scannerName = String.valueOf(scannerId);
RegionScanner existing = scanners.putIfAbsent(scannerName, s);
if (existing == null) {
this.leases.createLease(scannerName, new ScannerListener(scannerName));
break;
}
}
return scannerId;
}
最终将lease以scannerId加入DelayQueue中,
public void addLease(final Lease lease) throws LeaseStillHeldException {
if (this.stopRequested) {
return;
}
lease.setExpirationTime(System.currentTimeMillis() + this.leasePeriod);
synchronized (leaseQueue) {
if (leases.containsKey(lease.getLeaseName())) {
throw new LeaseStillHeldException(lease.getLeaseName());
}
leases.put(lease.getLeaseName(), lease);
leaseQueue.add(lease);
}
}
3. 租约的失效
租约线程每10s会检查一次leaseQueue,leaseQueue是一个java.util.concurrent.DelayQueue, 是一个使用优先
队列(PriorityQueue)实现的BlockingQueue,优先队列的以指定的时间做为比较的基准值。
public void run() {
while (!stopRequested || (stopRequested && leaseQueue.size() > 0) ) {
Lease lease = null;
try {
lease = leaseQueue.poll(leaseCheckFrequency, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
continue;
} catch (ConcurrentModificationException e) {
continue;
} catch (Throwable e) {
LOG.fatal("Unexpected exception killed leases thread", e);
break;
}
if (lease == null) {
continue;
}
// A lease expired. Run the expired code before removing from queue
// since its presence in queue is used to see if lease exists still.
if (lease.getListener() == null) {
LOG.error("lease listener is null for lease " + lease.getLeaseName());
} else {
lease.getListener().leaseExpired();
}
synchronized (leaseQueue) {
leases.remove(lease.getLeaseName());
}
}
close();
}
poll方法会取出到期的lease并执行其Listener的过期方法。
public void leaseExpired() {
RegionScanner s = scanners.remove(this.scannerName);
if (s != null) {
LOG.info("Scanner " + this.scannerName + " lease expired on region "
+ s.getRegionInfo().getRegionNameAsString());
try {
HRegion region = getRegion(s.getRegionInfo().getRegionName());
if (region != null && region.getCoprocessorHost() != null) {
region.getCoprocessorHost().preScannerClose(s);
}
s.close();
if (region != null && region.getCoprocessorHost() != null) {
region.getCoprocessorHost().postScannerClose(s);
}
} catch (IOException e) {
LOG.error("Closing scanner for "
+ s.getRegionInfo().getRegionNameAsString(), e);
}
} else {
LOG.info("Scanner " + this.scannerName + " lease expired");
}
}
过期方法中会将此scanner从
内存中删除并将scanner关闭。
4. 常见错误
2013-11-06 16:16:38,684 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.regionserver.LeaseException: lease '-2408052186420749395' does not exist
at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2783)
at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
以上常见的错误是因为leaser失效,而client可能没有关闭scanner,使用老的scannerid过来next时,会有一个重新生成lease的过程,过程如下:
1.
lease = this.leases.removeLease(scannerName);
在next方法中,先执行一次删除lease的操作,看看lease能不能正常删除
Lease removeLease(final String leaseName) throws LeaseException {
Lease lease = null;
synchronized (leaseQueue) {
lease = leases.remove(leaseName);
if (lease == null) {
throw new LeaseException("lease '" + leaseName + "' does not exist");
}
leaseQueue.remove(lease);
}
return lease;
}
如果这个lease是存在的,自然可以正常删除,一量lease已经失效,则会抛LeaseException,
正常情况下,lease被remove之后,为了一个正常的next能继续运行下去,那么在最后会再增加一个lease,leasename还是原来的scannerid
if (this.scanners.containsKey(scannerName)) {
if (lease != null) this.leases.addLease(lease);
}
针对以上错误
1.检查hbase.rpc.timeout(默认60000ms) 是否大于等于hbase.regionserver.lease.period(默认为60000ms), 大于等于才是对的。
2. 检查是否有scanner没有关闭。