最近这的是投入不少精力在lucene身上,学到一点心得,留此文章以作纪念。一个目的是分享给大家,另一个目的是将来再用的时候再看看自己的文章能少走些路,好啦,开始正文。
lucene特点及作用文章里就不说了,网上有的是。我就简单说下个人理解,
正常 sql 查询时:name like '%继中%' 想必大家一定明白这样不会走索引的,然后就在多行数据级别查询相应时间会很慢,对吧,因为数据库在一行行扫呢。所以我们自然会想到怎样能让它走索引?
解决方案之一:lucene出来了。
其实它就是帮你把文章拆分成若干个关键词,这样以便按关键词查询时能通过关键词直接查询来锁定哪些文章匹配该关键词并快速返回。说再直白点,就是 sql语句的查询不用like ,而是 name ='继中',这样就走索引了,所以就快了而已。
下面来说正题,spring框架下配置lucene,lucene版本:3.0.3,直接上代码,通过代码我来分享下各行的作用
mvc-config.xml:
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:mvc="http://www.springframework.org/schema/mvc" xmlns:context="http://www.springframework.org/schema/context" xmlns:util="http://www.springframework.org/schema/util" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc-3.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-3.0.xsd" default-autowire="byName" > <bean class="org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter"> <property name="messageConverters"> <list> <bean class = "org.springframework.http.converter.StringHttpMessageConverter"> <property name = "supportedMediaTypes"> <list><value>text/plain;charset=UTF-8</value></list> </property> </bean> </list> </property> </bean> <context:component-scan base-package="com.jizhong" /> <mvc:annotation-driven/> <bean class="org.springframework.web.servlet.view.InternalResourceViewResolver"> <property name="prefix" value="/" /> <property name="suffix" value=".jsp" /> </bean> <!-- LUCENE SEARCH CONFIG --> <!-- 设置字段内容长度,这里不做限定 --> <bean id="MAXFIELDLENGTH2" class="org.apache.lucene.index.IndexWriter.MaxFieldLength.UNLIMITED" /> <!-- set your analyzer, to be used by the IndexWriter and QueryParser ,关于分词器,因为我们主要进行中文搜索,所以要选择好点的中文分词器,我选择了paoding--> <bean id="luceneAnalyzer" class="net.paoding.analysis.analyzer.PaodingAnalyzer"> </bean> <!-- set your Lucene directory --> <!-- in this case I am pulling the location from a properties file --> <!-- also, using the SimpleFSLockFactory ,数据文件存放位置设置--> <bean id="luceneDirectory" class="org.apache.lucene.store.SimpleFSDirectory" > <constructor-arg> <bean class="java.io.File"> <constructor-arg value="D:\\common\\hahaha" /> </bean> </constructor-arg> </bean> <!-- now you're ready to define the IndexWriter,这里创建 IndexWriter并引入相关bean--> <bean id="indexWriter" class="org.apache.lucene.index.IndexWriter"> <constructor-arg ref="luceneDirectory" /> <constructor-arg ref="luceneAnalyzer" /> <constructor-arg name="create" value="false" /> <constructor-arg ref="MAXFIELDLENGTH2" /> </bean> <!-- define the IndexSearcher ,这里创建IndexSearcher--> <bean id="indexSearcher" class="org.apache.lucene.search.IndexSearcher"> <constructor-arg ref="luceneDirectory" /> </bean> </beans>
?
以上是spring配置文件中关于lucene的代码片段,看起来是不是很简单?
我们继续看代码
package com.jizhong.mmmmm.controller; import java.io.IOException; import java.io.StringReader; import javax.servlet.http.HttpServletRequest; import org.apache.log4j.Logger; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.TermAttribute; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.Field.Index; import org.apache.lucene.document.Field.Store; import org.apache.lucene.document.NumericField; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.MultiFieldQueryParser; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.BooleanClause; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TopDocs; import org.apache.lucene.util.Version; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Controller; import org.springframework.ui.ModelMap; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.RequestMethod; @Controller public class LuceneController { private static Logger logger = Logger.getLogger(LuceneController.class); @Autowired(required = false)//这里我写了required = false,需要时再引入,不写的话会报错,大家有更好解决方案请留言哈 private Analyzer myAnalyzer; @Autowired(required = false) private IndexWriter indexWriter; @Autowired(required = false) private IndexSearcher searcher; @RequestMapping(value = "search.do", method = RequestMethod.GET) public String testsSearch(HttpServletRequest request, ModelMap modelMap) throws Exception { search(); return "test"; } @RequestMapping(value = "idSearch.do", method = RequestMethod.GET) public String idSearch(HttpServletRequest request, ModelMap modelMap) throws Exception { idSearch(); return "test"; } @RequestMapping(value = "moreSearch.do", method = RequestMethod.GET) public String moreSearch(HttpServletRequest request, ModelMap modelMap) throws Exception { searchMore(); return "test"; } @RequestMapping(value = "create.do", method = RequestMethod.GET) public String testsCreate(HttpServletRequest request, ModelMap modelMap) throws Exception { create("整形值添加"); // create(request.getParameter("name")); return "test"; } @RequestMapping(value = "delete.do", method = RequestMethod.GET) public String delete(HttpServletRequest request, ModelMap modelMap) throws Exception { delete("id", request.getParameter("id")); return "test"; } @RequestMapping(value = "optimize.do", method = RequestMethod.GET) public String optimize(HttpServletRequest request, ModelMap modelMap) throws Exception { indexWriter.optimize();//优化索引方法,不建议经常调用,会很耗时,隔段时间调优下即可 return "test"; } //关于更新一个文档要注意一点,虽然它提供了updateDocument,但我觉得他是先删再加,所以大家要把所以值都写上,虽然可能只更新一个字段 @RequestMapping(value = "update.do", method = RequestMethod.GET) public String update(HttpServletRequest request, ModelMap modelMap) throws Exception { Term term = new Term("id", "1999991"); Document doc = new Document(); doc.add(new Field("id", String.valueOf(1999991), Store.YES, Index.NOT_ANALYZED)); doc.add(new Field("name", 555555 + "555555" + 555555, Store.YES, Index.ANALYZED)); doc.add(new Field("level1", String.valueOf(555555), Store.YES, Index.NOT_ANALYZED)); doc.add(new Field("level2", String.valueOf(555555), Store.YES, Index.NOT_ANALYZED)); doc.add(new Field("level3", String.valueOf(555555), Store.YES, Index.NOT_ANALYZED)); doc.add(new Field("brand_id", String.valueOf(555555 + 100000), Store.YES, Index.NOT_ANALYZED)); indexWriter.updateDocument(term, doc); indexWriter.commit();//凡是涉及到索引变化的动作都要提交才能生效 return "test"; } //delete,没啥说的哈 private void delete(String field, String text) throws CorruptIndexException, IOException { Term term1 = new Term(field, text); indexWriter.deleteDocuments(term1); indexWriter.commit(); } public void create(String string) throws Exception { long begin = System.currentTimeMillis(); for (int m = 604; m < 605; m++) { for (int i = m * 10000; i < (m + 1) * 10000; i++) { Document doc = new Document(); // doc.add(new Field("id", String.valueOf(i), Store.YES, Index.NOT_ANALYZED_NO_NORMS)); NumericField field = new NumericField("id", 6, Field.Store.YES, false); field.setIntValue(i); doc.add(field);//这里不建议这样写,无论什么格式都以字符串形式灌入数据最好,否则会因为不匹配而查不到,经验之谈哈,如下面这样: doc.add(new Field("name", i + string + i, Store.YES, Index.ANALYZED));//关于索引策略,建议需要模糊查询字段进行分词策略,其他则不分词 doc.add(new Field("level1", String.valueOf(3), Store.YES, Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("level2", String.valueOf(2), Store.YES, Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("level3", String.valueOf(1), Store.YES, Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("brand_id", String.valueOf(i + 100000), Store.YES, Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("hehe", String.valueOf(i + 100000), Store.YES, Index.NOT_ANALYZED_NO_NORMS)); indexWriter.addDocument(doc); } System.out.println(m); } indexWriter.commit(); System.out.println("create cost:" + (System.currentTimeMillis() - begin) / 1000 + "s"); } //这里的查询是说:搜索name字段关键词为“整形的”,level3字段值为1的内容,两者条件是 'and'的关系 public void search() throws Exception { long begin = System.currentTimeMillis(); String[] queryString = { "整形", "1" };//注意字段与值要一一对应哦,同下 String[] fields = { "name", "level3" };////注意字段与值要一一对应哦,同上 BooleanClause.Occur[] clauses = { BooleanClause.Occur.MUST, BooleanClause.Occur.MUST };//这里就是 and 的关系,详细策略看文档哈 Query query = MultiFieldQueryParser.parse(Version.LUCENE_30, queryString, fields, clauses, myAnalyzer); IndexReader readerNow = searcher.getIndexReader(); //这个判断很重要,就是当我们刚灌入了数据就希望查询出来,因为前者写索引时关闭了reader,所以我们现在查询时要打开它 if (!readerNow.isCurrent()) { searcher = new IndexSearcher(readerNow.reopen()); } System.out.println(searcher.maxDoc()); Sort sort = new Sort(); sort.setSort(new SortField("id", SortField.INT, true)); TopDocs topDocs = searcher.search(query, null, 53, sort);//排序策略 // TopDocs topDocs = searcher.search(query, 50); for (ScoreDoc scoreDoc : topDocs.scoreDocs) { Document doc = searcher.doc(scoreDoc.doc); System.out.println("id:" + doc.get("id")); System.out.println("name:" + doc.get("name")); System.out.println("level3:" + doc.get("level3")); System.out.println("new field:" + doc.get("hehe")); } System.out.println("search cost:" + (System.currentTimeMillis() - begin) / 1000 + "s"); } private void idSearch() throws ParseException, CorruptIndexException, IOException { long begin = System.currentTimeMillis(); QueryParser qp = new QueryParser(Version.LUCENE_30, "id", myAnalyzer); Query query = qp.parse("4040011"); IndexReader readerNow = searcher.getIndexReader(); if (!readerNow.isCurrent()) { searcher = new IndexSearcher(readerNow.reopen()); } TopDocs topDocs = searcher.search(query, null, 53); for (ScoreDoc scoreDoc : topDocs.scoreDocs) { Document doc = searcher.doc(scoreDoc.doc); System.out.println("id:" + doc.get("id")); System.out.println("name:" + doc.get("name")); System.out.println("level3:" + doc.get("level3")); System.out.println("new field:" + doc.get("hehe")); } System.out.println("search cost:" + (System.currentTimeMillis() - begin) / 1000 + "s"); } public void searchMore() throws Exception { long begin = System.currentTimeMillis(); String[] queryStringOne = { "kkk", "222222" }; String[] queryStringTwo = { "99980", "222222" }; String[] fields = { "name", "level2" }; BooleanClause.Occur[] clauses = { BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD }; Query queryOne = MultiFieldQueryParser.parse(Version.LUCENE_30, queryStringOne, fields, clauses, myAnalyzer); Query queryTwo = MultiFieldQueryParser.parse(Version.LUCENE_30, queryStringTwo, fields, clauses, myAnalyzer); BooleanQuery booleanQuery = new BooleanQuery(); booleanQuery.add(queryOne, BooleanClause.Occur.MUST); booleanQuery.add(queryTwo, BooleanClause.Occur.MUST); IndexReader readerNow = searcher.getIndexReader(); if (!readerNow.isCurrent()) { searcher = new IndexSearcher(readerNow.reopen()); } System.out.println(searcher.maxDoc()); Sort sort = new Sort(); sort.setSort(new SortField("id", SortField.INT, true)); TopDocs topDocs = searcher.search(booleanQuery, null, 53, sort); // TopDocs topDocs = searcher.search(query, 50); for (ScoreDoc scoreDoc : topDocs.scoreDocs) { Document doc = searcher.doc(scoreDoc.doc); System.out.println("id:" + doc.get("id")); System.out.println("name:" + doc.get("name")); System.out.println("level3:" + doc.get("level3")); System.out.println("new field:" + doc.get("hehe")); } System.out.println("search cost:" + (System.currentTimeMillis() - begin) / 1000 + "s"); } @RequestMapping(value = "result.do", method = RequestMethod.GET) public void getAnalyzerResult() throws IOException { StringReader reader = new StringReader("爱国者mp3"); TokenStream ts = myAnalyzer.tokenStream("name", reader); ts.addAttribute(TermAttribute.class); while (ts.incrementToken()) { TermAttribute ta = ts.getAttribute(TermAttribute.class); System.out.println(ta.term()); } } }
?