JSPWiki 支持附件文件内容搜索_JAVA_编程开发_程序员俱乐部

中国优秀的程序员网站程序员频道CXYCLUB技术地图
热搜:
更多>>
 
您所在的位置: 程序员俱乐部 > 编程开发 > JAVA > JSPWiki 支持附件文件内容搜索

JSPWiki 支持附件文件内容搜索

 2012/9/8 11:52:11  squll369  程序员俱乐部  我要评论(0)
  • 摘要:最近项目组要搭一个wiki,经过筛选我们决定使用JSPWiki(网上有大量的分析),待搭完以后,发现他不支持附件文件内容搜索,也就是说,如果这篇wiki中上传了一些doc,xls等文件是不能被搜索到的,但是在jspwiki.properties配置中有如下配置:jspwiki.searchProvider=LuceneSearchProviderjspwiki.lucene.analyzer=org.apache.lucene.analysis.standard
  • 标签:文件 内容 JS

?? 最近项目组要搭一个wiki,经过筛选我们决定使用JSPWiki(网上有大量的分析),待搭完以后,发现他不支持附件文件内容搜索,也就是说,如果这篇wiki中上传了一些doc,xls等文件是不能被搜索到的,但是在jspwiki.properties配置中有如下配置:

?

jspwiki.searchProvider =LuceneSearchProvider
jspwiki.lucene.analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer

?

也就是JSPWiki也是用lucene来做检索的,于是下了他的源码,看了下这个类com.ecyrd.jspwiki.search.LuceneSearchProvider,发现了如下方法:

?

protected String getAttachmentContent( Attachment att )
    {
        AttachmentManager mgr = m_engine.getAttachmentManager();
        //FIXME: Add attachment plugin structure

        String filename = att.getFileName();

        if(filename.endsWith(".txt") ||
           filename.endsWith(".xml") ||
           filename.endsWith(".ini") ||
           filename.endsWith(".html"))
        {
            InputStream attStream;

            try
            {
                attStream = mgr.getAttachmentStream( att );

                StringWriter sout = new StringWriter();
                FileUtil.copyContents( new InputStreamReader(attStream), sout );

                attStream.close();
                sout.close();

                return sout.toString();
            }
            catch (ProviderException e)
            {
                log.error("Attachment cannot be loaded", e);
                return null;
            }
            catch (IOException e)
            {
                log.error("Attachment cannot be loaded", e);
                return null;
            }
        } 
......

?

就是说支持文本文件附件(txt,xml,ini,html)的内容搜索,试了一下,上传了一个txt文件,果然是可以被查出来,。

?

于是决定给这个类动动手术,添加点功能让它可以支持doc和xls, 添加了如下代码在下面(office 2003到2007,文件格式不同,分开来写了),然后重新打包(打好的jar,我放在附件里),启动JSPWiki, 实验了一下,word 和 excel的文件可以被查出来了。

?

注意的是,这个类的作用是,在附件文件上传,用lucene建立了索引,所以实验的话,一定要重新上传文件,在改这个class之前上传的文件时没有用的。

?

......       
     else if(filename.endsWith(".doc")){
            InputStream attStream = null;
            try {
                attStream = mgr.getAttachmentStream(att);             
                WordExtractor extractor = new WordExtractor(attStream);  
                String s = extractor.getText();  
                log.debug("Extracted text: " + s + " from attachment: " + filename);
                return s;              
            } catch (Exception e) {
                log.error("Attachment cannot be loaded", e);
                return null;
            } finally {
                if(attStream != null){
                    try {
                      attStream.close();
                  }
                  catch (IOException e) {
                      log.warn("Couldn't close attachment stream for " + filename, e);
                  }
                }           
            }
        }
        
        else if(filename.endsWith(".docx")){
            InputStream attStream = null;
            try {
                attStream = mgr.getAttachmentStream(att);
                XWPFWordExtractor extractor = new XWPFWordExtractor(new XWPFDocument(attStream));  
                String s = extractor.getText();  
                log.debug("Extracted text: " + s + " from attachment: " + filename);
                return s;              
            } catch (Exception e) {
                log.error("Attachment cannot be loaded", e);
                return null;
            } finally {
                if(attStream != null){
                    try {
                      attStream.close();
                  }
                  catch (IOException e) {
                      log.warn("Couldn't close attachment stream for " + filename, e);
                  }
                }           
            }
        }
        
        else if(filename.endsWith(".xls")){
            InputStream attStream = null; 
            try {
                attStream = mgr.getAttachmentStream(att);
                HSSFWorkbook workbook=new HSSFWorkbook(attStream);
                HSSFSheet sheet=null;
                StringBuffer sb = new StringBuffer();
                for(int i = 0; i < workbook.getNumberOfSheets(); i++) {
                    sheet=workbook.getSheetAt(i);
                    if(sheet == null){
                        continue; 
                    }
                    for (int j = 0; j < sheet.getPhysicalNumberOfRows(); j++) {
                        HSSFRow row=sheet.getRow(j);
                        if(row == null){
                            continue; 
                        }
                        for (int k = 0; k < row.getLastCellNum(); k++) {
                            sb.append(row.getCell(k));   
                            sb.append(" ");
                        }                  
                    }    
                }
                String s = sb.toString();
                log.debug("Extracted text: " + s + " from attachment: " + filename);
                return s; 
                
            } catch (Exception e) {
                log.error("Attachment cannot be loaded", e);
                return null;
            } finally {
                if(attStream != null){
                    try {
                      attStream.close();
                  }
                  catch (IOException e) {
                      log.warn("Couldn't close attachment stream for " + filename, e);
                  }
                }           
            }
        }
        else if(filename.endsWith(".xlsx")){
            InputStream attStream = null; 
            try {
                attStream = mgr.getAttachmentStream(att);
                XSSFWorkbook workbook = new XSSFWorkbook(attStream);
                XSSFSheet sheet=null;
                StringBuffer sb = new StringBuffer();
                for(int i = 0; i < workbook.getNumberOfSheets(); i++) {
                    sheet = workbook.getSheetAt(i);
                    if(sheet == null){
                        continue; 
                    }
                    for (int j = 0; j < sheet.getPhysicalNumberOfRows(); j++) {
                        XSSFRow row=sheet.getRow(j);
     
                        if(row == null){
                            continue; 
                        }
                        for (int k = 0; k < row.getLastCellNum(); k++) {
                            sb.append(row.getCell(k)); 
                            sb.append(" ");
                        }                  
                    }    
                }
                String s = sb.toString();
                log.debug("Extracted text: " + s + " from attachment: " + filename);
                return s; 
                
            } catch (Exception e) {
                log.error("Attachment cannot be loaded", e);
                return null;
            } finally {
                if(attStream != null){
                    try {
                      attStream.close();
                  }
                  catch (IOException e) {
                      log.warn("Couldn't close attachment stream for " + filename, e);
                  }
                }           
            }
        }    
......
?

?

?

  • JSPWiki.jar (1019.4 KB)
  • 下载次数: 0
发表评论
用户名: 匿名