luncene这个东西发现面向对象的功能非常强大,看看他的查找功能就知道他的设计精湛,title:"The Right Way" AND text:go 当初我在o/r mapping中很想实现这样的复杂的查找功能。在建立索引的时候这个表格应该是很重要的:
方法 | 切词 | 索引 | 存储 | 用途 |
---|---|---|---|---|
Field.Text(String name, String value) | Yes | Yes | Yes | 切分词索引并存储,比如:标题,内容字段 |
Field.Text(String name, Reader value) | Yes | Yes | No | 切分词索引不存储,比如:META信息, 不用于返回显示,但需要进行检索内容 |
Field.Keyword(String name, String value) | No | Yes | Yes | 不切分索引并存储,比如:日期字段 |
Field.UnIndexed(String name, String value) | No | No | Yes | 不索引,只存储,比如:文件路径 |
Field.UnStored(String name, String value) | Yes | Yes | No | 只全文索引,不存储 |
public static void deleteDocument(String indexPath, String keyId) throws LuceneException { IndexReader reader = null; try { reader = IndexReader.open(indexPath); for (int i = 0; i < reader.numDocs(); i++) { if (reader.isDeleted(i)) { continue; } Document doc = reader.document(i); if (doc.get("objectId").equals(keyId)) { reader.delete(i); break; } } } catch (IOException e) { throw new LuceneException(e.getMessage(), e); } finally { if (reader != null) { try { reader.close(); } catch (IOException e1) { e1.printStackTrace(); } } } }不过还好。现在基本上完成了对XML文档的搜索功能,下列函数主要是对XML的索引建立起重要作用。
public static void indexDocs(IndexWriter writer, File file) throws ParserConfigurationException, SAXException, IOException { if (file.isDirectory()) { String[] files = file.list(); for (int i = 0; i < files.length; i++) indexDocs(writer, new File(file, files[i])); } else { System.out.println("adding " + file); XMLDocumentHandlerSAX hdlr = new XMLDocumentHandlerSAX(file); writer.addDocument(hdlr.getDocument()); // For DOM, use // XMLDocumentHandlerDOM hdlr = new XMLDocumentHandlerDOM(); // writer.addDocument(hdlr.createXMLDocument(file)); } }通过修改XMLDocumentHandlerSAX类来实现自己需要的Index. 查找功能:
public static void search(String indexPath, String keyWords) throws LuceneException { Searcher searcher = null; try { searcher = new IndexSearcher(indexPath); Analyzer analyzer = new StandardAnalyzer(); Query query = QueryParser.parse("title:\"" + keyWords + "\" " + keyWords, "content", analyzer); System.out.println("Searching for: " + query.toString()); Hits hits = searcher.search(query); System.out.println(hits.length() + " total matching documents"); final int HITS_PER_PAGE = 10; for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) { int end = Math.min(hits.length(), start + HITS_PER_PAGE); for (int i = start; i < end; i++) { Document doc = hits.doc(i); String name = doc.get("objectId"); System.out.println(name); System.out.println(doc.get("author")); System.out.println(doc.get("title")); } } } catch (IOException e) { throw new LuceneException(e.getMessage(),e); } catch (ParseException e) { throw new LuceneException(e.getMessage(),e); }finally{ if(searcher!=null){ try { searcher.close(); } catch (IOException e1) { } } } }几个参考资源 下午要和整个新闻捏合在一块咯。
1 评论:
不错
发表评论