阿菜ing: 上午完成了lucene对XML的搜索功能.

luncene这个东西发现面向对象的功能非常强大，看看他的查找功能就知道他的设计精湛，title:"The Right Way" AND text:go 当初我在o/r mapping中很想实现这样的复杂的查找功能。在建立索引的时候这个表格应该是很重要的：

方法	切词	索引	存储	用途
Field.Text(String name, String value)	Yes	Yes	Yes	切分词索引并存储，比如：标题，内容字段
Field.Text(String name, Reader value)	Yes	Yes	No	切分词索引不存储，比如：META信息，不用于返回显示，但需要进行检索内容
Field.Keyword(String name, String value)	No	Yes	Yes	不切分索引并存储，比如：日期字段
Field.UnIndexed(String name, String value)	No	No	Yes	不索引，只存储，比如：文件路径
Field.UnStored(String name, String value)	Yes	Yes	No	只全文索引，不存储

在建立索引的时候更新起来很困难。按照offical上面的解释是先删除在建立就是update，^_^.而且删除不是很方便,我这里的删除感觉很市别扭：

public static void deleteDocument(String indexPath, String keyId)
        throws LuceneException {
        IndexReader reader = null;

        try {
            reader = IndexReader.open(indexPath);

            for (int i = 0; i < reader.numDocs(); i++) {
                if (reader.isDeleted(i)) {
                    continue;
                }

                Document doc = reader.document(i);

                if (doc.get("objectId").equals(keyId)) {
                    reader.delete(i);

                    break;
                }
            }
        } catch (IOException e) {
            throw new LuceneException(e.getMessage(), e);
        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e1) {
                    e1.printStackTrace();
                }
            }
        }
    }

不过还好。现在基本上完成了对XML文档的搜索功能,下列函数主要是对XML的索引建立起重要作用。

public static void indexDocs(IndexWriter writer, File file)
        throws ParserConfigurationException, SAXException, IOException {
        if (file.isDirectory()) {
            String[] files = file.list();

            for (int i = 0; i < files.length; i++)
                indexDocs(writer, new File(file, files[i]));
        } else {
            System.out.println("adding " + file);

            XMLDocumentHandlerSAX hdlr = new XMLDocumentHandlerSAX(file);
            writer.addDocument(hdlr.getDocument());

            // For DOM, use
            // XMLDocumentHandlerDOM hdlr = new XMLDocumentHandlerDOM();
            // writer.addDocument(hdlr.createXMLDocument(file));
        }
    }

通过修改XMLDocumentHandlerSAX类来实现自己需要的Index. 查找功能：

public static void search(String indexPath, String keyWords) throws LuceneException {
        Searcher searcher = null;

        try {
            searcher = new IndexSearcher(indexPath);

            Analyzer analyzer = new StandardAnalyzer();

            Query query = QueryParser.parse("title:\"" + keyWords + "\"  " +
                    keyWords, "content", analyzer);
            System.out.println("Searching for: " + query.toString());

            Hits hits = searcher.search(query);
            System.out.println(hits.length() + " total matching documents");

            final int HITS_PER_PAGE = 10;

            for (int start = 0; start < hits.length();
                    start += HITS_PER_PAGE) {
                int end = Math.min(hits.length(), start + HITS_PER_PAGE);

                for (int i = start; i < end; i++) {
                    Document doc = hits.doc(i);
                    String name = doc.get("objectId");
                    System.out.println(name);
                    System.out.println(doc.get("author"));
                    System.out.println(doc.get("title"));
                }
            }

           
        } catch (IOException e) {
            throw new LuceneException(e.getMessage(),e);
        } catch (ParseException e) {
			throw new LuceneException(e.getMessage(),e);
		}finally{
			if(searcher!=null){
				try {
					searcher.close();
				} catch (IOException e1) {
				}
			}
		}
    }

几个参考资源

下午要和整个新闻捏合在一块咯。

阿菜ing

上午完成了lucene对XML的搜索功能.

1 评论:

标签

存档