Lucene教程 之 索引和搜索示例

学习使用Apache Lucene 6索引和搜索文档。Lucene被许多不同的现代搜索平台(例如Apache Solr和ElasticSearch)或爬网平台(例如Apache Nutch)用于数据索引和搜索。

目录

Lucene Maven依赖
Lucene编写索引示例
Lucene搜索示例
下载源代码

Lucene Maven依赖性

一旦你在Eclipse中创建Maven项目,包括按照Lucene的依赖关系pom.xml。我正在使用Lucene 6.6.0版本。

<properties>
	<lucene.version>6.6.0</lucene.version>
</properties>

<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-core</artifactId>
	<version>${lucene.version}</version>
</dependency>
<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-analyzers-common</artifactId>
	<version>${lucene.version}</version>
</dependency>
<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-queryparser</artifactId>
	<version>${lucene.version}</version>
</dependency>

Lucene写索引示例

创建IndexWriter

org.apache.lucene.index.IndexWriter类提供创建和管理索引的功能。它的构造函数带有两个参数:FSDirectoryIndexWriterConfig。请注意,创建编写器后,无法将给定的配置实例传递给另一个编写器。

private static IndexWriter createWriter() throws IOException 
{
	FSDirectory dir = FSDirectory.open(Paths.get(INDEX_DIR));
	IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());
	IndexWriter writer = new IndexWriter(dir, config);
	return writer;
}

建立文件

org.apache.lucene.document.Document 类表示Lucene索引的文档。

private static Document createDocument(Integer id, String firstName, String lastName, String website) 
{
	Document document = new Document();
	document.add(new StringField("id", id.toString() , Field.Store.YES));
	document.add(new TextField("firstName", firstName , Field.Store.YES));
	document.add(new TextField("lastName", lastName , Field.Store.YES));
	document.add(new TextField("website", website , Field.Store.YES));
	return document;
}

将文档写到索引

若要将lucene文件写入index,请使用IndexWriter.addDocuments(documents)方法。

package com.how2codex.demo.lucene;

import java.io.IOException;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.FSDirectory;

public class LuceneWriteIndexExample 
{
	private static final String INDEX_DIR = "c:/temp/lucene6index";

	public static void main(String[] args) throws Exception 
	{
		IndexWriter writer = createWriter();
		List<Document> documents = new ArrayList<>();
		
		Document document1 = createDocument(1, "Lokesh", "Gupta", "how2codex.com");
		documents.add(document1);
		
		Document document2 = createDocument(2, "Brian", "Schultz", "example.com");
		documents.add(document2);
		
		//Let's clean everything first
		writer.deleteAll();
		
		writer.addDocuments(documents);
		writer.commit();
	    writer.close();
	}

	private static Document createDocument(Integer id, String firstName, String lastName, String website) 
	{
    	Document document = new Document();
    	document.add(new StringField("id", id.toString() , Field.Store.YES));
    	document.add(new TextField("firstName", firstName , Field.Store.YES));
    	document.add(new TextField("lastName", lastName , Field.Store.YES));
    	document.add(new TextField("website", website , Field.Store.YES));
    	return document;
    }

	private static IndexWriter createWriter() throws IOException 
	{
		FSDirectory dir = FSDirectory.open(Paths.get(INDEX_DIR));
		IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());
		IndexWriter writer = new IndexWriter(dir, config);
		return writer;
	}
}

在计算机上执行上述代码后,您将看到在配置的文件夹路径中创建的Lucene索引。

计算机中的Lucene索引
计算机中的Lucene索引

Lucene搜索示例

创建IndexSearcher

org.apache.lucene.search.IndexSearcher用于从索引中搜索lucene文档。它需要一个参数Directory,它指向索引文件夹。

private static IndexSearcher createSearcher() throws IOException 
{
	Directory dir = FSDirectory.open(Paths.get(INDEX_DIR));
	IndexReader reader = DirectoryReader.open(dir);
	IndexSearcher searcher = new IndexSearcher(reader);
	return searcher;
}

搜索Lucene文档

搜索lucene文档,您需要org.apache.lucene.search.Query使用org.apache.lucene.queryparser.classic.QueryParserclass 创建实例。IndexSearcher.seach(Query)返回org.apache.lucene.search.TopDocs代表查询结果的结果。

IndexSearcher searcher = createSearcher();

TopDocs foundDocs = searchById(1, searcher);

private static TopDocs searchById(Integer id, IndexSearcher searcher) throws Exception
{
	QueryParser qp = new QueryParser("id", new StandardAnalyzer());
	Query idQuery = qp.parse(id.toString());
	TopDocs hits = searcher.search(idQuery, 10);
	return hits;
}

搜索索引示例

package com.how2codex.demo.lucene;

import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class LuceneReadIndexExample 
{
	private static final String INDEX_DIR = "c:/temp/lucene6index";

	public static void main(String[] args) throws Exception 
	{
		IndexSearcher searcher = createSearcher();
		
		//Search by ID
		TopDocs foundDocs = searchById(1, searcher);
		
		System.out.println("Total Results :: " + foundDocs.totalHits);
		
		for (ScoreDoc sd : foundDocs.scoreDocs) 
		{
			Document d = searcher.doc(sd.doc);
			System.out.println(String.format(d.get("firstName")));
		}
		
		//Search by firstName
		TopDocs foundDocs2 = searchByFirstName("Brian", searcher);
		
		System.out.println("Total Results :: " + foundDocs2.totalHits);
		
		for (ScoreDoc sd : foundDocs2.scoreDocs) 
		{
			Document d = searcher.doc(sd.doc);
			System.out.println(String.format(d.get("id")));
		}
	}
	
	private static TopDocs searchByFirstName(String firstName, IndexSearcher searcher) throws Exception
	{
		QueryParser qp = new QueryParser("firstName", new StandardAnalyzer());
		Query firstNameQuery = qp.parse(firstName);
		TopDocs hits = searcher.search(firstNameQuery, 10);
		return hits;
	}

	private static TopDocs searchById(Integer id, IndexSearcher searcher) throws Exception
	{
		QueryParser qp = new QueryParser("id", new StandardAnalyzer());
		Query idQuery = qp.parse(id.toString());
		TopDocs hits = searcher.search(idQuery, 10);
		return hits;
	}

	private static IndexSearcher createSearcher() throws IOException {
		Directory dir = FSDirectory.open(Paths.get(INDEX_DIR));
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher searcher = new IndexSearcher(reader);
		return searcher;
	}
}

输出:

总结果:: 1

总结果:: 1
2

下载源代码

使用下面的给定链接下载本教程的源代码。

saigon has written 1440 articles

Leave a Reply