SimpleLucene – Lucene.net made easy

Posted by ben 23. October 2010 00:01

What is SimpleLucene?

SimpleLucene is an open source project designed to make it easier for you to wire up Lucene.net in your applications.

It started off as a test project whilst I was learning Lucene.net. I quickly found that there were very few complete examples of using Lucene.net in an application. Most just covered simple indexing and querying requirements.

SimpleLucene provides an extra layer of abstraction around common Lucene tasks such as creating an index, searching an index and maintaining an index. It’s been designed in such a way to make it easy to index your domain objects and return strongly typed results.

So to begin, head over to http://simplelucene.codeplex.com and download the latest source code.

Add a reference to both SimpleLucene.dll and Lucene.NET.dll.

Creating an Index

To create an index we need two things. An instance of IIndexWriter and an instance of IIndexDefinition<TEntity>.

IIndexWriter gives the index service the information it needs to create the index.

IIndexDefinition<TEntity> is how we tell Lucene.net how to index your entity.

So let’s start with a simple Product class:

    public class Product
    {
        public int Id { get; set; }
        public string Name { get; set; }
    }

Now let’s create an index definition:

    public class ProductIndexDefinition : IIndexDefinition<Product> {
        public Document Convert(Product p) {
            var document = new Document();
            document.Add(new Field("id", p.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("name", p.Name, Field.Store.YES, Field.Index.ANALYZED));
            return document;
        }

        public Term GetIndex(Product p) {
            return new Term("id", p.Id.ToString());
        }
    }

Although SimpleLucene does do a lot of the work for you, you still need to have some understanding of Lucene objects in order to create your definitions and construct your queries.

In this definition we tell SimpleLucene how to create a Lucene document from an instance of our Product class.

The GetTerm method should return the unique identifier of your entity. This is required so we can handle deletions.

Although you can create your own IIndexWriter, we have already created a DirectoryIndexWriter writer for writing a standard Lucene index. It takes two parameters:

  1. indexLocation – a System.IO.DirectoryInfo object specifying where you wish to create the index
  2. createIndex – boolean value indicating whether to create the index if it does not exist.

In this case we do want to create the index so we create our writer like so:

	var writer = new DirectoryIndexWriter(
		new DirectoryInfo(@"c:\index"), true);

Now we need some products to index. I’m using a static list of products but the same techniques would apply if you retrieving your entities from a database.

    public class Repository
    {
        public IList<Product> Products {
            get {
                return new List<Product> {
                    new Product { Id = 1, Name = "Football" },
                    new Product { Id = 2, Name = "Coffee Cup"},
                    new Product { Id = 3, Name = "Nike Trainers"},
                    new Product { Id = 4, Name = "Apple iPod Nano"},
                    new Product { Id = 5, Name = "Asus eeePC"},
                };
            }
        }
    }

SimpleLucene comes with a default IndexService and SearchService, both implementing IIndexService and ISearchService respectively. These default implementations will suit most people but you can always roll your own if you want.

So finally, to index our products run the following code:

var writer = new DirectoryIndexWriter(
	new DirectoryInfo(@"c:\index"), true);

var service = new IndexService();
service.IndexEntities(writer, Repository().Products, ProductIndexDefinition());

Here we pass our index writer, list of products and index definition to the index service.

After running the above you should find an index has been created in the specified directory.

Lucene Luke

Luke is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display and modify their content in several ways.

You can download Luke from http://code.google.com/p/luke/. Luke effectively allows you to open a lucene index and view it’s contents. Very handy for making sure your data has been indexed as expected.  You can also search the index and make optimizations. For a full list of features check out the project website.

image

Searching your index

So now we have an index created, how do we search it?

To perform a search you first need an IIndexSearcher. IIndexSearcher gives the SearchService the information it needs to search an index.

Again you can roll your own, but we have provided a DirectoryIndexSearcher to save you the trouble. Let’s new one up:

	var searcher = new DirectoryIndexSearcher(
		new DirectoryInfo(@"c:\index"), true);

This is very similar to how we create a writer. Again we pass in the location of the index and a boolean value indicating whether to open the index in read-only mode (this should normally be true).

To search the index we need to create a Lucene query object. There are a number of different Query types you can use. In this example we will use a TermQuery.

var query = new TermQuery(new Term("name", "Football"));

Here we simply specify the field we want to search on and a search term.

The default SearchService has two methods you can use for searching. The first method returns a standard list of Lucene documents. The second, allows you to return a strongly typed collection based using a converter that you pass in.

So to query our index and return a list of Products we use the following:

var searcher = new DirectoryIndexSearcher(
                new DirectoryInfo(@"c:\index"), true);

	var query = new TermQuery(new Term("name", "Football"));

	var searchService = new SearchService();

	Func<Document, ProductSearchResult> converter = (doc) => {
		return new ProductSearchResult {
			Id = int.Parse(doc.GetValues("id")[0]),
			Name = doc.GetValues("name")[0]
		};
	};

	IList<Product> results = searchService.SearchIndex(searcher, query, converter);

Constructing Complex Queries

In most “real world” cases, your queries will be more complex than this.

SimpleLucene provides a base query class “QueryBase” you can use to chain together queries. An example implementation is shown below:

    public class ProductQuery : QueryBase
    {
        public ProductQuery(Query query) : base(query) { }

        public ProductQuery() { }

        public ProductQuery WithKeywords(string keywords)
        {
            if (!string.IsNullOrEmpty(keywords))
            {
                string[] fields = { "name", "description", "sku" };
                var parser = new MultiFieldQueryParser(Version.LUCENE_29,
                    fields, new StandardAnalyzer(Version.LUCENE_29));
                Query multiQuery = parser.Parse(keywords);

                this.AddQuery(multiQuery);
            }
            return this;
        }

        public ProductQuery WithCategory(int categoryId)
        {
            if (categoryId > 0) {
                this.AddQuery(TermQuery("category", categoryId.ToString()));
            }
            return this;
        }
	}

You can then construct your query like so:

	var query = new ProductQuery()
			.WithKeywords(keywords)
			.WithCategory(categoryId);

	var results = searchService.SearchIndex(searcher, query.Query, converter);

That covers the basics. In the next article I will cover how to maintain your index using SimpleLucene’s index management features.

Tags: ,
Categories SimpleLucene | C# | Development

blog comments powered by Disqus