Searching with a Lucene.NET wrapper
For a project we’re currently working on we needed to be able to search on different fields that are shown to the end user. We didn’t want to rely fully on the SQL server Full Text Search capabilities. Luckily I knew the Lucene.NET engine from some previous work with the Umbraco CMS.
We just wanted to receive from our search engine the ID’s of the objects where the search term in was found rather then complete objects. When working with a database it just doesn’t feel right to store all your data on 2 different places (the database and the search index store). We only wanted to add the fields where we would search on to the search index and let Lucene decide the match, afterwards we’ll pull the objects from the database using Entity Framework.
I’ve followed a few of those blog posts I could find. Off course these posts give you a starting point and for the simplicity all code is written in one or two classes. After the first implementation I started with some refactoring’s and decided it would be easier for future implementations if I created my own search DLL or package.
At the bottom you’ll find the link to the Github repository where you can browse and download the package.
If you want to be able to search in Lucene you’ll have to add the parameter to a Lucene document. To simplify the creation of such documents I’ve created an abstract class ADocument where you have to inherit from.
You’ll see that I’ve added the public property “Id” so that every class that will inherit should have that Id. In the setter of the ID you’ll see I add the field to the Lucene.Net Document with the AddParameterToDocument method.
Next to the private AddParameterToDocument method you’ll find two other methods, one with the possibility to store the parameter in the Lucene index and one to just analyze the parameter but not to store it in the Lucene index.
You’ll see in the ADocument class that I’ve decorated the “Id” with the “SearchField” attribute. This attribute was created to simplify the search on multiple fields that you’ll see when implementing the BaseSearcher. You can find the implementation of this attribute below.
Both the writer as the searcher have to have access to the index that is written to the local file system. To avoid multiple implementations (check if the directory exists, load a FSDirectory object, …) I’ve created a BaseSearch class. Not much to see in this class, just some basic settings of the folder where the index is stored.
before we can search we’ll have to add some items into the search index off course. Following the DRY principle we’ve created a new abstract base class, the “BaseWriter”. In this class we added the methods to add and update new or existing item(s) of the type of ADocument. Next to adding and updating we added the corresponding delete methods. Because we delete everything based on the Id property of the ADocument implementation it’s important its value is always set in the deriving class.
NOTE: the log messages are added to be used with Log4Net.
Adding items to the Lucene index wasn’t that hard. When I first started to implement the search methods I wanted to be able to search on a specific property or on all properties available on that object.
To be able to search on all properties you’ll have to user a MultifieldQueryParser instead of the default QueryParser. But with the MultifieldQueryParser you have to enter all the fields where Lucene have to search on. I really didn’t wanted to have to implement the same search code for every type we have to add to the index.
We’ll no choice then to turn to reflection to fetch all the parameters. But the class could have more properties then the one we are searching on. To avoid these extra properties (and the errors because we’ll be searching on properties that are not indexed) I’ve added the SearchField attribute.
So the first thing we’ll do when searching is to fetch all properties from the class and add them in a list. By using the T parameter we can easily reuse the same method for different classes as long they inherit from ADocument.
With this list we can now implement the actual searching. If we search on a specific field we’ll use the default QueryParser and if we are searching on more then one field, the MultifieldQueryParser.
When you look into the code, you’ll see that even when searching on a specific field, there’s still a possibility that the MultiFieldQueryParser is used. I’ve added the option to search on multiple parameters in the Lucene index if they are related to each other.
For example: you have a registration number for each person that always starts with the year of registration, a number (sequence) and a suffix: 2014-0023-aaa. When you add this complete registration number to the index and you search on 2014 (without wildcards) the Lucene engine will return no results. To avoid that the end user have to use wildcards you can store the registration number in three different parts. But when you want to search on the field RegistrationNumber you’ll have to indicate that multiple fields have to be used.
Therefor can the SearchField parameter contain a array of strings that contain the other searchfields that have to be taken into account. (confused, see the TestApp project on github)
The search term that the end user enters has to be translated to a Lucene.NET Query. There a re build in methods to convert a string to a Query object. Although the method can throw a ParseException if invalid characters are used. Therefor I added a private ParseQuery method to catch those exceptions and to filter out the invalid characters.
Because we only want to return the Ids of the objects we’re searching for we can use a generic SearchResult class to return the results. The amount of hits and the search term are added to the result to be shown in the UI.
The above classes (except a custom exception) are all the parts we need to start testing our search wrapper. On the Github repository you’ll find a TestApp project in the solution where the classes below are implemented.
The example used is of a Person that will register for a service. For simplicity sake local classes are used as repository instead of a database but you’ll get the point.
We”’ll start with the Person class that is used in some application. All default stuff and an override for the ToString method to print out Person class.
It are these Person classes we want to add to our search index. To add these we have to create a PersonDocument class that implements from our ADocument abstract class. We”’ll add all properties with private backing fields so we can call the ‘AddParameterToDocument’ methods. Next to adding the property to the index we’ll have to decorate the properties we want to search on. (See the RegistrationString property where we added multiple SearchFields)
At the bottom I added a operator method to cast the person object to a PersonDocument object so I don’t have to repeat the cast in the business logic.
The PersonWriter class will be not that difficult to implement now our PersonDocument class is defined. We just have to add the add and update methods and the delete methods that will call the base class methods.
Also the PersonSearcher class will not be that difficult with the Search method implemented in the BaseSeacher class.
In the program.cs file you’ll find the creation of the multiple Person objects an how they are added to the index. Underneath you’ll find the different search methods and their results.
Although Lucene.NET has far more options then showed here in this blog post, will the created wrapper at least give you the basic search possibilities. Off course can you extend the base classes to add more search options, index options etc.
With the basic settings in the Document class that inherits from ADocument you’ll avoid to create numerous searchers or indexers.
All source code can be found on GitHub and feel free to fork or download.