The pairing of Solr and Spark was a popular topic at the All Things Open 2016 conference. These tools re-index content in real time based on what people search for, how they search, and what keywords or products they use.
When used together, Solr and Spark can boost a website's search terms and product recommendations, and therefore increase its number of visitors and conversions.
Solr is an enterprise search appliance that creates a software-based index of a site’s content. Spark is a big data tool for massively paralyzing, i.e. running parallel jobs against a cluster of machines to ask certain kinds of questions and produce certain kinds of answers. Combining these two technologies allows a site to provide visitors with the most relevant results for internal site searches, or to display links for the most popular articles or products.
These functions are possible because Spark processes visitors’ activity data and simultaneously sends the information back to Solr as relevancy indices, creating a popularity catalog between the two programs that is updated in real time.
This means that using Solr with Spark makes it actually possible to empower front-end GUIs, so certain keywords or links can be considered trending or popular depending on how many times they’ve been searched for or acted upon by past visitors.
For example, if the software realizes most people searching for soccer shin guards end up clicking on the seventh search result, the relevancy of this link will be automatically brought up for future visitors. It’s like AB testing on steroids.
The team that built Solr also created a plugin for the program that provides Spark with native dataframes. Thanks to this plugin, very little translation is needed between the two for one to bring in data from the other and vice versa. That’s why these two programs work so well together.
Spark has other benefits to the Solr software as well. For instance, Spark can be used to update Solr indexes in parallel, so content search results stay fresh.
Also, a Spark cluster can be scaled up and down as needed during rapid re-indexing. This compatibility also comes in handy for offline computations of pagerank, popularity, or trends.
The uses of Solr and Spark go beyond just web signals. Email trackers, UI trackers inside of ads, or any other data that can be gathered as signals provide other options for developers. One such alternative use is tracking clicks made by recipients of email marketing messages to influence which products or search results are presented in future communications and remarketing efforts.
There isn’t much word yet as to how Solr and Spark’s partnership could account for bot searches and clicks, but, for now, the technology can still be very useful to developers looking to create more interactivity between front and back ends of web applications.
The Lucidworks team has put up a demo repo on Github so you can test drive Solr and Spark running together. What's even better, the app uses Flask as the front-end to route traffic in and out of the solution.