Lucene Solr Revolution 2013 Presentations

Day 1 Presentations

GO TO DAY 2
Click on the slide thumbnail to download the presentation


KEYNOTE: Search is not a solved problem

Presented by Hilary Mason, Chief Scientist, bitly

Watch session video.

Language support and linguistics in Lucene/Solr and its eco-system
Presented by Christian Moen & Gaute Lambertsen, Atilika Inc.

In search, language handling is often key to getting a good search experience. This talk gives an overview of language handling and linguistics functionality in Lucene/Solr and best-practices for using them to handle Western, Asian and multi-language deployments. Pointers and references within the open source and commercial eco-systems for more advanced linguistics and their applications are also discussed.

The presentation is mix of overview and hands-on best-practices the audience can benefit immediately from in their Lucene/Solr deployments. The eco-system part is meant to inspire how more advanced functionality can be developed by means of the available open source technologies within the Apache eco-system (predominantly) while also highlighting some of the commercial options available.

Video & slides are available here.

State Decoded: Empowering The Masses with Open Source State Law Search
Presented by Doug Turnbull, Search and Big Data Architect, OpenSource Connections

The Law has traditionally been a topic dominated by an elite group of experts. Watch how State Decoded has transformed the law from a scary, academic topic to a friendly resource that empowers everyone using Apache Solr. This talk is a call to action for discovery and design to break open ivory towers of expertise by baking rich discovery into your UI and data structures.

Video &  slides are available here.

Writing Custom Queries: Scorers’ Diversity and Traps
Presented by Mikhail Khludnev, Grid Dynamics

Lucene has number of built-in queries, but sometimes developer needs to write own queries that might be challenging. We’ll start from the basics: learn how Lucene searches, look into few build-in queries implementations, and learn two basic approaches for query evaluation. Then I share experience which my team got when built one eCommerce Search platform, we’ll look at sample custom query or even a few ones, and talk about potential problems and caveats on that way.

Video & slides are available here.

 


Building a Real-time, Big Data Analytics Platform with Solr
Presented by Trey Grainger, Search Technology Development Manager, CareerBuilder

Having “big data” is great, but turning that data into actionable intelligence is where the real value lies. This talk will demonstrate how you can use Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.

At CareerBuilder, we utilize these techniques to report the supply and demand of the labor force, compensation trends, customer performance metrics, and many live internal platform analytics. You will walk away from this talk with an advanced understanding of faceting, including pivot-faceting, geo/radius faceting, time-series faceting, function faceting, and multi-select faceting. You’ll also get a sneak peak at some new faceting capabilities just wrapping up development including distributed pivot facets and percentile/stats faceting, which will be open-sourced.

The presentation will be a technical tutorial, along with real-world use-cases and data visualizations. After this talk, you'll never see Solr as just a text search engine again.

Videos & Slides are available here.

Personalized Search on the Largest Flash Sale Site in America
Presented by Adrian Trenaman, Senior Software Engineer, Gilt Groupe

Gilt Groupe is an innovative online shopping destination offering its members special access to the most inspiring merchandise, culinary offerings, and experiences every day, many at insider prices. Every day new merchandising is offered for sale at discounts of up to 70%. Sales start at 12 noon EST resulting in an avalanche of hits to the site, so delivering a rich user experience requires substantial technical innovation.

Implementing search for a flash-sales business, where inventory is limited and changes rapidly as our sales go live to a stampede of members every noon, poses a number of technical challenges. For example, with small numbers of fast moving inventory we want to be sure that search results reflect those products we still have available for sale. Also, personalizing search – where search listings may contain exclusive items that are available only to certain users – was also a big challenge

Gilt has built out keyword search using Scala, Play Framework and Apache Solr / Lucene. The solution, which involves less than 4,000 lines of code, comfortably provides search results to members in under 40ms. In this talk, we'll give a tour of the logical and physical architecture of the solution, the approach to schema definition for the search index, and how we use custom filters to perform personalization and enforce product availability windows. We'll discuss lessons learnt, and describe how we plan to adopt Solr to power sale, brand, category and search listings throughout all of Gilt's estate.

Video & slides are available here.

Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Presented by John Berryman, Search Architect, Opensource Connections

In a recent project with the US Patent and Trademark Office, Opensource Connections was asked to prototype the next generation of patent search - using Solr and Lucene. An important aspect of this project was the implementation of BRS, a specialize search syntax used by patent examiners during the examination process.

In this fast paced session we will relate our experiences and describe how we used a combination of Parboiled (a Parser Expression Grammar [PEG] parser), Lucene Queries and SpanQueries, and an extension of Solr's QParserPlugin to build BRS search functionality in Solr. First we will characterize the patent search problem and then define the BRS syntax itself. We will then introduce the Parboiled parser and discuss various considerations that one must make when designing a syntax parser.

Following this we will describe the methodology used to implement the search functionality in Lucene/Solr. Finally, we will include an overview our syntactic and semantic testing strategies. The audience will leave this session with an understanding of how Solr, Lucene, and Parboiled may be used to implement their own custom search parser.

Video & slides are available here.

 


Analytics in OLAP with Lucene and Hadoop
Presented by Dragan Milosevic, Senior Architect, zanox

Analytics powered by Hadoop is powerful tool and this talk addresses its application in OLAP built on top of Lucene. Many applications use Lucene indexes also for storing data to alleviate challenges concerned with external data sources. The analyses of queries can reveal stored fields that are in most cases accessed together. If one binary compressed field replaces those fields, amount of data to be loaded is reduced and processing of queries is boosted. Furthermore, documents that are frequently loaded together can be identified. If those documents are saved in almost successive positions in Lucene stored files, benefits from file-system caches are improved and loading of documents is noticeably faster.

Large-scale searching applications typically deploy sharding and partition documents by hashing. The implemented OLAP has shown that such hash-based partitioning is not always an optimal one. An alternative partitioning, supported by analytics, has been developed. It places documents that are frequently used together in same shards, which maximizes the amount of work that can be locally done and reduces the communication overhead among searchers. As an extra bonus, it also identifies slow queries that typically point to emerging trends, and suggests the addition of optimized searchers for handling similar queries.

Video & Slides are now available here.

 


Make your GUI Shine with AJAX-Solr
Presented by Troy Thomas, Senior Manager, Internet Enabled Services, Synopsys & Koorosh Vakhshoori, Software Architect,Synopsys

With AJAX-Solr, you can implement widgets like faceting, auto-complete, spellchecker and pagination quickly and elegantly. AJAX-Solr is a JavaScript library that uses the Solr REST-like API to display search results in an interactive user interface. Come learn why we chose AJAX-Solr and Solr 4 for the SolvNet search project. Get an overview of the AJAX-Solr framework (Manager, Parameters, Widgets and Theming). Get a deeper understanding of the technical concepts using real-world examples. Best practices and lessons learned will also be presented.

Video & slides are available here.

CMS Integration of Apache Solr - How we did it.
Presented by Ingo Renner, Software Engineer, Infield Design

TYPO3 is an Open Source Content Management System that is very popular in Europe, especially in the German market, and gaining traction in the U.S., too.

TYPO3 is a good example of how to integrate Solr with a CMS. The challenges we faced are typical of any CMS integration. We came up with solutions and ideas to these challenges and our hope is that they might be of help for other CMS integrations as well.

That includes content indexing, file indexing, keeping track of content changes, handling multi-language sites, search and facetting, access restrictions, result presentation, and how to keep all these things flexible and re-usable for many different sites.

For all these things we used a couple additional Apache projects and we would like to show how we use them and how we contributed back to them while building our Solr integration.

Video & slides are available here.

From Text to Truth: Real-World Facets for Multilingual Search
Presented by Benson Margulies, Executive Vice President and Chief Technology Officer, Basis Technology

Solr's ability to facet search results gives end-users a valuable way to drill down to what they want. But for unstructured documents, deriving facets such as the persons mentioned requires advanced analytics. Even if names can be extracted from documents, the user doesn't want a "George Bush" facet that intermingles documents mentioning either the 41st and 43rd U.S. Presidents, nor does she want separate facets for "George W. Bush" or even "乔治·沃克·布什" (a Chinese translation) that are limited to just one string. We'll explore the benefits and challenges of empowering Solr users with real-world facets.

Video & slides are available here.

 


Batch Indexing and Near Real Time, keeping things fast
Presented by Marc Sturlese, Architect, Backend engineer, Trovit

In this talk I will explain how we combine a mixed architecture using Hadoop for batch indexing and Storm, HBase and Zookeeper to keep our indexes updated in near real time.Will talk about why we didn't choose just a default Solr Cloud and it's real time feature (mainly to avoid hitting merges while serving queries on the slaves) and the advantages and complexities of having a mixed architecture. Both parts of the infrastucture and how they are coordinated will be explained with details.Finally will mention future lines, how we plan to use Lucene real time feature.

Video & Slides are available here.

Next Generation Electronic Medical Records and Search: A Test Implementation in Radiology
Presented by David Piraino, Chief Imaging Information Officer, Imaging Institute Cleveland Clinic, Cleveland Clinic
& Daniel Palmer, Chief Imaging Information Officer, Imaging Institute Cleveland Clinic, Cleveland Clinic

Most patient specifc medical information is document oriented with varying amounts of associated meta-data. Most of pateint medical information is textual and semi-structured. Electronic Medical Record Systems (EMR) are not optimized to present the textual information to users in the most understandable ways. Present EMRs show information to the user in a reverse time oriented patient specific manner only. This talk discribes the construction and use of Solr search technologies to provide relevant historical information at the point of care while intepreting radiology images.

Radiology reports over a 4 year period were extracted from our Radiology Information System (RIS) and passed through a text processing engine to extract the results, impression, exam description, location, history, and date. Fifteen cases reported during clinical practice were used as test cases to determine if ""similar"" historical cases were found . The results were evaluated by the number of searches that returned any result in less than 3 seconds and the number of cases that illustrated the questioned diagnosis in the top 10 results returned as determined by a bone and joint radiologist. Also methods to better optimize the search results were reviewed.

An average of 7.8 out of the 10 highest rated reports showed a similar case highly related to the present case. The best search showed 10 out of 10 cases that were good examples and the lowest match search showed 2 out of 10 cases that were good examples.The talk will highlight this specific use case and the issues and advances of using Solr search technology in medicine with focus on point of care applications.

Video & slides are available here.

Semantic Search in the Cloud

Presented by Roberto Masiero, Vice President ADP Innovation Lab, ADP

In this presentation we will cover ADP's Semantic Search strategy and implementation. From the use cases to the design to support semantic searches on a vast set of data, to crawling data from hundreds of data sources. We will also cover our architecture to scale the search service on a multi-tenant SaaS environment.

Video & slides are available here.

 


How to make a simple cheap high-availability self-healing Solr cluster
Presented by Stephane Gamard, Chief Technology Officer, Searchbox

In this presentation we aim to show how to make a high availability Solr cloud with 4.1 using only Solr and a few bash scripts. The goal is to present an infrastructure which is self healing using only cheap instances based on ephemeral storage. We will start by providing a comprehensive overview of the relation between collections, Solr cores, shardes, and cluster nodes. We continue by an introduction to Solr 4.x clustering using zookeeper with a particular emphasis on cluster state status/monitoring and solr collection configuration. The core of our presentation will be demonstrated using a live cluster.

We will show how to use cron and bash to monitor the state of the cluster and the state of its nodes. We will then show how we can extend our monitoring to auto generate new nodes, attach them to the cluster, and assign them shardes (selecting between missing shardes or replication for HA). We will show that using a high replication factor it is possible to use ephemeral storage for shards without the risk of data loss, greatly reducing the cost and management of the architecture. Future work discussions, which might be engaged using an open source effort, include monitoring activity of individual nodes as to scale the cluster according to traffic and usage.

Video & Slides are available here.

 


Text Tagging with Finite State Transducers
Presented by David Smiley, Software Systems Engineer, Lead, MITRE

OpenSextant is an unstructured-text geotagger. A core component of OpenSextant is a general-purpose text tagger that scans a text document for matching multi-word based substrings from a large dictionary. Harnessing the power of Lucene’s state-of-the-art finite state transducer (FST) technology, the text tagger was able to save over 40x the amount of memory estimated for a leading in-memory alternative. Lucene’s FSTs are elusive due to their technical complexity but overcoming the learning curve can pay off handsomely.

Video & slides are available here.

Internalizing location services with GeoNames
Presented by John Marc Imbrescia, Senior Software Engineer, Etsy.com

Etsy recently chose to bring our location services in house. We used the open source GeoNames data set and built the tools we needed to use that data to allow members to select their location, show translations of place names, and to feed data into our search database for local, regional, and country based searches.

This talk will cover the implementation details and decisions we made along the way. How we mapped places from our old data set to the GeoNames data. The internal tools we built including a SOLR core for doing location place name autosuggest. Modifications to our Listings Search and Shop Search cores and the different ways we use location based search around the site both distance and region based using GeoNames hierarchy data.

There will also be a discussion about choosing to release some of the tools we built for this project open source and the decisions behind the non-search (display etc.) related elements of the project and the tools we chose for them and why.

Video & slides are available here.

Building a Near Real-time Search Engine and Analytics for logs using Solr

Presented by Rahul Jain, System Analyst (Software Engineer), IVY Comptech Pvt Ltd

Consolidation and Indexing of logs to search them in real time poses an array of challenges when you have hundreds of servers producing terabytes of logs every day. Since the log events mostly have a small size of around 200 bytes to few KBs, makes it more difficult to handle because lesser the size of a log event, more the number of documents to index. In this session, we will discuss the challenges faced by us and solutions developed to overcome them. The list of items that will be covered in the talk are as follows. 

  • Methods to collect logs in real time.
  • How Lucene was tuned to achieve an indexing rate of 1 GB in 46 seconds
  • Tips and techniques incorporated/used to manage distributed index generation and search on multiple shards
  • How choosing a layer based partition strategy helped us to bring down the search response times.
  • Log analysis and generation of analytics using Solr.
  • Design and architecture used to build the search platform.
Video & Slides are available here.

 


Get On The Spot Solutions To Your Real Life Lucene/Solr Challenges

Presented by Chris Hostetter, LucidWorks

Date: May 1,2013
Time: 4:00-5:15 PT
Room:Crystal Ballroom

Got a tough problem with your Solr or Lucene application? Facing
challenges that you'd like some advice on? Looking for new approaches to
overcome a Lucene/Solr issue? Not sure how to get the results you
expected? Don't know where to get started? Then this session is for you.

Now, you can get your questions answered live, in front of an audience of
hundreds of Lucene Revolution attendees! Back again by popular demand,
"Stump the Chump" at Lucene Revolution 2013 puts Chris Hostetter (aka
Hoss) in the hot seat to tackle questions live.

All you need to do is send in your questions to us here at
stump@lucenerevolution.org. You can ask anything you like, but consider
topics in areas like:

  • Data modelling
  • Query parsing
  • Tricky faceting
  • Text analysis
  • Scalability

You can email your questions to stump@lucenerevolution.org. Please
describe in detail the challenge you have faced and possible approach you
have taken to solve the problem. Anything related to Solr/Lucene is fair
game.

Our moderator, Steve Rowe, will will read the questions, and Hoss have to
formulate a solution on the spot. A panel of judges will decide if he has
provided an effective answer. Prizes will be awarded by the panel for the
best question - and for those deemed to have "Stumped the Chump".

Watch session video

Day 2 Presentations

GO TO DAY 1


KEYNOTE: Lucene / Solr road map

Presented by Steve Rowe Lucene/Solr Committer and PMC Chair, LucidWorks

Video and slides are available here.

KEYNOTE: Solr- Past, Present & Future

Presented by Yonik Seeley Lucene/Solr Committer and PMC Chair, LucidWorks

Video and slides available here

KEYNOTE: Skills, Reputation and Search

Presented by Peter Skomoroch, Principal Data Scientist, LinkedIn

Video and Slides are available here.

Lucene / Solr 4 Spatial Deep Dive

Presented by David Smiley, Software Systems Engineer, Lead, MITRE

Lucene’s former spatial contrib is gone and in its place is an entirely new spatial module developed by several well-known names in the Lucene/Solr spatial community. The heart of this module is an approach in which spatial geometries are indexed using edge-ngram tokenized geohashes searched with a prefix-tree/trie recursive algorithm. It sounds cool and it is! In this presentation, you’ll see how it works, why it’s fast, and what new things you can do with it. Key features are support for multi-valued fields, and indexing shapes with area -- even polygons, and support for various spatial predicates like “Within”. You’ll see a live demonstration and a visual representation of geohash indexed shapes. Finally, the session will conclude with a look at the future direction of the module.

Video & slides are available here.

CommerceSearch: Moving from FAST to Solr on ATG

Presented by Ricardo Merizalde, Software Development Manager, Backcountry.com

The intent of this presentation is to describe an implementation of an open source framework for e-commerce sites. The presentation will focus on Oracle ATG but can be extended to other platforms. First, a brief introduction on what CommerceSearch is (an open source integration and framework for eCommerce sites). Review challenges we had with FAST Impulse. Review the main search artifacts merchandisers can manage through CommerceSearch. Review how CommerceSearch integrates with ATG to deploy changes in near real time. Review CommerceSearch integration test framework and automated test framework (Selenium). Finally, summarize the benefits we got by moving from FAST to Solr.

Video and Slides are available here.

 


Solr Powered Libraries: A survey of the world's knowledge bases

Presented by Erik Hatcher, Lucene/Solr Committer and PMC member, Co-founder, LucidWorks

Using Apache Lucene and Solr search technologies, information and knowledge have become vastly more searchable, findable, and accessible. Because scholars and researchers are some of the most demanding users of search systems, the problems encountered by the implementers are complex. For example, many of the applications built on these technologies also thrive on intentionally designed-in serendipitous discovery capabilities, bringing to light previously unknown, yet related and potentially interesting, content.

Libraries and other public knowledge-sharing environments, such as Wikipedia, generally embrace "open source" and community improving contributions as core principles, making a lovely synergy with the power, features, and community-driven ecosystem provided by Lucene and Solr.

Video & slides are available here.

Scaling up Solr 4.1 to Power Big Search in Social Media Analytics

Presented by Timothy Potter, Architect, Big Data Analytics, Dachis Group

My presentation focuses on how we implemented Solr 4.1 to be the cornerstone of our social marketing analytics platform. Our platform analyzes relationships, behaviors, and conversations between 30,000 brands and 100M social accounts every 15 minutes. Combined with our Hadoop cluster, we have achieved throughput rates greater than 8,000 documents per second. Our index currently contains more than 500,000,000 documents and is growing by 3 to 4 million documents per day.

The presentation will include details about:

  • Designing a Solr Cloud cluster for scalability and high-availability using sharding and replication with Zookeeper
  • Operations concerns like how to handle a failed node and monitoring
  • How we deal with indexing big data from Pig/Hadoop as an example of using the CloudSolrServer in SolrJ and managing searchers for high indexing throughput
  • Example uses of key features like real-time gets, atomic updates, custom hashing, and distributed facets. Attendees will come away from this presentation with a real-world use case that proves Solr 4.1 is scalable, stable, and is production ready. (note: we are in production on 18 nodes in EC2 with a recent nightly build off the branch_4x).
Video & slides are available here.

Solr at Zvents, nearly 6 years later and still going strong

Presented by Amit Nithianandan, Lead Engineer Search/Analytics New Platforms, Zvents/Stubhub

Zvents has been a user of Apache Solr since 2007 when it was very early. Since then, the team has made extensive use of the various features and most recently completed an overhaul of the search engine to Solr 4.0. We'll touch on a variety of development/operational topics including how we manage the build lifecycle of the search application using Maven, release the deployment package using Capistrano and monitor using NewRelic as well as the extensive use of virtual machines to simplify node management. Also, we’ll talk about application level details such as our unique federated search product, and the integration of technologies such as Hypertable, RabbitMQ, and EHCache to power more real-time ranking and filtering based on traffic statistics and ticket inventory.

Video and session slides are available here.

 


Advanced Query Parsing Techniques

Presented by Paul Nelson, Chief Architect, Search Technologies

Lucene and Solr provide a number of options for query parsing, and these are valuable tools for creating powerful search applications. This presentation will review the role that advanced query parsing can play in building systems, including: Relevancy customization, taking input from user interface variables such as the position on a website or geographical indicators, which sources are to be searched and 3rd party data sources. Query parsing can also enhance data security. Best practices for building and maintaining complex query parsing rules will be discussed and illustrated.

Video and slides are available here.

 


Brahe - Mass scale flexible indexing

Presented by Ben Brown, Software Architect, Cerner Corporation

Our team made their first foray into Solr building out Chart Search, an offering on top of Cerner's primary EMR to help make search over a patient's chart smarter and easier. After bringing on over 100 client hospitals and indexing many tens of billions of clinical documents and discrete results we've (thankfully) learned a couple of things.

The traditional hashed document ID over many shards and no easily accessible source of truth doesn't make for a flexible index.
Learn the finer points of the strategy where we shifted our source of truth to HBase. How we deploy new indexes with the click of a button, take an existing index and expand the number of shards on the fly, and several other fancy features we enabled.

Video & Slides are now available here.

 


SolrCloud: the 'Search First' NoSQL database

Presented by Mark Miller, Software Engineer, Cloudera

As the NoSQL ecosystem looks to integrate great search, great search is naturally beginning to expose many NoSQL features. Will these Goliath's collide? Or will they remain specialized while intermingling – two sides of the same coin.
Come learn about where SolrCloud fits into the NoSQL landscape. What can it do? What will it do? And how will the big data, NoSQL, Search ecosystem evolve. If you are interested in Big Data, NoSQL, distributed systems, CAP theorem and other hype filled terms, than this talk may be for you.

Video & slides are available here.

Using Lucene/Solr to Build Advertising Systems

Presented by Hideharu Hatayama, Rakuten, Inc.

I want to talk about architecture patterns of Solr centered ad systems and practical knowledge which we gained by operating the system with high availability for years, and these topics would be applicable for other systems such as e-commerce site or restaurant recommendation site.Through the presentation, I'll aim that beginners will get the hints of how to design their system architecture using Solr with high performance, and how to manage or operate the systems avoiding down time.

Video and Slides available here.

 


Multi-faceted responsive search, autocomplete, feeds engine and logging

Presented by Remi Mikalsen, Search Engineer, The Norwegian Centre for ICT in Education

Learn how utdanning.no leverages open source technologies to deliver a blazing fast multi-faceted responsive search experience and a flexible and efficient feeds engine on top of Solr 3.6. Among the key open source projects that will be covered are Solr, Ajax-Solr, SolrPHPClient, Bootstrap, jQuery and Drupal. Notable highlights are ajaxified pivot facets, multiple parents hierarchical facets, ajax autocomplete with edge-n-gram and grouping, integrating our search widgets on any external website, custom Solr logging and using Solr to deliver Atom feeds. utdanning.no is a governmental website that collects, normalizes and publishes study information for related to secondary school and higher education in Norway. With 1.2 million visitors each year and 12.000 indexed documents we focus on precise information and a high degree of usability for students, potential students and counselors.

Video & slides are available here.

 


Crowd-sourced intelligence built into Search over Hadoop

Presented by Ted Dunning, Chief Application Architect, MapR
& Grant Ingersoll, Chief Technology Officer, LucidWorks

Search has quickly evolved from being an extension of the data warehouse to being run as a real time decision processing system. Search is increasingly being used to gather intelligence on multi-structured data leveraging distributed platforms such as Hadoop in the background. This session will provide details on how search engines can be abused to use not text, but mathematically derived tokens to build models that implement reflected intelligence. In such a system, intelligent or trend-setting behavior of some users is reflected back at other users. More importantly, the mathematics of evaluating these models can be hidden in a conventional search engine like SolR, making the system easy to build and deploy. The session will describe how to integrate Apache Solr/Lucene with Hadoop. Then we will show how crowd-sourced search behavior can be looped back into analysis and how constantly self-correcting models can be created and deployed. Finally, we will show how these models can respond with intelligent behavior in realtime.

Video & slides are available here.

 


Implementing Search with Solr at 7digital

Presented by James Atherton, Search Team Lead, 7digital

A usage/case study, describing our journey as we implemented Lucene/Solr, the lessons we learned along the way and where we hope to go in the future.How we implemented our instant search/search suggest. How we handle trying to index 400 million tracks and metadata for over 40 countries, comprising over 300GB of data, and about 70GB of indexes. Finally where we hope to go in the future.

Video & slides are available here.

Designing the Search Experience

Presented by Tyler Tate, Cofounder, TwigKit

Search is not just a box and ten blue links. Search is a journey: an exploration where what we encounter along the way changes what we seek. But in order to guide people along this journey, we must understand both the art and science of search.In this talk Tyler Tate, cofounder of TwigKit and coauthor of the new book Designing the Search Experience, weaves together the theories of information seeking with the practice of user interface design, providing a comprehensive guide to designing search.Pulling from a wealth of research conducted over the last 30 years, Tyler begins by establishing a framework of search and discovery. He outlines cognitive attributes of users—including their level of expertise, cognitive style, and learning style; describes models of information seeking and how they've been shaped by theories such as information foraging and sensemaking; and reviews the role that task, physical, social, and environmental context plays in the search process.

Tyler then moves from theory to practice, drawing on his experience of designing 50+ search user interfaces to provide practical guidance for common search requirements. He describes best practices and demonstrates reams of examples for everything from entering the query (including the search box, as-you-type suggestions, advanced search, and non-textual input), to the layout of search results (such as lists, grids, maps, augmented reality, and voice), to result manipulation (e.g. pagination and sorting) and, last but not least, the ins-and-outs of faceted navigation. Through it all, Tyler also addresses mobile interface design and how responsive design techniques can be used to achieve cross-platform search.This intensive talk will enable you to create better search experiences by equipping you with a well-rounded understanding of the theories of information seeking, and providing you with a sweeping survey of search user interface best practices.

Video & slides are available here.

 


Rapid pruning of search space through hierarchical matching

Presented by Chandra Mouleeswaran, Co Chair at Intellifest.org, ThreatMetrix

This talk will present our experiences in using Lucene/Solr to the classification of user and device data. On a daily basis, ThreatMetrix, Inc., handles a huge volume of volatile data. The primary challenge is rapidly and precisely classifying each incoming transaction, by searching a huge index within a very strict latency specification. The audience will be taken through the various design choices and the lessons learned. Details on introducing a hierarchical search procedure that systematically divides the search space into manageable partitions, yet maintaining precision, will be presented.

Video & slides are available here.

 


Living with Garbage

Presented by Gregg Donovan, Senior Software Engineer, Etsy.com, Inc.

Understanding the impact of garbage collection, both at a single node and a cluster level, is key to developing high-performance, high-availability Solr and Lucene applications. After a brief overview of garbage collection theory, we will review the design and use of the various collectors in the JVM.

At a single-node level, we will explore GC monitoring -- how to understand GC logs, how to monitor what % of your Solr request time is spend on GC, how to use VisualGC, YourKit, and other tools, and what to log and monitor. We will review GC tuning and how to measure success.

At a cluster-level, we will review how to design for partial availability -- how to avoid sending requests to a GCing node and how to be resilient to mid-request GC pauses.For application development, we will review common memory leak scenarios in custom Solr and Lucene application code and how to detect them.

Video and slides are available here.

Beyond simple search – adding business value in the enterprise

Presented by Kathy Phillips, Enterprise Search Services Manager/VP, Wells Fargo & Co.
& Tom Lutmer, eBusiness Systems Consultant, Enterprise Search Services team, Wells Fargo & Co.

What is enterprise search? Is it a single search box that spans all enterprise resources or is it much more than that? Explore how enterprise search applications can move beyond simple keyword search to add unique business value. Attendees will learn about the benefits and challenges to different types of search applications such as site search, interactive search, search as business intelligence, and niche search applications. Join the discussion about the possibilities and future direction of new business applications within the enterprise.

Video & slides are available here.

Tips and Tricks for getting the best out of Solr on Windows Azure

Presented by Brian Benz, Senior Technical Evangelist, Microsoft Open Technologies, Inc.

This session will cover tips and tricks for getting the most out of Solr in Windows Azure. Windows Azure enables quick and easy installation and setup of Solr search functionality in a variety of ways, and lets you focus on managing and operating Solr servers in our managed environment. We’ll cover multiple options for setting up Solr in Windows Azure, including working examples.

Videos and slides are available here.

 


Beyond TF-IDF: Why, What and How

Presented by Stephen Murtagh, Etsy.com, Inc.

TF-IDF (term frequency, inverse document frequency) is a standard method of weighting query terms for scoring documents, and is the method that is used by default in Solr/Lucene. Unfortunately, TF-IDF is really only a measure of rarity, not quality or usefulness. This means it would give more weight to a useless, rare term, such as a misspelling, than to a more useful, but more common, term.

In this presentation, we will discuss our experiences replacing Lucene's TF-IDF based scoring function with a more useful one using information gain, a standard machine-learning measure that combines frequency and specificity. Information gain is much more expensive to compute, however, so this requires periodically computing the term weights outside of Solr/Lucene and making the results accessible within Solr/Lucene.

Video & slides are available here.

 


Concept Search for eCommerce with Solr

Presented by Mikhail Khludnev, eCommerce Search Platform, Grid Dynamics

This talk describes our experience in eCommerce Search: challenges which we’ve faced and the chosen approaches. It’s not indented to be a full description of implementation, because too many details need to be touched. This talk is more like problem statement and general solutions description, which have a number of points for technical or even academic discussion. It’s focused on text search use-case, structures (or scoped) search is out of agenda as well as faceted navigation.

Video & slides are available here.

Building big social network search system using Lucene

Presented by Aleksey Shevchuk, Lead developer, odnoklassniki.ru

We will explain how search systems of social network Odnoklassniki work. Each day 40mln people use Odnoklassniki to communicate and entertain themselves. These activities are hard to imagine without proper search system. A dozen big index's and thousands of small indexes are responding to more than 4000 searches per second at peak times. Users can search within specific site sections of the site or the whole site. Search system will decide which indexes should be queried, and which results to show. To improve relevance we use information from social graph and various activity statistics available for indexed entities. Query log analysis? Again Lucene!

Videos and slides are available.

Edanz Journal Selector: Case Study: a Prototype based on Solr/Nutch/Hadoop

Presented by Liang Shen, Developer, European Bioinformatics Institute

I'm going to introduce a project I built in 2011: Edanz Journal Selector. It's a tool for scholars to find the right journals to publish their manuscripts. It will be a typical “How We Did It” Development Case Study.

We built Edanz Journal Selector based on Solr/Lucene/Hadoop/Hive and deployed it on Amazon web servies. I'm going to share experiences about architecture, cloud and etc. from this project.