The BlackRay Data Engine
The BlackRay Data Engine is an open source, high-performance, in-memory relational database, designed for large data sets and constant performance. Originally designed for directory applications it offers features such as token search, token position, phonetic search and the combination of these features with leading, trailing and mid-span wildcards. It is built to run on standard hardware, but offer the ability to index data in the over 100 Million row range, with constant search throughput of several 100 queries per second, even with complex queries.
In this talk we would like to explain our motivations for designing and building BlackRay, and then elaborate on the architecture of the internals of the data engine. The index structures inside the data engine are designed for low memory consumption, and the ability to quickly index large amounts of data. A total of five index layers, also called index perspectives, is required to fulfill the functions required for our search algorithms. Our smart combination of binary- and permuterm based search offers significant performance benefits over many traditional tree- and trie-based searches. Finally, searching for a token combination within a single table column only result in linear complexity, rather than exponential as in most typical index structures. The option to additionally compress the index further reduce the amount of memory used during operation.
English
Education:
BSc Computer Engineering, San Jose State University 2000
Work:
2000-2003: Founder, COO PhysicianPal inc. -- We were th first web-based collaboration platform for physicians, ambulatory surgery centers and patients. Platform built with Java, on Tomcat and MySQL.
2004-now: Director, SoftMethod GmbH -- SoftMethod is a leading provider of high performance data management technologies. Products include a directory assistance/information provider solution and the BlacRay Open Source data engine.
2009-now: Vice Chairman, Board of Directors, Open Database Alliance (http://www.odba.org)
Cool stuff:
I curently work on the following open source projects:
- BlackRay
- OpenContactCenter
- JaXAs
- phonet4j
All of which are hosted in ourn forge (https://forge.softmethod.de)
I am especially proud of the BlackRay data engine, which was a commercial project from 2004 until June 2009. In June 2009 we open sourced the entire project, under the GPLv2.
BlackRay was first presented at OSCON 2009 in San Jose, and we gave a talk about it at FrOSCon (http://programm.froscon.org/2009/events/456.en.html) which is on youtube (http://www.youtube.com/watch?v=Z8xGm6cQhWc). This year, I was invited to speak about BlackRay at MySQL Con 2010 in Santa Clara (http://en.oreilly.com/mysql2010/public/schedule/speaker/45063), and again at FrOSCon 2010 (http://programm.froscon.de/2010/events/643.en.html)
I have over ten years of Java coding experience and do a lot of work in Java performance engineering....
Would be cool to be at codebits 2010 again! (more)
Friday, 4 of December of 2009, from 12:00 to 13:00
Cláudio Valente
José Pedro Aguiar Airosa
Luis Neves
Marco Ramos
Marco Sousa
Miguel Figueiredo Mascarenhas Sousa Filipe
pedro mg
Ricardo Ferreira
Ricardo Jorge Martins Piedade
Vitor Gaspar Silva
Estimated head count: 17 people
(based on the total of persons interested in this talk and the universe of people attending Codebits)