Workshop Schedule

12:30-13:30 Informal Get Together

Moderator: Jim Paterson, Glasgow Calonian University

Special Talk: "Beyond NoSQL with MarcLogic: The Universal Index", Nuno Job 

14:00 "Dinner in the sky with MongoDB", Marc Boeker, ONchestra

Biography:

Developed about 24 years ago, bootstrapped with the help of 386-computes and under continuos care of a penguin, Marc early learned to think binary. At the age of 14 his parents deployed him to the web.

For his first steps there, a simple SQL was enough to replace his brain. But as time goes by, he recognizes, that a fixed structure is too inflexible. Thats the point where he discovered the benefits of NoSQL databases to use as his main brain storage.

Talk Abstract:

Boarding -
The presentation will target both types of visitors. Those who are already familiar with MongoDB
and those who are new to the NoSQL world and MongoDB. We will  provide a lot of real world examples where MongoDB is well suited for.
We will give give a small overview what NoSQL means and why it is so fascinating to break standards that are established since decades. Thenwe will introduce MongoDB and compare it to its main competitor CouchDB. After the introduction, everyone should have the same level of knowledge to dive into the details of MongoDB.

Welcome passengers -
In this step we will present the internals of MongoDB and its ecosystem, that has evolved in the past month. Lets solve some common questions like:
- How to get MongoDB running on your Mac/*nix OS?
- What drivers are available and how to install them?
- Which tools can be used to administer MongoDB?

Taking off -
As everyone is ready to rumble, we will start with simple CRUD example. Once we know how to get data in and out of MongoDB, we'll take a deeper look at the different query operators. What are they for and how can we use them to accomplish the same complex queries as in SQL. The main goal in this step is to get the audience comfortable with a database that doesn't offer SELECT, INSERT, UPDATE and DELETE and doesn't have inflexible schemes. This step is know as blame SQL :)

Having a dinner in the sky -
Once we have convinced everyone in using NoSQL for their next project :) we'll serve some delicious extras that MongoDB offers right out of the box:

- Replication/Sharding
- Index support (single, multiple, unique)
- Some impressive performance benchmarks

One very interesting subproject of MongoDB is GridFS, a filesystem based on MongoDB. It's designed to speed up file management and get rid of slow file system indexes and IO problems when serving a large number of files.
MongoDB is even well suited for processing large amounts of data as it includes a Map & Reduce function. We'll take a small peek into it by running a simple example.

Take down -
If MongoDB has now a few more fans, I'd like to show them a few tricks in migrating their existing
data to MongoDB. If we still have some time, we can make a quick walk through converting a SQL
scheme into a NoSQL one.

14:50 "Scale Out vs. Scale In - a face-off between Cassandra and Redis", Tim Lossen, wooga

Biography:

Tim Lossen works as a Backend Developer at wooga, the leading provider of "social games" in Europe. Together with his colleagues he is responsible for the infrastructure of the games which are played by millions of facebook users.

Talk Abstract:

There are many NoSQL solutions to choose from -- so which one is best suited for your situation?

In this talk, we will take a closer look at two of the main contenders: Cassandra, which represents the eventually-consistent scale-out ring (descended from Amazon's Dynamo), and Redis, a simple but lightning-fast key-value-store (inspired by Memcached).

I will highlight the strengths and weaknesses of each, and tell which one we selected to build the backend for a facebook game with heavy scaling needs.

15.40 Drink / Coffee Break

16:00 "The GraphDB Landscape and SonesDB", Achim Friedland, Sones

Biography:

Achim Friedland is responsible for Research & Development at the sones startup in Erfurt. He studied computer science at TU Ilmenau an is an independent developer and trainer for different companies. His current focus is on design and development of distributed data-management systems and the integration in existing system- and application landscapes. During his scientific research he was also busy with the topic "Future Internet Design".

Talk Abstract:

The topic of this talk will be on graph databases in general and the sones graph data storage.

Graph databases have a long tradition for a wide range of application domains like the semantic web, geo information systems, recommendation systems and bio informatics. These applications demand features the traditional relational model cannot deliver easily: Linked data as a first class concept, efficient support of sparse semi-structured tables, on-going schema evolution, versioned information and (semi-) automatic partitioning and distribution. Graph databases can solve this issues and are able to replace expensive operations like a chain of recursive JOINs by much cheaper graph traversals.

The sones GraphDS has a clear focus on object-oriented and semi-structured graph data within a distributed computing environment. The vertices and edges may be modeled having structured or unstructured properties, complex documents or even large binary data. Thus, it is especially favorable for building sophisticated information management applications. In order to simplify graph traversals having complex evaluation criteria and the migration from SQL it comes with its own graph query language called GQL.

Lately, the sones graph data storage has left its private beta state and was released as open source software. We will give a short overview of the features of its initial release and provide a forecast on upcoming enhancements and future development efforts.

 

16:50 "Neo4j for deep spatial and social intelligence", Peter Neubauer, Neo Technology

Biography:

Peter Neubauer, COO NeoTechnology, Peter has been deeply involved in programming for 10 years and is co-founder of a number of Open Source projects like Neo4j, Tinkerpop, OPS4J and Qi4j. Peter loves connecting things, writing crappy prototypes and throwing together new ideas and projects around graphs and society-scale innovation. If you want brainstorming - feed him a Latte and you are in business.

Talk Abstract:

With the social and spatial domains coming ever closer together, having spatial indexing is more and more interesting. Peter is presenting the structure of the Neo4j Spatial project, turning Neo4j into a GeoTools compliant feature store.
Also, recommendation algorithms are an interesting aspect in most complex datasets today. Peter is presenting Tinkerpop Pipes as a fast, expressive graph query language and some interesting algorithms, some of them even being designed to be run in parallel pipes.

17:40 Drink / Coffee Break

18:00 "Mastering Massive Data Volumes with Hypertable", Doug Judd, Hypertable Inc

Biography:

Doug Judd is CEO of Hypertable

Talk Abstract:

Hypertable is an high performance, scalable, open source database modeled closely after Bigtable,  the database that powers over 100 Google services.

Unique among the prominent open source scalable databases, Hypertable is implemented in C++ for optimum performance. The project was started in March of 2007 by ex-Inktomi search engineers and is sponsored by Baidu (Nasdaq: BIDU), the leading Internet search services provider in China, and Rediff.com (Nasdaq: REDF) the largest India-owned and operated Webportal.

In this talk, Doug Judd, the original creator and current maintainer of the project, will give an introduction to Hypertable. He will present an architectural overview and will demonstrate the features that make Hypertable unique and compelling. He will also give an assessment of best-fit use cases and describe several real world deployments, including the Rediff.com's e-mail SPAM classifier and Zvents' analytics platform.

18:50 "Agile Web Data with CouchDB", Mikeal Rogers, Couchio

Biography:

Mikeal Rogers is a developer at Couchio working on all things in and around CouchDB.

Talk Abstract:

CouchDB is a database built of the web and replication is the transport for web and mobile offline / online applications.

19:30 Official Part Ends

20:00 NoSQL / ICOODB After Workshop Beer.

Location to be announced.

Workshop Chairs: Stefan Edlich, Jim Paterson