Large-Scale Real-Time Data Management for Engagement and Monetization

Invited talk at the 12th International Workshop on Large-Scale and Distributed Systems for Information Retrieval, co-located with ACM CIKM 2015, October 23, 2015, Melbourne, Australia.

Abstract

Cxense helps companies understand their audience and build great online experiences. Cxense Insight and DMP let customers annotate, filter, segment and target their users based on the consumed content and performed actions in real-time. With more than 5000 active websites, Insight alone tracks more than a billion unique users with more than 15 billions page views per month. To leverage the huge amounts of data in real-time, we have built a large distributed system relying on techniques familiar from databases, information retrieval and data mining. In this talk, we outline our solutions and give some insight into the technology we use and the challenges we face. This introduction should be interesting to undergraduate and PhD students as well as experienced researchers and engineers. [ Extended abstract/description: Preprint PDF, ACM DL ]

Presentation slides

Additional material

Cxense DMP Video

Demo: Associating Users with Audience Segments

Demo: Using Audience Segments to Drive Targeted Advertising with Cxense Display

How to pronounce Cxense?

Read more

Yet another year, and yet another Javazone. As always, lots of food, new people, old friends and many interesting talks. Long story short, here is my top-5 of the most entertaining talks:

  1. Java 8 JVM Memory and Thread Management by Ken Sipe – a very good explanation of the Java memory model and some practical tips.
  2. Coding Culture by Sven Peters – a set of very good suggestions on how to create a great work place for developers.
  3. The Rule of Three by Kevlin Henney – as always, a hilarious talk about everything and nothing.
  4. Building a Carputer: Java and teh Automotive Internet of Things by Simon Ritter – Java, Raspberry Pi and an Audi S3 - how awesome can it be?
  5. Production time profiling On-Demand with Java Flight Recorder by Klara Ward – nuff said!

Two more talks for those specially interested in the JVM internals:

  1. The Illusion of Execution by Nitsan Wakart.
  2. Java Concurrency Under The Hood by Gleb Smirnov.

And two more interesting talks for my Norwegian-speaking friends:

  1. Skrivetips for utviklere by June Henriksen.
  2. Anbefalinger fra Apache Mahout til Spark MLlib på 4 steg i FINN.no by Helge Jenssen.

These and other presentations and lightning talks can be found here.

I know lots of people who were passionate about computers since they were just little kids. I wasn’t. I wanted to be a chief officer (seaman), a lawyer, a scientist or a graphical designer at different parts of my childhood. The first piece of code I wrote in 2002 was something like this (this is a CASIO fx graphical calculator with a BASIC dialect).

13 years later I work as a programmer - but it is not the passion for computers that drives me forward. It is helping people to solve their problems, and of course getting a chance to put some simple mathematics into it :)

A few weeks ago I was at the annual JavaZone where they had many great talks and quite many cool stands. At one of the stands they had a spinner, a wheel with numbers 1 to 20, and if you were lucky you could win a book. The girl at the stand said “You can choose one number and spin it twice or you can choose two numbers but only spin once”. “Is there a difference?” - I asked, “No” - the girl replied and shook her head. But really? Lets look at this…

Assume we had just one number and one spin, the probability to win is then 1/n where n is 20. Lets call it just p. Now, with two numbers the probabilty is 2/n or 2p.

With two spins it must be 1 minus the probability to loose on both attempts, that is 1-(1-p)^2. Or, put it another way, the probability to win on the first attempt plus the probability to win on the second attempt given that the first attemp has failed, that is p+(1-p)p. Either approach gives 2p-p^2.

The difference of p^2 is actually very interesting. Why is it or what does it represent? Well, both situations can be expressed as an “A or B” outcome, which can also be expressed as “A + B - A and B”. In both situations A and B have the same probability, p, but “A and B” is however different. In the first case we cannot win with both numbers at the same time, while in the second case it is possible, we can win on both spins.

This means that something we percieve as increasing our chance to win, in reality reduces it. It is probably why the most people at the stand chose to spin the wheel twice. As a true scientist, I sneaked to the stand whole 7 times, chose two numbers each time and didn’t win once. Well, probability and luck have nothing to do with each other I say.

This week a few colleagues and I have been to the JavaZone, which is an annual Java/JVM conference arranged here in Oslo. To me, a conference like this is a great opportunity to learn something new, meet new people and old friends and just have fun. Long story short, here are seven of the best talks, ranged by how good, interesting, funny or relevant they were to me. All seven are in English and are suitable for most (Java) developers:

  1. Practical Considerations For Microservice Architectures – a nice talk with some great slides about architecture, design, collaboration and tools.
  2. Making Steaks from Sacred Cows – a hilarious talk about cargo cult programming, how little do we know and the Matrix.
  3. Understanding Java Byte Code - an interesting and brief introduction to the Java byte code format.
  4. JVM tools - a good introduction to a number of tools such as jstat, jmap and mission control.
  5. Building a Big Data Machine Learning Platform - Cliff Click’s presentation of the H20 platform. Quite interesting!
  6. Java 8 Lambdas and Streams for Dummies - a nice but very basic introduction to Java 8. Also mentions using JMH for micro-benchmarking.
  7. 33 Things You Want to Do Better - an interesting talk; towards the end presents also some useful frameworks, such as Guava and LomBok.

As a bonus, for those who likes mathematics and understands Norwegian – Kontroversiell matematikk. Otherwise, more of this year’s presentations and lightning talks can be found here.