Open AI Gym is a fun toolkit for developing and comparing reinforcement learning algorithms. It provides a variety of environments ranging from classical control problems and Atari games to goal-based robot tasks. Currently it requires an amout of effort to install and run it on Windows 10. In particular you need to recursively install Windows Subsystem for Linux, Ubuntu, Anaconda, Open AI Gym and do a robot dance to render simulation back to you. To make things a bit easier later you would also like to use Jupyter Notebook. In the following you will find a brief step-by-step description as of September 2018 with the end result looking like this:

The end result in action

First we install the Linux subsystem by simply running the following command as Administrator in Power Shell:

Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux

After restarting the computer, install Ubuntu 18.04 from the Windows Store, launch and run system update:

sudo apt-get update
sudo apt-get upgrade

Next we install Anaconda 3 and its dependencies:

sudo apt-get install cmake zlib1g-dev xorg-dev libgtk2.0-0 python-matplotlib swig python-opengl xvfb
wget https://repo.continuum.io/archive/Anaconda3-5.2.0-Linux-x86_64.sh
chmod +x Anaconda3-5.2.0-Linux-x86_64.sh
./Anaconda3-5.2.0-Linux-x86_64.sh

Then we ensure that .bashrc file has been modified as necessary. If not, adding . /home/username/anaconda3/etc/profile.d/conda.sh at the very end should do the kick.

After this, we relaunch the terminal and create a new conda environment called gym:

conda create --name gym python=3.5
conda activate gym

and install Gym and its dependencies:

git clone https://github.com/openai/gym
cd gym
pip install -e .[all]

As we are also going to need matplotlib, we install it by running:

pip install matplotlib

Next we install jypyter notebooks and create a kernel pointing to the gym environment created in the previous step:

conda install jupyter nb_conda ipykernel
python -m ipykernel install --user --name gym

Now we run the magic command which will create a virtual X server environment and launch the notebook server:

xvfb-run -s "-screen 0 1400x900x24" ~/anaconda3/bin/jupyter-notebook --no-browser

no-browser tells that we don’t want to open a new browser in the terminal, but prompt the address. The address can now be pasted in the browser in your Windows 10, outside of the Ubuntu env.

Next we create a new notebook by choosing “New” and then “gym” (thus launching a new notebook with the kernel we created in the steps above), and writing something like this:

import gym
import 
env = gym.make('CartPole-v0')

from IPython import display
import matplotlib.pyplot as plt
%matplotlib inline

env.reset()
img = plt.imshow(env.render(mode='rgb_array')) # only call this once
for _ in range(100):
    img.set_data(env.render(mode='rgb_array')) # just update the data
    display.display(plt.gcf())
    display.clear_output(wait=True)
    action = env.action_space.sample()
    env.step(action)

This tells to create a new cart pole experiment and perform 100 iterations of doing a random action and rendering the environment to the notebook.

If you are lucky, hitting enter will display an animation of a cart pole failing to balance. Congratulations, this is your first simulation! Replace ‘CartPole-v0’ with ‘Breakout-v0’ and rerun - we are gaming! AWESOME!

Flutter is a great Google SDK allowing effortless creation of mobile apps with native interfaces on iOS and Android. To install it follow the official installation instructions. Here are a few additional tips for Mac and Windows:

  • If you are using fish, add set PATH /Users/simonj/flutter/bin $PATH to your .config/fish/config.fish
  • For android emulation support install Androind Studio and create a new emulated device, lets call it Pixel_2_API_26. To launch the emulator, run ~/Library/Android/sdk/tools/emulator -avd Pixel_2_API_26 on Mac or C:\Users\<name>\AppData\Local\Android\Sdk\emulator\emulator.exe -avd Pixel_2_API_26 on Windows.
  • Disabling Hyper-V may help if you experience Windows crashes when running Android emulator.
  • Some useful commands:
    • flutter doctor # helps to diagnose problems, install missing components, etc.
    • flutter build apk # builds apk
    • flutter install -d # installs apk to a device. to use your actual phone, mount the phone with usb debug enabled
    • flutter devices # to list devices
  • To use visual studio code, follow these instructions or just run code . from your flutter terminal/project and install Flutter and Dart plugins.
  • Check out awesome-flutter for example and inspiration!

In the following, I briefly show how coef_, intercept_, decision_function, predict_proba and predict are connected in case of a binary LogisticRegression model.

Assume we have trained a model like this:

>>> ...
>>> lrmodel = linear_model.LogisticRegression(C=0.1, class_weight='balanced')
>>> lrmodel.fit(X_train, y_train)

The model’s coef_ attribute represents learned feature weights (w) and intercept_ represents the bias (b). Then the decision_function is equivalent to a matrix of x · w + b:

>>> (X_test @ lrmodel.coef_[0].T + lrmodel.intercept_)[:5]
    array([-0.09915005,  0.17611527, -0.14162106, -0.03107271, -0.01813942])

>>> lrmodel.decision_function(X_test)[:5]
    array([-0.09915005,  0.17611527, -0.14162106, -0.03107271, -0.01813942])

Now, if we take sigmoid of the decision function:

>>> def sigmoid(X): return 1 / (1 + np.exp(-X))

>>> sigmoid(X_test @ lrmodel.coef_[0].T + lrmodel.intercept_)[:5]
    array([ 0.47523277,  0.54391537,  0.46465379,  0.49223245,  0.49546527])

>>> sigmoid(lrmodel.decision_function(X_test))[:5]
    array([ 0.47523277,  0.54391537,  0.46465379,  0.49223245,  0.49546527])

it will be equivalent to the output of predict_proba (each touple is probabilities for -1 and 1). We see that these numbers are exactly the second column (the positive class) here:

>>> lrmodel.predict_proba(X_test)[:5]
    array([[ 0.52476723,  0.47523277],
           [ 0.45608463,  0.54391537],
           [ 0.53534621,  0.46465379],
           [ 0.50776755,  0.49223245],
           [ 0.50453473,  0.49546527]])

Finally, the predict function:

>>> lrmodel.predict(X_test)[:5]
    array([-1,  1, -1, -1, -1], dtype=int64)

is eqivalent to:

>>> [-1 if np.argmax(p) == 0 else 1 for p in lrmodel.predict_proba(X_test)] [:5]
    [-1, 1, -1, -1, -1]

which in our case is:

>>> [1 if p[1] > 0.5 else -1 for p in lrmodel.predict_proba(X_test)] [:5]
    [-1, 1, -1, -1, -1]

Have you wondered how to design and run online experiments? In particular, how to implement an experiment dashboard such as the one enpictured below (in this case Visual Website Optimizer) and how to use this in your product? Good, lets have a quick look!

Visual Website Optimizer

On a pure technicall side, the first thing we have to implement is a way to define an experiment as a set of variables we want to try out and a value mapping for the audience. The most important part here is that, for a signle user, the assignment should be the same for a single experiment, e.g., the user is always in the same of two groups. However, it should different across the experiments. The latter is known as a carry-over error, and it is for example if the same user is assigned to the same test group across different experiments.

Facebook has previously released PlanOut, a platform for Online field experiments. Apart from the language itself, the essense of this project is in the random.py which demonstrates a possible way of mapping users or pages to the random aleternatives. In short, each experiment and variable has a salt to be added on top of the user or page id hash to enforce randomization across experiments and variables. The resulting hash is then mapped to the final value through modulo arithmetic and a set of linear transformations. Given this, it is fairly easy to design an API or a library to represent an experiment with a variety of options and to assign those to users in a controlled and consistent fashion. Or you can just use PlanOut or VWO right out of the box.

Settled with the setup and random assignment parts, the next question is how to actually design and run an experiment for your needs. For this, I highly recomment to take a quick look at the original paper describing PlanOut and its presentation, as well as a nice presentation  and a great tutorial about implementing and analysing online experiments at Facebook. Furthermore, there is a series of interesting publications from Microsoft (a survey, a paper and another paper) explaining numerous caveats of running controlled experiments on the web. In particular it explains statistical significance, power, and several types of mistakes it is possible to run into.

If resarch papers sound just too dry and formal, there are several interesting guides explaining A/B testing and its pitfalls in a very accessible manner:

Finally, when it comes to presenting and interpreting the results, this A/B test analysis tool provides a simple view similar to the one in the screenshot above. Otherwise, there are several online calculators for various computations that do not require a PhD in statistics - sample size, experiment duration,comparing two rates (chi-squared test), comparing two series (t-test).

Have fun!

Software Engineering at Cxense

A year ago I wrote a post about the things I have learned during my PhD at NTNU. Since then I wanted to write about my experience as a Software Engineer at Cxense. I am sure that the list of what I have learned over the past four and half years would be very long, so instead I would like to summarize it as the five best things about working at Cxense. So here it goes:

Challenges

Engineers love tough problems and here are plenty of them. Within a month after starting at Cxense, I have been thrown into designing a weighting function that would include dwell time and page popularity into computing the impact of the page’s content on a user’s interest profile and implementing complex filters for a distributed low-level database (solved by implementing bit-wise logical filters and a stack machine). At the same time I reviewed a highly impressive implementation for an incremental inverted index on top of a compressed integer list and lots of mind-numbing bit-twiddling code. The author was Erik Gorset, whom I am deeply grateful for all the inspiration and advice over the years. Since then my colleagues and I have worked on thousands of dazzling problems within areas of distributed databases, data mining, machine learning and development operations, not to mention all bugfixes. Fixing bugs is actually fun, because it feels good to trace that flaw by the end of the day and fix it. Nothing is perfect, and that is also why you need to write good unit and integration tests. Writing tests is hard and time consuming, but it always pays off in the long run. Code reviews provide a great opportunity to learn how other people think and tackle challenges, both for the reviewer and the one being reviewed.

Scale

When I met guys from Cxense for the first time five years ago, they were going live with a large publisher network in Spain. I remember seeing a live chart with the event traffic instantly increasing ten times or so. The guys were very excited about whether the system would handle the challenge, and it did. Since then the numbers have increased by many orders of magnitude. We passed the 20 billion page views per month mark a two years ago. Counting all possible type of events we collect that would be above 60 billion events per month or more than one terabyte of data per day. Crawling is also approaching 100 million Web pages per month these days. These numbers are huge, but the transition impresses me even more because it is impossible to design a system for this kind of growth. A solution that is sufficient at one point has to be phased with a better solution at another point, which in its turn will become obsolete at a later point. Nevertheless, despite this tremendous growth we are still able to serve real-time data with latency counted in just milliseconds.

Tools

We used Java since the beginning, and previously most of us used Eclipse with a common setup for CheckStyle and FindBugs. Nowadays, the majority is using IDEA, but we still heavily rely on a common code style, static checks and a set of best practices. We use git with a common code repo and another repo for data center configuration related stuff. At the moment we use GitLab for code reviews and a homemade builder doing speculative merge builds and lots of awesomeness like stability tests and automatic deployment to the staging environment. Skipping the details, what I love most is that we can deploy anything within just a few minutes even at the current scale.

Initially we would rent hardware and run things on bare Ubuntu servers ourselves, using Upstart for continuous services and crontab for scheduled tasks. The implementation would be in Java with a BASH or Python wrapping and most of the internal services would often have a fixed HTTP port with JSON input/output. Data pipelines would often be implemented with a few simple of components, one of which would for example receive data and write it to disk using atomic appends, while another would tail the written data and crunch it or send it to another service (again using JSON over HTTP). This gave us many nice benefits for scaling (add more consumers behind a load balancer), failover (as long the data is written to disk somewhere, nothing is lost), debugging (just look up the logs) and monitoring.

With the growth mentioned earlier, we are now moving over to Mesos (running on top of our own hardware) and running things with Aurora. Nowadays, we use gRPC for communication between services, Consul for service discovery, Prometheus for performance metrics and alerting and Grafana for monitoring dashboards. Things are getting even better and deployments across multiple data centers and hundreds of machines are still within minutes. In fact, they are only getting faster.

Flow

When I started at Cxense, it was a typical startup – we used JIRA for issue tracking, but it was a mix of roadmap features, devs’ own ideas and FIXMEs, support issues reported by customers or deliverables promised by sales guys, with no clear definition for priorities or deadlines. We would pick tickets that we thought were most urgent or important and did our best to solve them most efficiently. We would discuss this in daily standups and perhaps present some cool new features at regular video calls. Once in a while each developer would have a hell week, when he or she would be facing all the incoming support issues, chats and alerts. The fun part in this was that one could hack a set of new features over a weekend or handle a hundred of support issues during a hell week. The boring part of this was figuring out where the company was going after all.

Today everything is much more organized. For the feature part, we have a roadmap for a year ahead, which then drains down to specific plan for each quarter for each team. For the support part, we have several lines of highly trained professionals, which filter out all the noise and leave only actual issues to be fixed. For most teams, we run two-week iterations, where tickets come from the development roadmap for the given quarter, pre-triaged support issues meeting the bar, or the technical roadmap issues. The last part covers things like scalability improvements, shifting out components with better alternatives, or making the system more resilient. We separated on-call incidents from the rest of the JIRA issues long time ago, but for the last year or so we have also been working on growing a highly skilled infrastructure team.

Each iteration starts with a planning phase involving team leads and engineering managers, followed by a meeting where team members are free to pick the issues they like to put their hands on and with respect to the amount of work they can handle in the next sprint. The sprint ends with a highlight of the most important things a team has achieved and a demonstration of the most interesting features.

People

The most important ingredient in a great workplace is people. Joining Cxense I had no expectations other that here would be the best people in the industry. This turned out to be true, and I am lucky to be working with some of the best engineers and business experts in the field. It is a pleasure to be a part of the discussions and changes we have done over the years and those to come.

Cxense is spread across 5 out of the 7 parts of world. Therefore, it is quite common that my day would start with a quick fix for a support issue from Tokyo, followed by a few quick answers to questions from my colleagues in Samara, Munich, Buenos Aires or San Francisco. The changes I made the day before would be reviewed by a colleague sitting next to me in Oslo and then distributed to the data centers across the world. After the lunch I would focus on the next feature, working with the data from a major news publisher in USA, Argentina, Norway or Japan. On the way home, I would comment on the global company chat and receive a funny cat picture from a colleague in Melbourne. The world is very small when it comes to Cxense.

In conclusion

Cxense grew a lot over the years I have been working here, but it remains a fast-paced company with talented people, exciting challenges and a cool software stack. Things are getting better and I can only imagine where we will be in the next five years. While working here I have learned a billion things about software development (and yet still learning), and I hope that this post was able to bring some of those insights to you. …ah, and by the way, we are hiring!