You have built a great streamlit app. So far, you only ran it locally on your computer on localhost:8501. Now you would like to share your app with others, but wonder how. This blogpost introduces you to one option: Heroku. Heroku is a platform as a service that allows you to deploy your apps (not just streamlit apps, but also jvm apps, ruby apps etc.). This post will guide you through the deployment of a streamlit app on Heroku.

Continue reading

In many scenarios, such as a google search or a product recommendation in an online shop, we have tons of data and limited space to display it. We cannot show all the products of an online shop to the user as a possible next best offer. Neither would a user want to scroll through all the pages indexed by a search engine to find the most relevant page that matches his search keywords. The most relevant content should be on top. Learning to rank (LTR) models are supervised machine learning models that attempt to optimize the order of items. So compared to classification or regression models, they do not care about exact scores or predictions, but the relative order. LTR models are typically applied in search engines, but gained popularity in other fields such as product recommendations as well.

Continue reading

I was recently invited to join a panel discussion among developers to dispel the myth of the typical BS Buzzword Bingo around machine learning and AI. In this blog post, I will share some buzzwords we talked about with a little description and links. Ooops, I already used some buzzwords. So let’s start. AI (Artificial Intelligence) is the magic portion to fix all problems of all companies and will make us unemployed in the future.

Continue reading

Humans intuitively understand the meaning of words: Which words are similar, opposites or related to each other? But our machine learning models do not have this intuition. Word embeddings are numeric vectors that represent text. These vectors are learned through neural networks. The objective when creating these embedding vectors is to capture as much “meaning” as possible: Related words should be closer together than unrelated words. Also, they should be able to preserve mathematical relationships between words such as

Continue reading

In this blogpost I will share some tips for working with Jupyter Notebooks. Those tips greatly improved my productivity when working with Jupyter Notebooks and I wish someone would have told me earlier. The two main topics of this post are extensions and magic commands. Jupyter Extensions Have you ever missed a feature in your Jupyter Notebook that IDEs have? E.g. you were hoping for autocompletion or automatically formatting code? Then there might be a Jupyter Notebook extension for you.

Continue reading

Elasticsearch is often the storage engine of choice for storing and querying full text data. But writing an ElasticSearch query is pretty different compared to querying a relational database in SQL. In this blogpost, you will learn some basics you need to understand before working with ElasticSearch. In the second part, you learn how to write queries in ElasticSearch. ElasticSearch uses many of the same concepts as your SQL Database. The terminology is just a little different.

Continue reading

Searching through full text fields with regexes in relational database systems like PostgreSQL or MySQL is painful: The query latency is high and your results will be unordered, so you have no idea how relevant your query results are. Elasticsearch is often the storage engine of choice for storing and querying full text data. In ElasticSearch querying fulltext fields is among the least resource intensive tasks and your query results are ordered putting the most relevant results on top.

Continue reading

table { width:80% !important;} The basic idea of complex datatypes is to store multiple values in a single column. So if you are working with a Hive database and you query a column, but then you notice “This value I need is trapped in a column among other values…” you just came across a complex a.k.a. nested datatype. There are three types: arrays, maps and structs. First, you have to understand, which types are present.

Continue reading

Author's picture

Heike W


Data Scientist @ Xing GmbH

Germany