This post will teach you the inution of REST APIs and how you can use them to get interesting datasets for your data projects. First, we will look at the four components of a request. In the second part of this blogpost, we will go through one example and access the coingecko API via curl.

So let’s start from 0: What is an API? API stands for application programming interface. APIs contain a set of methods that allow the communication between your computer and a server.  In other words, they send back and forth data using HTTP requests. You can retrieve data (GET), send data (POST,PUT) or delete data (DELETE). You have probably seen the “Like on Facebook” or “Share on Twitter” buttons on various websites. When you click one of these buttons, the site you’re visiting communicates with your Facebook or Twitter account, and alters its data by adding new likes or creating a tweet. For you as a data scientist/analyst APIs provide a great opportunity to retrieve interesting datasets. For example the Kayak API let’s you access a lot of travel data while the imdb API contains movie data. Also some models like BERT are accessible through an API. 

There are different ways and rules to follow when creating APIs. REST stands for “Representational State Transfer” and is just one option. A REST request can have four components:


  1. method
  2. endpoint 
  3. header
  4. body

We’ll look into each component and then puzzle them together. After this chapter, you will understand the structure of a REST call and how to use it to get data.

1. Method 

Request methods characterize what action we are going to take by referring to the API. In total, there are four main types of actions, but only the GET method is important when downloading data. POST, PUT and DELETE modify data. The following table gives an overview.


method what it does
GET requests data from a server
POST adds new data to the server  
PUT change existing data  
DELETE delete existing data

2. Endpoint

An endpoint is the Unique Resource Location (URL) where a service is accessed. So it is like a navigation that routes you to your destination. Let’s briefly talk about the different components of a URL by looking at an endpoint. One example from the twitter API for getting tweets from a specified location:


`https://api.twitter.com/2/tweets/:ID?expansions=geo.place_id`
  • Scheme: identifies the protocol of the browser, e.g. http or https. The : is the scheme separator. The // establishes the start of a domain.
  • Domain Name, a.k.a. host: name of intended host or web server that is being requested (here: api.twitter.com)
  • Path: route to the resource you want to access. Paths are just like paths on a website. Any colons (:) on a path denote a variable, which you need to replace with an actual value. E.g. you replace :ID with your user id. This is one option to pass parameters to a rest request.
  • Querystring: query parameters give you the option to modify a request. A querystring begins with ? followed by parameters and their values. Multiple parameters are separated with a & (e.g. ?param1=1&param2=2). You find information about parameters in the API documentation. So you you know where to go, the next step is what to do.

3. Header

Headers contain meta information e.g. for authentication or for passing information about the body content ([see] (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers)). In a REST request, you can add a header by typing --header (or short -H) followed by a string containing the header fields as colon-separated key-value pairs. For example if we wanted to authenticate accessing the twitter API:

curl 'https://api.twitter.com/2/tweets/:ID?expansions=geo.place_id' --header 'Authorization: Bearer $BEARER_TOKEN'

--header denotes that the header arguments start. Authorization is the key and Bearer $BEARER_TOKEN is the value (in this case the token) we pass.

4. Body

The request body is used to send and receive data. So instead of passing data/parameters in the url (path parameters) or as a querystring, you can also send information in the body. The body can be in HTML, XML, JSON or other formats. You can add a body with --data (or short -d) as in this example from the twitter API.

curl --request POST 
  --url https://data-api.twitter.com/insights/engagement/totals 
  --header 'accept-encoding: gzip' 
  --header 'authorization: OAuth oauth_consumer_key="consumer-key-for-app",oauth_nonce="generated-nonce",oauth_signature="generated-signature",oauth_signature_method="HMAC-SHA1", oauth_timestamp="generated-timestamp",oauth_token="access-token-for-authed-user", oauth_version="1.0"' 
  --header 'content-type: application/json' 
  --data '{
                "tweet_ids": [
                    "1060976163948904448","1045709644067471360"
                ],
                "engagement_types": [
                    "favorites","replies","retweets","video_views","impressions","engagements"
                ],
                "groupings": {
                    "perTweetMetricsOwned": {
                        "group_by": [
                            "tweet.id",
                            "engagement.type"
                        ]
                    }
                }
            }' 

Example: Sending requests via curl

There are different ways to access APIs. One quick appoach is through using the curl command from your terminal. You just need the keyword curl (which stands for client url) and our four components of the REST call.

Anatomy of a rest call via curl

Let’s recapitulate the different components of a rest call. We need a method, a URL and optionally a header and a body. So you can write a request with the following options:

curl --request + method (GET/POST/DELETE/PUT) + header (--header) + URL (in quotes) + body (--data)

You can use a long or a short version to pass options when making the request.


short long what it does
-X --request HTTP method to be used (can be omitted????)
-d --data Data/body to be sent
-H --header Header to be sent

Before you make your first request, check if you have curl installed:

curl -version.

If you don’t, you can get it here

Amazing, you can start making requests to (?????) an API. Oftentimes you will need to request developer access. Don’t let that scare you off. Usually, you will be allowed to use the API a couple of days later. For this blogpost example, let’s work with the coingecko API, because you can directly start using it. Coingecko is a website that contains information and metrics of the crypto currencies.

We will start with a test request to make sure, the API server is available

curl -X GET "https://api.coingecko.com/api/v3/ping" -H "accept: application/json"

returns

{
	"gecko_says": "(V3) To the Moon!"
}

Good, looks like it is running.

Now you can study the documentation for the data you’re interested in. Let’s say you want to get the exchange rates of your crypto currencies in the current moment.

  • base url: https://api.coingecko.com/api/v3/
  • path: /exchange_rates
  • Head: accept: application/json (this API returns a response in json format per default)

Wrapping this into a curl request:

curl -X GET "https://api.coingecko.com/api/v3/exchange_rates" -H "accept: application/json"

You will get the exchange rates in json format:

{
  "rates": {
    "btc": {
      "name": "Bitcoin",
      "unit": "BTC",
      "value": 1,
      "type": "crypto"
    },
    "eth": {
      "name": "Ether",
      "unit": "ETH",
      "value": 38.915,
      "type": "crypto"
    },
    "ltc": {
      "name": "Litecoin",
      "unit": "LTC",
      "value": 202.191,
      "type": "crypto"
    },
    
    ...
    

As a Data Scientist/Analyst you may be more interested in the historical development of a specific currency. Checking the API documentation you will find out, that you will need to pass the coin id through a path parameter and a date as a query parameter and an optional query parameter for localization.

  • base url is the same as before https://api.coingecko.com/api/v3/
  • path with a placeholder for the coin id /https://api.coingecko.com/api/v3/coins/{id}/history. Let’s go with bitcoin here.
  • query parameter for date as string in dd-mm-yyyy format

Leads to the following request:

curl -X GET "https://api.coingecko.com/api/v3/coins/bitcoin/history?date=01-01-2020" -H "accept: application/json"

Which returns a lot of data about bitcoin (price, market cap, facebook likes etc.) on that date.

{
  "id": "bitcoin",
  "symbol": "btc",
  "name": "Bitcoin",
  "image": {
    "thumb": "https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579",
    "small": "https://assets.coingecko.com/coins/images/1/small/bitcoin.png?1547033579"
  },
  "market_data": {
    "current_price": {
      "aed": 26429.239288693556,
      "ars": 430563.0457102599,
      "aud": 10256.814195551682,
      "bch": 35.13945542496315,
      "bdt": 610699.0910671571,
      "bhd": 2712.9039956563156,
      
      ....
      

Request messages

Nice, if everything worked well, you should have received 200 messages. This means your request has succeeded. Let’s briefly look into the different messages that you might see. Messages with a number >= 400 indicate an error.


number what does it mean? example
200+ request has succeeded any of the above requests
300+ request is redirected to another URL your request is redirected to from https://api.coingecko.com/api/v3/ to an older version https://api.coingecko.com/api/v2/
400+ an error from the client side has occurred e.g. you have a typo one of your parameter values, e.g. in the coin id /https://api.coingecko.com/api/v3/coins/bitcion/history
500+ an error from the server has occurred e.g. there is a problem with the software running on the coingecko server. You can try running the request again or you will have to wait untill someone restarted the server

Now you might be wondering, how to deal with all this json printouts in your console. One option would be, to write them into a file

curl -X GET "https://api.coingecko.com/api/v3/coins/bitcoin/history?date=01-01-2020" -H "accept: application/json" >> output.json

There is also the very useful jq tool which formats and highlights JSON. If you have jq installed, you can just run curl ... | jq to format the output:

curl -X GET "https://api.coingecko.com/api/v3/coins/bitcoin/history?date=01-01-2020" -H "accept: application/json" >> output.json | jq

Or you make your requests directly from python and write the results into python variables. You can use the python requests package.