This post will teach you the inution of REST APIs and how you can use them to get interesting datasets for your data projects. First, we will look at the four components of a request. In the second part of this blogpost, we will go through one example and access the coingecko API via curl
.
So let’s start from 0: What is an API? API stands for application programming interface. APIs contain a set of methods that allow the communication between your computer and a server. In other words, they send back and forth data using HTTP requests. You can retrieve data (GET
), send data (POST
,PUT
) or delete data (DELETE
). You have probably seen the “Like on Facebook” or “Share on Twitter” buttons on various websites. When you click one of these buttons, the site you’re visiting communicates with your Facebook or Twitter account, and alters its data by adding new likes or creating a tweet. For you as a data scientist/analyst APIs provide a great opportunity to retrieve interesting datasets. For example the Kayak API let’s you access a lot of travel data while the imdb API contains movie data. Also some models like BERT are accessible through an API.
There are different ways and rules to follow when creating APIs. REST stands for “Representational State Transfer” and is just one option. A REST request can have four components:
- method
- endpoint
- header
- body
We’ll look into each component and then puzzle them together. After this chapter, you will understand the structure of a REST call and how to use it to get data.
1. Method
Request methods characterize what action we are going to take by referring to the API. In total, there are four main types of actions, but only the GET
method is important when downloading data. POST
, PUT
and DELETE
modify data. The following table gives an overview.
method | what it does | |
---|---|---|
GET |
requests data from a server | |
POST |
adds new data to the server | |
PUT |
change existing data | |
DELETE |
delete existing data |
2. Endpoint
An endpoint is the Unique Resource Location (URL) where a service is accessed. So it is like a navigation that routes you to your destination. Let’s briefly talk about the different components of a URL by looking at an endpoint. One example from the twitter API for getting tweets from a specified location:
`https://api.twitter.com/2/tweets/:ID?expansions=geo.place_id`
- Scheme: identifies the protocol of the browser, e.g.
http
orhttps
. The:
is the scheme separator. The//
establishes the start of a domain. - Domain Name, a.k.a. host: name of intended host or web server that is being requested (here:
api.twitter.com
) - Path: route to the resource you want to access. Paths are just like paths on a website. Any colons
(:)
on a path denote a variable, which you need to replace with an actual value. E.g. you replace:ID
with your user id. This is one option to pass parameters to a rest request. - Querystring: query parameters give you the option to modify a request. A querystring begins with
?
followed by parameters and their values. Multiple parameters are separated with a&
(e.g.?param1=1¶m2=2
). You find information about parameters in the API documentation. So you you know where to go, the next step is what to do.
3. Header
Headers contain meta information e.g. for authentication or for passing information about the body content ([see] (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers)). In a REST request, you can add a header by typing --header
(or short -H
) followed by a string containing the header fields as colon-separated key-value pairs. For example if we wanted to authenticate accessing the twitter API:
curl 'https://api.twitter.com/2/tweets/:ID?expansions=geo.place_id' --header 'Authorization: Bearer $BEARER_TOKEN'
--header
denotes that the header arguments start. Authorization
is the key and Bearer $BEARER_TOKEN
is the value (in this case the token) we pass.
4. Body
The request body is used to send and receive data. So instead of passing data/parameters in the url (path parameters) or as a querystring, you can also send information in the body. The body can be in HTML, XML, JSON or other formats. You can add a body with --data
(or short -d
) as in this example from the twitter API.
curl --request POST
--url https://data-api.twitter.com/insights/engagement/totals
--header 'accept-encoding: gzip'
--header 'authorization: OAuth oauth_consumer_key="consumer-key-for-app",oauth_nonce="generated-nonce",oauth_signature="generated-signature",oauth_signature_method="HMAC-SHA1", oauth_timestamp="generated-timestamp",oauth_token="access-token-for-authed-user", oauth_version="1.0"'
--header 'content-type: application/json'
--data '{
"tweet_ids": [
"1060976163948904448","1045709644067471360"
],
"engagement_types": [
"favorites","replies","retweets","video_views","impressions","engagements"
],
"groupings": {
"perTweetMetricsOwned": {
"group_by": [
"tweet.id",
"engagement.type"
]
}
}
}'
Example: Sending requests via curl
There are different ways to access APIs. One quick appoach is through using the curl
command from your terminal. You just need the keyword curl
(which stands for client url) and our four components of the REST call.
Anatomy of a rest call via curl
Let’s recapitulate the different components of a rest call. We need a method, a URL and optionally a header and a body. So you can write a request with the following options:
curl
--request
+ method (GET
/POST
/DELETE
/PUT
) + header (--header
) + URL (in quotes) + body (--data
)
You can use a long or a short version to pass options when making the request.
short | long | what it does |
---|---|---|
-X |
--request |
HTTP method to be used (can be omitted????) |
-d |
--data |
Data/body to be sent |
-H |
--header |
Header to be sent |
Before you make your first request, check if you have curl installed:
curl -version.
If you don’t, you can get it here
Amazing, you can start making requests to (?????) an API. Oftentimes you will need to request developer access. Don’t let that scare you off. Usually, you will be allowed to use the API a couple of days later. For this blogpost example, let’s work with the coingecko API, because you can directly start using it. Coingecko is a website that contains information and metrics of the crypto currencies.
We will start with a test request to make sure, the API server is available
curl -X GET "https://api.coingecko.com/api/v3/ping" -H "accept: application/json"
returns
{
"gecko_says": "(V3) To the Moon!"
}
Good, looks like it is running.
Now you can study the documentation for the data you’re interested in. Let’s say you want to get the exchange rates of your crypto currencies in the current moment.
- base url:
https://api.coingecko.com/api/v3/
- path:
/exchange_rates
- Head:
accept: application/json
(this API returns a response in json format per default)
Wrapping this into a curl
request:
curl -X GET "https://api.coingecko.com/api/v3/exchange_rates" -H "accept: application/json"
You will get the exchange rates in json format:
{
"rates": {
"btc": {
"name": "Bitcoin",
"unit": "BTC",
"value": 1,
"type": "crypto"
},
"eth": {
"name": "Ether",
"unit": "ETH",
"value": 38.915,
"type": "crypto"
},
"ltc": {
"name": "Litecoin",
"unit": "LTC",
"value": 202.191,
"type": "crypto"
},
...
As a Data Scientist/Analyst you may be more interested in the historical development of a specific currency. Checking the API documentation you will find out, that you will need to pass the coin id through a path parameter and a date as a query parameter and an optional query parameter for localization.
- base url is the same as before
https://api.coingecko.com/api/v3/
- path with a placeholder for the coin id
/https://api.coingecko.com/api/v3/coins/{id}/history
. Let’s go withbitcoin
here. - query parameter for date as string in
dd-mm-yyyy
format
Leads to the following request:
curl -X GET "https://api.coingecko.com/api/v3/coins/bitcoin/history?date=01-01-2020" -H "accept: application/json"
Which returns a lot of data about bitcoin (price, market cap, facebook likes etc.) on that date.
{
"id": "bitcoin",
"symbol": "btc",
"name": "Bitcoin",
"image": {
"thumb": "https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579",
"small": "https://assets.coingecko.com/coins/images/1/small/bitcoin.png?1547033579"
},
"market_data": {
"current_price": {
"aed": 26429.239288693556,
"ars": 430563.0457102599,
"aud": 10256.814195551682,
"bch": 35.13945542496315,
"bdt": 610699.0910671571,
"bhd": 2712.9039956563156,
....
Request messages
Nice, if everything worked well, you should have received 200 messages. This means your request has succeeded. Let’s briefly look into the different messages that you might see. Messages with a number >= 400 indicate an error.
number | what does it mean? | example |
---|---|---|
200+ | request has succeeded | any of the above requests |
300+ | request is redirected to another URL | your request is redirected to from https://api.coingecko.com/api/v3/ to an older version https://api.coingecko.com/api/v2/ |
400+ | an error from the client side has occurred | e.g. you have a typo one of your parameter values, e.g. in the coin id /https://api.coingecko.com/api/v3/coins/bitcion/history |
500+ | an error from the server has occurred | e.g. there is a problem with the software running on the coingecko server. You can try running the request again or you will have to wait untill someone restarted the server |
Now you might be wondering, how to deal with all this json printouts in your console. One option would be, to write them into a file
curl -X GET "https://api.coingecko.com/api/v3/coins/bitcoin/history?date=01-01-2020" -H "accept: application/json" >> output.json
There is also the very useful jq
tool which formats and highlights JSON. If you have jq installed, you can just run curl ... | jq
to format the output:
curl -X GET "https://api.coingecko.com/api/v3/coins/bitcoin/history?date=01-01-2020" -H "accept: application/json" >> output.json | jq
Or you make your requests directly from python and write the results into python variables. You can use the python requests package.