Benefits and cons - GraphQL #1

If you never heard about GraphQL or never used it:

GraphQL is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data - Wikipedia

In more common terms, GraphQL is primarily a tool to query data from an API.

Instead of describing GraphQL using a bunch of sentences and writing paragraphs, let's look at the REST issue and why GraphQL is a great answer to that.

REST inconvenience

Before we go any further, know that REST is not an obsolete technology, far from it. This is not an epilogue to discredit developers who build APIs this way, because REST still has advantages over GraphQL. It always depends on your needs.

Let's say we have a User class:

class User 
{ 
    int Id 
    string FirstName 
    string LastName 
    string NickName 
}

With REST, an endpoint returns an object. In our case, the GET route /api/user/{id} returns the user with the Id {id}.

One endpoint for one object

Since an endpoint always returns exactly the same structure, we are faced with two choices:

Different uses will call the same endpoints, even if they don't use all the attributes of the object.
Create an armada of endpoints to send only the attributes used.

The first case is mostly used nowadays. In our example, the User object contains 4 attributes, and you only need Nickname to fill the menu bar of the website, which means that 3 attributes received from the server will not be used.

The more data you send, the more it will overload your server. To reduce the size of the exchanges, the data must be filtered on the server side, not on the client side.

The second point is more delicate. Splitting the global endpoints into a series of secondary endpoints leads to a major problem: maintainability. If you've ever worked on legacy applications, this is your worst nightmare.

One call for each endpoint

To improve the client's page, all available houses are displayed (like a bad copy of Airbnb) with the following object:

class House 
{ 
    string Address 
    string City 
}

The page now calls 2 endpoints:

/api/user/{id} for user information
/api/houses to retrieve all houses.

We assume that the page will not only display this information, additional calls can be made. You get the idea: the client will be slower depending on the number of endpoints.

One way to improve the situation is to combine the two endpoints into one. Unfortunately, users and houses are not related (it's all houses, not users' houses).
Building endpoints with unrelated objects decreases maintainability.

How can I be so sure of this? Try writing the endpoint route.

/api/user/{id}/routes ? /api/homepage/{id} ? /api/user/{id}/homepage ?

Each proposal has an ambiguous definition. The first one should return all the user's homes. The second one should return homepage number 1. The last one should return the homepage of the user {id} but this means that the homepage differs for each user, which is wrong.

GraphQL to save us all

GraphQL was developed internally by Facebook in 2012 before being publicly released in 2015 - Wikipedia

At first, GraphQL was created mainly to interact with Node. Then it became an open-source project and frameworks started to develop libraries to interact with it. Symfony got one here, .NET Core also here, so I guess the major frameworks do.

I use it frequently for .NETCore projects, and it works like a charm.

To make it easier to understand, GraphiQL, a graphical interface developed for GraphQL, allows to query the server directly. This is the same behavior as Swagger for classic APIs. You can check it on the demo :

SWAPI GraphQL API

As you can see, it creates an interface for requests. This is a useful tool if you don't want to use Postman every time you test an API. It also generates documentation about your data, like a glossary of available queries, fields, enums...

Let's dive into it.

How to construct queries

The structure of the query follows the pattern:

rootfield { fieldA(parameters) { subfieldA }}

rootfield defines the type of action: Query to retrieve the data, Mutation to modify it (insert, update, delete...), and now Subscription to connect to WebSocket.
field defines all the fields to be retrieved in the query.
parameters represents the filters to be applied on a query such as the filter on the data or the number of result rows. The parameters can be created according to your needs.
subfield has the same structure as the field, but one layer below.

In order to simplify the explanations, we will only use the root field Query. If we populate the previous queries with the User object, we can query:

query { users } : returns all users
query { users { firstname nickname }} : returns the firstname and nickname of all users
query { users (id:1) { firstname nickname }}: returns fisrtname and nickname of users with id 1, in our case, only one element.

firstname and nickname are subfields of user because you can't globally query an object as you can with * in SQL. The subfields allow you to specify the exact properties requested.

On the same principle, subfields allow you to cascade queries within the navigation properties. Admitting that User has 2 parents to display it on a certificate, you can request it by : query { users { parents { firstname nickname }} and so on.

Well, the armada problem is solved. If you haven't noticed, all requests ask for the same endpoint, here /graphql. You no longer need to maintain a million endpoints. The client can request an endpoint directly and choose to get what it needs.

How to group queries

GraphQL goes even further by allowing multiple queries in a single request. Yes, you read the previous sentence correctly.

query { users (id:1) { nickname } houses { address country }}

The query combines the two queries, the one to get the user's nickname and the one to get all the houses.

Keep in my mind this is not the entire possibility of GraphQL, but I could consecrate another article just to explain all possibilities (check out Fragments if you are curious).

As we can see, GraphQL responds quite well to REST queries. A call can ask for several queries, and for each of them, select exactly the properties you need.

...As far as possible

If the first thing you have in mind is: "Oh, it sounds perfect, why bother me to keep using REST ?". Let's get to the point!

Better for data API than functional API

GraphQL is rather made for data queries, and a little less for functional APIs.

To better understand it, you need to know how GraphQL is set up.

At the beginning, it uses a function called CreateSchema which stores all the classes, types and enumerations of the database classes in the schema. This is how the documentation is generated, and how it solves queries.

All new queries should also be registered. To continue with the example, let's say we want to display all the house prices. The price attribute is not stored in the database, but is calculated from a function.

The query query { getHousePrice(id:1) { price } } returns the object HousePrice with 2 attributes: price and address.

HousePrice and getHousePrice are both elements GraphQL does not know their existence, so it has to be added to the schema to be used from a query.

In the same way as you configure a route on REST, you have to configure the scheme under GraphQL. It is not a complex task, but it requires more time.

HTTP status code managing

Concerning error handling, things get a bit more complex. Since REST uses one endpoint per call, it's quite easy to manage the codes returned. GraphQL does not.

HTTP code returned corresponds to the GraphQL call, not about the object inside.

Imagine your table Users now have authorization. You expect to get a status 401 for each client who is not authenticated. GraphQL will return 200 anyway because the call GraphQL worked. But it'll be empty and will contain inside information about the fact that the user can not access this data.

If you want to go further, you can check out how errors should be managed:

Caching queries result

In the single endpoint continuum, query caching is much more difficult since it cannot infer the query from the endpoint.

One way to make caching easier is to use persistent queries. In a nutshell, this involves pre-registering requests with the server and returning an identifier. Instead of calling the query, clients will now send the identifier instead of the full query string.

Last but not least, GraphQL has a longer learning curve than REST. As with every new tool, the learning phase cannot be set aside for the developer team.

There are other topics that are more complex to manage such as the uploading of files, querying in indefinite depth, or the rate limitation, but they are intended to become much simpler in future versions of the library.

Conclusion

GraphQL is undeniably a great tool for building APIs. It fits perfectly into a data-driven API, consolidating all requests into a single call with a single endpoint.

In addition to overcoming the problems associated with REST principles, it facilitates teamwork thanks to the documentation automatically generated by GraphiQL. In my experience, documentation between front and back is often converted into a waste of time for the team. Small things can have huge impacts.

But, as life is not a bed of roses, there is a "but".

REST can still be a great choice, especially for functional APIs. It's still the most common way to code services, and it's still possible. Embedding functions, not database classes, produces additional code that can complicate the GraphQL schema. Error handling and caching are clearly more difficult to implement because of the single endpoint.

The advantages go hand in hand with the disadvantages.

Think about the purpose of the service first, then choose the architecture you are most comfortable with. Both are always awesome for building APIs.

I hope you are more aware of what GraphQL does and doesn't do.

Have a great day everyone!

#GraphQL