Aviad Mizrachi, Co-Founder and CTO at Frontegg, talks at the API Conference Berlin about best practices on how to authenticate and authorize your APIs, from the design phase to the real time implementation phase.
All right. So, we’ll talk about API access and why is it broken, and how to fix it. So first of all, a little bit about me. So, I’m Aviad, besides being the co-founder and CTO of Frontegg, I play the tuba, which is pretty nice. Obsessed about football, big fan of football here.
And depending of the code that I am writing or the task that I’m working on, it’s either classical music or metal. So, if we talk a little bit about numbers, there are few numbers around API calls and API bridges. So, starting with an amazing fact, that 94% of the applications, and this is based on OWASP, were tested for some form of broken access control.
Actually, broken access control just moved from number five to number one top-rated OWASP vulnerability where broken authentication was there for the last few years and now is still in the top 10. And just looking at this amazing number that Twitter published around 6 billion API calls per day, for Twitter, around 70,000 calls per second, where you can just imagine the number of API calls around the world.
That means that your API needs to be secured. And when I talk with my developers about the three questions that every API developer should ask himself, we talk always about who are you? Okay? So, that’s always… Start with, who are you? When the API is getting called, we want to make sure that the referral of the API, we want to know who is he?
And after who is he, we want to know where this API belongs to. Basically, resolving the context, the tenant context, the user context, we want to make sure that we always know that. And after that, the third question would be, what can you do? What can you do basically resides for the authorization part of the APIs. And that basically made a paradigm of what an API access is made of, okay?
So, you always start with authentication, and authentication is pretty straightforward, but the tenancy context and the authorization context are the parts where you can fail. And this is what we’re going to discuss today. We’re going to start with API authentication pitfalls.
We’re going to talk a little bit about the multi-tenancy context, we’re going to talk about the user context, how to avoid cross-tenants issues, and then we’re going to discuss the authorization context, how we are protecting it, and how we are doing it the right way. So, we all know what authentication is, right? We want to make sure that when a user is calling our API, when an API is…
whenever a client is calling our API, we want to know that the API is authenticated and who is the identity that is trying to reach our server. So, authentication is pretty straightforward. It’s a process of verifying the identity. So, someone is trying to call our API, we want to make sure that we know him, and he can actually access our API.
And that’s basically the idea and the essence of authentication. And it comes with several methods. It can be either a document passport that we go with when we fly, but in the digital world it’s usually going to be an email or password or certificate or multifactor authentication, or even a fingerprint that would define who the user is.
And usually, that’s a pretty straightforward scenario. A client tries to access our server, we’re going to send him to authenticate, and then we’re going to get either a session or any other method of proving that the user really is. But we want to make sure that when a user is approaching our server, we want to know who he is first.
So, let’s touch base on several of these that we just talked. Starting to fix some stuff on the broken authentication, for example. So, we’re going to talk a little bit about who are you? So, a reminder that who are you basically resides with the API authentication part. So, one of the most common issues with API authentication would be the session management.
So, most of the stuff that would see around broken authentication would be sessions. So, we see parts of broken authentication with session identifiers that are exposed. We’re going to reuse sessions after logout.
We’re going to not invalidate the session ID. And there are tons of ideas, okay, and tons of samples of how a session ID can be part of the URL. And that’s, like, a big no, no, because when we put the session ID on the URL, basically what happens is that this session goes through all the routers on the way, all the firewalls on the way, and basically means that this session can be hijacked pretty easily, okay?
So, we never want to use session on the URLs. We want to make sure that our sessions are stored on the server, with built-in session managers. We want to make sure that they are never running on the URLs. They need to be really, really securely stored, preferably on the HTTP-only sessions. And we want to make sure that once we are doing session management on the server side, we always invalidate sessions.
If we are doing session management, we always have a central place where we invalidate sessions after logout, okay, and we use it for timeouts and ideal sessions, etc. So, this is a great example of how session are managed. Usually, session are managed over cookies. So, we set a cookie, and again, when we are talking about web, when we set a cookie, we want to make sure that these sessions and these cookies are HTTP only, so they are not exposed via XSS or that kind of stuff.
They are always HTTP only. And when we are logging out the user, we want to make sure that he’s really logged out from either all of the devices, or when we terminate a session, we want to make sure that this session is really terminated. And the way to do it is through central session management on the backend side.
And when we talk about it, okay, so we are all developers, we always want to code stuff, right? So, we want to code session management, but do we have to? Right? We don’t want to reinvent the wheel on the session management because there are tons of open-source libraries that will help us do session management, whether we’re using Node.js, whether we are using Python, PHP, Java, .NET, whatever.
Okay? There are tons of third-parties libraries that we can use. There are tons of open-source libraries that we can use. We don’t want to code this part, okay? So, once we fix the session management, it’s time to talk a little bit about what else can go wrong on your authentication part.
Let’s dig a little bit on that code, okay? So, there is a code here, I’ll let it sit for a second. So basically, what it does, it goes to johndoe.com, and it tries to log in with all the common passwords we have downloaded right before that.
So that’s a common approach of what we call an automated attack, okay? So, I have a script somewhere which sits and tries to attack our server with password spraying, with DDoS attacks, tries to hijack some kind of session using all kinds of approaches. But usually, it would be a script that sits there and tries to take our server and tries to bypass basically all the approaches that we currently need.
And basically, there are some sort of automated attacks that we can block an attack. So, let’s talk a little bit about the APIs. There are public APIs usually which, where we use for authentication, sorry. Public APIs that we use for our login, signup, reset password, you know, all the authentication APIs that are public, the user don’t need to be authenticated in order to get them.
And usually for these ones, if you want to prevent anti-automation, we will use mechanisms like a captcha, like dynamic DDoS, we know to figure out IP-based filtering and that kind of stuff, maybe countries, but that’s for the public APIs. What are we doing when the user is already authenticated but then he got a way into the system and then he starts bombarding us with requests and that kind of stuff?
So usually what we will do is basically the way to do it is obviously using some mechanisms like DDOS, preventing IPs preventing sessions, being able to revoke sessions quite easily. And there are great ways of doing that, okay? If we are talking about rate limit, that’s a great Node.js example of how we are doing rate limits.
So, we have an API, and we want to make sure that once an IP is hitting this API, we want to give it a window where we don’t accept any other APIs, okay? For example, on the Create Account, we don’t want our users here to create too many accounts from a specific IP. So, we block them, and that way, we are protecting our backend using rate limits.
And that’s a pretty straightforward sample. There are a bunch of samples, much more complex, using artificial intelligence, of how to block requests and how to identify malicious sessions that are trying to bombard us on the server. And another great way of fixing automated attacks is basically log everything, okay?
So, we want to make sure that once a request hits our server, whether it’s an API gateway, whether… We are living in a distributed world of applications. We want to make sure that we log everything.
And the part of the stuff that we want to make sure that we log are stuff that’s going to help us understand if someone is attacking us or not. And that includes user agents, that includes printing out the headers and the cookies that are coming our way to know which session is it, where the session was originated for, whether it’s an IP address, whether it’s forwarded for on all the routers on the way and all the reverse proxies on the way, and basically, what the origin of the request was.
And that will help us basically identify malicious and wrong request and basically to block them right away. And another common way to prevent automatic on the authentication side would be to basically implement user lockouts, okay?
So, we have a user. I can try basically to steal an identity of someone which I know. Basically, I can try to log in using his credentials or to guess his credentials and try to log in. And I don’t need to be a script.
I can do it from my computer. So, the reCAPTCHA will not work here. But still, after several attempts, I expect to be locked. And the reason I expect to be locked is because if I’m not protected using automation scripts, I have to be protected using human errors and basically start delaying failed attempts and start logging out users for specific times to remove basically this burden of failed logins.
And here again, we want to make sure that we log, and we write all failures, and we notify administrators when something, which is called credential stuffing, is happening. And we want to block all brute force attacks on us.
Whether it’s human-based or whether it’s automation-based, we want to make sure that we log them and we block them once they happen. So, we talk a little bit about identity. And there two main types of doing identity management on the world of APIs, right? We all know both of them.
We know that they’re session-based and they’re JWT-based, and they are a bit different because when we do a session-based authentication, that means that the client is hitting our server with email or password or whatever, okay? We generate a session ID, we store it on the server side, and when the client is approaching us to get a little bit of data, he comes with the session cookie, we validate the session, and basically, we bring back the data based on this session.
Where JWTs are the same, but we don’t store anything on the backend side, right? So, we generate a JWT that’s a client token, we send the JWT back to the client, and the JWT will be used to verify the signature and to verify the authenticity of the request.
But we don’t validate anything which is stored on our site. And basically, that means that once we are building the distributed application, we are having some issues, right? Because if we are building a distributed application using session management, that means that whatever request needs to go to the resource server, if the resource server and the auth server are separated, that means that the resource server will have to communicate to the auth server for every request to validate the session.
And that means another round trip. That means that they probably need to have another layer of authenticity between them in order to validate the session. And that means that we are going to have problems of scaling comparing to JWT, where the resource server can just validate the JWT using any one of the open-source libraries, okay, and validate the token signature and expiry, right?
So, JWT is basically, you can shoot them wherever you want, the public key is wide open, and basically, the resource server can just validate the token signature on himself without basically communicating with the auth server. And that means that JWTs obviously have some benefits over session-based, where we can integrate pretty easily, right?
If we are doing JWT, that means that we can integrate with another servers, we can basically provide them with our public, and they can validate the JWT on themselves. And that means that we support scale. There is no single point of failure to validate sessions on every one of our integrations or any one of our servers.
Each server, each resource, each microservice can basically validate the JWT on its own. And that means that on the client side, even, on the web application, on the mobile device, once we get the JWT, we can validate the expiration, okay? We don’t need to go to the session to ask whether this session is valid. JWTs are basically Base64-signed information that can be validated just for expiration on the client side.
However, if we want to implement immediate logout, we want to have some sort of session management, and that requires us to use session-based authentication.
And some of the cases… basically, what we see in the modern world that JWT are the way to go because of the microservices world because, you know, modern application requires OAuth and that kind of stuff. And in a lot of integrations, JWT works. However, when we want to implement something around logout and that kind of stuff.
So, most of the cases we are using JWT. However, we do have some kind of basic session management on top of it to validate a logout and that kind of stuff or refresh tokens. One thing I want to mention here is, and that would be a little bit more to the authentication world and a little bit less on the API world.
However, when we define APIs and when we develop APIs, we have the responsibility to take care of our users. So, the world of authentication has evolved over the last couple of years, okay? It really evolved. And still, our users have not evolved so much. So, we see some amazing statistics about poor passwords, okay?
Most of our users admit that they use the same password for all of their accounts. That’s 53%. That’s an amazing number, okay? We have 81% of company data breaches which were related to poor passwords. And even Microsoft, right, they let us know that most of the Microsoft 365 admin do not enable MFA.
And that’s an amazing number. That’s the data of their users, where they pretty much don’t care about. So, when we develop our API, whether we are building it or we’re using third parties, we want to make sure that we are doing stuff to support passwordless authentication, which is way more secure than password authentication.
We want to make sure that we help our users, and we educate them to MFA everywhere, okay? We want to use single sign on whenever possible. And we see, like, a great feature, obviously, in GitHub where when we go to the super user spaces, changing settings, changing administrators, we want to make sure that we provide better security by requiring our users to reauthenticate, okay?
And that’s something that our API needs to support. We need to know that on the authentication side, we are forcing MFA, and we need to know that once we are going to the sensitive features, settings-related, user management, that kind of stuff, we want to give our users the feeling that they are secured, and we want to make sure that we are reauthenticating them before they access these features.
So, we talked a little bit about authentication. Any questions so far? Feel free to interrupt me. Good. So, if we talked a little bit about the API authentication side. So, we now know that once a user or a client hit our API, they are fully authenticated.
And that’s great. But how about the API context? So, we have a user, he authenticated through the authorization server, and now he’s trying to get any data from our resource server. And that’s pretty straightforward, right?
We do it, like, 10 times a day. So, when the user is approaching our resource server, he comes with, let’s… We just mentioned JWT. So, he comes with a JWT-encoded data, and there we know the user, and we know what user type it is, etc., etc. However, we need to make sure that once this user is trying to hit our resource server, we want to make sure that he comes with a matching context.
And one of the most common attacks that we see in OWASP, for example, is trying to bypass the access control. I can modify the URL, I can change some states in the HTML, I can change some states in the API request, okay? And basically, that results permitting some users, okay, viewing someone else’s account, okay?
And that’s, like, the worst nightmare every API developer has, to wake up in the middle of the night getting a call from someone which starts with these two words, “We’re hacked.” And basically, that means that once we are hacked, that means that every user can see any other data, okay?
And the multi-tenancy is completely broken, okay? So, let’s take a little bit of an example. So, I have a user authenticated. This is a JWT. In this use case, the user is myself, and I’m logged in because I’m building a multi-tenant application. I’m logged in, and I have my tenant ID as part of the claims on the JWT.
So now I’m trying to access this API and getting users of another tenant, right? So, I’m passing the tenant information on my URL. And basically, that’s a path [inaudible].
And that’s something that we see, like, a lot of times being happening. So, normally, what I would expect is to being stopped here and getting, like, a forbidden error, right? And being stopped because I cannot do that. And the way usually we try to do this stuff, and that’s, like, the general guidelines on how to block this kind of request, would be to proxy…
Usually, on the microservices infrastructure, we have an API gateway, whether it’s nginx or any other one that is validated in the JWT. So, on the JWT, what we’ll do, we’re going to make sure that we extract the claims for the tenant, for the user, etc., and we’re going to reverse proxy them to the microservice.
However, we need to be really careful when we do that because we want to make sure that once the request is hitting our nginx, we are removing any other incoming headers that we are going to reverse proxy because otherwise, we have a risk of header tampering. That means that I’m just sending the tenant ID header, and I can bypass this mechanism. So that’s something that we need to take care when we implement this.
And another good scenario would be, let’s try to avoid route params as much as we can. If the user is authenticated and we want to get the users of the tenant that he’s authenticated to, we don’t need this to be part of route params.
We have this from the context, okay? And if we have to use route params because this user is part of multiple tenants and we have hierarchies, etc., we want to make sure that we are using a guard, and this guard will basically mean and validate that this user really belong to the tenant that he’s trying to get the data from.
And here we have a good GraphQL sample of hitting the server and resolving the context. So here we are resolving the context from the authorization header. That’s a JWT, okay? And we decoded the JWT, and we got the user from the JWT, and now we are using this user to basically validate the query, okay, based on the context and based on the tenant ID, which comes from the context of the user.
And that’s, like, a good practice to use on every one of our resolvers or mutations that we are using on GraphQL. And if we are doing it on REST, it’s a little bit more work. But still, we need to put guards whether you are using Node.js, or Java or .NET, or even GOLANG to make sure that we are protecting ourselves because that’s really slippery way to go of validating context on tenants.
But that’s the most common pitfalls that we would see trying to build a multi-tenant product. So, we discussed a little bit about the API context, and that’s, like, as I mentioned, the most common pitfalls that we would see.
So, we dealt with the API authentication, we dealt with the context management, but we still have one more broken layer to fix, and that would be the authorization. So, the user is authenticated, we got the matching context, but how are we making sure that this user basically can get the data that he is trying to get?
So, there are a lot of… If you look… You know, I was browsing Twitter yesterday for elevation of privilege. Just look for Google for elevation of privilege. There are tons of, tons of, tons of CVEs and articles about what elevation of privilege is and how it’s being done.
But basically, it comes down to two main items. I’m acting as a user without being logged in. That was covered on the authentication side. If I’m logged in as a user, how can you know that basically I’m getting all the information or I’m doing the actions that I’m able to do as a user and I’m not really an admin, okay?
So, role-based access is pretty straightforward, right? But there are tons of, tons of, tons of stuff around it out there that we see in almost… most of the attacks that we see with elevation of privilege.
And that’s only on the role-based access. What happens if I try to access non-privileged entities? For example, I’m trying to access a GitHub repository of another company. How am I being blocked? Right? On my team, okay, we are all on the same organization. There are some repositories that are for one team, there are certain repositories for another team.
I have a subscription plan, I’m in the early-stage plan, and I’m trying to get features of my growth plan, okay, or hidden features which are closed with a feature flag. All of these will fall back on the non-privileged entities attacks that we see pretty much on a daily basis.
And there are all kinds of techniques to basically elevate privileges. We can do access token manipulation, we can try using non-authenticated access for all kinds of APIs, trying to attack the APIs one by one to make sure that all of them are validating the JWTs. We can do some account manipulation, right, to try to change the account from a non-authenticated or authenticated API using query params, using path params, etc.
So, this is where everything comes into place with what I call the authorization pyramid. So, if we take a look at the most common stuff we would deal around authorization on our APIs and our product, that would be first of all, the most straightforward stuff we will do is around our back, right?
We are going to introduce… These APIs are for admins, these APIs are for viewers, these APIs are for operators. And we’ll deal around it. And then our product manager comes and say, “Okay. I want to introduce some subscription,” and say “Okay. These APIs are for the growth plan, these APIs are for the early stage,” etc. And then we’ll introduce some feature flagging mechanism.
Then we’ll introduce some entity-based authorization so these APIs can be resolved only on the context of this GitHub repository, and this API needs to be resolved only on the context of this GitHub repository if the user has access to this GitHub repository. And that means that we have a lot of stuff that we want to do before we are building this stuff.
So, a few guidelines. We want to make sure that we have a public API, public resource, okay? All the other ones need to be denied by default, okay? And we want to implement some access control mechanisms on top of this, okay, whether it’s calls, whether it’s other mechanisms.
And every API that we implement needs to have some kind of validation over the record ownership. I’m trying to read the repository in GitHub. GitHub makes sure that I can actually read this repository, this repository belongs to me. And basically, on top of that, we’re adding some business limit requirements that basically help us understand that which user can access what under which circumstances.
And the way we did it so far is, right, I just showed you. We put the data on the JWT so we couldn’t claim on the JWT, enforce it on the server side, we can decode and validate it on the frontend side, and that’s pretty much pretty straightforward.
However, when we take it one step further from the basic RBAC, role-based access control mechanisms, we start dealing with entities, we start dealing with hierarchies, we start dealing with feature flags. And when we try to put them on JWTS, okay, we start hitting questions like this.
Our APIs are blocked because there is max size of HTTP headers, which are enforced on all kinds of places, whether it’s Node.js., whether it’s nginx, whether it’s firewalls on the way. And basically, if we put everything on the JWT, our application will fail, and we will start collapsing our APIs.
So, the new way to do this stuff is to use a mechanism of policies as code, okay? So, I’m not going deeper into this. I really strongly suggest that you read… These are the main two actors in this area, the open-policy agents, which is a great open-source, which implements a great mechanism called Rego.
Rego is the way to define the policies, and you can define which user under which circumstances can see which entity under which scopes. And the other way to do it is OSLO. OSLO is a great open-source library as well, which lives as part of your code, which helps you validate policy as code as well. And then we have client, which is approaching our API gateway.
We just saw that on the API gateway, we are validating the authentication of the users, and we are passing the reverse proxy context back to the resource server. And now the resource server, when we get the get method with the query param, we want to make sure that once we go to the database, before that we are validating if this user can actually see this resource.
So, this user has read permission on one resource but doesn’t have read permission on another resource. And the way to validate this would be on the policy as code. So when we take a look, because we are running out of time, so when we take a look at all of these, we go and we see that we start with the API authentication and we do JWT tokens and we do rate limits, and we really want to use anti-automation, and that’s, like, one of the most common stuff that we would see on any start of project or any API, basic API controls.
But that’s not enough to validate the authentication because we need to always take care of the user context, the multi-tenant context, and we do it with path control, we do it with guards, we do it with reverse proxies and context-aware APIs. And once we are doing context-aware APIs, we go and reach the authorization context.
And the authorization context is validated, whether we are using RBAC, role-based access control, we are using claims on the JWT to validate it. But if we are using something which is far more complex than just basic JWT and role validation, we want to make sure that we are enforcing the record ownership.
Whether it’s using OPA, whether it’s with the feature flags mechanism, whether it’s with the subscription mechanism, we want to make sure that this user can actually do whatever the user is intending to do. And that is done using all the tools that I just showed you. So, that’s pretty much was a really brief how to fix API access control.
And I’ll open for questions now if you have some.
The Complete Guide to SaaS Multi-Tenant Architecture