[March 2021 update]: This write-up was chosen as the first place winner of the 2020 GCP VRP prize, and LiveOverflow made an amazing video explaining how the vulnerability was found.
By using an internal (dogfood) version of the Google Cloud Deployment Manager, I was able to issue requests to some Google internal endpoints through Google's Global Service Load Balancer, which could have led to RCE.
This could be achieved through a request to the Deployment Manager to create a Type Provider, but adding an undocumented field called googleOptions: Example.
This begins an async operation, in which the Deployment Manager attempts to retrieve a descriptor document from the specified descriptor URL.
If it fails, it might still provide information in the error message, such as the response from the internal server. If it succeeds, it would allow an attacker to issue complex internal requests.
Examples: App Engine Admin API (Internal test version), Issue Tracker Corp API.
Note the issue is not limited to requests to APIs, it just works best on them; example of non-API endpoint (Google Accounts and ID Administration "GAIA" backend - Test endpoint) - The descriptorUrl doesn't matter there, since we expect it to fail because it is not an API.
Google paid $31,337 as a reward for the bug report.
[March 2021 update]: Google paid an additional $133,337 prize as part of the 2020 GCP VRP prize, thus a total of $164,674 was paid for this report + write-up.
Deployment Manager is a Google Cloud service that provides a way to handle infrastructure resources' creation, deletion, and modification, programmatically (Infrastructure as code).
Relevant Deployment Manager concepts are:
- Type: Describes the properties of a specific kind of infrastructure resource (For example: VMs, issue tickets, user permissions), there are several pre-defined Types available in Deployment Manager (Called base types)
- Type Provider: Provides a service's RESTful API endpoint, with its descriptor document, for Deployment Manager to manage Types within that service (For example: An API to manage VM instances)
- Resource: Represents an instance of a single infrastructure resource, provided by a Type (For example: A VM instance)
- Templates: Reusable Python or Jinja2 files to programmatically configure Resources
- Deployment: A collection of Resources that are deployed and managed together
- Operation: Whenever a creation, modification, or deletion, action is done in the Deployment Manager, an Operation is returned which can be polled to check for completion or error
The main way to interact with the Deployment Manager is through its REST APIs, of which there are two documented versions: v2 (Generally available) and v2beta (In public beta) (Read more about Google products' launch stages).
A key difference between both versions, is that Type Providers are only available in the v2beta version.
It is a bit hard to understand Google Cloud Deployment Manager at first glance, if you are interested in it, I would recommend you play around with it, especially through the v2beta REST API. Read the docs, and try creating Deployments and Type Providers to get the hang of it.
I tried to link useful resources throughout this write-up, hoping to make it easier to understand.
My first approach to researching the Deployment Manager was to look for hidden or internal Types, since some Google services (Such as Google App Engine Flexible) use the Deployment Manager internally (You can see it in your project's logs when deploying an app), but I found none.
Then, I looked at the Jinja2 and Python templates of the Deployments.
Through some trial and error, I was able to create Deployments, with specially crafted templates, that would return data as a Python exception on their Operations.
This way, I was able to inspect the Python libraries, read the Python code, and list/read files, but the templates' interpreting script runs on an isolated container with zero privileges, not even network connectivity.
After those attempts, I tried creating Type Providers pointing to internal Google Corp APIs, such as issuetracker.corp.googleapis.com, but the Operations always failed with an error saying it did not receive a valid response for the descriptor document, and showing the HTML for the login portal to which issuetracker.corp.googleapis.com redirects to when accessed externally.
And specifying any private IP address failed with an error saying it was not a valid address (Attempts to bypass it, with domains and redirections pointing to private IPs, gave the same result).
These failed attempts were quite demotivating, so I did not continue researching the Deployment Manager for a while (Remember I did not do all this research all at once on a single day, it was a very slow process).
One day, I decided to look into the Deployment Manager API methods, by enabling it on the Google Cloud Console, going to the metrics page, and looking at the Filters section, where there is a drop-down list titled Methods with all of them, including undocumented ones - Methods usually include the API version in their names.
I noticed there were two more API versions besides v2 and v2beta (The documented ones), called alpha and dogfood.
And I could call methods on those versions, just by replacing v2 or v2beta with either alpha or dogfood in every API call.
I played around a bit with the alpha version, but I did not find anything interesting in it.
The dogfood version was a bit more interesting though, especially because I have noticed the word dogfood being used for internal testing in Google services.
When I listed the base types on that version, most of them returned an extra field in their definitions: googleOptions.
When I listed my own Type Providers, they also included this extra field, and specifying the $outputDefaults system parameter in my query, I could see which fields did the googleOptions field have inside.
Boolean. Regardless of what value I specified, the Deployment Manager API always set it to false on my Type Providers. Effect unknown.
Enum. I was able to find a single valid value: CREATE_OR_ACQUIRE. Effect unknown.
Boolean. Whenever I set it to true, the Type Provider was always successfully created, regardless of values in any other field, but attempting to create Deployments using it always failed with an error saying the descriptor document could not be retrieved.
Enum. The valid values were UNKNOWN, USER and GOOGLE. No effects were observed by setting it to any of these values, but I always set it to GOOGLE during my research.
Enum. The valid values I found at first were: UNKNOWN_TRANSPORT_TYPE and HARPOON. No effects were observed by setting it to any of these values.
Enum. The valid values I found at first were: UNKNOWN_CREDENTIAL_TYPE and OAUTH. No effects were observed by setting it to any of these values.
String. Either empty or something like blade:<TARGET> or gslb:<TARGET>. No effects were observed by setting it to any value.
String, either the same as gslbTarget or empty. No effects were observed by setting it to any value.
It surely looks like this could be used to achieve SSRF to internal servers!
But whatever values I tried on gslbTarget and descriptorUrlServerSpec, they did not seem to have any effect.
I then tried to brute force valid credentialType values, and found a new one: GAIAMINT.
When testing Deployments with a Type Provider using that value, I also tested what happened if I set the Type Provider to use an OAuth 2.0 access token as its authentication mechanism.
I am not sure how to decode that, but it looks like it has some protobuf data inside some other binary format, and some strings can be retrieved: anonymous, email@example.com (The email of the service account Deployment Manager uses for tokens on my project), cloud-dm and cloudgaia::vjgv73:9898.
This looks like it is intended for internal use, and some googlers confirmed it is intended for authentication between internal Google systems, it is probably not possible to use it externally.
But besides this oddity, I was unable to brute force any other valid values for credentialType, nor any value for transport.
At this point I also tried adding staging_ to the beginning of the API version, since I noticed the Google Compute Engine API does that for the Staging environment (Fact mentioned in a few places, like in this GitHub PR), and it worked!
But the Staging environment seemed to work exactly the same way as the Production one.
After several failed attempts to achieve anything significant, I stopped researching Deployment Manager for a couple weeks.
Breakthrough: Exploiting Proto over HTTP
Protobufs are used mainly for gRPC, a remote procedure call (RPC) system developed by Google, and supported by many Google APIs.
Unfortunately, the Deployment Manager API does not support gRPC, but it does support a relatively-unknown feature: Proto over HTTP.
- URL paths stay the same (/deploymentmanager/<VERSION>/projects/<PROJECT>/global/...)
- The Content-Type header needs to be set to application/x-protobuf
- In Production, it fails with the error message: Proto over HTTP is not allowed for service
- It works in Staging!
This gave me unnamed proto field numbers, and the values assigned to them.
Comparing the values from the retrieved proto and the values in the JSON API, I quickly matched each field number to its field name, and reverse engineered the Type Providers proto message definition.
Quick example of all of this:
- I create a Type Provider through the JSON API:
- I get that same Type Provider through the JSON API:
- I get that same Type Provider through the Proto over HTTP API:
- I decode the response with protoc:
- I figure out which number corresponds to each field (For example, 1=name, 2=id, 3=insertTime,...)
- I construct an approximaiton of the original proto message definition with that information
- GSLB - It directs requests from the Deployment Manager to the internal Google endpoints specified in gslbTarget and descriptorUrlServerSpec
- ENDUSERCREDS, TYPE_CREDENTIAL - They seem to act the same way as OAUTH and UNKNOWN_CREDENTIAL_TYPE
With the newly discovered GSLB value for transport, I can craft Type Providers such that the Deployment Manager directs requests to internal Google endpoints... As long as I know where to point gslbTarget to.
Here is an example for creating a Type Provider for Google App Engine Admin API - Test environment (Which since my 2018 GAE RCE, has been blocked externally by a 429 error).
I got blade:apphosting-admin by listing Types on the dogfood version, the appengine.v1.version Type had gslbTarget set to this value.
I added -nightly at the end because, before the GAE Test API got blocked externally in 2018, I had noticed the string nightly a lot in it.
This Type Provider worked flawlessly, and I successfully created a Deployment that used it to launch a new app into GAE Test to check if my 2018 bug was properly fixed (It was).
If I specified some invalid gslbTarget (And I always set descriptorUrlServerSpec to the same value as gslbTarget), the Operation for creating a Type Provider would fail, either with an error message saying it could not connect to the GSLB endpoint, the error the internal endpoint returned (Often 404 Not Found), or that the response was not a valid descriptor document (For example, some endpoints returned a normal HTML) along with the response data.
One endpoint even returned an error page with a Java stack trace and a message along the lines of: Debugging information, only visible to internal IPs!
Therefore, I could retrieve some internal information this way.
If I specified some valid gslbTarget, like blade:corp-issuetracker-api for issuetracker.corp.googleapis.com (I got the GSLB name from some of my past research), I would be able to perform calls to the API!
Even though I had no idea how the format for Issue Tracker's resources would be like, this could be easily overcome by calling listTypes on the new Type Provider.
These were interesting issues, but I was a bit doubtful of their impact, especially since requests were being made with the Deployment Manager service account's credentials for my project, which would probably be restricted to which endpoints it would be allowed to talk to.
While researching this, I had told some googlers that I had found a way to perform requests to GSLB endpoints, and they told me to write it down on a VRP grant ticket, so that the SRE team could have a heads up of what I was up to, in case they detected my requests.
They also explained one potential issue with requests to GSLB endpoints:
If service A makes a request with service B on behalf of user C, the authorization of user C is checked. If there are no credentials for C, then the authorization of A is checked instead.This was really interesting, since I had noticed that the service account credentials used by Deployment Manager were delegated by firstname.lastname@example.org (I could see the delegator's ID in the Cloud Console logs), and I assume it means that Google prod account has, at least, permissions to delegate tokens for some service accounts.
I would just have to find a way to do so, and remove the service account's credentials so the identity of the Deployment Manager would be used instead.
By this time, it was night in Uruguay, so I just wrote down my research in the grant ticket and stopped researching for the day.
Next morning, my dogs woke me up at about 6 AM, and I noticed notifications of updates on the grant ticket, including one that arrived just as I was reading them:
Eduardo then quickly submitted a VRP report for me, triaged it, escalated it to P0, and issued a Nice catch!
It took less than 5 minutes from report submission to Nice catch!, maybe the fastest RCE VRP report ever :).
Later that day, I asked Eduardo a few questions, and he told me this bug was now being treated as an incident, but just because RCE bugs are treated like so.
Because of this, they asked me to stop further research into it, and send them the details of my actions and findings.
I asked about the potential way this issue could have been exploited, and my understanding is:
- Privilege escalation may be achieved through the identity of the Deployment Manager service (email@example.com), so it might have access to internal services a normal service account would not have access to
- It is not known if there are attack vectors that would allow an attacker to achieve a shell into Google's internal systems, but the privileges could be high enough
Because of this probable maximum impact, Google treated this as RCE, and issued a $31,337 reward (Their current standard amount for RCE).
Thanks so much to the Google VRP!
It was a very interesting bug to research, and I would love to see what other issues could be found in Google Cloud Deployment Manager.
Extra notesThe issue has now been fixed. The fix seems to just be that now gslbTarget and descriptorUrlServerSpec are ignored when specified on the create, patch, or update Type Provider operations.
The dogfood version might still be accessible for a while on the API, but that does not mean it is a security issue by itself. There could be some hidden security hole in it though ;).
Also, after reporting my findings to Google, and even after finishing the first few drafts of this write-up, I had the idea of checking whether the discovery document for the dogfood version of the Staging Deployment Manager API could be accessed publicly.
Lo an behold, it can: https://staging-deploymentmanager.sandbox.googleapis.com/$discovery/rest?version=dogfood (Copy on GitHub, just in case it stops working in the future).
The discovery document includes the googleOptions field, and provides a little bit more insight into what its fields do, but not nearly enough, so even if I had noticed the document before, I would have probably had to perform the same steps I performed during my research.
- May 7th, 2020: Issue found and mentioned in a VRP grant ticket
- May 8th, 2020: Googler checks the issue, submits RCE report and quickly escalates it
- May 19th, 2020: Reward of $31,337.00 issued
- May 20th, 2020: Issue confirmed as fixed
- March 2021: Prize of $133,337 issued for the vulnerability write-up