---
title: Use Internal Data For File Paths
impact: CRITICAL
impactDescription: prevents path traversal attacks
tags: file-path, path-traversal, lfi, input-validation, security, python, pyspark
---

## Use Internal Data For File Paths

Using user-controlled input to build file paths can lead to Path Traversal vulnerabilities, allowing attackers to read or write sensitive files on the system.

**Incorrect (unvalidated path):**

```python
filename = request.args.get('filename')
with open(f"/app/uploads/{filename}", "r") as f:
    return f.read()
# Attacker input: ../../../etc/passwd
```

**Correct (validation and mapping):**

```python
import os

ALLOWED_FILES = {
    "report": "report_2024.pdf",
    "stats": "stats_2024.csv"
}

# Use an allow-list or ID-based lookup
file_id = request.args.get('id')
actual_filename = ALLOWED_FILES.get(file_id)
if not actual_filename:
    raise ValueError("Invalid file ID")

# Safe path joins and normalization check
base_dir = "/app/uploads"
safe_path = os.path.normpath(os.path.join(base_dir, actual_filename))
if not safe_path.startswith(base_dir):
    raise SecurityError("Path traversal attempted")

with open(safe_path, "r") as f:
    return f.read()
```

**PySpark Context:**
When reading from S3/HDFS using user-provided paths, ensure the path starts with the expected prefix.

**Tools:** Bandit (B108, B110), SonarQube, Semgrep