The journey

Homelab

From a full Google Drive to a production-grade self-hosted stack. Each step was forced by a real problem. Nothing was planned upfront.

9.6TB

vs 100 GB on Google ↓

services running

£437/yr

subscriptions cancelled

~4yrs

to break even ↓

The maths works eventually. But that's not really why you do it.

You do it because a broken WireGuard tunnel at 11pm is more interesting than Netflix. Because understanding your own attack surface matters to you. Because "it just works" isn't satisfying when you don't know why it works.

If that doesn't sound like you — close the tab and pay Google. Genuinely. It's the right call for most people.

Skip the journey? Here's where I ended up.

24 services across 7 categories

Routing & Proxy

Traefik WireGuard nginx (VPS) Tailscale

DNS & Security

Pi-hole CrowdSec Authentik

Media

Plex Immich

Automation

Home Assistant Zigbee2MQTT

Observability

Prometheus Grafana Loki Promtail cAdvisor Node Exporter SNMP Exporter

Web

gread.uk rosiealsopphotography peragaadventures.com

Management

Portainer Uptime Kuma Homepage

Routing & Proxy

Traefik WireGuard nginx (VPS) Tailscale

DNS & Security

Pi-hole CrowdSec Authentik

Media

Plex Immich

Automation

Home Assistant Zigbee2MQTT

Observability

Prometheus Grafana Loki Promtail cAdvisor Node Exporter SNMP Exporter

Web

gread.uk rosiealsopphotography peragaadventures.com

Management

Portainer Uptime Kuma Homepage

9.6 TB storage · vs 100 GB on Google

Photo library with ML search · replaces Google Photos

Media server, anywhere · cancelled streaming subs

Home automation, no cloud · replaced 5 apps

· ~20% queries blocked

Websites for others · £0 recurring

Full observability · metrics, logs, alerts

· one login, all services

Hindsight

Set up logs from day one, not just metrics.

Host something for someone else early. It changes your reliability mindset.

Buy a before you lose data, not after.

The Trigger

beginner

Problem

Google Drive running out of space. Next tier meant paying more for storage we were outgrowing. Streaming services pushing ads on paid plans.

Solution

Self-host. Take control of storage and media.

The cost calculus shifts faster than you expect once you start adding up subscriptions.

cost privacy storage

More context

Google Drive hit its 100 GB limit. The next tier was £24.99/year for 200 GB, and the photo library was growing faster than we could trim it. Rosie had the same problem compounded across her personal Google account and GSuite. Streaming services were getting worse, with ads appearing on already-paid subscriptions.

The cost calculus shifted. Self-hosting went from “someday” to “this weekend.”

Hardware

beginner

Problem

Need a device that can store files, run , and not require Linux expertise to get started.

Solution

Synology DS923+: turnkey OS, package ecosystem, Docker support.

SHR handles mixed drive sizes. NVMe cache is nice-to-have, not essential. Free RAM from an old laptop is the best kind of RAM.

storage hardware

More context

Synology DS923+ with SHR (Synology Hybrid RAID) across three drives of different sizes: 2x 8 TB Toshiba MG08ADA800E and 1x 2 TB (used, free). SHR creates multiple md arrays internally to maximise usable space, giving 8.71 TB total on Volume 1 (btrfs, encrypted).

Two M.2 NVMe drives in RAID1 for read/write cache. 32 GB RAM swapped in from an old laptop.

A second volume (1 TB SSD) was added later to handle media and Docker containers quietly.

Replacing Google

beginner

Problem

Need cloud sync, local backups, and file sharing without recurring cloud costs.

Solution

Synology Drive for sync, Time Machine for Mac backups, SMB for LAN, Hyper Backup for offsite.

Encryption key management is critical. Keep recovery keys in multiple safe places before you need them.

storage privacy cost

More context

The immediate wins, things that work out of the box on :

Synology Drive handles file sync. Google Drive stays for Docs, Sheets, and Mail, but bulk storage moved off it.
Time Machine as a LAN backup target for MacBooks.
SMB shares for LAN file access from any device.
Btrfs snapshots with hourly/daily retention on shared folders.
Offsite backup via Hyper Backup to a used 2 TB HDD in a UGREEN enclosure, kept in a shed. Encrypted.

Media Server

beginner

Problem

Want to watch media on any device in the house.

Solution

Plex via Synology Package Center. LAN-only at this stage.

The DS923+ uses an AMD Ryzen R1600 with no Intel Quick Sync, so no hardware transcoding. Remote access via relay or QuickConnect was essentially unusable.

media

More context

Plex installed via Synology Package Center. Media playable on phones and TV over local network.

The DS923+ has an AMD Ryzen R1600 processor which lacks Intel Quick Sync, the hardware transcoding engine Plex relies on. Without it, Plex can only direct-play original files or attempt software transcoding (far too slow on a NAS CPU for real-time playback).

This mattered because both Plex’s built-in relay and Synology’s QuickConnect reduce stream quality to save bandwidth, which triggers Plex to transcode. No hardware transcoding meant those remote access options were effectively broken, either unwatchably slow or stuck buffering. On LAN it was fine because clients could direct-play at full quality.

This constraint made the later tunnel critical: with enough raw bandwidth, clients direct-play the original files remotely, with no transcoding needed at all.

Home Automation

comfortable

Problem

Five different manufacturer apps to control lights, heating, air quality, and media. Cloud-dependent, fragmented, unreliable.

Solution

Home Assistant OS as a VM with Zigbee2MQTT. One dashboard, no cloud dependencies.

Battery Zigbee devices don't route, but mains-powered ones like IKEA plugs do. An IKEA plug in the hallway fixed the mesh.

home-automation privacy

More context

Home Assistant OS installed as a VM on Synology VMM. Zigbee2MQTT with a Dongle Plus MG24 (~£30) connecting everything. No cloud required.

Smart TRVs on every radiator (5 rooms) with weekly schedules. Sonoff SNZB-06P human presence sensor in the study (USB-powered). Dyson fan for air quality monitoring. Govee lights, IKEA remote, IKEA plug as a Zigbee router, leak sensors in bathroom and kitchen. LG CX TV and Chromecast integration.

Five apps (Govee, Dyson, LG ThinQ, Google Home, TRV manufacturer) became one dashboard.

Living with a NAS

beginner

Problem

The NAS lives in the living room next to the router. HDD seek noise during media playback was audible over the TV.

Solution

Added a 1 TB SSD as a separate volume for media and Docker data. SSDs are silent.

Not everything needs RAID. Media is re-downloadable. Separating data by recovery importance saves money, noise, and complexity.

storage hardware

More context

The NAS stays in the living room; it needs a wired ethernet connection to the router. Moving it wasn’t an option.

The fix was separating what lives on spinning disks from what doesn’t need to. A 1 TB SSD became Volume 2 (ext4, encrypted) for media and Docker container runtime. The SHR array on the HDDs holds everything that needs RAID protection and snapshots.

The deeper lesson: not all data has the same recovery cost. Photos and documents need redundancy. Movies don’t, because you can re-download them. Building volumes around that distinction is cheaper and quieter than putting everything on the same drives.

External Access: Cloudflare Tunnels

networking

Problem

means no inbound connections. Static IPv4 would cost ~£60/month vs £22/month ISP.

Solution

Cloudflare Tunnels: outbound-only, punches through CGNAT.

CF Tunnels work fine for dashboards and APIs but hit a wall for media: 100 MB body limit and video streaming ToS.

networking vpn cost

More context

Cloudflare Tunnels were the first attempt at external access. They work for lightweight use cases like admin dashboards and small file transfers. But two limitations made them a dead end for media and file hosting:

100 MB HTTP request body limit on free/Pro plans. Large uploads fail.
Terms of Service violation: Cloudflare prohibits proxying video/streaming on standard plans.

Before giving up on tunnels for media I tried a side route: skip the tunnel on the streaming hostname and serve it directly over IPv6 with an AAAA record, while keeping everything else routed through the tunnel. The DNS rules said no. A tunnel route needs a CNAME pointing at <uuid>.cfargotunnel.com, and RFC 1034 forbids a CNAME from sharing a hostname with any other record. So the same name couldn’t be both “tunnel for clients” and “direct for v6 clients”; it was one or the other without running split-DNS. That’s when the VPS + idea took over.

graph TB
  subgraph ext["Internet"]
    User(["External User"])
    VPS["Hetzner VPS: nginx TCP proxy"]
  end

  subgraph lan["Home Network"]
    Router["Router"]

    subgraph nas["Synology NAS (DS923+)"]
      WireGuard["WireGuard"]
      Traefik["Traefik :443"]
      CrowdSec["CrowdSec"]
      PiHole["Pi-hole :53"]

      subgraph containers["Containers"]
        Authentik["Authentik"]
        Grafana["Grafana"]
        Prometheus["Prometheus"]
        Portainer["Portainer"]
        Uptime["Uptime Kuma"]
        Immich["Immich"]
      end
    end

    subgraph havm["Home Assistant VM"]
      HA["Home Assistant"]
    end

    Client(["LAN Client"])
  end

  User -- HTTPS --> VPS
  VPS -- WireGuard tunnel --> WireGuard
  WireGuard --> Traefik
  Traefik -- stream --> CrowdSec

  Client -- DNS --> PiHole
  Client -- HTTPS --> Traefik

  Router -- DNS --> PiHole

  Traefik --> Authentik
  Traefik --> Grafana
  Traefik --> Prometheus
  Traefik --> Portainer
  Traefik --> Uptime
  Traefik --> Immich
  Traefik --> HA

WireGuard + Hetzner VPS

linux

Problem

Cloudflare Tunnel limitations blocking media hosting. Need real external access without paying for a static IP.

Solution

Hetzner CX22 VPS as a WireGuard endpoint and TCP proxy. TLS terminates on the NAS, not the VPS.

WireGuard userspace on Synology works but needs tuning (MTU, UDP buffers). Rate limiting prevents TCP congestion collapse.

vpn networking cost

More context

A Hetzner CX22 (£3.19/month) runs nginx as a dumb TCP proxy. Traffic flows through a WireGuard tunnel to the NAS where terminates TLS. The VPS never sees plaintext.

No request body limits. No ToS restrictions. You control the whole path.

devops

Problem

Internal services (Grafana, Portainer, Pi-hole) were reachable from the public internet via the VPS TCP proxy.

Solution

Three-layer defence: VPS filter, Traefik IP allowlist, strict SNI matching.

DNS isn't the only way to reach a service. Defence-in-depth means each layer catches what the previous misses.

security networking

More context

Discovered that *.internal.gread.uk services were publicly reachable; the VPS nginx forwarded any TCP connection blindly. Built a three-layer fix:

VPS nginx SNI filter: drops connections to internal hostnames at TCP level
Traefik IP allowlist: blocks non-LAN IPs on internal routes
sniStrict: true: rejects mismatched TLS SNI

Plus for IP reputation, Synology Firewall for default-deny, and weekly automated regression tests via GitHub Actions.

What 100 GB became

Google One 100 GB £1.59/mo

Google One 2 TB £7.99/mo

Google One 10 TB £480/yr

Self-hosted 9.6 TB

The equivalent Google One tier (10 TB) would cost £480/yr, which alone pays off the NAS in under 2 years.

Authentik SSO

devops

Problem

Per-service authentication, with different logins for Grafana, Immich, Pi-hole, Portainer. Annoying, especially remotely.

Solution

Authentik as centralised provider with Google OAuth upstream. Log in once, access everything.

Authentik's config has many moving parts. Session expiry needs tuning per-service. Always keep a break-glass path.

security containers

More context

Centralised identity provider. Google OAuth upstream, so no new credentials. Native OIDC for Grafana, Immich, Portainer. Traefik forwardAuth for Pi-hole, Homepage, Traefik dashboard.

If Authentik goes down, forwardAuth services become inaccessible externally. Pi-hole keeps its own password for LAN break-glass. Emergency recovery key stored offline.

Hosting for Others

devops

Problem

Rosie paying £140/year for Squarespace. Elia wanted a site but the hosting costs didn't make sense for a new venture.

Solution

Self-hosted Astro static sites behind nginx + . Cost: £0.

Hosting for others changes your reliability mindset overnight. When it's just for you, it doesn't matter. When others depend on it, everything matters.

cost containers networking

More context

The maturity inflection point. Rosie’s photography portfolio was costing £140/year on Squarespace for a static site with minimal traffic. Elia wanted an adventure tourism site but the hosting costs didn’t make sense for a brand-new venture.

Both now self-hosted as Astro static sites behind nginx + Traefik. Adding a new site is a compose file + Traefik labels.

This forced proper monitoring, proper deployment, and proper security. The production mindset came from having real users, not from discipline alone.

Infrastructure as Code

devops

Problem

SSH in, edit a file, restart a container, forget what changed. A month later, something breaks and you can't remember whether the config on the NAS matches what you intended.

Solution

Everything in a git repo. A deploys configs, substitutes secrets, and rebuilds sites. git pull is the deployment mechanism.

The moment you can see a diff of what changed and when, debugging goes from 'what did I do?' to 'this commit broke it.' Dependabot and GitHub Actions handle the rest.

containers

More context

The homelab repo is the source of truth. Every config file (Traefik routes, Prometheus scrape targets, Grafana dashboards, Loki pipeline, CrowdSec scenarios) lives in git.

The post-merge hook on the NAS is where the real work happens. It’s not just git pull && docker compose up. It:

Copies configs to /volume2/docker/ (the NAS paths that each container mounts)
Injects secrets from a .secrets file at deploy time, so credentials never touch the repo
Uses checksum comparison to decide whether to rebuild Astro static sites: if the source hasn’t changed, the build is skipped
Manages container state: starts new services, restarts changed ones, leaves untouched containers alone

git pull on the NAS is the entire deployment process. No manual SSH, no “I think I changed that file”, no config drift.

Pre-commit hooks enforce formatting. Dependabot opens PRs for container image updates. GitHub Actions run weekly security regression tests against the live stack. It’s not Kubernetes; it doesn’t need to be. It’s version-controlled, auditable, and recoverable from any commit.

UPS

beginner

Problem

A power cut could corrupt volumes, kill running Docker containers, and lose unsaved database state.

Solution

CyberPower CP900EPFCLCD-UK UPS protecting the NAS, fibre ONT, and router. DSM manages safe shutdown via USB.

At ~55W load the UPS gives ~30 minutes runtime. Full shutdown and auto-recovery tested and working. All services come back.

hardware reliability

More context

A sudden power loss can corrupt btrfs metadata, break Docker containers mid-write, and damage databases. The CyberPower CP900EPFCLCD-UK (900VA / 540W, pure sine wave) protects the entire network path: NAS, fibre ONT, and WiFi router.

Connected to the NAS via USB. DSM detects it and manages the shutdown sequence:

Power fails, UPS switches to battery instantly (no interruption)
After a configured timeout, DSM enters safe mode
Docker containers stop, VM shuts down, volumes unmount cleanly (~3 min)
NAS sends shutdown signal to the UPS, cutting WiFi and network to preserve battery during prolonged outages
When mains power restores, UPS powers on, NAS auto-boots, all services recover

Immich is the one exception: its encrypted volume mount requires manual intervention on each reboot. This is intentional: the photo library stays locked until someone physically approves access.

Nominal draw is ~55-60W. Battery life expectancy: 3-5 years (replacement part: RBP0051).

Load Testing

devops

Problem

Before sharing the site publicly, no idea where the limits were or what the actual failure mode looked like under real traffic.

Solution

Graduated load testing with — inside the LAN first, then through the VPS tunnel, then fully external — with live service probes throughout to make degradation measurable, not guesswork.

The ceiling is TLS termination on the R1600 (~440 RPS), not Traefik routing — switching proxy buys single-digit percent, not a different ceiling. CrowdSec is nearly free per request; the cost is log ingestion. And the counterintuitive part: a healthy process can still leave every route dark for twelve minutes while the kernel drains connection state.

devops networking observability

More context

The actual finding

Alive ≠ reachable

14:40
attack

14:51
cpu peaks

14:57
recovered

Traefik never crashed — 29-hour process uptime, 206 MB RSS, nowhere near OOM. It simply couldn't drain its accept queue faster than monitoring probes refilled it. Process alive; every route dark.

Since early 2026, a £19/month AI subscription can carry most technical people through this entire setup, as long as you question what it tells you. I didn't always, and ended up exposing my internal services to the internet.

Skip for media hosting: the 100 MB body limit and video ToS make it a dead end. Fine for dashboards though.

was the right call. Mixed drive sizes (2x 8 TB + 1x 2 TB free) just work.

Set up from day one. gives you 'what', Loki gives you 'why'.

Start hosting something for someone else early. It changes your reliability mindset overnight.

Set a fallback DNS on your router. If goes down, you want the network to still work.

Buy a before you lose data, not after.

Homelab

The Trigger

Hardware

Replacing Google

Media Server

Home Automation

Living with a NAS

External Access: Cloudflare Tunnels

Pi-hole

Split-Horizon DNS

TLS / Traefik

How it all connects

WireGuard + Hetzner VPS

Does it pay for itself?

Tailscale

Immich

Observability: Metrics

Observability: Logs

Security Hardening

What 100 GB became

Authentik SSO

Hosting for Others

Infrastructure as Code

UPS

Load Testing

What I'd do differently

Related reading

Homelab

Timeline

The Trigger

Hardware

Replacing Google

Media Server

Home Automation

Living with a NAS

External Access: Cloudflare Tunnels

Pi-hole

Split-Horizon DNS

TLS / Traefik

How it all connects

WireGuard + Hetzner VPS

Does it pay for itself?

Tailscale

Immich

Observability: Metrics

Observability: Logs

Security Hardening

What 100 GB became

Authentik SSO

Hosting for Others

Infrastructure as Code

UPS

Load Testing

What I'd do differently

Related reading