Search thousands of fresh jobs

×
This job is expired
Catch Recruit (Catch)

Platform Engineer - Sandton

Catch Recruit (Catch)

  • R Undisclosed
  • Permanent Intermediate position
  • Sandown
  • Posted 29 Aug 2025 by Catch Recruit (Catch)
  • Expires in 34 days
  • Job 2620906
Apply Now

About the position

Key Responsibilities
Reliability & Operations

  • Own uptime, performance, and monitoring for all production applications.
  • Manage Heroku pipelines, CI/CD, review apps, and production environments.
  • Operate Celery workers and queues, monitor health, and handle missed task check-ins.
  • Define and track service level objectives (SLOs) (availability, latency, task success rate).
  • Maintain runbooks, a centralised wiki for incident response, and lead post-mortems.
  • Run periodic disaster recovery drills and coordinate penetration tests.

Platform Engineering

  • Keep environments current (Heroku stacks, Postgres/Redis versions, DO/AWS base images).
  • Manage daily backups, ensure restore tests and disaster recovery runbooks are in place.
  • Standardise infrastructure (Terraform or scripts for DO/AWS; [URL Removed] for Heroku).
  • Manage Cloudflare for DNS, edge security, and performance optimisation.
  • Tune performance (DB indices, query optimisation, cache usage, Celery queue design).
  • Optimise infrastructure costs across Heroku, DigitalOcean, and AWS.

Developer Experience & CI/CD

  • Maintain CI pipelines with type checking, linting, and security scanning.
  • Enforce test coverage and automate deploy checks (smoke tests, migration health, error budgets).
  • Support developers with tooling for local/staging environments and build self-service dashboards (e.g., Celery queue status).
  • Collaborate with developers to streamline workflows and educate on secure coding practices.

Security & Compliance

  • Own vulnerability management and dependency patching cadence.
  • Manage access reviews, secrets, MFA/SSO, and enforce least-privilege IAM policies.
  • Implement encryption for data at rest and in transit (e.g., S3 server-side encryption).
  • Contribute evidence and responses for security questionnaires and SOC 2 audits.
  • Maintain a "security pack" with architecture, sub-processors, and DR/backup processes.

Monitoring & Alerting

  • Configure Sentry ownership rules, Cron Monitors, and release health.
  • Centralise metrics/logs (Heroku metrics, Papertrail, Sentry, APM, Prometheus/New Relic).
  • Set up alerts on golden signals (latency, errors, traffic, saturation) and avoid alert fatigue.
  • Conduct capacity planning and track resource usage trends.

Vendor & External Services

  • Evaluate and manage vendor relationships (e.g., Mailgun, Twilio) to ensure service level agreements (SLAs) and performance.
  • Assess new tools/services to enhance platform capabilities (e.g., observability, security)
  • Track costs, security posture, and integration quality for all third-party services.
  • Must-Have
  • Cloud infrastructure management: 3+ years operating production apps on Heroku, AWS, DigitalOcean, or similar.
  • CI/CD pipelines: Hands-on experience with GitHub Actions, Heroku CI, or equivalent; solid Git fundamentals.
  • Monitoring & incident response: Experience with Sentry, Papertrail (or similar), logs, and uptime/performance dashboards.
  • Security fundamentals: Understanding of IAM, encryption in transit/at rest, MFA/SSO, and secure configuration practices.
  • Disaster recovery & backups: Experience implementing and operating automated backups, restore testing, and writing/maintaining incident runbooks.
  • Communication & collaboration: Ability to document processes clearly and work closely with developers in a small team.

Strong Plus- Infrastructure as Code & automation: Experience with Terraform, Docker, or equivalent tooling.- Asynchronous workloads: Familiarity with Celery, Redis, or other task queues and message brokers.- Scaling & cost optimisation: Capacity planning, performance tuning, and managing infra spend.- Compliance frameworks: Exposure to SOC 2, GDPR, or supporting client security questionnaires.- Incident management: Participation in on-call rotations, leading post-mortems, or serving as incident commander.

Nice-to-Have

  • Proficiency in Python; familiarity with Django/Flask.
  • Experience with DNS/CDN/edge security (e.g., Cloudflare).
  • Observability platforms (Prometheus, Grafana, New Relic).
  • Static analysis and code quality tools (mypy, Bandit, SonarQube).
  • Prior exposure to multi-tenant SaaS environments.
  • Certifications (AWS Certified DevOps Engineer

Desired Skills:

  • Heroku
  • AWS
  • DigitalOcean
  • Github
  • Sentry
  • Papertrail
  • encryption in transit
  • MFA/SSO
  • maintaining incident runbooks
  • Terraform
  • Docker
  • Celery
  • SOC2
  • GDPR
  • Python
  • Django/Flask
  • SaaS

Desired Work Experience:

  • 2 to 5 years

Apply Now

Catch Recruit (Catch)

About the agency

Are you ready to take the next step in your career? Catch generalises across industries in connecting talented professionals like you with top-tier companies looking for exceptional candidates. Our personalized approach ensures that we understand your unique skills, aspirations, and career goals, matching you with the perfect opportunities to help you thrive. With our extensive network and industry expertise, we are committed to guiding you through every step of the job search process, from crafting the perfect resume to acing your interviews. Send us your CV and apply on our website or via Career Junction to unlock your potential and achieve the career of your dreams. We’re here to assist you every step of the way. If you don’t find a suitable role on our job portals, send us your application anyway, we might have something suitable. Let’s connect and start your journey to success! Catch – Finding your talent!

Receive a daily digest of all new jobs matching this job. Your information is safe with us and you can cancel any time.

Expires in 33 days

Email me jobs similar to: Platform Engineer - Sandton

Receive a daily digest of all new jobs matching this job: Senior IT Auditor. Your information is safe with us and you can cancel at any time.