DevOps and CI/CD - Complete Mastery
Introduction: Breaking Down Silos
DevOps bridges the gap between development and operations. Traditionally:
- Developers: Write code, want fast releases
- Operations: Maintain systems, want stability
- Result: Conflict, slow deployments, silos
DevOps enables:
- Faster deployments - Multiple times per day
- Higher reliability - Automated testing catches issues early
- Better collaboration - Shared responsibility
- Continuous improvement - Metrics-driven decisions
This guide covers the complete DevOps lifecycle.
1. Version Control with Git
Git Fundamentals
Git: Distributed version control system
- Track changes over time
- Collaborate with teams
- Rollback to previous versions
- Branching for parallel work# Initialize repository
git init # Create .git folder
git clone https://url.git # Copy existing repo
# Basic workflow
git add . # Stage changes
git commit -m "Add feature" # Record changes
git push origin main # Push to remote
git pull origin main # Get latest changes
# Branching (feature isolation)
git branch feature/login # Create branch
git checkout feature/login # Switch branch
git branch -a # List all branches
# Modern shortcut
git switch -c feature/login # Create and switch
# Merging
git checkout main
git merge feature/login # Merge feature into main
# History
git log # View commits
git log --oneline # Compact view
git log -p # Show changes in each commit
git diff HEAD~1 # Compare to previous commitBranches and Workflows
Git Flow (Complex projects):
├─ main (production)
│ └─ release/1.0
│ └─ feature/user-auth (branch)
│ └─ feature/payments (branch)
│
└─ develop (integration)
GitHub Flow (Modern, simple):
├─ main (always production-ready)
│ └─ feature/feature-name (PR)
│ └─ bugfix/issue-123 (PR)
│ └─ experiment/new-idea (PR)
Trunk-Based Development (Continuous deployment):
└─ main (constant small commits, feature flags)
├─ commit: Add feature behind flag
├─ commit: Enable feature for 10% users
└─ commit: Enable for 100% usersPull Request Workflow
1. Create feature branch
git switch -c feature/delete-users
2. Make changes
git add .
git commit -m "Add delete button"
3. Push branch
git push origin feature/delete-users
4. Create Pull Request on GitHub
- Describe changes
- Reference issues
- Request reviewers
5. Code review
- Teammates review code
- Request changes if needed
- Approve when ready
6. Merge
- Confirm all checks pass
- Merge and delete branch
7. Deploy
- GitHub Actions auto-deploys
- Rollback if needed
Best Practices:
- Small PRs (200 lines max)
- Descriptive titles and descriptions
- Link to issues
- Test locally before pushing
- Respond to feedback quickly2. Continuous Integration (CI)
CI Principles
Continuous Integration:
1. Developers commit frequently (multiple times/day)
2. Automated build on every commit
3. Automated tests run
4. Notify team of failures
5. Fix broken builds quickly
Benefits:
- Catch bugs early
- Reduce integration problems
- Enable faster releases
- Improve code quality
- Team visibility
Pipeline:
Code Commit → Trigger → Build → Lint → Unit Tests → Integration Tests
↓
Fail → Notify developers
↓
Pass → Artifact → CD StageGitHub Actions CI
# .github/workflows/ci.yml
name: Continuous Integration
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [16, 18, 20]
steps:
# GitHub provides workers with tools
- uses: actions/checkout@v3
- name: Setup Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Run tests
run: npm test
- name: Calculate coverage
run: npm run coverage
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage/coverage.jsonGitLab CI/CD Advanced
# .gitlab-ci.yml
stages:
- test
- build
- deploy
variables:
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: ""
test:
stage: test
image: node:18
script:
- npm install
- npm run lint
- npm test
- npm run coverage
coverage: '/Coverage: \d+\.\d+%/'
artifacts:
paths:
- coverage/
expire_in: 30 days
only:
- merge_requests
- main
build:
stage: build
image: docker:latest
services:
- docker:dind
script:
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
- docker tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA $CI_REGISTRY_IMAGE:latest
- docker push $CI_REGISTRY_IMAGE:latest
only:
- main
deploy_staging:
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl set image deployment/app-staging app=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA -n staging
- kubectl rollout status deployment/app-staging -n staging
environment:
name: staging
url: https://staging.example.com
kubernetes:
namespace: staging
only:
- main
deploy_production:
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl set image deployment/app app=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA -n production
- kubectl rollout status deployment/app -n production
environment:
name: production
url: https://example.com
kubernetes:
namespace: production
only:
- tags
when: manual # Require manual approval3. Containerization with Docker
Docker Fundamentals
Container: Lightweight OS-level virtualization
- Includes app + runtime + dependencies
- Isolated from host and other containers
- Consistent across environments (dev = prod)
- Fast startup (seconds vs minutes for VMs)
Image: Template (read-only)
Container: Instance of image (running)
Relationship:
Image : Container = Class : ObjectDockerfile
# Multi-stage build
FROM node:18 as builder
WORKDIR /app
# Copy and install dependencies
COPY package*.json ./
RUN npm ci --only=production
# Build application
COPY . .
RUN npm run build
# Production stage (smaller image)
FROM node:18-alpine
WORKDIR /app
# Copy only what we need
COPY /app/node_modules ./node_modules
COPY /app/dist ./dist
COPY /app/package.json ./
# Non-root user (security)
RUN useradd -m appuser
USER appuser
# Health check
HEALTHCHECK \
CMD node healthcheck.js
# Expose port
EXPOSE 3000
# Environment variable
ENV NODE_ENV=production
# Command to run
CMD ["node", "dist/server.js"]Docker Commands
# Build image
docker build -t myapp:1.0 .
docker build -t myapp:1.0 -f Dockerfile.prod .
# Run container
docker run -p 3000:3000 --name myapp myapp:1.0
docker run -d --name myapp -p 3000:3000 -e NODE_ENV=production myapp:1.0
# Interactive mode
docker run -it myapp:1.0 /bin/sh
# View running containers
docker ps
docker ps -a # All, including stopped
# Container logs
docker logs myapp
docker logs -f myapp # Follow logs (tail -f)
# Execute command in container
docker exec myapp npm test
# Stop and remove
docker stop myapp
docker rm myapp
docker rmi myapp:1.0 # Remove image
# Docker registry (push/pull)
docker tag myapp:1.0 registry.example.com/myapp:1.0
docker push registry.example.com/myapp:1.0
docker pull registry.example.com/myapp:1.0Docker Compose
# docker-compose.yml - Multi-container orchestration
version: '3.8'
services:
app:
build:
context: .
dockerfile: Dockerfile
ports:
- "3000:3000"
environment:
DATABASE_URL: postgres://db:5432/myapp
REDIS_URL: redis://redis:6379
depends_on:
- db
- redis
volumes:
- ./src:/app/src # Mount for development
networks:
- backend
db:
image: postgres:15
environment:
POSTGRES_DB: myapp
POSTGRES_USER: user
POSTGRES_PASSWORD: password
volumes:
- db_data:/var/lib/postgresql/data
networks:
- backend
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user"]
interval: 5s
timeout: 3s
retries: 5
redis:
image: redis:7-alpine
networks:
- backend
volumes:
db_data:
networks:
backend:
driver: bridge# Docker Compose commands
docker-compose up # Start services
docker-compose up -d # Detached mode
docker-compose logs -f # Follow logs
docker-compose down # Stop and remove
docker-compose exec app npm test # Run command in service
docker-compose ps # View running services4. Kubernetes Orchestration
Kubernetes Architecture
Master Node (Control Plane):
├─ API Server: Handle requests
├─ Scheduler: Assign pods to nodes
├─ Controller Manager: Manage state
└─ etcd: Store cluster state
Worker Node:
├─ kubelet: Node agent
├─ Container Runtime: Run containers
└─ kube-proxy: Network
Pod: Smallest unit (usually 1 container)
Deployment: Manage replicas of pods
Service: Expose pods to network
Ingress: Route external trafficKubernetes Manifests
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: production
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
serviceAccountName: myapp
securityContext:
runAsNonRoot: true
runAsUser: 1000
containers:
- name: app
image: registry.example.com/myapp:v1.2.3
imagePullPolicy: Always
ports:
- containerPort: 3000
name: http
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 10
periodSeconds: 5
volumeMounts:
- name: config
mountPath: /etc/config
readOnly: true
volumes:
- name: config
configMap:
name: myapp-config
---
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
type: LoadBalancer
selector:
app: myapp
ports:
- name: http
port: 80
targetPort: 3000
---
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
data:
LOG_LEVEL: "info"
MAX_CONNECTIONS: "100"
---
apiVersion: v1
kind: Secret
metadata:
name: db-credentials
type: Opaque
stringData:
url: "postgres://user:pass@db:5432/app"Kubernetes Commands
# Cluster info
kubectl cluster-info
kubectl get nodes
kubectl top nodes # Resource usage
# Deployments
kubectl apply -f deployment.yaml
kubectl get deployments
kubectl describe deployment myapp
kubectl scale deployment myapp --replicas=5
kubectl rollout history deployment/myapp
kubectl rollout undo deployment/myapp # Rollback
kubectl set image deployment/myapp app=myapp:v2 # Update image
# Pods
kubectl get pods -A # All namespaces
kubectl logs pod-name -f # Follow logs
kubectl exec -it pod-name -- /bin/bash # Shell access
kubectl delete pod pod-name
# Services
kubectl get services
kubectl port-forward svc/myapp-service 3000:80
# Debugging
kubectl describe pod pod-name # Detailed info
kubectl get events # Cluster events
kubectl logs pod-name --previous # Crashed pod logs5. Infrastructure as Code (IaC)
Terraform (Cloud Infrastructure)
# terraform/main.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
}
# VPC
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
tags = {
Name = "${var.environment}-vpc"
}
}
# ECS Cluster
resource "aws_ecs_cluster" "main" {
name = "${var.environment}-cluster"
setting {
name = "containerInsights"
value = "enabled"
}
}
# EC2 Instance
resource "aws_instance" "web" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
vpc_security_group_ids = [aws_security_group.web.id]
tags = {
Name = "${var.environment}-web"
}
}
# Security Group
resource "aws_security_group" "web" {
name = "${var.environment}-web-sg"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# RDS Database
resource "aws_db_instance" "main" {
identifier = "${var.environment}-db"
engine = "postgres"
engine_version = "15.3"
instance_class = "db.t3.micro"
db_name = var.db_name
username = var.db_user
password = random_password.db.result
allocated_storage = 20
storage_type = "gp3"
skip_final_snapshot = var.environment != "production"
}
# Output values
output "instance_ip" {
value = aws_instance.web.public_ip
}
output "rds_endpoint" {
value = aws_db_instance.main.endpoint
}Ansible (Configuration Management)
# playbook.yml
---
- name: Deploy application
hosts: web_servers
become: true
vars:
app_user: appuser
app_dir: /opt/myapp
app_port: 3000
tasks:
- name: Update package manager
apt:
update_cache: yes
when: ansible_os_family == "Debian"
- name: Install dependencies
apt:
name:
- nodejs
- npm
- git
state: present
- name: Create application user
user:
name: "{{ app_user }}"
state: present
shell: /bin/bash
- name: Clone repository
git:
repo: "{{ git_repo }}"
dest: "{{ app_dir }}"
version: main
become_user: "{{ app_user }}"
- name: Install npm dependencies
npm:
path: "{{ app_dir }}"
become_user: "{{ app_user }}"
- name: Build application
command: npm run build
args:
chdir: "{{ app_dir }}"
become_user: "{{ app_user }}"
- name: Create systemd service
template:
src: app.service.j2
dest: /etc/systemd/system/myapp.service
notify: restart app
- name: Start application
systemd:
name: myapp
state: started
enabled: yes
daemon_reload: yes
handlers:
- name: restart app
systemd:
name: myapp
state: restarted6. Monitoring and Observability
Logging with ELK Stack
Elasticsearch: Store and search logs
Logstash: Process and transform logs
Kibana: Visualize and analyze
Flow: Application → Filebeat → Logstash → Elasticsearch → Kibana# docker-compose.yml - ELK Stack
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.10.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
ports:
- "9200:9200"
volumes:
- elastic_data:/usr/share/elasticsearch/data
logstash:
image: docker.elastic.co/logstash/logstash:8.10.0
volumes:
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
ports:
- "5000:5000"
depends_on:
- elasticsearch
kibana:
image: docker.elastic.co/kibana/kibana:8.10.0
ports:
- "5601:5601"
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
depends_on:
- elasticsearch
volumes:
elastic_data:Prometheus + Grafana (Metrics)
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'application'
static_configs:
- targets: ['localhost:3000']
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']// Node.js with Prometheus metrics
const express = require('express');
const promClient = require('prom-client');
const app = express();
// Request counter
const httpRequestDuration = new promClient.Histogram({
name: 'http_request_duration_ms',
help: 'Duration of HTTP requests in ms',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.1, 5, 15, 50, 100, 500]
});
// Request middleware
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = Date.now() - start;
httpRequestDuration
.labels(req.method, req.route.path, res.statusCode)
.observe(duration);
});
next();
});
// Metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', promClient.register.contentType);
res.end(await promClient.register.metrics());
});
app.get('/', (req, res) => {
res.json({ message: 'Hello' });
});
app.listen(3000);7. GitOps
GitOps Principles
GitOps: Git as single source of truth
Traditional:
Code → CI → Deploy Script → kubectl apply
GitOps:
Code → Git → ArgoCD watches → Auto-applies
Benefits:
- Version control for all changes
- Easy rollbacks (git revert)
- Clear audit trail
- PR for all infrastructure changes
- Self-healing (constant reconciliation)ArgoCD Setup
# argocd-application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/myapp-config
targetRevision: main
path: kubernetes/
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # Remove k8s resources not in git
selfHeal: true # Reconcile on changes
syncOptions:
- CreateNamespace=true# Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Access UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
# Get admin password
kubectl get secret -n argocd argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d8. Best Practices
Pipeline Safety
Never Deploy Directly:
✗ ssh app@prod "cd /app && git pull && npm start"
Always Use CI/CD:
✓ git push → GitHub Actions → Review → Docker build → Push → k8s apply
Deploy Safety:
1. Blue-Green Deployment: V1 (blue) + V2 (green), switch traffic
2. Canary Deployment: 5% → 25% → 50% → 100% users on new version
3. Rolling Update: Gradually replace pods
Rollback Strategy:
- Keep previous versions in registry
- git revert rolls back code changes
- kubectl rollout undo rolls back k8sSecurity Best Practices
1. Secrets Management:
- Never commit secrets to git
- Use environment variables
- Use Kubernetes Secrets
- Rotate secrets regularly
2. Registry Security:
- Use private registries
- Scan images for vulnerabilities
- Sign images
- Use specific tags (never 'latest' in prod)
3. RBAC (Role-Based Access Control):
- Principle of least privilege
- Service accounts per app
- Limit API permissions
4. Network Security:
- Use network policies
- Firewalls
- TLS for all communicationKey Takeaways
- Git Discipline - Small commits, meaningful messages, PRs for review
- CI is mandatory - Automate testing and builds
- Containers everywhere - Consistency and reliability
- Kubernetes at scale - Manage complexity of many containers
- Infrastructure as code - Reproducible, versioned infrastructure
- Monitoring essential - Can't improve what you don't measure
- Automation first - Manual processes don't scale
- Security throughout - Not an afterthought
- Fast feedback loops - Deploy frequently
- ChatOps culture - Team communication drives decisions