Retrieve core files

Set core file path

This article on Kubernetes and core file managemnt and this other one provide some tracks about core file management.

First, on your Kubernetes nodes, set the core file path, named <core-path>, in example below it is /tmp/coredump, and make it writable to all container users which can potentially create core files:

COREPATH=/tmp/coredump
sudo mkdir $COREPATH
sudo chmod 777 $COREPATH
sudo sh -c 'echo "$COREPATH/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern'

Note

For kind users, on development workstation, above command must be run on host, it will be then propagated to k8s node and to pods.

Store core files in a persistent storage

Install Qserv operator and then install a Qserv instance dedicated to development:

kubectl apply -k https://github.com/lsst/qserv-operator/overlays/dev

It is possible to set the core path easily. First, download qserv-operator locally

git clone https://github.com/lsst/qserv-operator

And then edit corepath parameter in file qserv-operator/overlays/dev/qserv.yaml

apiVersion: qserv.lsst.org/v1alpha1
kind: Qserv
metadata:
name: qserv
spec:
devel:
   corepath: "<core-path>"

And finally install Qserv in the current namespace

kubectl apply -k qserv-operator/overlays/dev

Retrieve core files

Core files will be available on the node running the pods, inside the <core-path> directory.

Note

For kind users, use docker cp <kind-node>:<core-path> . command to get core file on the workstation.

Demo

This demo rely on a Kubernetes cluster based on kind and the qserv-operator:

# Create a directory to store core files on the k8s node (kind-specific)
docker exec -it -- kind-control-plane sh -c "mkdir -p /tmp/coredump && chmod 777 /tmp/coredump"

# Install Qserv
kubectl apply -k qserv-operator/overlays/dev

# Check Qserv is running
kubectl get pods -o wide
NAME                              READY   STATUS    RESTARTS   AGE     IP            NODE
qserv-dev-czar-0                  3/3     Running   0          11s     10.244.0.54   kind-control-plane
qserv-dev-repl-ctl-0              1/1     Running   0          11s     10.244.0.46   kind-control-plane
qserv-dev-repl-db-0               1/1     Running   0          11s     10.244.0.56   kind-control-plane
qserv-dev-worker-0                5/5     Running   0          11s     10.244.0.53   kind-control-plane
qserv-dev-worker-1                5/5     Running   0          11s     10.244.0.51   kind-control-plane
qserv-dev-worker-2                5/5     Running   0          11s     10.244.0.55   kind-control-plane
qserv-dev-xrootd-redirector-0     2/2     Running   0          11s     10.244.0.44   kind-control-plane
qserv-dev-xrootd-redirector-1     2/2     Running   0          10s     10.244.0.45   kind-control-plane
qserv-dev-xrootd-redirector-2     2/2     Running   0          10s     10.244.0.47   kind-control-plane
qserv-dev-xrootd-redirector-3     2/2     Running   0          10s     10.244.0.48   kind-control-plane
qserv-operator-5467b89db4-hbwgc   1/1     Running   0          149m    10.244.0.5    kind-control-plane

# Kill replication controller
kubectl exec -it qserv-dev-repl-ctl-0 -- bash
bash-4.2$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
qserv        1     0  0 11:30 ?        00:00:00 /bin/sh /config-start/start.sh
qserv        9     1  0 11:30 ?        00:00:00 qserv-replica-master-http --worker-evict-timeout=3600 --health-probe-interval=120 --replication-interval=1200 --config=mysql://qsreplica:@qserv-dev-repl-db:3306/qservReplica --qserv-db-password=CHANGEME
qserv      100     0  0 11:38 pts/0    00:00:00 bash
qserv      112   100  0 11:38 pts/0    00:00:00 ps -ef
bash-4.2$ kill -s SIGSEGV  9
bash-4.2$ command terminated with exit code 137

# List and retrieve core file (kind-specific)
docker ls  docker exec -it kind-control-plane ls /tmp/coredump
core.qserv-replica-m.9.qserv-dev-repl-ctl-0.1597318703

# Retrieve corefile locally (docker cp does not work because /tmp is managed by tmpfs in kind)
docker exec kind-control-plane tar Ccf "/tmp/coredump" - . | tar Cxf . -
ls
core.qserv-replica-m.9.qserv-dev-repl-ctl-0.1597318703