Retrieve core files¶
Set core file path¶
This article on Kubernetes and core file managemnt and this other one provide some tracks about core file management.
First, on your Kubernetes nodes, set the core file path, named <core-path>
, in example below it is /tmp/coredump
, and make it writable to all container users which can potentially create core files:
COREPATH=/tmp/coredump
sudo mkdir $COREPATH
sudo chmod 777 $COREPATH
sudo sh -c 'echo "$COREPATH/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern'
Note
For kind users, on development workstation, above command must be run on host, it will be then propagated to k8s node and to pods.
Store core files in a persistent storage¶
Install Qserv operator and then install a Qserv instance dedicated to development:
kubectl apply -k https://github.com/lsst/qserv-operator/overlays/dev
It is possible to set the core path easily. First, download qserv-operator
locally
git clone https://github.com/lsst/qserv-operator
And then edit corepath
parameter in file qserv-operator/overlays/dev/qserv.yaml
apiVersion: qserv.lsst.org/v1alpha1
kind: Qserv
metadata:
name: qserv
spec:
devel:
corepath: "<core-path>"
And finally install Qserv in the current namespace
kubectl apply -k qserv-operator/overlays/dev
Retrieve core files¶
Core files will be available on the node running the pods, inside the <core-path>
directory.
Note
For kind users, use docker cp <kind-node>:<core-path> .
command to get core file on the workstation.
Demo¶
This demo rely on a Kubernetes cluster based on kind and the qserv-operator
:
# Create a directory to store core files on the k8s node (kind-specific)
docker exec -it -- kind-control-plane sh -c "mkdir -p /tmp/coredump && chmod 777 /tmp/coredump"
# Install Qserv
kubectl apply -k qserv-operator/overlays/dev
# Check Qserv is running
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
qserv-dev-czar-0 3/3 Running 0 11s 10.244.0.54 kind-control-plane
qserv-dev-repl-ctl-0 1/1 Running 0 11s 10.244.0.46 kind-control-plane
qserv-dev-repl-db-0 1/1 Running 0 11s 10.244.0.56 kind-control-plane
qserv-dev-worker-0 5/5 Running 0 11s 10.244.0.53 kind-control-plane
qserv-dev-worker-1 5/5 Running 0 11s 10.244.0.51 kind-control-plane
qserv-dev-worker-2 5/5 Running 0 11s 10.244.0.55 kind-control-plane
qserv-dev-xrootd-redirector-0 2/2 Running 0 11s 10.244.0.44 kind-control-plane
qserv-dev-xrootd-redirector-1 2/2 Running 0 10s 10.244.0.45 kind-control-plane
qserv-dev-xrootd-redirector-2 2/2 Running 0 10s 10.244.0.47 kind-control-plane
qserv-dev-xrootd-redirector-3 2/2 Running 0 10s 10.244.0.48 kind-control-plane
qserv-operator-5467b89db4-hbwgc 1/1 Running 0 149m 10.244.0.5 kind-control-plane
# Kill replication controller
kubectl exec -it qserv-dev-repl-ctl-0 -- bash
bash-4.2$ ps -ef
UID PID PPID C STIME TTY TIME CMD
qserv 1 0 0 11:30 ? 00:00:00 /bin/sh /config-start/start.sh
qserv 9 1 0 11:30 ? 00:00:00 qserv-replica-master-http --worker-evict-timeout=3600 --health-probe-interval=120 --replication-interval=1200 --config=mysql://qsreplica:@qserv-dev-repl-db:3306/qservReplica --qserv-db-password=CHANGEME
qserv 100 0 0 11:38 pts/0 00:00:00 bash
qserv 112 100 0 11:38 pts/0 00:00:00 ps -ef
bash-4.2$ kill -s SIGSEGV 9
bash-4.2$ command terminated with exit code 137
# List and retrieve core file (kind-specific)
docker ls docker exec -it kind-control-plane ls /tmp/coredump
core.qserv-replica-m.9.qserv-dev-repl-ctl-0.1597318703
# Retrieve corefile locally (docker cp does not work because /tmp is managed by tmpfs in kind)
docker exec kind-control-plane tar Ccf "/tmp/coredump" - . | tar Cxf . -
ls
core.qserv-replica-m.9.qserv-dev-repl-ctl-0.1597318703