Eric Li's Blog


  • Home

  • Categories

  • Archives

  • Tags

  • Search

Setup Mesos cluster for spark workload in a nut shell

Posted on 2017-05-31 |

Setup Mesos cluster with Ansible

Dependencies

1
2
3
4
5
6
7
ansible-galaxy install JasonGiedymin.mesos
ansible-galaxy install AnsibleShipyard.ansible-zookeeper
ansible-galaxy install AnsibleShipyard.ansible-java
ansible-galaxy install geerlingguy.java
ansible-galaxy install JasonGiedymin.marathon
ansible-galaxy install JasonGiedymin.chronos
ansible-galaxy install JasonGiedymin.nodejs

Pre-requisite

playbook mesos_install.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
- name: Mesos
  hosts: mesos
  gather_facts: yes
  vars:
    - zookeeper_hostnames: "{{ groups.zookeeper_hosts | join(':' + zookeeper_client_port + ',') }}:{{ zookeeper_client_port }}"
  tasks:
    - debug: msg=" zookeeper_hostnames {{ zookeeper_hostnames }}"
- name: Zookeeper
  hosts: zookeeper_hosts
  sudo: yes
  roles:
    - role: AnsibleShipyard.ansible-zookeeper
  tasks:
    - debug: msg="{{ zookeeper_hostnames }}"
- name: Java
  hosts: all
  sudo: yes
  roles:
    - role: geerlingguy.java
- name: mesos_masters
  hosts: mesos-masters
  strategy: debug
  sudo: yes
  gather_facts: yes
  vars:
    chronos_bin_load_options_override_enabled: yes
    chronos_conf_options:
      hostname: "{{ chronos_hostname }}"
      http_port: "{{ chronos_port }}"
      mesos_framework_name: "chronos"
  tasks:
    - debug: msg="{{ chronos_conf_options }}"
  roles:
    - role: JasonGiedymin.mesos
      mesos_install_mode: master-slave
    - role: JasonGiedymin.nodejs
      nodejs_version: 0.10.25
      nodejs_global_packages:
        - express
      nodejs_path: "/usr/"
    - role: JasonGiedymin.chronos
      chronos_version: "2.4.0"
- name: mesos_slaves
  hosts: mesos-slaves
  sudo: yes
  gather_facts: yes
  vars:
    - zookeeper_hostnames: "{{ groups.zookeeper_hosts | join(':' + zookeeper_client_port + ',') }}:{{ zookeeper_client_port }}"
  tasks:
    - debug: msg="{{ zookeeper_hostnames }} "
  roles:
    - role: JasonGiedymin.mesos
      mesos_install_mode: slave
    - role: JasonGiedymin.marathon

group_var/mesos.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# playbook group_vars file "inventories/development/group_vars/mesos.yml"
# Mesos
mesos_version: “1.1.0"
mesos_hostname: "{{ ansible_fqdn }}"
mesos_cluster_name: "Development Cluster"
#binary
mesos_containerizers: "mesos"
#mesos_containerizers: "docker,mesos"
mesos_quorum: '2'
mesos_log_location: '/opt/logs/mesos'
mesos_work_dir: '/opt/mesos'
#share the same zookeeper in cluster
zookeeper_hostnames: "{{ groups.zookeeper_hosts | join(':' + zookeeper_client_port + ',') }}:{{ zookeeper_client_port }}"

Inventory

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[all]
mesos-spark-kube2 ansible_ssh_host=172.16.170.70
mesos-spark-kube3 ansible_ssh_host=172.16.170.146
mesos-spark-kube4 ansible_ssh_host=172.16.170.147
mesos-spark-kube1  ansible_ssh_host=172.16.169.210
[k8s-cluster:children]
kube-node
kube-master
[kube-node]
mesos-spark-kube2
mesos-spark-kube3
mesos-spark-kube4
[etcd]
mesos-spark-kube1
[zookeeper_hosts]
mesos-spark-kube1
[mesos:children]
mesos-masters
mesos-slaves
[mesos-slaves]
mesos-spark-kube2
mesos-spark-kube3
mesos-spark-kube4
[mesos-masters]
mesos-spark-kube1

Let is go to kick-up a cluster

Setup cluster with playbook to install mesos

ansible-playbook -i ~/.kargo/inventory/inventory.cfg mesos_install.yml

Verify cluster with slave status

curl http://172.16.169.210:5050/master/state | jq .slaves

Verify cluster with mesos execution

1
2
3
4
5
6
./bin/spark-submit --class org.apache.spark.examples.SparkPi  --master mesos://172.16.169.210:5050  --num-executors 20  --driver-memory 1g --executor-memory 2g --executor-cores 1 --queue thequeue file:///root/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar 10000
./bin/spark-shell --master mesos://172.16.169.210:5050  --num-executors 20
./bin/spark-submit --class org.apache.spark.examples.SparkPi  --master mesos://172.16.169.210:5050  --num-executors 16  --driver-memory 1g --executor-memory 1g --executor-cores 1 --queue thequeue file:///root/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar 10000

Go further for client setup

setup ubuntu 16.04 client

1
2
3
4
5
apt-get install wget curl unzip python-setuptools python-dev mesos=1.2.0-2.0.1
wget http://repos.mesosphere.com/debian/pool/main/m/mesos/mesos_1.2.0-2.0.1.debian8_amd64.deb
dpkg -i mesos_1.2.0-2.0.1.debian8_amd64.deb

setup an spark client

1
2
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
tar -xvf spark-2.1.0-bin-hadoop2.7.tgz

submit spark job from client

1
./bin/spark-submit --class org.apache.spark.examples.SparkPi  --master mesos://172.16.169.210:5050  --num-executors 20  --driver-memory 1g --executor-memory 2g --executor-cores 1 --queue thequeue file:///root/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar 10000

submit to multiple master

1
2
3
./bin/spark-submit --class org.apache.spark.examples.SparkPi  --master mesos://zk://172.16.169.210:2181/mesos  --num-executors 2  --driver-memory 1g --executor-memory 2g --executor-cores 1 --queue thequeue file:///root/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar 10000
./bin/spark-submit --class org.apache.spark.examples.SparkPi  --master mesos://zk://172.16.169.210:2181/mesos  --num-executors 2  --driver-memory 1g --executor-memory 2g --executor-cores 1 --queue thequeue file:///root/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar 10000

Monitoring

mesos monitoring with REST API

master: http://172.16.169.210:5050/metrics/snapshot

slave: http://172.16.169.210:5051/metrics/snapshot

Integrate with spark dispatcher for backend job

mesos dispatcher

1
2
3
4
sbin/start-mesos-dispatcher.sh -h 172.16.169.210 --name dispatcher -m mesos://zk://172.16.169.210:2181/mesos
sbin/start-mesos-dispatcher.sh -h 9.30.101.101 --name dispatcher -m mesos://zk://mesos-medium1:2181/mesos
-z zk://172.16.169.210:2181

Cisco AnyConnect failed to establish connectivity

Posted on 2017-03-17 |

Abstract

Issue: Cisco AnyConnect failed to establish connectivity to VPN server due to hostscan state idel, TOKEN_INVALID, unable to init cert verification.
Root cause: unable to init cert verification due to Java certification validation (Java 1.8.0 121)

Solution:

You has following 6 actions to resume your vpn client of Cisco AnyConnect. Mostly you can resolve issue thru actions No. 4 and No. 5 due to java security validation issue.

  1. “rm ~/.anyconnect”
  2. regenerate the p12 certs from website of your company and import certs to your Mac. IBM
    Notice: Delete previous all IBM VPN Intermediate CA from your keychains.

  3. Double click on the P12 download from website, d select Open to import it using the Keychain Access utility

  4. Important: Delete all IBM Internal Root CA, IBM VPN Intermediate CA, Your private key from system chain, just leave these 3 in login chain

  5. Login from Firefox https://(your vpn endpoint)/CACHE/stc/2/index.html (e.g. https://sasvpn01.cn.ibm.com/CACHE/stc/2/index.html) to validate your certs and java runtime has been setup successfully.

  6. reinstall Cisco AnyConnect.
  7. If below expired certs message prompt, add url to the exception as websites.

    Troubleshooting

  1. Open Java Console from System Preference to enable debug,trace for expired certs.
  2. Logging of cisco
    tail -f /var/log/system.log
    find ~/.cisco 
    tail -f ~/.cisco/hostscan/log/cscan.log 
    tail -f ~/.cisco/hostscan/log/libcsd.log 
    tail -f ~/.cisco/hostscan/log/cstub.log
    
  3. Configuration of VPN client

    ~/.anyconnect

Autoscale workload over Span-VLAN k8s cluster

Posted on 2017-02-07 | In container , network , ha , tech |

Span-VLAN k8s cluster Env

1
2
3
4
5
6
7
8
9
10
11
12
13
:..........:........................:...............:................:............:........:
: id : hostname : primary_ip : backend_ip : datacenter : action : public vlan/private vlan
:..........:........................:...............:................:............:........:
: 27407789 : hydra-calico-dal09-m01 : 169.46.186.87 : 10.173.49.4 : dal09 : - : 1213/1319
: 27407791 : hydra-calico-dal09-w01 : 169.45.171.24 : 10.155.230.151 : dal09 : - : 996/968
: 27407793 : hydra-calico-dal09-w02 : 169.45.131.2 : 10.153.86.9 : dal09 : - : 959/1211
: 27407873 : hydra-calico-dal10-w03 : 169.47.195.84 : 10.171.90.69 : dal10 : - : 1136/1197
: 27407875 : hydra-calico-dal10-w04 : 169.47.195.88 : 10.171.90.94 : dal10 : - : 1136/1197
:..........:........................:...............:................:............:........:
root@hydra-calico-dal09-m01:~# kubectl get nodes --show-labels|grep zone
10.153.86.9 Ready 26d beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,beta.kubernetes.io/zone=dal09,kubernetes.io/hostname=10.153.86.9
10.171.90.94 Ready 26d beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,beta.kubernetes.io/zone=dal10,kubernetes.io/hostname=10.171.90.94

10.171.90.94: zone=dal10
10.153.86.9: zone=dal9

deploy an nginx deployment

1
2
3
kubectl run nginx-hpa --image=nginx --requests=cpu=100m,memory=50M --expose --port=80
kubectl get deployment

run an test against apache server.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
kubectl run -i --tty service-test --image=busybox /bin/sh
$ wget -q -O- http://nginx-hpa.default.svc.cluster.local
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>

manually scale out application cross zone dal9 and dal10

1
2
3
4
5
6
$ kubectl scale deployment nginx-hpa --replicas=2
deployment "nginx-hpa" scaled
kubectl get pod -o wide|grep hpa
nginx-hpa-2266641329-901jn 1/1 Running 0 19m 10.171.29.98 10.171.90.94 <= dal10
nginx-hpa-2266641329-zmnrg 1/1 Running 0 8m 10.98.23.204 10.153.86.9 <=dal9

autoscale, set target cpu usage with 30%

1
2
3
4
5
6
7
kubectl autoscale deployment nginx-hpa --cpu-percent=30 --min=2 --max=10
kubectl get hpa
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
nginx-hpa Deployment/nginx-hpa 30% 0% 2 10 4s

raising workload with 10 concurrent query threads to reach threshold of CPU usage

1
2
3
kubectl run -i --tty load-generator --image=busybox /bin/sh
while true; do wget -q -O- http://nginx-hpa.default.svc.cluster.local &>/dev/null;done &

result: pod scale out from 2 to 3

1
2
3
4
5
6
7
8
9
$ kubectl get hpa
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
nginx-hpa Deployment/nginx-hpa 30% 67% 2 10 5m
$ kubectl get pod -o wide|grep hpa
nginx-hpa-2266641329-2l98k 1/1 Running 0 14s 10.98.23.206 10.153.86.9 <= **newly created pod**
nginx-hpa-2266641329-901jn 1/1 Running 0 38m 10.171.29.98 10.171.90.94
nginx-hpa-2266641329-zmnrg 1/1 Running 0 27m 10.98.23.204 10.153.86.9

drain application from availability zone dal9 to zone dal10 online

1
2
3
4
5
6
7
8
9
10
11
$kubectl drain 10.153.86.9
node "10.153.86.9" cordoned
pod "nginx-hpa-2266641329-zmnrg" evicted
pod "nginx-hpa-2266641329-2l98k" evicted
node "10.153.86.9" drained
$kubectl get pod -o wide|grep hpa
nginx-hpa-2266641329-5w5jt 1/1 Running 0 1m 10.171.29.104 10.171.90.94
nginx-hpa-2266641329-901jn 1/1 Running 0 47m 10.171.29.98 10.171.90.94
nginx-hpa-2266641329-wzwhm 1/1 Running 0 1m 10.171.29.107 10.171.90.94

limitation. local data could not be reserved after drain app from one available zone to another one.

1
error: pods with local storage (use --delete-local-data to override): monitoring-grafana-3730655072-g667m, monitoring-influxdb-957705310-s144d
12…4
Eric Li

Eric Li

11 posts
10 categories
27 tags
© 2017 Eric Li
Powered by Hexo
Theme - NexT.Pisces