As organizations grow and mature, so does the organization’s data. Eventually, the data may become too ‘big’ or too complicated for traditional analytics tools. This growth and resulting challenge has led to many well-established artificial intelligence (AI) applications in the marketplace.
The hype about AI technologies is big: everyone is talking about how AI will revolutionize our lives, touch every home, and solve all our problems. But the promise of AI is not always realized. How can your organization make the leap to AI nirvana?
As an enterprise, your infrastructure is likely rather traditional, and your classic service-based solutions are developed and run within a perfectly adjusted workflow that must be maintained. To adopt AI technology, you must find a way to leverage this technology while making the technology compatible with your internal workflows.
In this post, we’ll show you how Shiny (R) can help you adopt and integrate AI technology.
There are two main programming languages for an AI/data analysis: Python and R. Both languages need a wrapping framework (for example, Flask for Python or Shiny for R) for interaction and visualization.
Shiny (R) is powerful and user-friendly tool, which is why we recommend a continuous integration and continuous deployment (CI/CD) approach for a Shiny (R)-in-Kubernetes cloud solution.
Here is a Shiny (R) example:
This is the big-picture view of a typical Shiny (R) development flow:
The R project is basically a bunch of scripts. We must add these scripts to the Docker Image, and either configure a connection to a database or mount a disk volume with data to analyze.
Next, we run unit tests to ensure they are working properly. We’ve chosen Jenkins because it’s already a part of the technology stack in our project. (We use Jenkins to build and deliver developed services.)
Here’s how this process works:
To implement this scenario, we will:
- Build a Docker Image with Jenkins, Docker, and Kubernetes Control Bundled.
- Build a Shiny (R) Docker Image.
- Deploy Shiny (R) and Jenkins in Kubernetes.
- Configure Jenkins.
Build a Docker Image with Jenkins, Docker, and Kubernetes Control Bundled
We have created a Docker file from the official Jenkins Docker Image (jenkins/jenkins:lts).
Inside the Kubernetes cluster, you are basically in a Docker-inside-Docker situation. Install the Docker CE and pass-through /var/run/docker.sock from Kubernetes (so we share the same Docker Agent).
You must also install Kubectl to control the Kubernetes cluster. To do this, place the Kubernetes “config” file into the user’s home directory in order to have access to the cluster without any additional setup.
The resulting Docker file for our Jenkins will look like this:
FROM jenkins/jenkins:lts EXPOSE 8080 50000 USER root # Install prerequisites for Docker RUN apt-get update && apt-get install -y sudo iptables libsystemd-journal0 init-system-helpers libapparmor1 libltdl7 libseccomp2 libdevmapper1.02.1 && rm -rf /var/lib/apt/lists/* ENV DOCKER_VERSION=docker-ce_17.03.0~ce-0~ubuntu-trusty_amd64.deb ENV KUBERNETES_VERSION=v1.6.6 # Set up Docker RUN wget https://download.docker.com/linux/ubuntu/dists/trusty/pool/stable/amd64/$DOCKER_VERSION RUN dpkg -i $DOCKER_VERSION # Set up Kubernetes RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/$KUBERNETES_VERSION/bin/linux/amd64/kubectl RUN chmod +x ./kubectl RUN mv ./kubectl /usr/local/bin/kubectl # Configure access to the Kubernetes Cluster ADD install/config ~/.kube ENTRYPOINT ["/bin/tini", "--", "/usr/local/bin/jenkins.sh"]
Build a Shiny (R) Docker Image
You can speed up the building process of new Docker Images by inheriting from the base Docker Image (with just Shiny (R), named “shiny-r”) and the deployment Docker Image (inherited from the base one and named “shiny-r-bundle”).
For the base Docker Image, we use a CentOS 6.6 because we plan to use the Shiny Server for a CentOS 6. Our next steps are:
- Install R (from yum repository).
- Install database access libraries (unixodbc, freetds).
- Override some options in the .Rprofile file.
- Install the required R packages: shiny, markdown (for reporting), testthat (for unit tests).
- Install ‘RODBC’ R package (for database access).
- Install an open-source version of the Shiny Server.
The base Dockerfile for Shiny (R) is:
FROM centos:6.6 RUN rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm RUN yum -y install yum-plugin-ovl RUN yum -y install R RUN yum -y install unixodbc freetds RUN echo "options(repos = c(CRAN = \"https://cran.rstudio.com\"))" > ~/.Rprofile RUN R -e "install.packages('shiny')" RUN R -e "install.packages('rmarkdown')" RUN R -e "install.packages('testthat')" RUN R -e "install.packages('RODBC', type = 'source')" RUN R -e "sessionInfo()" ENV R_STUDIO_VERSION=shiny-server-184.108.40.2068-rh6-x86_64.rpm RUN curl https://download3.rstudio.org/centos6.3/x86_64/$R_STUDIO_VERSION > $R_STUDIO_VERSION RUN yum clean all; yum -y install --nogpgcheck $R_STUDIO_VERSION. EXPOSE 3838 CMD ["/opt/shiny-server/bin/shiny-server"]
In order for the Shiny to work, install the ‘shiny’ and ‘markdown’ packages. ‘Testthat’ will be used for unit testing, and ‘RODBC’ will be used for accessing traditional RDBMS.
To deploy the Shiny (R) project, create the inherited Docker Image, which will also conduct unit tests during the build:
FROM some-registry.com/shiny-r:latest COPY config/* /usr/local/etc/ RUN rm /srv/shiny-server/index.html RUN rm -rf /srv/shiny-server/sample-apps COPY install /srv/shiny-server RUN echo; echo "TESTS:"; \ for file in $(find /srv/shiny-server -name TEST_RUNNER.R); \ do \ echo " --------- RUN TEST: $file -------------"; \ cd $(dirname "$file"); \ R --quiet --no-save < $file; \ done CMD ["/opt/shiny-server/bin/shiny-server"]
Supply test runner scripts for every R project (named ‘TEST-RUNNER.R’), and execute these scripts during the build of the Docker Image.
Example of “TEST_RUNNER.R”:
library('testthat') source('server.R') test_dir('tests', reporter = 'Summary')
This test runner verifies ‘server.R’ executing all testing scripts in the directory ‘tests’.
Example of an R unit test (‘test/test_server.R’):
expect_that(Fibonacci(-1), throws_error()) expect_that(Fibonacci(0) == 0, is_true()) expect_that(Fibonacci(1) == 1, is_true()) expect_that(Fibonacci(2) == 1, is_true()) expect_that(Fibonacci(10) == 55, is_true())
Deploy Shiny (R) and Jenkins in Kubernetes
To deploy Shiny (R) and Jenkins into the Kubernetes cluster, we must supply deployment and service definitions:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: shiny-r namespace: default labels: app: shiny-r spec: replicas: 1 template: metadata: labels: app: shiny-r spec: containers: - name: shiny-r imagePullPolicy: Always image: some-registry.com/shiny-r-bundle:latest ports: - containerPort: 3838 resources: limits: cpu: 100m memory: 4Gi requests: cpu: 100m memory: 4Gi readinessProbe: httpGet: path: / port: 3838 volumeMounts: - name: shiny-r-storage mountPath: /opt/shiny volumes: - name: shiny-r-storage --- apiVersion: v1 kind: Service metadata: name: shiny-r-service namespace: default labels: app: shiny-r spec: type: LoadBalancer ports: - port: 3838 selector: app: shiny-r apiVersion: extensions/v1beta1 kind: Deployment metadata: name: jenkins-ci spec: replicas: 1 template: metadata: labels: name: jenkins-ci spec: containers: - name: jenkins-ci imagePullPolicy: Always image: some-registry.com/jenkins-ci:latest ports: - containerPort: 8080 - containerPort: 50000 readinessProbe: tcpSocket: port: 8080 initialDelaySeconds: 40 periodSeconds: 20 securityContext: privileged: true volumeMounts: - mountPath: /var/run name: docker-sock volumes: - name: docker-sock hostPath: path: /var/run --- apiVersion: v1 kind: Service metadata: name: jenkins-ci-lb spec: type: LoadBalancer ports: - name: jenkins port: 8080 targetPort: 8080 - name: jenkins-agent port: 50000 targetPort: 50000 selector: name: jenkins-ci
- You will need one Jenkins node for very low hardware requirements. Two 2 CPU cores and 512M RAM should be sufficient.
- Determine the number of Shiny (R) instances based on the number of active users and high-availability requirements. Shiny (R) nodes should not starve on computing power: ensure you provide maximum available CPU cores and enough RAM.
- In case of the multiple Shiny (R) nodes, you must use a “sticky sessions” in your load balancing (for example, using the Ingres Service instead of the LoadBalancer service).
Make the following two low-level configurations before starting to work with Jenkins in Kubernetes:
- Set parameter excludeClientIPFromCrumb=true in the file /var/jenkins_home/config.xml to fix a “No valid crumb was included in the request” error.
- Do a “docker login” so Jenkins’ Docker should login and use the correct Docker Registry.
Now we can create the new Jenkins project:
Next, we add the build parameter:
Then we configure access to the source code repository:
And, finally, we add a build step:
cp -r r/* $ROOT_DOCKER/install #Docker build and publish DOCKER_NAME="shiny-r-bundle" DOCKER_IMAGE="some-registry.com/$DOCKER_NAME:$BUILD_NUMBER" cd $ROOT_DOCKER docker pull some-registry.com/shiny-r docker build -t $DOCKER_NAME . docker tag $DOCKER_NAME $DOCKER_IMAGE docker push $DOCKER_IMAGE #Kubernetes redeploy kubectl set image deployment/shiny-r shiny-r=$DOCKER_IMAGE
In this script, we copy R script files, rebuild the ‘shiny-r-bundle‘ Docker Image, and rollout a new Docker Image in the Kubernetes cluster.
Let’s take a look at the results of our efforts. In this example, we’ve set up git polling for every minute (cron expression ‘* * * * *’), and we have committed some code changes to this repository.
Redeploying in one minute is a great result for a CI/CD, giving us a large time reserve for the rigorous unit tests. Check out our on Running Spark with Jupyter Notebook & HDFS on Kubernetes to learn more about running data science worklaod on Kubernetes.
Interested in simplifying the management of your Kubernetes cluster? Consider using a Kubernetes management platform such as Kublr.