Splunk, will it Alteryx?

This article is by David Ha and originally appeared on the Alteryx Engine Works Blog here: https://community.alteryx.com/t5/Engine-Works-Blog/Splunk-will-it-Alteryx/ba-p/554043#

 

What is Splunk?

Although Splunk has expanded to many software products, in this blog we will focus on Splunk's core product offering, which is a technology for collecting and analyzing machine generated data. Data can be ingested into Splunk directly from IoT devices, applications, logs, performance monitoring tools, and more. This typically takes place through "Forwarders" which collects and forwards the data to "Indexers". Indexers receive the data and store it in the back end where it can be searched and analyzed.

Splunk provides a very powerful way for IT, DevOps, and Administrators to visualize data from their infrastructure and source systems. Once these connections are setup, metrics, logs, and messages are automatically read into Splunk in real time, where reports can be generated to run on scheduled basis, or on demand to gain insights into system behaviors. In addition to reports, search results can be saved as Dashboards and Alerts as well.

You can even configure Splunk to store data sets such as CSV files and then append additional rows as new records are read from the source system. In the example below, you can see I imported the FuzzyData2.csv Alteryx sample data into Splunk.

screen7.PNG

 

A list of data sources in Splunk.

Splunk also provides a slick visual interface for monitoring the environment, working with saved reports and dashboards, configuring new searches, etc...

screen8.PNG

 

Splunk's Monitoring Console provides great insight into the system.

 

Will it Alteryx?

With some background in Splunk, we can now turn to the question on everyone's mind, if and how Alteryx can integrate with this technology. There are a number of ways that Alteryx can be used to analyze Splunk data. Here are some examples:

1. Splunk's REST APIs and the Alteryx Download Tool. Splunk offers REST API access to various Splunk resources, including data inputs, data outputs, searches, and alerts. These APIs can be accessed via the Alteryx Download Tool to pull data into an Alteryx workflow.

2. Splunk's Python library and the Alteryx Python Tool. Splunk offers a Python SDK with libaries to access Splunk resources using Python code. You can work with data, saved searches, new searches, and more. These libraries can be accessed via python code in the Alteryx Python Tool and then integrated with other Alteryx workflow building blocks.

3. Splunk's ODBC Driver and the Alteryx Input Tool. Splunk provides an ODBC driver which allows us to read data into Alteryx using the standard Input Tool with a generic ODBC connection. Let's dive into this scenario a bit more...

First, configure the Splunk ODBC Driver to connect to your Splunk environment:

screen6.png

Then, in Alteryx, simply choose the ODBC Generic Connection and reference the ODBC DSN you created above.

screen10.png

If the connection succeeds, you'll see a list of "Tables" that can be imported into Alteryx Designer over the ODBC connection. These "Tables" are actually Saved Searches in Splunk.

screen9.png

One thing to be aware of, if you use this method, you will likely observe the following error when trying to run the workflow and read in data:

Error: Input Data (1): Error SQLExtendedFetch: [Splunk][SplunkODBC] (60) Unexpected response from server. Verify the server URL. Error parsing JSON: Text only contains white space(s)

screen3.PNG

This appears to be an issue with the Splunk ODBC Driver. The data is in fact read into Alteryx, but it is inaccessible by any tools downstream in the workflow. The trick here is to enable the "Cache Data" option on the Input tool, and then run the workflow again.

screen4.PNG

The next time the workflow runs, it will build up the data in the cache. You'll still see an error displayed. Then any subsequent workflow runs will leverage the cached data and be successful.

screen2.PNG

This works regardless of whether we are accessing file based content in Splunk, or data from logs or performance metrics:

screen5.PNG

Final Thoughts

Splunk provides a great way to store, search, and monitor machine generated data from sensors, logs, network traffic, and more. Being able to analyze that data to gain business insights makes it even more valuable. With Alteryx, data from Splunk can be combined with other sources, cleansed, and enriched to provide deeper insight.