Getting Cfengine Community to promise logs into Splunk

One of the benefits of the enterprise version of Cfengine is the ability to capture the logs and the status of cf-agent runs centrally. Auditors have typically asked the IT teams to provide a snapshot of the environment so they can evaluate what is and is not in compliance. However capturing the Cfengine logs centrally means that we can get these reports on the fly. Auditors don’t need a snapshot anymore; they can have full access to see not just current state, but when things were last changed and when hosts go out of compliance. But up until now this data was only available in Cfengine Enterprise. Here is a solution to get the data from Cfengine Community and into Splunk where it can be reported on as needed.

The first thing we need to do is get the data out of Cfengine. Folks like Neil Watson over at Evolve Thinking (https://github.com/evolvethinking/delta_reporting) have created libraries that leverage Cfengine’s reports policies to do this for you. However, this only works if using the Evolve Thinking libraries. A lot of engineers using Cfengine are not doing that. They’ve built their own libraries and promises making the effort to migrate difficult. Instead, I propose updating the Cfengine binaries directly to have them write logs as promises are validated. here is the patch I’m using… Feel free to contact me and offer any fixes to it (I can think of several fixes I want to make).


diff -ur cfengine-3.8.0/libpromises/eval_context.c cfengine-3.8.0-new/libpromises/eval_context.c
--- cfengine-3.8.0/libpromises/eval_context.c 2015-11-27 18:14:19.000000000 -0500
+++ cfengine-3.8.0-new/libpromises/eval_context.c 2015-12-20 10:05:39.095929361 -0500
@@ -24,6 +24,11 @@

#include <eval_context.h>

+#include <sys/file.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
#include <files_names.h>
#include <logic_expressions.h>
#include <syntax.h>
@@ -2526,6 +2531,61 @@
WriterClose(w);
}

+void log_to_promise_summary( const char *handle, const char *msg,
+ PromiseResult status ) {
+
+ FILE* f;
+ int lockres;
+ char *result=malloc(16);
+
+ /* Create a status string for each of the possible status types. The string
+ * goes in the output we generate */
+ switch (status) {
+ case PROMISE_RESULT_SKIPPED:
+ strcpy( result, "skipped" );
+ break;
+ case PROMISE_RESULT_NOOP:
+ strcpy( result, "NOOP" );
+ break;
+ case PROMISE_RESULT_CHANGE:
+ strcpy( result, "repaired" );
+ break;
+ case PROMISE_RESULT_WARN:
+ strcpy( result, "warn" );
+ break;
+ case PROMISE_RESULT_FAIL:
+ strcpy( result, "failed" );
+ break;
+ case PROMISE_RESULT_DENIED:
+ strcpy( result, "denied" );
+ break;
+ case PROMISE_RESULT_TIMEOUT:
+ strcpy( result, "timeout" );
+ break;
+ case PROMISE_RESULT_INTERRUPTED:
+ strcpy( result, "interrupted" );
+ break;
+ default:
+ strcpy( result, "unknown" );
+ break;
+ }
+
+ /* Open the log file and write a status line.
+ * NOTE: We lock the file to prevent other cf-agent runs from overwriting
+ * and corrupting the log file accidentally
+ * TODO: Make the file name a config variable
+ * TODO: No longer open and close the file for EVERY promise. Find a way
+ * to open once (maybe use the cf-agent GenerateReport stub function
+ * that enterprise uses) */
+ f = fopen( "/var/cfengine/cfps.log", "a" );
+ lockres = flock( fileno( f ), LOCK_EX );
+ fprintf ( f, "%d cfPS handle=\"%s\" status=\"%s\" message=\"%s\"\n", (int)time( NULL ), handle, result, msg );
+ lockres = flock( fileno( f ), LOCK_UN );
+ fclose( f );
+
+ free( result );
+}
+
void cfPS(EvalContext *ctx, LogLevel level, PromiseResult status, const Promise *pp, Attributes attr, const char *fmt, ...)
{
/*
@@ -2559,6 +2619,9 @@
Log(level, "%s", msg);
va_end(ap);

+ /* Log to promise_summary.log */
+ log_to_promise_summary( PromiseGetHandle( pp ), msg, status );
+
/* Now complete the exits status classes and auditing */

ClassAuditLog(ctx, pp, attr, status);

The handle field uniquely identifies the promise that is run. I’ve tested the patch on the 3.7.2 and 3.8.0 versions and they work. Creating a Cfengine package from this is outside the scope of this post (there are documents here https://github.com/cfengine/buildscripts). Its a bit ugly so future post will provide an updated Redhat spec file to create the community edition easier.

Once you have it running, the next step is to get it into Splunk. You’ll need an input for this. I put my logs in their own index (I don’t plan on doing searches against syslog or other data types so this is faster for me).


[monitor:///var/cfengine/cfps.log]
index=cfengine
sourcetype=cfps

Once in Splunk we can create some reports dashboards and alerts:

This will give you the current promise status of all your hosts:

index=cfengine | stats latest(status) by host,handle | stats count by latest(status)

This makes for a good alert. Show all hosts that have not tried remediation in the past hour

index=cfengine | stats latest(_time) as newtime by host | eval current=if(newtime>=(now()-3600),"YES","NO") | search current=NO | convert ctime(newtime) as "Last Reported In" | table host "Last Reported In"

What about the current status of set of filtered handles:

index=cfengine sourcetype=cfps handle=ntp_* | table host handle _time

Building on that, auditors may want to know the last time things had issues or were repaired. Simply change the handle to anything you want (maybe NTP compliance?).

index=cfengine sourcetype=cfps handle=ntp_* NOT status="NOOP" | table host handle _time

There’s more we can do with this and as time moves on I’ll probably provide more. If you have any suggestions feel free to post a comment or a message via the contact link.