Asterisk CDR logging to Splunk as JSON

Asterisk call logs are created by default as CSV files. These CSV files are incredibly hard to parse in Splunk via search time extractions. It would be nice to be able to get the data to Splunk and extract them in an easier way… Lets do it in JSON.

Asterisk is an open source PBX. The call logs it stores are called “CDR” in Asterisk parlance and are stored as CSV files. The data in the files is comma seperated and each field is quoted by double quotes. Any quotes inside a field are double quoted. There are no headers in the file so field names would have to be manually entered. A sample line:

"","1234567890","1234567890","home","""Savarese"" <1234567890>","SIP/scott_office-00000000","SIP/broadvoice_bos-00000001","Dial","SIP/1234567890@broadvoice_bos,60,t","2015-08-24 23:28:03","2015-08-24 23:28:11","2015-08-24 23:28:18",15,7,"ANSWERED","DOCUMENTATION","1440458883.0",""

Initially I thought about using a transforms.conf block like this:

DELIMS=,
FIELDS=a,b,c,d,e

The problem there is that the quoting would still be there after it is parsed. And the inner quoting would be a complete mess! I don’t like the CSV format here at all.

Asterisk does have the ability to specify the fields it outputs by default:

Master.csv => ${CSV_QUOTE(${CDR(clid)})},${CSV_QUOTE(${CDR(src)})},${CSV_QUOTE(${CDR(dst)})},${CSV_QUOTE(${CDR(dcontext)})},${CSV_QUOTE(${CDR(channel)})},${CSV_QUOTE(${CDR(dstchannel)})},${CSV_QUOTE(${CDR(lastapp)})},${CSV_QUOTE(${CDR(lastdata)})},${CSV_QUOTE(${CDR(start)})},${CSV_QUOTE(${CDR(answer)})},${CSV_QUOTE(${CDR(end)})},${CSV_QUOTE(${CDR(duration,f)})},${CSV_QUOTE(${CDR(billsec,f)})},${CSV_QUOTE(${CDR(disposition)})},${CSV_QUOTE(${CDR(amaflags)})},${CSV_QUOTE(${CDR(accountcode)})},${CSV_QUOTE(${CDR(uniqueid)})},${CSV_QUOTE(${CDR(userfield)})},${CDR(sequence)}

So, looking at the format of the cdr_custom.conf file in the configuration it just outputs a string to a file. That string has several variables. I can change the string to output anything I want. So, why not have it output raw JSON?

Master.json => {"clid":${QUOTE(${CDR(clid)})}, "src":${QUOTE(${CDR(src)})}, "dst":${QUOTE(${CDR(dst)})}, "dcontext":${QUOTE(${CDR(dcontext)})}, "channel":${QUOTE(${CDR(channel)})}, "dstchannel":${QUOTE(${CDR(dstchannel)})}, "lastapp":${QUOTE(${CDR(lastapp)})}, "lastdata":${QUOTE(${CDR(lastdata)})}, "start":${QUOTE(${CDR(start)})}, "answer":${QUOTE(${CDR(answer)})}, "end":${QUOTE(${CDR(end)})}, "duration":${QUOTE(${CDR(duration, f)})}, "billsec":${QUOTE(${CDR(billsec, f)})}, "disposition":${QUOTE(${CDR(disposition)})}, "amaflags":${QUOTE(${CDR(amaflags)})}, "accountcode":${QUOTE(${CDR(accountcode)})}, "uniqueid":${QUOTE(${CDR(uniqueid)})}, "userfield":${QUOTE(${CDR(userfield)})}, "sequence":${QUOTE(${CDR(sequence)})}}

A big difference is the CSV_QUOTE function becomes just plain QUOTE. What this does is take the inner double quotes and instead use a backslash to escape the inner quote. Both of these functions as asterisk defaults and can be found in the Asterisk documentation. Since this is JSON formatted I added field names in there and the start and end brackets.

To read the file all I did was use a generic splunk inputs.conf monitor block on my universal forwarder and shipped the logs in. I didn’t need to create a props.conf block for this input as Splunk just figured out it was JSON and did the right thing without any extra configuration.

Keep in mind one thing… voice calls are latency sensitive and thus we want to make sure Asterisk isn’t overwhelmed. These logs are slightly longer than the CSV logs and on a busy server will take longer to write. If scalability is a concern, this might not be the right solution for you. While writing flat files is incredibly fast and the added bytes don’t look like they are much you might be better off writing to one of Asterisk’s other cdr backends. And of course, test before deploying.