Multivalue and array functions - Splunk Documentation (2024)

For an overview about the stats and charting functions, seeOverview of SPL2 stats and chart functions.

dataset()

The dataset function aggregates events into arrays of SPL2 field-value objects.See object in Built-in data types.

Usage

You can use this function in the SELECT clause in the from command and with the stats command.

There are three supported syntaxes for the dataset() function:

SyntaxData returned
dataset()The function syntax returns all of the fields in the events that match your search criteria. Use with or without a BY clause.
dataset<field-list>The function syntax returns only the specified fields in each event that match your search criteria. The list of fields must be a comma-separated list. Use with or without a BY clause.
dataset_nogroupby( [field-list] )Use only with a BY clause, such as the GROUPBY clause in the from command or the BY clause with the stats command.


This function syntax removes the group by field from the arrays that are generated. You can return all of the fields in the events or only the specified fields that match your search criteria. When used with the GROUPBY clause, include the group by field in the SELECT clause.

Different output based on the BY clause used

When you specify a BY clause field, the results are organized by that field. The values in the group by field are included in the array.

However, the output you see depends on whether you use the GROUPBY clause with the from command or the BY clause with the stats command:

  • The GROUPBY clause in the from command returns only one field that contains the arrays, unless you specifically add the group by field to the SELECT clause.
  • The BY clause in the stats command returns two fields. One field contains the values from the BY clause field and another field contains the arrays.

For an illustration of this behavior, see the examples below that include a BY clause.

Examples

1. Return all fields and values in a single array

You can create a dataset array from all of the fields and values in the search results.

Consider this set of data:

_timedepartmentusername
13 Apr 2023 13:02:45.000 PMEngineeringClaudia Garcia
13 Apr 2023 10:52:41.000 AMITVanya Patel
13 Apr 2023 06:23:48.000 AMPersonnelAlex Martin

Use the dataset function to create an array from all of the fields and values using the following search:

...| stats dataset()

The results look something like this:

dataset
[{_time: 13 Apr 2023 13:02:45.000 PM, department: Engineering, username: "Claudia Garcia"}, {_time: 13 Apr 2023 10:52:41.000 AM, department: IT, username: "Vanya Patel"}, {_time: 13 Apr 2023 06:23:48.000 AM, department: Personnel, username: "Alex Martin"}]

If you don't specify any fields with the dataset function, all of the fields are included in a single dataset array. A single dataset array is also returned if you specify a wildcard with the dataset function, for example: dataset(*).

2. Return specific fields in an array

You can specify multiple fields to populate the dataset arrays.

Consider this set of data:

_timehostactiondepartmentuidusername
13 Apr 2023 13:02:45.000 PMmailsv2Failed passwordEngineering1066Claudia Garcia
13 Apr 2023 10:52:41.000 AMmailsv1Failed passwordIT1815Vanya Patel
13 Apr 2023 06:23:48.000 AMmailsv3Session closedPersonnel1916Alex Martin
13 Apr 2023 06:23:48.000 AMmailsv3Failed passwordEngineering1862Wei Zhang
12 Apr 2023 20:18:36.000 PMmailsv1Session closedEngineering1690Rutherford Sullivan

The following search creates a dataset array with only the values from the department and username fields.

| FROM main SELECT dataset(department, username)

The results look something like this:

dataset(department, username)
[{department: Engineering, username: "Claudia Garcia"}, {department: IT, username: "Vanya Patel"}, {department: Personnel, username: "Alex Martin"}, {department: Engineering, username: "Wei Zhang"},{department: Engineering, username: "Rutherford Sullivan"}]

3. Use a BY clause to create separate arrays

You can use a BY clause to organize the search results into separate dataset arrays.

Consider this set of data, which is the same set of data in the previous example:

_timehostactiondepartmentuidusername
13 Apr 2023 13:02:45.000 PMmailsv2Failed passwordEngineering1066Claudia Garcia
13 Apr 2023 10:52:41.000 AMmailsv1Failed passwordIT1815Vanya Patel
13 Apr 2023 06:23:48.000 AMmailsv3Session closedPersonnel1916Alex Martin
13 Apr 2023 06:23:48.000 AMmailsv3Failed passwordEngineering1862Wei Zhang
12 Apr 2023 20:18:36.000 PMmailsv1Session closedEngineering1690Rutherford Sullivan

The following search organizes the dataset arrays by department. Each dataset array contains the user ID and username. The AS keyword is used to rename the output field employees. Otherwise the output field would be named dataset(uid, username).

| FROM main GROUP BY department SELECT dataset(uid, username) AS employees

Because this search uses the from command, the GROUP BY clause is used before the SELECT clause. If you use this function with the stats command, you would specify the BY clause.

The results look something like this:

employees
[{uid: 1066, username: "Claudia Garcia"}, {uid: 1690, username: "Rutherford Sullivan"}, {uid: 1862, username: "Wei Zhang"}]
[{uid: 1815 , username: "Vanya Patel"}]
[{uid: 1916, username: "Alex Martin"}]

Notice that the GROUP BY field is not returned in the output. To return the GROUP BY field, you must include that field in the SELECT clause.

4. BY clause output differences

The previous example shows the output using the from command GROUPBY clause when you specify a set of field names with the dataset function.

The following search uses the from command GROUPBY clause and returns all of the fields:

| FROM main GROUP BY department SELECT dataset() AS employees

The results look like this:

employees
[{host: mailsv2, action: Failed password, department: Engineering, uid: 1066, username: "Claudia Garcia"}, {host: mailsv1, action: Session closed, department: Engineering, uid: 1690, username: "Rutherford Sullivan"}, {host: mailsv3, action: Failed password, department: Engineering, uid: 1862, username: "Wei Zhang"}]
[{host: mailsv1, action: Failed password, department: IT, uid: 1815 , username: "Vanya Patel"}]
[{host: mailsv3, action: Session closed, department: Personnel, uid: 1916, username: "Alex Martin"}]

Here is a similar search using the stats command:

| FROM main | stats dataset() BY department AS employees

The results look like this:

departmentemployees
Engineering[{host: mailsv2, action: Failed password, department: Engineering, uid: 1066, username: "Claudia Garcia"}, {host: mailsv1, action: Session closed, department: Engineering, uid: 1690, username: "Rutherford Sullivan"}, {host: mailsv3, action: Failed password, department: Engineering, uid: 1862, username: "Wei Zhang"}]
IT[{host: mailsv1, action: Failed password, department: IT, uid: 1815 , username: "Vanya Patel"}]
Personnel[{host: mailsv3, action: Session closed, department: Personnel, uid: 1916, username: "Alex Martin"}]

The stats command returns two fields, the BY clause field department and the employees field.

Notice that the group by field, department, is included in the arrays with both the GROUP BY clause in the from command and the BY clause in the stats command. You can remove the group by field from the arrays by using the dataset_nogrouby() syntax, described in the next example.

5. Using the dataset_nogroupby() syntax

The previous example shows the output using a BY clause in the search, with both the from and the stats commands. The group by field appears in the arrays. To remove the group by field from the arrays, use the dataset_nogroupby() syntax.

The following search uses the from command with the dataset_nogroupby() syntax:

| FROM main GROUP BY department SELECT dataset_nogroupby() AS employees

The results look like this:

employees
[{host: mailsv2, action: Failed password, uid: 1066, username: "Claudia Garcia"}, {host: mailsv1, action: Session closed, uid: 1690, username: "Rutherford Sullivan"}, {host: mailsv3, action: Failed password, uid: 1862, username: "Wei Zhang"}]
[{host: mailsv1, action: Failed password, uid: 1815 , username: "Vanya Patel"}]
[{host: mailsv3, action: Session closed, uid: 1916, username: "Alex Martin"}]

The search returns a set of arrays, but there is no way to determine what the group arrays belong to.

However, when you include the group by field in the SELECT clause or use the stats command with dataset_nogroupby() syntax the group by information is returned n a separate field.

For example, the following two searches return the same results:

| FROM main GROUP BY department SELECT department, dataset_nogroupby() AS employees

| FROM main | stats dataset_nogroupby() BY department AS employees

The results look like this:

departmentemployees
Engineering[{host: mailsv2, action: Failed password, uid: 1066, username: "Claudia Garcia"}, {host: mailsv1, action: Session closed, uid: 1690, username: "Rutherford Sullivan"}, {host: mailsv3, action: Failed password, uid: 1862, username: "Wei Zhang"}]
IT[{host: mailsv1, action: Failed password, uid: 1815 , username: "Vanya Patel"}]
Personnel[{host: mailsv3, action: Session closed, uid: 1916, username: "Alex Martin"}]

list(<value>)

The list function returns a multivalue entry from the values in a field. The order of the values reflects the order of the events.

Usage

You can use this function with the stats, streamstats, and timechart commands.

  • If more than 100 values are in the field, only the first 100 are returned.
  • This function processes field values as strings.

Example

To illustrate what the list function does, let's start by generating a few simple results.

  1. Use the from and streamstats commands to generate a set of 11 results that are simply timestamps and a count of the results, which are used as row numbers. Empty arrays with the from command generate the timestamps.

    | from [{},{},{},{},{},{},{},{},{},{},{}] | streamstats count AS rowNumber

    There are 11 results. Th first few results look something like this:

    _timerowNumber
    2019-10-10T07:19:02.000-07:001
    2019-10-10T07:19:02.000-07:002
    2019-10-10T07:19:02.000-07:003
    2019-10-10T07:19:02.000-07:004
    2019-10-10T07:19:02.000-07:005

    Notice that each result appears on a separate row, with a line between each row.

  2. Add the stats command with the list function to the search. The numbers are returned in ascending order in a single, multivalue result.

    | from [{},{},{},{},{},{},{},{},{},{},{}] | streamstats count AS rowNumber | stats list(rowNumber) AS numbers

    The results look something like this:

    numbers
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11

    Notice that this is a single result with multiple values. There are no lines between each value.

  3. Compare this result with the results returned by the values function.

pivot(<key>,<value>)

The pivot function aggregates the values in a field and returns the results as an object. See object in the list of built-in data types.

Usage

The <key> argument can be a single field or a string template, which can reference multiple fields.

The <value> argument must be an aggregate, such as count() or sum().

You can use this function with the SELECT clause in the from command, or with the stats command.

Examples

Using the pivot function

Consider the following data that's in the main dataset:
statushost
200www1
200www2
200www1
400www2
400www1
400www3
403www2
404www2
Use the pivot function to create an object that summarizes the HTTP status values for each separate host. The search looks like this:

FROM main GROUP BY host SELECT host, pivot(status, count())

The results look something like this:
hostpivot(status,count())
www1{"200":2,"400":1}
www2{"200":1,"400":1,"403":1,"404":1}
www3{"400":1}
You can use the pivot function with the stats command to accomplish the same results:

FROM main | stats pivot(status,count()) as pivotStatus by host

In this search the output field, pivot(status,count()), is renamed using the AS clause.
This search returns the same results as the previous search and looks something like this:
hostpivotStatus
www1{"200":2,"400":1}
www2{"200":1,"400":1,"403":1,"404":1}
www3{"400":1}

Creating nested objects with the pivot function

You can use nested pivot functions to create nested objects.
Consider the following data:
statushostaction
200www1purchase
200www2purchase
200www1addtocart
200www2purchase
400www1addtocart
400www1purchase
403www2changequantity
404www1view
404www3purchase
404www2changequantity
404www1purchase
You can use one pivot function as the <value> argument in another pivot function to create nested objects:

FROM main GROUP BY status SELECT status, pivot(host, pivot(action, count())) AS nestedPivot

The results look something like this:
statusnestedPivot
200{"www1":{"addtocart":1,"purchase":1},"www2":{"purchase":2}}
400{"www1":{"addtocart":1,"purchase":1}}
403{"www2":{"changequantity":1}}
404{"www1":{"purchase":1,"view":1},"www2":{"changequantity":1},"www3":{"purchase":1}}

Using a string template with the pivot function

You can use a string template in the <value> argument of the pivot function.
In this example, the string template contains two template expressions, ${name} and ${city}, which are field names. The entire string template is enclosed in double quotation marks:

SELECT pivot("${name} in ${city}", count()) AS mylist FROM main

Suppose these are the values in the name and city fields:
namecitydate
ClaudiaLondon2020-04-13
AlexBerlin2020-04-19
WeiSydney2020-04-21
ClaudiaLondon2020-04-30
The results look something like this:
mylist
{"Alex in Berlin":1,"Claudia in London":2,"Wei in Sydney":1}
You can add the flatten command to the search:

SELECT pivot("${name} in ${city}", count()) AS mylist FROM main | flatten mylist

The results look something like this:
Alex in BerlinClaudia in LondonWei in Sydneymylist
121{"Alex in Berlin":1,"Claudia in London":2,"Wei in Sydney":1}

unpivot(<key>,<value>)

The unpivot function transposes search results into an array of one or more objects.

Usage

Use the <key> and <value> parameters to label the values in the objects. These parameters must be strings.

You can use this function in the SELECT clause in the from command and with the stats command.

Examples

1. Unpivot columnar search results

The following search results show the sum of the values in the bytes field for each host:

hostSumOfBytes
www192726
www2113377
www3115699
www4105869

The following search uses the unpivot function to transpose the results into an array:

... | stats unpivot("field", "value") by host

The results look like this:

hostunpivot("field", "value")
www1[{"field":"SumOfBytes","value":92726}]
www2[{"field":"SumOfBytes","value":113377}]
www3[{"field":"SumOfBytes","value":115699}]
www4[{"field":"SumOfBytes","value":105869}]

If you don't specify a BY clause, the results look like this:

unpivot("field", "value")
[{"field":"SumOfBytes","value":92726},{"field":"host","value":"www1"},{"field":"SumOfBytes","value":113377},{"field":"host","value":"www2"},{"field":"SumOfBytes","value":115699},{"field":"host","value":"www3"},{"field":"SumOfBytes","value":105869},{"field":"host","value":"www4"}]

2. Unpivot arrays

The following search results, in the $pivot view, contain a set of arrays with a count of the HTTP status codes organized by action:

actionCountByActions
addtocart{"200":28,"400":3,"408":1,"505":2}
changequantity{"200":4}
purchase{"200":37,"404":1,"503":1,"505":1}
remove{"200":8}
view{"200":21,"400":1,"408":1,"505":2}

The following search uses the unpivot function to transpose the $pivot view results:

FROM $pivot GROUPBY action SELECT action, unpivot("Stats", "Values") AS AggregationSummary

The results look like this:

actionAggregationSummary
addtocart[{"Stats":"CountByActions","Values":{"200":28,"400":3,"408":1,"505":2}}]
changequantity[{"Stats":"CountByActions","Values":{"200":4}}]
purchase[{"Stats":"CountByActions","Values":{"200":37,"404":1,"503":1,"505":1}}]
remove[{"Stats":"CountByActions","Values":{"200":8}}]
view[{"Stats":"CountByActions","Values":{"200":21,"400":1,"408":1,"505":2}}]

values(<value>)

The values function returns a list of the distinct values in a field as a multivalue entry.

Usage

You can use this function with the stats, streamstats, and timechart commands.

  • By default there is no limit to the number of values returned.
  • This function processes field values as strings.

The order of the values is lexicographical.

Lexicographical order sorts items based on the values used to encode the items in computer memory. In Splunk software, this is almost always UTF-8 encoding, which is a superset of ASCII.

  • Numbers are sorted before letters. Numbers are sorted based on the first digit. For example, the numbers 10, 9, 70, 100 are sorted lexicographically as 10, 100, 70, 9.
  • Uppercase letters are sorted before lowercase letters.
  • Symbols are not standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.

Example

To illustrate what the values function does, let's start by generating a few simple results.

  1. Use the from and streamstats commands to generate a set of 11 results that are simply timestamps and a count of the results, which are used as row numbers. Empty arrays with the from command generate the timestamps.

    | from [{},{},{},{},{},{},{},{},{},{},{}] | streamstats count AS rowNumber

    There are 11 results. Th first few results look something like this:

    _timerowNumber
    2019-10-10T07:19:02.000-07:001
    2019-10-10T07:19:02.000-07:002
    2019-10-10T07:19:02.000-07:003
    2019-10-10T07:19:02.000-07:004
    2019-10-10T07:19:02.000-07:005

    Notice that each result appears on a separate row, with a line between each row.

  2. Add the stats command with thevalues function the the search. The results are returned in lexicographical order.

    | from [{},{},{},{},{},{},{},{},{},{},{}] | streamstats count AS rowNumber | stats values(rowNumber) AS numbers

    The results look something like this:

    numbers
    1
    10
    11
    2
    3
    4
    5
    6
    7
    8
    9

    Notice that this is a single result with multiple values. There are no lines between each value.

  3. Compare these results with the results returned by the list function.

See also

Function information
Overview of SPL2 stats and chart functions
Quick Reference for SPL2 Stats and Charting Functions
Naming function arguments in the SPL2 Search Manual
Evaluation functions for JSON and multivalue data
JSON functions
Multivalue eval functions
Multivalue and array functions - Splunk Documentation (2024)

References

Top Articles
Latest Posts
Article information

Author: Arline Emard IV

Last Updated:

Views: 5798

Rating: 4.1 / 5 (52 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Arline Emard IV

Birthday: 1996-07-10

Address: 8912 Hintz Shore, West Louie, AZ 69363-0747

Phone: +13454700762376

Job: Administration Technician

Hobby: Paintball, Horseback riding, Cycling, Running, Macrame, Playing musical instruments, Soapmaking

Introduction: My name is Arline Emard IV, I am a cheerful, gorgeous, colorful, joyous, excited, super, inquisitive person who loves writing and wants to share my knowledge and understanding with you.