Aggregations

Query API supports different types of aggregations:

  • aggregate — Single-number aggregation (such as overall NPS). It returns a single value.

  • aggregateList — Returns a list of single number aggregations on records.

  • aggregateRank — Calculates the rank of one dimension/metric relative to all others.

  • aggregateRankList — Returns a list of aggregation ranks given by either multiple metrics applied to the same dataset, or multiple datasets with its own metric. These scenarios are mutually exclusive.

  • aggregateTable — Returns a table where each cell includes the aggregation for the given row and column.

  • aggregateTableList — Returns a list of aggregation tables given by rows, columns and metrics applied to a given dataset.

In general, aggregations have a definition which determines the operation, and another part that outlines the data extraction or data retrieval operation. The definitions change based on the type of aggregation needed.

Sample query

query{
  survey_count:aggregate(
    definition:{
      data:{                 # defines where the data is taken from
        source: FEEDBACK     # FEEDBACK or INVITATIONS
        filter:{             # filters applied to apply to the aggregation response
          fieldIds:"e_responsedate", lt:"2019-02-01", gt:"2018-07-01"
        }
      }
      metric:{               # definition of the metric applied on the filtered data
        count:{              # calculation 
          field:{            # field the calculation will be done on
            id:"a_surveyid"
          }
        }
      }
  })
}

Sample response

"data":{
    "Survey_count": 345
}

List of aggregations

An aggregateList operation is useful to request several aggregate metrics with a single dataset definition.

The following example brings back an average and a count calculation in a single operation.

Sample query

query {
  aggregateList(
    definition: {
      data: { 
	source: FEEDBACK 
	}
      metrics: [
        { average: { field: { id: "q_bp_cc_ltr_scale" } } }
        { count: null }
      ]
    }
  )
}

Sample response

{
  "data": {
    "aggregateList": [
      5.745756020529017,
      112894
    ]
  },
  "errors": null,
  "_links": null,
  "_allowed": [
    "POST",
    "GET"
  ]
}

Ranked aggregations

An aggregateRank operation is useful where the account has a limited scope (such as a call center agent being granted access to only her specific unit) and wants to rank that, relative to some larger dataset.

The following example is made using an account with access to a single unit, looking to rank that unit across all units available in a data view with ID "27".

Sample query

query {
  aggregateRank(definition: {
    fieldKey: "e_unitid",
    metric: { average: { field: { id: "e_ltr" } } },
    minimumSampleSize: 15,
    myData: { source: FEEDBACK },
    rankedData: { dataView: { id: "27" } }
  }) {
    rank
    totalElements
    metricValue
  }
}

Sample response

{
  "data": {
    "aggregateRank": {
      "rank": 263,
      "totalElements": 831,
      "metricValue": 8.658536585365853
    }
  }
}

Ranked aggregations like this are useful when trying to contextualize where your limited data view fits relative to the larger context's data view. If your account has full access to the data anyway, you may need to apply myData filters to achieve the same result.

List of aggregation ranks

An aggregateRankList operation is useful to request several rank metrics given by either multiple metrics applied to the same dataset, or multiple datasets with its own metric.

The following example brings back an average and a count calculation applied to the same dataset in a single operation.

Sample query

query {
  aggregateRankList(definition: {
    fieldKey: "e_unitid",
    metrics: [
        { average: { field: { id: "q_bp_cc_ltr_scale" } } }
        { count: null }
      ],
    minimumSampleSize: 15,
    myData: { source: FEEDBACK },
    rankedData: { dataView: { id: "28" } }
  }) {
    rank
    totalElements
    metricValue
  }
}

Sample response

{
  "data": {
    "aggregateRankList": [
      {
        "rank": 134,
        "totalElements": 200,
        "metricValue": 5.615384615384615
      },
      {
        "rank": 256,
        "totalElements": 594,
        "metricValue": 73
      }
    ]
  },
  "errors": null,
  "_links": null,
  "_allowed": [
    "POST",
    "GET"
  ]
}

Getting an aggregate table

aggregateTable is a generic representation to return any aggregation which could be shaped as a table. The definition has different attributes to use depending on the type of aggregation. Not all the attributes are needed, and there are different attributes to do the same thing. This provides flexibility to model the aggregation in different ways.

Along with this generic definition, the combinations of attributes will be validated in run-time, and appropriate ones are used when the query executes.

Sample query

Get a table of the count of invitations, and the average likelihood to recommend per gender, split by quarters.

query {
aggregateTable(definition:{
    data:{
      source: INVITATIONS
      filter:{       # filters to narrow down the data set to apply the aggregation.
          fieldIds:"e_responsedate", lt:"2019-02-01", gt:"2018-07-01"
        }
    }
    columns:{ #defines how the columns are generated
      segment:{field:{ id: "e_personalgender"}}   
          # Using segments, the columns are created using the options of the field. 
          #  "Slices" can be used here to manually define the columns but in run-time only.
          #  one option is available. Either "slices" or "segments" should be used.
    }
    rows:{
      slices:{items:[     # Manual row definition is done with slices. As in column  
                          #  definition. "Segment" could also be used but only one of 
                          #  those should be used at a time.
        {key:"0"          # index of the slice.
          label:"Q1"      # label for the slice.
          filter:{fieldIds:"e_responsedate", gt:"2018-01-01", lte:"2018-03-31"} #data filter that defines the grouping criteria
        }, 
        {key:"1"
        label:"Q2"
          filter:{fieldIds:"e_responsedate", gt:"2018-04-01", lte:"2018-06-30"}
        }, 
        {key:"2"
        label:"Q3"
          filter:{fieldIds:"e_responsedate", gt:"2018-07-01", lte:"2018-09-30"}
        },
        {key:"3"
        label:"Q4"
          filter:{fieldIds:"e_responsedate", gt:"2018-10-01", lte:"2018-12-31"}
        }
      ]}
    }
    metrics:[ # multiple metrics can be defined
      {count:{field:{id:"a_surveyid"}}},
      {average:{field:{id:"q_ltr_score"}}}
    ]
  }
  ){
    #Here starts the data retrieval part
    columns{ #returns key and labels for columns. Using this query example, this returns e_personalgender options. 
      key 
    label
    }
    columnKeys
    columnLabels
    rows{ #returns key and labels for rows. Using this query example, this returns the rows defined manually as slices.
      key
      label
    }
    rowLabels #same as rows.label
    rowKeys #same as rows.keys
    metricsTable # returns a 3D array. first index is the metric defined in order (zero-based array). On each array position there will be the matrix with the aggregation. 
    metricTable(index:1) #only returns the matrix with the aggregations for "average".
  }
}

Getting a list of aggregate tables

An aggregateTableList operation is useful to request several aggregations which could be shaped as a table, given by either multiple definitions applied to the same dataset, or multiple datasets with its own definition.

In the example below, the first definition brings a filtered set of surveys completed between January 1, 2018 and July 1, 2018 for each value of e_bp_cc_gender_alt (the gender of the person who receives the survey) and uses the count metric applied to field q_bp_cc_ltr_scale (likelihood to recommend). The second definition counts the number of invitation surveys there are in the same time period.

Sample query

query {
  aggregateTableList(
    definitions: [ 
    {
      data: { source: FEEDBACK filter: { fieldIds: "e_responsedate" gte: "2018-01-01" lt: "2018-07-01" } }
      rows: {
        segment: {field: {id: "e_bp_cc_gender_alt"}}
      }
      metrics: {
        customCalculation: {
          name: "bp_count"
          field: { id: "q_bp_cc_ltr_scale" }
        }
      }
    }
    {
      data: { source: INVITATIONS filter: { fieldIds: "e_creationdate" gte: "2018-01-01" lt: "2018-07-01" } }
      metrics: {
        count: null
      }
    }]
  ) {
    metricsTable
  }
}

Sample response

{
  "data": {
    "aggregateTableList": [
      {
        "metricsTable": [
          [
            [
              696
            ]
          ],
          [
            [
              675
            ]
          ]
        ]
      },
      {
        "metricsTable": [
          [
            [
              1090
            ]
          ]
        ]
      }
    ]
  },
  "errors": null,
  "_links": null,
  "_allowed": [
    "POST",
    "GET"
  ]
}

Using R-Fields in Query API

R-Fields are cross-records aggregations that have defined calculations. Query API takes that and aggregates data using the calculation defined in the module.

To query R-Fields use the custom metric (available to R-Fields only).

The metric may be used in any metric object defined for all types of aggregations.

Sample query - Simple aggregate operation

query aggregateNps {
  aggregate(definition: {
    data: {
			source: FEEDBACK,
			filter:{
        fieldIds: "e_responsedate",
        gte: "2017-08-01"
      }
  }
    metric: {
      custom: {
        field: { id: "r_bp_branch_net_promoter_score" }
      }
    }
  })
}

Sample response

{
  "data": {
    "aggregate": 42.671335667833915
  },
  "errors": null,
  "_links": null,
  "_allowed": [
    "POST",
    "GET"
  ]
}

Sample query - aggregateTable operation

query aggregateNpsTable{
    aggregateTable(
        definition:{
            data: {
                source: FEEDBACK
            },
            columnGroups: [
                { slices: { items: [
                    { key:"MAY", label:"May", filter: { fieldIds: "e_responsedate", gte: "2018-05-25" lt: "2018-06-01" } },
                    { key:"JUNE" , label:"June", filter: { fieldIds: "e_responsedate", gte: "2018-06-01" lt: "2018-07-01" } }
                ] } }
            ]
             metrics: [
                { custom: {field: {id: "r_bp_branch_net_promoter_score"} } } 
            ]
        }) {
        columns { key label }
        totalColumns
        metricTable(index:0)
        countsTable
    }
}

Sample response

{
  "data": {
    "aggregateTable": {
      "columns": [
        {
          "key": "MAY",
          "label": "May"
        },
        {
          "key": "JUNE",
          "label": "June"
        }
      ],
      "totalColumns": 2,
      "metricTable": [
        [
          35,
          14.379084967320262
        ]
      ],
      "countsTable": [
        [
          198,
          2823
        ]
      ]
    }
  },
  "errors": null,
  "_links": null,
  "_allowed": [
    "POST",
    "GET"
  ]
}

Available metrics for aggregations

As mentioned above, aggregations have a definition which determines the calculation. Below are the calculations, or metrics, available for aggregation nodes:

  • average — Average of the values of a field.

  • sum — Sum of the values of a field.

  • count — Count of the total number of records containing values for a field, if the latter is not provided, it is a plain count of the total number of records.

  • customCalculation — Applies a custom calculation defined in the setup (which is referenced by its label) and applies it to values provided by a given field.

  • custom — Applies an r-field defined in the setup to the dataset.

  • countUnique — Count of distinct values for a given field.

  • bucketCount — Returns the total number of records matching a given field filter condition. Value hiding compares the minimum sample size with the total count of surveys that have any value for that field.

  • percentage — Calculates the cell record percentage relative to one axis subtotal. For example: if the table has Genre on rows and Age on columns, and you need to calculate the row percentage of a cell with (Male-Young) it does the division:

    • numerator — The survey's count with Male and Young.

    • denominator — The survey's count with male genre and non-null age.

      Note: This metric is only available for aggregateTable.
  • topicPercentage — Percentage 0-100 of surveys matching the corresponding TA topic / Number of surveys which do not specifically match it. Notice this metric is meant to be used in combination with filtering/splits by topic. In the context of phrase-level matching mode, if one of the co-ocurring topics — acting as the context topic — is defined in the query global filters, then this metric acts as the co-ocurrence between that topic and each topic from the rows/columns splits, so topicPercentage(contextTopic, splitTopic) is:

    topicPercentage = Number of surveys with contextTopic + splitTopic co-ocurring / Surveys mentioning splitTopic
  • globalRecordsTopicPercentage — An equivalent metric to topicPercentage, which is different when one of the data filters is wrapped by topicFilter ignoring those filters when computing the number of surveys from the denominator. It is intended to be used in phrase level filtering when a context topic is wrapped by topicFilter and it calculates globalRecordsTopicPercentage(contextTopic, splitTopic) as:

    globalRecordsTopicPercentage = Number of surveys with contextTopic + splitTopic co-ocurring / Number of surveys which do not specifically match them
  • topicImpact — A measurement of the impact of a given topic presence on a metric.

    The formula for impact is:

    topicMetricCount * ( topicMetric - metric ) / ( topicMetricCount - metricCount )

    The impact formula which has 4 parameters:

    • Score on surveys matching the corresponding TA topic.

    • Score samples count on surveys matching the corresponding TA topic.

    • Score on surveys which do not specifically match the topic.

    • Score samples count on surveys which do not specifically match the topic.

    When both numerator and denominator are zero the impact is “0” by definition.

    Notice this metric is meant to be used in combination with filtering/splits by topic.

  • scaledTopicImpact — Same as the topicImpact metric but multiplied by 100.

  • sentimentCount — Number of surveys with positive/negative sentiment. If filtering/splitting by topic is present, only comments matching it are counted.

  • sentimentPercentage — Percentage of surveys with positive/negative sentiment / Number of surveys If filtering/splitting by topic is present, only comments matching it will be counted.

  • sentimentNetPercentage — Net percentage calculated as:

    sentimentPercentage: [POSITIVE, STRONGLY_POSITIVE] - sentimentPercentage: [NEGATIVE, STRONGLY_NEGATIVE]
  • regressionBeta — Beta Calculation of a set of independent fields and a single dependent field.

  • regressionCorrelation — Regression correlation of a set of independent fields and a single dependent field.

  • segmentAverage — Average for the values of a given metric on a given segment.

  • segmentPercentile — Percentile for the values of a given metric on a given segment.

  • filtered — Wrapper that allows to filter data before applying a metric.

  • sumOfProduct — Sum of products of two fields. This is useful for computing regressions.