JIRA - Time spent in column
Recently I’ve been working on a small script that gathers information about Jira issues.
We wanted to know the time that a given ticket spends in each column on our kanban board. This can be useful for many reasons: for instance, you can measure the time an issue spent in the Open
status to get a ballpark estimate of the average time a ticket is waiting to be assigned. Or you can measure the average time in In Progress
to get an idea of velocity in general. You can of course measure any custom status (column) as well.
There is a plugin called the Time in Status for Jira Cloud, which accomplished this purpose. However, if you just want something free, you can hand-roll your custom reporting script like this.
# Requirements
- Access to Jira so that you can generate an access token and query boards
- You need to generate a Jira API access token following the instructions here.
- You need to export it like so:
export JIRA_TOKEN=<your Jira token here>
- And your email:
export JIRA_USER=<your email here>
- You need to export it like so:
sqlite3
(> 3.25
, for theLAG
window function)python3
# Details
For simplicity, I’ve used a SQLite database with only one table.
Please forgive the denormalised schema, but I think it’s best to store everything in one table for this simple use case.
In schema.sql
:
CREATE TABLE IF NOT EXISTS changelog (
issue_key TEXT NOT NULL,
from_column TEXT NOT NULL,
to_column TEXT NOT NULL,
timestamp TEXT NOT NULL,
issue_created TEXT NOT NULL
)
You can run this migration with sqlite3
$ sqlite3 changelog.db
$ sqlite> .read schema.sql
Create a python file to define the functions for getting the required details:
# time_in_column.py
import os
import sys
from datetime import datetime
import sqlite3
import requests
from requests.auth import HTTPBasicAuth
USER = os.getenv('JIRA_USER')
TOKEN = os.getenv('JIRA_TOKEN')
# This usually looks like"<company>.atlassian.net/rest/"
JIRA_API_BASE = '<your jira api base URL>'
DB_CONN = sqlite3.connect('changelog.db')
DATE_FORMAT = '%Y-%m-%dT%H:%M:%S.%f%z'
You can get all issues on a given board with this function:
def get_issues_from_board(board_id):
endpoint = JIRA_API_BASE + f'agile/1.0/board/{board_id}/issue'
res = requests.get(endpoint, auth=HTTPBasicAuth(USER, TOKEN))
# Could do some error handling here
issues = res.json().get('issues')
if not issues:
print("No issues found")
return []
for issue in issues:
yield issue
You can get the entire changelog of a given issue like so:
def get_issue_changelog(issue):
issue_id = issue.get('id')
key = issue.get('key')
issue_created = issue.get('fields').get('created')
issue_created = datetime.strptime(issue_created, DATE_FORMAT)
issue_changelog_endpoint = (
JIRA_API_BASE + f'api/3/issue/{issue_id}?expand=changelog'
)
res = requests.get(issue_changelog_endpoint, auth=HTTPBasicAuth(USER, TOKEN)).json()
changelog = res.get('changelog')
histories = get_histories(changelog)
for item in histories:
from_column, to_column, timestamp = item
yield (key, from_column, to_column, timestamp, issue_created)
The get_histories
function used above serves to get the status transitions of an issue. It looks like this:
def get_histories(changelog):
histories = changelog.get('histories')
histories = sorted(histories, key=lambda item: item.get('created'))
transitions = []
for history in histories:
# Timestamp of the state transition
created_at = history.get('created')
created_at = datetime.strptime(created_at, DATE_FORMAT)
for item in history.get('items'):
if item.get('field') == 'status':
to_column = item.get('toString')
from_column = item.get('fromString')
transitions.append((from_column, to_column, created_at))
return transitions
Finally, you can insert them into the changelog
table:
def insert_history_to_db(key, from_column, to_column, timestamp, issue_created):
c = DB_CONN.cursor()
c.execute("INSERT INTO changelog VALUES (?, ?, ?, ?, ?)", (key, from_column, to_column, timestamp, issue_created))
DB_CONN.commit()
Finally, to enable running the scrip from the command line:
if __name__ == '__main__':
_id = sys.argv[1]
if not _id:
print("Board ID missing")
sys.exit()
for issue in get_issues_from_board(_id):
key, from_column, to_column, timestamp, issue_created = get_issue_changelog(issue)
insert_history_to_db(key, from_column, to_column, timestamp, issue_created)
DB_CONN.close()
You can run the script with $ python time_in_column.py <kanban board ID>
.
# Interpreting results: enter LAG()
The changelog
table represents the state transitions of each issue. How can you use this data for “time spent in column” reports?
If we can somehow get the difference between the timestamps of state transitions of the same issue, then we know that that is the time spent in the given column.
SQL has a handy LAG
function for this: it’s a window function that can do calculations on rows based on a given physical offset (which is usually 1). This means that it enables spreadsheet-like calculations based on the row above or below the current one, while partitioning the data into chunks that make sense.
This might be common knowledge, but I didn’t know about this function until a few months ago. Even Julia Evans has tweeted about its usage!
The expression looks like this:
LAG (scalar_expression [,offset] [,default])
OVER ( [ partition_by_clause ] order_by_clause )
For me to understand what I need to partition by, I ask myself: “What is the entity I want to compare multiple rows for?”. Basically, “Where should the calculation restart”?
In this case, this is the issue_key
: since I want to calculate the differences of the timestamp column for the same issue_key
.
The query looks like this:
SELECT
issue_key AS key,
from_column AS status,
timestamp AS transition_timestamp,
JULIANDAY(timestamp) - LAG(
JULIANDAY(timestamp)
) OVER (PARTITION BY issue_key ORDER BY datetime(timestamp) ASC) AS time_in_column
FROM changelog ORDER BY issue_key, DATETIME(timestamp) ASC;
This gives us each ticket’s lifetime in each column.
KEY STATUS TRANSITION TIMESTAMP TIME IN COLUMN (DAYS)
--------------------------------------------------------------------------------------
XY-8 Ready for Development 2020-02-11 08:54:24.244000-06:00
XY-8 Ready 2020-02-11 11:12:12.890000-06:00 0.095701921265
XY-8 Open 2020-02-11 11:17:07.636000-06:00 0.003411412239
XY-8 In progress 2020-02-12 10:30:25.317000-06:00 0.967565752100
XY-8 Closed 2020-02-14 07:52:24.254000-06:00 1.890265474561
XY-8 In progress 2020-02-25 04:46:18.715000-06:00 10.87076922459
XY-9 Ready for Development 2020-02-10 11:06:21.060000-06:00
XY-9 Blocked 2020-02-11 10:25:12.282000-06:00 0.971426180563
XY-9 Ready 2020-02-11 11:12:13.138000-06:00 0.032648796215
XY-9 Open 2020-02-14 05:42:06.763000-06:00 2.770759548991
XY-9 In progress 2020-02-15 01:31:51.244000-06:00 0.826209270861
As you can see, a new “partition” starts with each issue key, and the calculation restarts.
# Conclusion
You can use this approach to get a general idea about the time spent in each column in a Jira Kanban board.
However, remember that
“When a measure becomes a target, it ceases to be a good measure.”
and to always take metrics (especially ones involving human beings) with a grain of salt.