How data is indexed

The Charts & Graphs for Bitbucket app uses indexes to keep statistics for file types and repository contributors. These indexes help show reports faster and are stored in tables in the Bitbucket database.

To keep stats up-to-date the app listens for repository reference change events. These events are triggered when the users pushes commits, creates or deletes branches, merges pull requests, etc.

Which branches are indexed?

For performance reasons, only the default branch (typically master) is indexed for filetype information. Author information is gathered for all branches. Note that changing the default branch in the repository configuration doesn’t trigger a re-index. In this case, you may trigger an index manually.

Triggering indexing via the User Interface

If the repository hasn't been indexed yet (for example plugin is just installed) indexing starts when users open repository browse page or repository contributors reports. Some reports like Project Contributors or User Contributions do not start indexing as they can not be associated with particular repository. This has one major consequence: if there is no repository activity (for example if plugin is running on test instance) than Project contributors report wont show any data.

Indexing Threads

The app uses 2 threads for indexing. We keep this number small to avoid performance issues. If events are triggered while both threads are busy then such events are dropped. Every time indexing is called it accounts all new changes in the repository (not only associated with particular event). So nothing is lost when events are dropped.

Indexing API

Also there is a REST API to re-create index for a repository. It cleans all stats for the repository and then re-creates the index from scratch. Details of the API can be found here.