Organizing for larger data sets

thv · January 13, 2023, 9:31am

Hi all,

we have a large Jira instance with 1000+ users and use eazyBI to provide standard reports for the organization, where Jira simply is not able to provide answers anymore. Our accounts reach ~ 300.000 issues each and ~ 10 GB database size. I already talked to the support that our long-term strategy must be to reduce these sizes and e.g. use template accounts.

After thinking about this for a bit, I have reached the conclusion that this will be a small problem for every report that starts on a certain “layer” of our task hierarchy (e.g. team reports).

But: for every report on the upper layer (e.g. regarding a Product) I will have the problem, that some of those reports rely on aggregated information from the lower layers. E.g. on the Epic level, for some reports I want to know the last sprint that a Story is currently planned for to make a prediction on the finish date. So to my understanding, I would need to import the Story level as well - which increases that account size again.

Further, there currently is no possibility to aggregate such information during the import, which ultimatively I understand as a prerequirement to splitting accounts.

Is my thinking up to this point correct or did I miss anything that could help us? Is there consultancy available to act as a discussion partner on those issues?

Best regards,
Tobias

nauris.malitis · January 19, 2023, 11:25am

Hi @thv

Planning accounts is crucial for big Jira instances. There are no technical limitations on how many issues can be imported per account, however the recommended amount of issues varies based on the intended usage of the account (report complexity in these accounts).

Here’s a quick overview of how the accounts should be organized in eazyBI:

General accounts
Import all projects or projects with a particular category, add JQL query filter for limiting data by time. Limit the selected custom fields for importing into an account. Avoid issue change history import if possible. Use default stored measures for reports.
Use precalculated custom fields with JavaScripts instead of complex MDX formulas.

The data in the account could be used for a general overview, general KPI progress in the company. You can compare projects, teams, but you do not need specific details. Reports on details - issues, assignees, etc., could lead to timeouts.

Accounts for project teams or actual issues
Accounts over relevant data set. We suggest planning separate accounts over actual data or for a particular team or project. There could be more details (custom fields), issue change history imported in those accounts. Reports on detailed - issue levels.
Limit to one/small set of projects. Add JQL query filter to import relevant issues only.
Those accounts would work faster with a small set of issues, and more complex MDX formulas could be executed without issues.
However, they would like to share common reports/measures. Therefore think of Template accounts to get the set of a common set of reports and measures.

Template accounts
Template accounts with a verified set of reports and measures. Use naming conventions and glossaries. Make naming conventions so that template reports and measures do not interfere with custom reports and measures in team accounts. See our documentation on how to enable and use template accounts.

Also, please check out the Training videos in the Administration section on how to plan your eazyBI accounts and how to scale larger instances: Training videos on specific topics

If you would like to receive individual consulting, you can find your nearest partner here and reach out to them: eazyBI Partners – eazyBI Training, Implementation and Support

Let me know if you have any additional questions on this!
Best regards,
Nauris / eazyBI support

Jade · January 19, 2023, 6:23pm

Same here, we have very similar use cases, and the number of issues, projects and users are just bigger.
The template account and general account design won’t work to solve this issue because:

general account still needs to have all the rollup data, and issue linkage data between all those separate jira projects. Even with JQL queries and limited jira fields, the number of issues for the general account still goes gigantic, as we have to have the rollup data coming from all issues, and we can’t avoid the complex MDS formulas. Performance is terrible.
accounts for project teams: we can setup template accounts. However with the needs to create hundreds of separate eazybi user accounts for each project teams, it’s very difficult to scale up. Think about 500+ user accounts X 20+ dashboards it’s very hard to maintain and keep updating with templates.

in short, this is a scalability issue. It works well for the small scope of teams, but not scale up for the large programs rollups. Any further recommendations?

thv · January 20, 2023, 10:41am

Dear @Jade and @nauris.malitis - thank you for your responses.

In principle, everything you said, Nauris, is what we think we need to do. So thank you for clarifying that we are on the same page. Unfortunately, with this strategy we will run into the problems I pointed out, and I think you have not given me a solution for that problems.

@Jade - could you give me some concrete numbers of your account size? Are you running on a separate server or on the Jira server? I ask, because recently we discovered that an eazyBI private installation could perhaps be done on a separate server, where we can delegate all CPU and RAM to eazyBI. It would be nice to have some comparison numbers available, to see if we can buy ourselfes time and solve the problem with “more hardware”.

@nauris.malitis - do you have plans to ease the aggregation of data during import, so that we can split accounts? Do you have experience how to improve the query performance (i.e. are there differences between database vendors? Do you recommand having the database and eazyBI on the same server to avoid network traffic?)

Best regards,
Tobias

nauris.malitis · January 24, 2023, 3:14pm

Hi @Jade and @thv

Thank you for the additional details!
You are raising valid points. However, the issue here is not the sheer size of your data by itself. The issue is using complex MDX queries with large datasets, the same would hold true when trying to execute complicated SQL queries on a large dataset in any DBMS. Most likely, a timeout error would occur.

Our recommendation here would be to bring the complex MDX queries from the reports to be precalculated during the import phase by defining new fields or using the existing fields in Advanced settings and using custom JavaScript code to sort/arrange the data.

Here is a list of different JavaScript calculated custom field examples (you can also search for examples here, in the Community): JavaScript calculated custom fields
You can then import these custom fields as Dimensions, Measures, and Properties in your report and use these precalculated values instead of executing complex MDX queries every time the report is opened.

To see which reports take the most time to load (have the most complex queries), you can enable the (usage statistics) account that will accumulate all the data about the eazyBI report metrics: Usage statistics.

This will give you an insight into which reports are most used by your users and which take the longest time to load and need to be optimized. If you have trouble optimizing individual reports, you can search for suggestions here in the Community or reach out to us at support@eazybi.com, and we will do as much as we can to help you!

@thv

Apply the JavaScript precalculation of data during the import process to ease the load on the reports
We are using MySQL and PostgreSQL DBMS for our instances
The database and the app will most likely be on different servers. Our recommendation would be to have them on the same local network.

Best regards,
Nauris

thv · January 25, 2023, 7:44am

Hello @nauris.malitis,

thank you for your response! As you recommended, we precalculate already a lot of measures to make our reports work. But, we cannot do aggregations in those precalculations (like MIN/MAX of some child issue field - concrete example: Maximum Sprint End Date of all Stories for an Epic). That’s my whole point from the beginning. Usually, problems arise as soon as the Issue Dimension is brought into the report, and then especially with the Descendants function. And as this is a systematic problem in the software, and not a special problem of our data modeling - the possibility of precalculation of some aggregation metrics in this dimension could be of great benefit for all your users.

Best regards,
Tobias

Jade · January 26, 2023, 12:53am

Complex MDX queries is just one side of story.
The problem is when the data size goes up, even with the basic measure, for example, get number of issues, per filter selection, and per row(one or two dimension) break down, it is slow already.
Then on top of this large set of data, any operation would make it even slower.

nauris.malitis · January 30, 2023, 12:43pm

Hi guys!

@thv
There are several ways how to use the Descendants function when you need to access properties from lower-level members. For example, in your use case, you could try to iterate through the Sprint dimension instead of the Issue dimension, like in this solution: Roll up sprint start and end dates from stories to epic on gantt chart - #2 by janis.plume

Also, when using the Issue dimension in the report, it is important to enable the “Nonempty” option in the Rows section and use at least one real measure in the report (not just properties). Additionally, you can check out this video by Daina on all the small details that can be improved in a report (these details are increasingly important as account size increases): Help, My Reports Are Too Slow!

As the MDX queries in eazyBI give the user a lot of freedom to get the same result in different ways, it is often important to understand if the optimal formula/approach is used. In these cases, we invite you to reach out to us at support@eazybi.com to understand and possibly fix the slow reports. When reaching out, you can also export and add your report definition/screenshots/log files to the email so we can better assess the current situation.

@Jade
I understand the frustration with slow reports, and it is indeed the case that the larger the data set gets, the more time it will take to cross-examine and return results for your queries in a report.
In these cases, we would like to encourage you to use the (usage statistics) feature to pinpoint the slow reports and reach out to us with the details. If this is an account-wise problem, we can help you revise your source import options, if this is an instance-wise problem, then you can send us the log files to check out what is happening in the background. Sometimes insufficient memory, lack of a dedicated node or some other issue can be a bottleneck for the set-up.

Looking forward to helping out in each of your individual cases!
Best regards,
Nauris