Long parse time - large dataset


#1

Hello,

I’ve been evaluating the Gantt solution with a large data set of over 7000 tasks with hundreds of links and a 3 year timeline. I have most of the performance options enabled and the use of the page is fine, but the parse step is showing as taking over 1.8 seconds. It looks like most of the time is in _buildtree and parseinner.

Looking at your own example of large datasets where you are loading 30,000 tasks in under a second, I’m wondering why my own parsing is taking so long. Any advice would be appreciated as load performance is a big issue we’re trying to resolve.


#2

Hi @_Matt !
Performance highly depend on configuration and extensions you use, as well as on the structure of your dataset.
Can you please share some kind of example so we could profile it locally?


#3

Sanitized sample data from api:

{	
	"data":[
		{"id": "task 1", "text":"<sanitized>", "start_date":"2019-05-30 00:00:00", "end_date":"2019-05-31 00:00:00", "status":"open", "assignee":"user3", "duration": 0, "type": "task"},
		{"id": "task 2", "text":"<sanitized>", "start_date":"2019-05-29 00:00:00", "end_date":"2019-05-30 00:00:00", "status":"open", "assignee":"user3", "duration": 0, "type": "task"},
		{"id": "task 3", "text":"<sanitized>", "start_date":"2019-05-28 00:00:00", "end_date":"2019-05-29 00:00:00", "status":"open", "assignee":"user3", "duration": 0, "type": "task"},
		{"id": "task 4", "text":"<sanitized>", "start_date":"2019-05-24 00:00:00", "end_date":"2019-05-25 00:00:00", "status":"open", "assignee":"user3", "duration": 0, "type": "task"},
		{"id": "task 5", "text":"<sanitized>", "start_date":"2019-04-17 00:00:00", "end_date":"2019-04-18 00:00:00", "status":"done", "assignee":"user3", "duration": 0, "type": "task"},
		{"id": "task 6", "text":"<sanitized>", "start_date":"2019-04-22 00:00:00", "end_date":"2019-04-23 00:00:00", "status":"in progress", "assignee":"user2", "duration": 1, "parent": "task 7", "type": "task"},
		{"id": "task 7", "text":"<sanitized>", "start_date":"2019-05-14 00:00", "end_date":"2019-05-17 00:00", "status":"open", "assignee":"user1", "duration": 3, "parent": "0", "type": "project"},
		...
	],
	"links":[
		{"id": "5cbe2b7f003eec88d714c614", "source": "task 4", "target": "task 3", "type": 0},
		{"id": "5cbe2b7f003eec88d714c644", "source": "task x", "target": "task y", "type": 0},
		{"id": "5cbe2b7f003eec88d714c64a", "source": "task q", "target": "task z", "type": 0},
		{"id": "5cbe2b7f003eec88d714c64c", "source": "task v", "target": "task 9", "type": 0},
		{"id": "5cbe2b7f003eec88d714c66c", "source": "task 1", "target": "task 8", "type": 0},
		...
	]
}

Extensions:

/ext/dhtmlxgantt_smart_rendering.js
/ext/dhtmlxgantt_multiselect.js
/ext/dhtmlxgantt_tooltip.js
/ext/dhtmlxgantt_marker.js
/ext/dhtmlxgantt_undo.js
/ext/dhtmlxgantt_keyboard_navigation.js

Config options:

gantt.config.min_column_width = 18;
gantt.config.row_height = 22;
gantt.config.sort = true;
gantt.config.static_background = true;
gantt.config.smart_scales = true;
gantt.config.branch_loading = false;
gantt.config.xml_date="%Y-%m-%d %H:%i";
gantt.config.work_time = true;
gantt.config.duration_unit = ‘hour’;
gantt.config.duration_step = ‘8’;
gantt.config.multiselect = true;
gantt.config.scale_unit = “month”;
gantt.config.step = 1;
gantt.config.date_scale = “%F, %Y”;
gantt.config.scale_height = 36;
gantt.config.order_branch = true;
gantt.config.order_branch_free = true;
gantt.config.order_branch = “marker”;


#4

Hi @_Matt!

Thank you very much for the details.
But could you please provide more a complete dataset.
Ideally, the same 7000 tasks you have, so I could test it locally and reproduce the same delay you have.

I don’t need any private info, you can clear all properties except for id, start_date, end_date, duration, parent, progress, type for tasks and id, source, target, type for links. But I need a dataset with the right amount of tasks, links and how they distributed in the time range and the hierarchy of levels.

You can copy the data from your gantt using this snippet:
https://snippet.dhtmlx.com/8ab23b07b

  1. copy this code
  2. open your page with gantt containing 7000 tasks
  3. execute this code in the browser console
    It will serialize the gantt data, taking only id/start/end/duration/parent/progress/type fields from tasks, and will copy the data to the clipboard.
    Then you can open the notepad, ctrl + v text there, save to the file and send the file to me. You can either attach it to the post or send me a PM.

Maybe we’ll be able to locate some bottleneck that is specific for your configuration and project structure and make an optimization. Otherwise, we don’t have a place to start with.
Btw, can you tell me what build of Gantt do you use? If it’s pre 6.1.3 - please try the latest package, there have been some performance improvements for an hour and minute duration.


#5

Sent private message with data set. Appreciate all the help. I am currently on 6.1.1, I will give 6.1.3 a shot and see what happens.


#6

Hi @_Matt !

Thank you for the test data.
I’ve run it locally and the parse takes from 0.6s to 1.2s, depending on a machine I try.
It seems to be lower than the number you have. If you’re using dhtmlxgantt of a version earlier than 6.1.3, then using the latest build should speed up things a little.

Other than that, I’m afraid can’t see any room for an immediate improvement.
We’ve run a couple of profiles and so far can see no obvious bottleneck that we could address fast. The code execution breakdown looks overall as it should - the parse time consists of building an internal tree structure and processing working times and time ranges.

Time range and duration calculations contribute to the overall time a lot - your project use ‘hours’ as duration units and spans for multiple years, so there are a lot of calculations to be done. Usually, the ‘day’ units are used, which makes much less calculations and is part of the reason why the demos on our website would work relatively faster with the same amount of data.

Building the tree hierarchy for your project also seems to take a bit more time than expected. Our implementation appears to be more optimized for nested tree structure and works slower for flat lists of tasks, which seems to be the case for your project.

There is definitely a place for optimization for our end, I’ve created a ticket in our internal tracker to investigate it. However, I’m afraid It won’t be done promptly.

Hopefully, updating gantt to the latest build will reduce the load time enough, other than that I don’t have any suggestions.