Long parse time - large dataset

Hello,

I’ve been evaluating the Gantt solution with a large data set of over 7000 tasks with hundreds of links and a 3 year timeline. I have most of the performance options enabled and the use of the page is fine, but the parse step is showing as taking over 1.8 seconds. It looks like most of the time is in _buildtree and parseinner.

Looking at your own example of large datasets where you are loading 30,000 tasks in under a second, I’m wondering why my own parsing is taking so long. Any advice would be appreciated as load performance is a big issue we’re trying to resolve.

Hi @_Matt !
Performance highly depend on configuration and extensions you use, as well as on the structure of your dataset.
Can you please share some kind of example so we could profile it locally?

Sanitized sample data from api:

{	
	"data":[
		{"id": "task 1", "text":"<sanitized>", "start_date":"2019-05-30 00:00:00", "end_date":"2019-05-31 00:00:00", "status":"open", "assignee":"user3", "duration": 0, "type": "task"},
		{"id": "task 2", "text":"<sanitized>", "start_date":"2019-05-29 00:00:00", "end_date":"2019-05-30 00:00:00", "status":"open", "assignee":"user3", "duration": 0, "type": "task"},
		{"id": "task 3", "text":"<sanitized>", "start_date":"2019-05-28 00:00:00", "end_date":"2019-05-29 00:00:00", "status":"open", "assignee":"user3", "duration": 0, "type": "task"},
		{"id": "task 4", "text":"<sanitized>", "start_date":"2019-05-24 00:00:00", "end_date":"2019-05-25 00:00:00", "status":"open", "assignee":"user3", "duration": 0, "type": "task"},
		{"id": "task 5", "text":"<sanitized>", "start_date":"2019-04-17 00:00:00", "end_date":"2019-04-18 00:00:00", "status":"done", "assignee":"user3", "duration": 0, "type": "task"},
		{"id": "task 6", "text":"<sanitized>", "start_date":"2019-04-22 00:00:00", "end_date":"2019-04-23 00:00:00", "status":"in progress", "assignee":"user2", "duration": 1, "parent": "task 7", "type": "task"},
		{"id": "task 7", "text":"<sanitized>", "start_date":"2019-05-14 00:00", "end_date":"2019-05-17 00:00", "status":"open", "assignee":"user1", "duration": 3, "parent": "0", "type": "project"},
		...
	],
	"links":[
		{"id": "5cbe2b7f003eec88d714c614", "source": "task 4", "target": "task 3", "type": 0},
		{"id": "5cbe2b7f003eec88d714c644", "source": "task x", "target": "task y", "type": 0},
		{"id": "5cbe2b7f003eec88d714c64a", "source": "task q", "target": "task z", "type": 0},
		{"id": "5cbe2b7f003eec88d714c64c", "source": "task v", "target": "task 9", "type": 0},
		{"id": "5cbe2b7f003eec88d714c66c", "source": "task 1", "target": "task 8", "type": 0},
		...
	]
}

Extensions:

/ext/dhtmlxgantt_smart_rendering.js
/ext/dhtmlxgantt_multiselect.js
/ext/dhtmlxgantt_tooltip.js
/ext/dhtmlxgantt_marker.js
/ext/dhtmlxgantt_undo.js
/ext/dhtmlxgantt_keyboard_navigation.js

Config options:

gantt.config.min_column_width = 18;
gantt.config.row_height = 22;
gantt.config.sort = true;
gantt.config.static_background = true;
gantt.config.smart_scales = true;
gantt.config.branch_loading = false;
gantt.config.xml_date="%Y-%m-%d %H:%i";
gantt.config.work_time = true;
gantt.config.duration_unit = ‘hour’;
gantt.config.duration_step = ‘8’;
gantt.config.multiselect = true;
gantt.config.scale_unit = “month”;
gantt.config.step = 1;
gantt.config.date_scale = “%F, %Y”;
gantt.config.scale_height = 36;
gantt.config.order_branch = true;
gantt.config.order_branch_free = true;
gantt.config.order_branch = “marker”;

Hi @_Matt!

Thank you very much for the details.
But could you please provide more a complete dataset.
Ideally, the same 7000 tasks you have, so I could test it locally and reproduce the same delay you have.

I don’t need any private info, you can clear all properties except for id, start_date, end_date, duration, parent, progress, type for tasks and id, source, target, type for links. But I need a dataset with the right amount of tasks, links and how they distributed in the time range and the hierarchy of levels.

You can copy the data from your gantt using this snippet:
https://snippet.dhtmlx.com/8ab23b07b

  1. copy this code
  2. open your page with gantt containing 7000 tasks
  3. execute this code in the browser console
    It will serialize the gantt data, taking only id/start/end/duration/parent/progress/type fields from tasks, and will copy the data to the clipboard.
    Then you can open the notepad, ctrl + v text there, save to the file and send the file to me. You can either attach it to the post or send me a PM.

Maybe we’ll be able to locate some bottleneck that is specific for your configuration and project structure and make an optimization. Otherwise, we don’t have a place to start with.
Btw, can you tell me what build of Gantt do you use? If it’s pre 6.1.3 - please try the latest package, there have been some performance improvements for an hour and minute duration.

Sent private message with data set. Appreciate all the help. I am currently on 6.1.1, I will give 6.1.3 a shot and see what happens.

Hi @_Matt !

Thank you for the test data.
I’ve run it locally and the parse takes from 0.6s to 1.2s, depending on a machine I try.
It seems to be lower than the number you have. If you’re using dhtmlxgantt of a version earlier than 6.1.3, then using the latest build should speed up things a little.

Other than that, I’m afraid can’t see any room for an immediate improvement.
We’ve run a couple of profiles and so far can see no obvious bottleneck that we could address fast. The code execution breakdown looks overall as it should - the parse time consists of building an internal tree structure and processing working times and time ranges.

Time range and duration calculations contribute to the overall time a lot - your project use ‘hours’ as duration units and spans for multiple years, so there are a lot of calculations to be done. Usually, the ‘day’ units are used, which makes much less calculations and is part of the reason why the demos on our website would work relatively faster with the same amount of data.

Building the tree hierarchy for your project also seems to take a bit more time than expected. Our implementation appears to be more optimized for nested tree structure and works slower for flat lists of tasks, which seems to be the case for your project.

There is definitely a place for optimization for our end, I’ve created a ticket in our internal tracker to investigate it. However, I’m afraid It won’t be done promptly.

Hopefully, updating gantt to the latest build will reduce the load time enough, other than that I don’t have any suggestions.

Dear Guldmi,
We are using Gantt version DHTMLX dhtmlxGantt 7.1.7. When we are having a schedule with huge no of tasks, the Autoshedule is taking very long time (around 7-10 mins). Our expectation is that it must happen within a couple of seconds. Please find attached the list of tasks and the configurations for one of our schedule for your reference.
Kindly help us to resolve this issue as it is troubling us very much.

gantt_config_7000lines.txt (2.1 MB)
Saurabh

Hello Saurabh,
I added your configuration and data to the snippet tool, but I cannot reproduce the issue:
https://files.dhtmlx.com/30d/f385806ed055bee681365d48b8561577/vokoscreen-2023-01-13_11-45-03.mp4

When I run it locally with the 7.1.7 version, I still cannot reproduce the issue:
https://files.dhtmlx.com/30d/2e4d72825140a071f74dbc4d99c7a2b0/vokoscreen-2023-01-13_11-51-52.mp4
https://files.dhtmlx.com/30d/656fc16c04769f01463a690d21aec8a6/7.1.7.zip

It is hard to suggest what can be wrong in your case. Please send me a ready demo so that I can reproduce the issue locally.

Hi Ramil,
We tested our application at our end and we found that the probably Autoschedule was happening quickly but update of activity dates to the server is taking very long time as we could see almost 5000 requests being fired for each of the activities that were affected due to the change in the schedule. Is there a way to speedup the entire process of Autoschedule and saving the new dates to the server?
Regards
Saurabh

Hello Saurabh,
When I try to send so many requests to the server in the snippet tool, I get the Insufficient resources error. Probably, it is not a good idea to send all the requests to the server at once because if something goes wrong, the request won’t reach the server. And Gantt won’t try to send it again.
It would be better to send a single request. Unfortunately, right now, it works only in the POST transaction mode if you set true as the second parameter:
https://docs.dhtmlx.com/api__dataprocessor_settransactionmode.html

The only workaround would be to let Gantt auto-schedule tasks and then manually send all the changes to the server using the Data Processor or the built-in Ajax module:
https://docs.dhtmlx.com/gantt/api__gantt_ajax_other.html

Here is a simple example in the snippet:
https://snippet.dhtmlx.com/5/1562237d0

Here is the HTML page:
https://files.dhtmlx.com/30d/7296124e051c4640a0841d1266e1c894/auto_scheduling_demo.html

Dear Ramil,

We are using a custom button to run the auto-schedule function. As soon as we press the button, the auto-schedule is carried out and requests to update various activities affected are fired. We have no control over firing of these requests from the browser.
We are using the following code as advised by you in your documentation for updating the activities at the server end.

var dp = gantt.createDataProcessor({
task: {
create: function(data) {
},
update: function(data, id) {
counter++;
if(counter == 1){
$(‘#counter_outer’).show();
}
server = site_url + “tasks/edit_task”;
return gantt.ajax.post(
server + “/” + id,
data,
function(result){
counter–;
//loader–;
/if(loader == 0){
if(auto_schedule == 1){
auto_schedule = 0;
}
}
/
if(counter == 0){
$(‘#counter_outer’).hide();
}

                var response = result.xmlDoc;
                if(response.responseText == 'logout'){
                    window.location.reload();
                }
            }
        );
    },
    delete: function(id) {
    }
},
link: {
    create: function(data) {
        server = site_url + "tasks/add_link";
        return gantt.ajax.post(
            server,
            data
        );
    },
    update: function(data, id) {
        server = site_url + "tasks/edit_link";
        return gantt.ajax.post(
            server + "/" + id,
            data
        );
    },
    delete: function(id) {
        server = site_url + "tasks/delete_link";
        return gantt.ajax.post(
            server + "/" + id
        );
    }
}

});
Please guide us as to how can we change this behaviour of firing a single request for updating the affected activities on the server after the Auto-schedule button is pressed. We were not able to fully understand your last reply.
Regards
Saurabh Arora

Hello Saurabh,
Right now, there is no built-in way to send a single request after auto-scheduling. You need to implement a custom solution.
After you click on the autoSchedule button, you don’t control that the events fire. But you can control what data is sent to the server regardless of that.

In the update function of the created Data Processor, you can return false, then Gantt won’t send the data to the server. When you click on the button to auto-schedule tasks, you can toggle a variable that will tell Gantt to not send the changes to the server. After auto-scheduling is done, Gantt will fire the onAfterAutoSchedule event:
https://docs.dhtmlx.com/gantt/api__gantt_onafterautoschedule_event.html
There, you toggle the variable again to allow sending the changes to the server. At this point, Gantt won’t try to do that as it finished working with the tasks. So, you can obtain changed tasks from the updatedTasks argument of the event handler. And then you can manually send the changes via the AJAX module.
Here is an example of how it can be implemented:
http://snippet.dhtmlx.com/5/0dea21272