react-digraph: Path to 5,000 Nodes



About 2 years ago I was hired to Uber on a team developing internal tools. One of these tools is a website that uses an Uber-built open source project called react-digraph. Back then, react-digraph could only handle a couple hundred nodes, but we had graphs that reached 1000 nodes plus connected edges. Performing simple tasks like selecting a new node, or creating a new node often too 30 seconds or more on these graphs. We needed something better.

Originally react-digraph, contrary to the name, didn’t use much of React. It was a simple component wrapper around a single very large file. This file used D3.js for much of the SVG rendering, which isn’t bad, but D3 required a knowledge set that was different than standard React. We decided to limit the use of D3 as much as possible and instead render nodes and edges with React instead. React would generate the node SVGs and place them on the canvas.

Versions

react-digraph v3.x is discussed in the intro above as the “original version”. It used synchronous rendering and was merely a wrapper around D3 functions.

Version 4.x is essentially the same as v3.x in terms of code, but it added asynchronous rendering, primarily using setTimeouts.

Version 5.x refactored the entire codebase to use React in a more meaningful method. It also added asynchronous react-dom.render() calls.

Asynchronicity

The original react-digraph (3.x) code was synchronous and blocking. When the loop for rendering nodes ran, it would block the browser and cause a poor user experience. Normally with small graphs this didn’t cause a problem, but once the number of nodes reached 500-1000 the user’s browser would stop working for seconds at a time. Memory use also increased significantly. We needed a way to render each node asynchronously and in a non-blocking manner.

Discovery (v3.x to 4.x)

Each function call takes about 0.50-1.0ms, roughly. Added together, 500 nodes takes roughly 250ms for the renderNode, and an additional 250ms for renderTextNode (at the 0.5ms calculation). At the 1ms calculation that’s about 1 second just to paint the canvas. This doesn’t take into account garbage collection “hiccups” due to the renderNodes function being called for every mouse event, including mouseenter on a node.

Here you can see the end of the renderNode function calls and the completed renderNodes time. Please be aware that adding console.time() to the code does add a slight delay.

Without console.time() in the renderNode function:

The rendering time looks to be linear to the number of nodes. There are spikes and dips in various places because of the inner workings of the browser, but those could be settled by taking more snapshots of each number of nodes and finding the mean of those times.

At 5000 nodes the rendering time blocks the UI significantly and it becomes unusable in real-world scenarios.

User Interaction Comparisons

Selecting a node before the asynchronous update

Wait for it, the node will change color…
  • 773 nodes, 1288 edges
  • Synchronous rendering of all nodes and lines
  • Selection takes roughly 20 seconds

Selecting a node after the asynchronous update

Still a little slow due to synchronous edge rendering, but much faster than before.
  • 773 nodes, 1288 edges
  • Deferred rendering of nodes (edges still synchronous)
  • Selection takes 4 seconds
    • Due to edges rendering synchronously, each selection takes longer than a graph with no edges.

Selecting a node when there are only nodes rendered

  • 4000 nodes in list, 0 edges
  • Renders only those nodes which might show in the view
  • Deferred rendering of nodes
  • No lines to delay node rendering
  • Graph immediately usable
  • Selection nearly instantaneous

Rendering Time

After the update, rendering time is constant at <10ms to complete the renderNodes function. Selection is also constant, assuming garbage collection isn’t happening at the moment and other blocking concerns are not present.

Memory Comparison

Memory usage includes data and layout from the code, however only the react-digraph implementation is changing in this example.

Memory Footprint is improved (tested against 4000 nodes).

  • Before Update: 614MB initially, then 525MB after rendering is complete 
    • Single node selection spikes to 731MB (Memory column shows 1.6GB)
    • Multiple selection spikes to 3.0GB after selecting 3 nodes, two times a piece.
  • After Update: 321MB initially, then settles at 286MB after a minute
    • Single node selection uses 314MB
    • Multiple selection uses 315MB

Version 5.x Improvements

Version 5 was a significant rewrite of the entire rendering system for react-digraph. In this version we fully embraced React and wrote a component for every type of object in the graph. We also expanded the asynchronous rendering for both nodes and edges.

Analysis

The analysis will investigate the actions only within the react-digraph codebase. This will inform us about the rendering speed of react-digraph without concerning ourselves with the UI, the NodeJS middle-tier, or the backend code.

The graph size will be either 200 nodes, 800 nodes, or 1000 nodes, depending on the render speed. Each section will specify the sample size when it is changed.

Action: Zoom

A single zoom event is really several iterative calls to the same canvas resize function. A single instance of this function call is shown below, each instance of which can be multiplied by the total number of function calls it took to complete the zoom operation. The number of function calls is based on the user scrolling their mouse or zooming using their touchpad.

The zoom action was tested with 1000 nodes and connected edges.

Upon first calling the zoom handler during a scroll event, we see that the function call takes a significant amount of time (denoted by “Event (wheel)”). Selecting this event, we can see the aggregated time below.

This can be broken into a few issues:

  1. Something initiates a long-running Major Garbage Collection. Digging deeper, this is caused by assigning node data to variables within the linkNodesAndEdges function. While this makes variables nice to work with, when the function is complete the garbage collector will need to clean up the variables.

A way to avoid unnecessary garbage collection is to assign data to variables outside of the for and forEach loops. This reuses memory references and holds off garbage collection until the loops are completely finished.

Another way to speed up this function is to use a for loop rather than a forEach. For loops tend to be faster in Javascript, while Array.forEach is more of a syntactic sugar around the for loop. The reason why forEach tends to be slower is because it typically needs to create a new anonymous function during each iteration, then perform the necessary garbage collection.

  1. We see below that linkNodesAndEdges takes 116.04ms, a majority of the time. This particular function loops through all nodes and connects the edges as incoming and outgoing arrays to make rendering edges faster when moving nodes. 
    1. As mentioned above, we can modify the way we handle garbage collection by reusing variables, and we can improve the loop by using a for loop. 
    2. Another method to improve this time is requiring node data to include incoming and outgoing edges rather than a separate edge array. This will make the linkNodesAndEdges function obsolete.

In addition to the initial setup call described above, we can see a number of additional calls being made in the timeline below. These are the result of iterative zoom events.

During these events we can see a more performant linkNodesAndEdges call, which is able to run in 2.69ms. This is likely due to the Javascript compiler having optimized the loop after the first run, however that is conjecture at this point without more analysis.

We can also see an interesting result of the zoom operation in the next image: Layout from the react-dom package is the largest contributor to the timeline while the Javascript functions for the zoom event only took 9.23ms total.

In the chart below “Layout” is mentioned as “Rendering”, which takes 112.9 ms.

The chart indicates that scripting took 9.2ms to call the zoom functions for 1000 nodes and 1000 edges. This is acceptably fast for the script. 

We now see the vast majority of the rest of the time used by Rendering and Painting. 

Painting is an operation of the browser, so it can’t be changed significantly. It may be possible to limit painting by reducing the number of nodes being rendered. Painting is also affected by adding new DOM nodes to the browser, removing DOM nodes, and certain CSS modifications. Zooming in on many nodes, effectively changing the size of the elements, will cascade the paint operation across all of the nodes on the screen and also possibly in the non-visible sections of the canvas.

Rendering is primarily caused by the React library. In the timeline view this is noted as “Layout” and it points to a specific file – react-dom.development.js?a4ae:2313. What Layout means in Chrome’s dev tools is which piece of Javascript code is causing the most layout invalidations within the browser. Diving into the code, we can see that the set.call()function is taking the most time within the trackValueOnNode function. While we may not be able to change the trackValueOnNode function, we can identify why it’s being called so much and what can be done to limit that function.

Action: Create Node

Create node is a fairly complex operation. It typically requires react-digraph to handle the click event, then use the parent website’s callback, which then may call the backend API to create a node. After some time the backend responds with the entire graph, UI deserializes the data into the client-side datastore, then it’s transformed again to be fed into react-digraph. 

Once react-digraph has the new array of nodes, it must loop through them to determine which ones have been added, removed, as well as new or removed edges. Then it must render those nodes in a non-blocking fashion.

Below, the handleSvgClicked() function analyzes the type of click event to determine if the shift key is pressed, then calls the onCreateNode() callback from the parent website. This function adds a new node, whether through an API call, or by inserting a new node in the nodes array, then it calls this.setState() to save the new nodes array into the component state and cause a re-render.

A good portion of the time used in this click event, which took 65.24ms, is used by the linkNodesAndEdges() function. This function took 48.44ms and contained a major garbage collection routine.

For a 200 node graph, we can see that creating a single node takes 2.4 seconds. This is a great area for improvement. The new code uses the same asynchronous rendering method using setTimeout as the 4.x code, however the new code attempts to use ReactDOM.render().

After the new state has been set, the graph’s componentDidUpdate() function asynchronously calls addNewNodes(), addNewEdges(), removeOldNodes(), and removeOldEdges() functions. We’ll focus on the addNewNodes() function.

The addNewNodes() function takes 11.39ms to complete but calls asyncRenderNode() several times. We can see many small function calls below this, these are setTimeouts and clearTimeout calls. They are relatively innocuous by themselves, but they allude to an excessive number of timeouts being set.

We can see when we zoom into the timeline after the asyncRenderNode() function that syncRenderNode and syncRenderEdges are taking the most time. For some reason those functions are being called for each and every node, even though the data has not changed.

The Bottom-Up callstack indicates that much of the node rendering time is used up by the edge rendering.

It’s possible to reduce the edge rendering time by blocking rendering when edge properties haven’t changed by using the React shouldComponentUpdate() function. Ideally edges would know within their props what the x and y coordinates of their connected nodes are, alternatively we could gain that information from the existing data. At this time shouldComponentUpdate() is not being used and instead we’re relying on React’s default object comparison, which was thought to be good enough.

The total runtime of a single edge rendering operation is roughly 8.28ms. For 200 nodes this calculates to 1.6 seconds, which matches our initial report of 1.8 seconds.

Action: Selecting Node

Selecting a node consists of the browser handling a click event, then passing that event to a function listener. In the graph below we can see that the browser’s handling of a mousedown event is relatively short (2.82ms).

After the mousedown event is a mouseup event. This event activates the drag and selection event listeners.

The image above shows that rendering connected edges for the node takes up a majority of the time.

After this section is an addNewNodes() call. This function only takes 9.58ms

We can also see in the timeline above that there are a number of smaller events at the bottom. These are mostly minor garbage collection events and are extremely quick, however it indicates that we are looping through all of the nodes and possibly performing unnecessary work.

Line 840 of this addNewNodes() example shows that we are comparing Javascript references for nodes, however we should probably be comparing {x, y} coordinates and other relevant information instead. Line 846 is showing that new nodes will undergo the same exact function call within an else condition. It seems as though more logic should be added to prevent unnecessary rendering.

After the addNewNodes() call, a number of timers fire, however they seem to be firing for every node instead of only the modified nodes. While this is a problem, the function that takes the most time is again the syncRenderEdge() function. As above, perhaps adding some logic to a shouldComponentUpdate() function could help.

The total time of one syncRenderEdge() call is 7.04ms. For 200 nodes that’s 1.4 seconds. This roughly correlates to the 1.8 seconds reported in the total timeline.

Action: Move Node

A move node action is where the user selects and holds a node, then moves their mouse to drag it around the canvas. Since it’s only modifying one node and its connected edges, it should render relatively quickly. We can see below that the total rendering time for a single movement is 12.92ms. This equates to a framerate of 77fps (estimated) for 200 nodes. It jumps to 40.31ms for 800 nodes, which is 24fps.

Most interestingly, the edges seem to render before the node updates its position when using 800 nodes. This is an annoyance to the user since the edges will follow the mouse while the node lags behind.

During the node movement we can see a few Major Garbage collection events. These indicate that there are some invalid ARIA properties, however these events would not be seen in production since they are only for development purposes.

We also find that rendering the edges takes up a majority of the time, and, since connected edge rendering is synchronous from node rendering, it can block node rendering.

Action: Delete Node

The delete node event is handled by a listener on the graph. The listener intercepts the delete keypress and sends it to the parent website. It’s the parent website’s duty to remove the selected node from the nodes array. Once the nodes array is modified then react-digraph will find all added and removed nodes. The result is similar to the node selection report above in the “Action: Select Node” section. For 200 nodes, the delete action takes 8.08ms.

Many node render events happen after a delete, and again edge rendering takes up a majority of the time.

Resulting Improvements

After the performance analysis above, some improvements were implemented based on the results.

Zoom Action – 1000 nodes

Before:
handleZoom took 121.06ms to complete.

Of that time, linkNodesAndEdges took 116.04ms, primarily due to garbage collection.

After:
handleZoom takes 5.25ms

Of that 5.25ms, linkNodesAndEdges now takes 0.45ms

We also see no garbage collection taking place, meaning the number of variables to clean up is insignificant and can be offset.

Create Node Action – 200 Nodes

The Create Node action uses the addNewNodes function. In the report above it was found that this function was executing asyncRenderNode on every object, rather than only those which are new or modified. The code has been changed.

Before:

After: 
For a 200 node graph, a create node action can complete in 7.9ms.

1000 Nodes:
For a 1000 node graph, a create node action now takes 14.7ms.

Delete Node Action: 200 Nodes

Deleting a node should be a singular event. The node should be removed, and the edges should be deleted, and nothing else needs to be rendered. The old code seemed to have been rendering all nodes after a delete, however that is now changed to only render the modified nodes.

Please note that the handleDelete action will be roughly the same, it’s the excessive renders afterward that blocked rendering.

Before:

Many node render events happen after a delete, and again edge rendering takes up a majority of the time.

After:

The handleDelete function takes the same amount of time, roughly. In the example below we see 11.61ms, compared to 8.08ms before, but that is likely due to Minor Garbage Collection routines and is not indicative of every execution.

The changes show that the excess rendering is no longer occurring. Below you can see a few renders happening in purple, a layout in green, and a DOM Garbage Collection near the end. These are usually due to removing DOM elements and are generally the browser performing asynchronous tasks.

Select Node Action:

Before:

After:

document.querySelector can be slow

The calculateOffset() function determines the offset that should be used when drawing an edge. It does this by selecting the elements from the page using document.querySelector() calls. It was found that when rendering an edge, this call takes up a significant portion of time.

Before:

Below we can see that a single edge calculation costs 3.66ms, or 3.6 seconds for 1000 edges (1000 nodes with 1 edge coming out of each node).

We use 2 querySelector() calls within this function. The first takes 1.27ms to complete, while the other takes 1.38ms.

The primary querySelector() call is:

const trgNode = document.querySelector(`#node-${trg[nodeKey]} use.node`);

You can see that we use an element ID (`#node-${trg[nodeKey]}`), so it’s possible to refactor this query selector to use document.getElementById() instead.

After:

The new code uses document.getElementById() to find the parent node container, then it uses querySelector() from that element, which reduces the scope of the query selector from the entire DOM to just the node container’s elements.

const nodeContainer = document.getElementById(`node-${trg[nodeKey]}`);
const trgNode = nodeContainer.querySelector(`use.node`);

The calculateOffset() function time is reduced to 1.32ms, rather than 3.53ms. For 1000 nodes and edges that’s 1.32 seconds to complete rendering of all edges.

Surprisingly, with one minor change to the query selector, we reduced the querySelector time to 0.26ms for the first one, and 0.14ms for the second selector. That’s 20% of the previous time for the first call, and 10% of the second call.

Yielding Loop

Another improvement we’ve made is to use a yielding loop for adding new nodes. A yielding loop is similar to a debounce function in which it processes a chunk of the loop, then uses a setTimeout(nextChunk, 0) function call. The setTimeout() allows other actions to occur until those have completed, then the Javascript event loop starts the next iteration of the loop by unblocking the timeout. Philip Roberts gives a great talk on just what the timeout and event loop is.

We found that with very large graphs (1000 nodes in testing), the for loop that was used to execute the asynchronous rendering process was blocking the UI. This would block for several seconds until much of the rendering was complete.

To avoid this, we now break up rendering new nodes into 50 node chunks. This allows some nodes to render while not blocking the mouse entirely. The downside is that in some graphs the nodes won’t be visible until some time after the first nodes have been rendered.

requestAnimationFrame()

One minor change that was used to speed up node and edge rendering on large graphs (1000 nodes), especially when we moved a node and needed the edges to follow smoothly, was to use requestAnimationFrame() instead of setTimeout(). An example of the difference is shown here.

The change to the code is minimal, and there’s no longer a need to specify a timeout because requestAnimationFrame() will process the next iteration every 16.6ms, yielding a 60fps refresh rate.

The image below shows fairly smooth node movement, and edge rendering is improved, however there is still some glitching with calculating the intersection of nodes and edges during movement. This is a minor inconvenience that only occurs with extremely large graphs. As soon as the node is dropped the edges will recalculate their final positions.

Memory Comparison

Memory consumption from v3 to v4 of react-digraph was improved dramatically. That improvement can be seen in the chart below.

With the updates to v5.1.3, the memory consumption wouldn’t have changed significantly from v4, but the performance improvements in this document seem to have lowered the memory use even more.

In the following chart we render the graph with different number of nodes, from 8 to 1000, let the rendering process complete so that the processor is no longer being used, then notate the memory consumption of the browser tab. We can see a slight memory reduction.

Note that “Perf” refers to the changes made to increase performance vs 5.1.3

Leave a Reply

Your email address will not be published. Required fields are marked *