# Allocating very large numbers of Java objects

Gary W. Lucas

Ranch Hand

Posts: 65

6

posted 1 year ago

I have a question regarding ways to reduce the overhead related to object instantiation. In most applications that I've written, I've never had to worry about this. But I am currently working on an application that involves, literally, millions of objects.

I am writing Java code for processing large data sets of land surface elevation measurements. The logic is based on a triangular mesh (a Delauany triangulation). A typical dataset size may contain 5 million data points, though I've recently started working with one that contains 12 million. The mesh itself consists of two kinds of geometric primitives, vertices (one per elevation sample) and edges (which are allocated in pairs, on average, 3 couplets per elevation sample). I've defined Java classes for both.

I've spent a lot of time researching algorithms and working toward efficient code. Runtimes average about 1 second per million vertices. I've paid a lot of attention to designing experiments that measure timing for specific processes (which is often frustrating… a modern computer is a very noisy test environment). It turns out that about a third of the runtime time is taken up by instantiating objects of the edge class. If possible, I'd like to do something to reduce this overhead.

So I was wondering if there was a way to expedite the construction of multiple objects. I've recently become aware of the Java Unsafe API (ironically in an article expressing concern about Oracle removing it from Java 9). And that's got me thinking that there might be some way around my issue using an exotic API or maybe just plain old reflection.

Has anyone had experience in this area? If so, could you point me in the direction of techniques that would be worth researching? Is there anything I should be wary about?

Thanks in advance for your help.

I am writing Java code for processing large data sets of land surface elevation measurements. The logic is based on a triangular mesh (a Delauany triangulation). A typical dataset size may contain 5 million data points, though I've recently started working with one that contains 12 million. The mesh itself consists of two kinds of geometric primitives, vertices (one per elevation sample) and edges (which are allocated in pairs, on average, 3 couplets per elevation sample). I've defined Java classes for both.

I've spent a lot of time researching algorithms and working toward efficient code. Runtimes average about 1 second per million vertices. I've paid a lot of attention to designing experiments that measure timing for specific processes (which is often frustrating… a modern computer is a very noisy test environment). It turns out that about a third of the runtime time is taken up by instantiating objects of the edge class. If possible, I'd like to do something to reduce this overhead.

So I was wondering if there was a way to expedite the construction of multiple objects. I've recently become aware of the Java Unsafe API (ironically in an article expressing concern about Oracle removing it from Java 9). And that's got me thinking that there might be some way around my issue using an exotic API or maybe just plain old reflection.

Has anyone had experience in this area? If so, could you point me in the direction of techniques that would be worth researching? Is there anything I should be wary about?

Thanks in advance for your help.

Gary W. Lucas

Ranch Hand

Posts: 65

6

posted 1 year ago

Oh no, my requirements are worse

For n data sample points, it's n * (3 edge pairs) or 6*n objects... 12 million sample points would require 72 million edge objects. So your suggestion about handling the problem in smaller pieces is well taken.

Of course I can partition the data sets into smaller pieces and process them one at a time. This would conserve memory (which is a problem on my aging laptop), but would add overhead in terms of moving data on and off disk. I can certainly live with that because, after all, everyone does. I was just looking for ways to improve my system.

Incidentally, I think it might clarify things a little if I give a little more information about my data. The sample points are collected by an airborne laser system known as Lidar which collects surface elevations in a basically unstructured pattern. So the spacing of the vertices in the triangular mesh is irregular. And, in fact, the technique I am using is sometimes referred to as a Triangulated Irregular Network (TIN). See Pictures of TINs

A key part of analyzing the data is to link vertices to their neighbors. And since the spacing is irregular, you never quite know how far away a neighbor is going to be. Also, partitioning the data isn't quite as convenient as it would be for a regular grid. So having everything in memory makes it easier to search for neighbors. Is it absolutely necessary? No. Practical? Only if you've got massive amounts of memory... And even that has limits. My sample set consists of 1700 data files, each containing about 5 million points each.

Anyway, the speed of object construction is just one aspect of the problem. But I'm looking for every angle I can exploit.

For n data sample points, it's n * (3 edge pairs) or 6*n objects... 12 million sample points would require 72 million edge objects. So your suggestion about handling the problem in smaller pieces is well taken.

Of course I can partition the data sets into smaller pieces and process them one at a time. This would conserve memory (which is a problem on my aging laptop), but would add overhead in terms of moving data on and off disk. I can certainly live with that because, after all, everyone does. I was just looking for ways to improve my system.

Incidentally, I think it might clarify things a little if I give a little more information about my data. The sample points are collected by an airborne laser system known as Lidar which collects surface elevations in a basically unstructured pattern. So the spacing of the vertices in the triangular mesh is irregular. And, in fact, the technique I am using is sometimes referred to as a Triangulated Irregular Network (TIN). See Pictures of TINs

A key part of analyzing the data is to link vertices to their neighbors. And since the spacing is irregular, you never quite know how far away a neighbor is going to be. Also, partitioning the data isn't quite as convenient as it would be for a regular grid. So having everything in memory makes it easier to search for neighbors. Is it absolutely necessary? No. Practical? Only if you've got massive amounts of memory... And even that has limits. My sample set consists of 1700 data files, each containing about 5 million points each.

Anyway, the speed of object construction is just one aspect of the problem. But I'm looking for every angle I can exploit.

William Brogden

Author and all-around good cowpoke

Rancher

Rancher

Posts: 13078

6

Gary W. Lucas

Ranch Hand

Posts: 65

6

posted 1 year ago

Hi Bill,

In implementing my code, I added a lot of instrumentation to the logic, so I have a pretty good idea about the numbers.

There a three main algorithms for building a 2D Triangulated Irregular Network (TIN). I use the insertion algorithm: pick three vertices for an initial triangle and then build the tin by adding one vertex at a time. When the vertex falls inside a triangle, you subdivide the triangle. When the vertex falls outside the existing triangular network, you extend the TIN (but I still call it "insertion"). There is an optimality requirement called the Delaunay criterion which specifies certain rules for how the triangles can be arranged. Often when you insert a new vertex, some of the existing edges will have to be removed and connected to other vertices. If you don't do that, you end up with a visually unbalanced, poorly laid out network (the Delaunay Triangulation is one of those intriguing cases where aesthetics and theory lead to the same results).

In an optimal mesh of infinite dimension, the average number of edges connected to each vertex approaches 6. So each vertex connects to 6 triangles. Edges and vertices are shared, of course, so for N vertices, there are 2*N triangles and 3*N edges.

There are two phases to the insertion: "Location", in which you find out which triangle contains the insertion vertex (if any) and "Insertion" in which you add the vertex to the TIN. Location is a read-only operation. For the Lidar data sets I have, there is an average of 3.56 edge interactions during location. During insertion, an average of 18 edges are inspected, three new ones are added, with approximately 1/3 all existing edges being adjusted to provide optimality (for efficiency, edges are never discarded, just reused... in my vertex class, the coordinates are declared final; but because I reuse edges, few edge elements are final).

Gary

Hi Bill,

In implementing my code, I added a lot of instrumentation to the logic, so I have a pretty good idea about the numbers.

There a three main algorithms for building a 2D Triangulated Irregular Network (TIN). I use the insertion algorithm: pick three vertices for an initial triangle and then build the tin by adding one vertex at a time. When the vertex falls inside a triangle, you subdivide the triangle. When the vertex falls outside the existing triangular network, you extend the TIN (but I still call it "insertion"). There is an optimality requirement called the Delaunay criterion which specifies certain rules for how the triangles can be arranged. Often when you insert a new vertex, some of the existing edges will have to be removed and connected to other vertices. If you don't do that, you end up with a visually unbalanced, poorly laid out network (the Delaunay Triangulation is one of those intriguing cases where aesthetics and theory lead to the same results).

In an optimal mesh of infinite dimension, the average number of edges connected to each vertex approaches 6. So each vertex connects to 6 triangles. Edges and vertices are shared, of course, so for N vertices, there are 2*N triangles and 3*N edges.

There are two phases to the insertion: "Location", in which you find out which triangle contains the insertion vertex (if any) and "Insertion" in which you add the vertex to the TIN. Location is a read-only operation. For the Lidar data sets I have, there is an average of 3.56 edge interactions during location. During insertion, an average of 18 edges are inspected, three new ones are added, with approximately 1/3 all existing edges being adjusted to provide optimality (for efficiency, edges are never discarded, just reused... in my vertex class, the coordinates are declared final; but because I reuse edges, few edge elements are final).

Gary

Campbell Ritchie

Marshal

Posts: 52602

119

posted 1 year ago

Well, anything involving millions of objects will take time. Is 1″ per million anything to worry about?

Please explain more about reusing edges.

You have nodes which have co-ordinates, presumably three

Then you must have some way of recording nodes. How do you do that? Do you have a List<Node>? Do you have some kind of sorting to make it easy to find nodes with similar lat and long, because you will want edges between nearby nodes. What is the structure of an edge? Does it simply have references to nodes, thisEnd and thatEnd or similar? Or does it have other attributes, e.g. length direction and gradient? Do you know anything about the time complexity of the insertion algorithm? Is there a delay in finding the nearest nodes to add to a particular triangle?

Please explain more about reusing edges.

You have nodes which have co-ordinates, presumably three

`double`s for lat., long. and altitude. So you can make the nodes immutable; once a node is located it will never move. I would have thought you can probably create nodes very quickly; if you are reading data from a disc, then object creation will probably be faster than that.Then you must have some way of recording nodes. How do you do that? Do you have a List<Node>? Do you have some kind of sorting to make it easy to find nodes with similar lat and long, because you will want edges between nearby nodes. What is the structure of an edge? Does it simply have references to nodes, thisEnd and thatEnd or similar? Or does it have other attributes, e.g. length direction and gradient? Do you know anything about the time complexity of the insertion algorithm? Is there a delay in finding the nearest nodes to add to a particular triangle?

Gary W. Lucas

Ranch Hand

Posts: 65

6

posted 1 year ago

No, I'd have to say that the overall run time doesn't bother me that much. I was just hoping to find a cool trick about object allocation. It does annoy me that such a large percentage of my run time is something I can't do anything about by writing better code.

In terms of your questions about the way the code is structured... Someday, if I ever get this stuff in a worthy state, I plan on open sourcing it. In the meantime, I'll try to describe it as clearly as I can.

The nodes (vertices) use doubles for x/y coordinates to support the level of precision needed for geographic applications, but floats are more than adequate for elevations. There's also an int used to relate the vertex back to the source lidar data. All are declared final. I've investigated the way Java lays out the object in memory and know that, right now, it's adding 4 bytes of memory padding to each instance. I may use that for something in the future.

The only information the edge class carries is connections between vertices and other edges. Clearly, you can traverse an edge in two directions: from A to B, and from B to A. Let's call edge AB the "dual" of BA. So I have an instance for each direction. The instance carries a reference one vertex, a reference to its dual, a reference linking to its next edge and one to its prior edge. So when I want to get the two vertices for a particular edge, I take one from each half of the pair. The reference to the dual is declared final, but everything else can be adjusted as edges are connected or disconnected from other edges and vertices. It's sort of like a 2D linked list. This configuration was described as a "Quad Edge" data structure by Guibas and Stolfi in their 1985 paper (see Quad Edge for useful pictures). I experimented with representing the edge as a single object. This reduced memory space but entailed messy additional runtime processing for logic to determine which direction an edge would be traversed at a particular phase of the logic.

The TIN is effectively a container class for nodes, so I don't maintain a separate list of nodes. It does accept a List<Vertex> as input and I do have an API that extracts one by traversing the edges. It's moderately expensive, but I don't really have a need for it anyway (right now). Triangles are not physically maintained in memory, but can be deduced by traversing the edges. I did define a Triangle class and have an API for building a list of triangles, but again don't really have a need for it right now.

The time complexity of the algorithm is, well, complex. In theory, I think the Location phase could be basically O(n^1.5) for random data and o(n) when subsequent input vertices tend to be located close together in space. Lidar samples tend to be closer to o(n). I implemented a sort based on the Hilbert Curve but don't use it very often. The Insertion phase looks to me like it should be o(n) except for pathological cases... 1 million points in a straight line with one more point offset by just enough to build triangles would have a time complexity of O(n^2). I've performed a linear regression on measured run times and it appears that the time-complexity overall is close to linear, but has a small quadratic term that would be relevant for samples of 100 million or larger.

In terms of your questions about the way the code is structured... Someday, if I ever get this stuff in a worthy state, I plan on open sourcing it. In the meantime, I'll try to describe it as clearly as I can.

The nodes (vertices) use doubles for x/y coordinates to support the level of precision needed for geographic applications, but floats are more than adequate for elevations. There's also an int used to relate the vertex back to the source lidar data. All are declared final. I've investigated the way Java lays out the object in memory and know that, right now, it's adding 4 bytes of memory padding to each instance. I may use that for something in the future.

The only information the edge class carries is connections between vertices and other edges. Clearly, you can traverse an edge in two directions: from A to B, and from B to A. Let's call edge AB the "dual" of BA. So I have an instance for each direction. The instance carries a reference one vertex, a reference to its dual, a reference linking to its next edge and one to its prior edge. So when I want to get the two vertices for a particular edge, I take one from each half of the pair. The reference to the dual is declared final, but everything else can be adjusted as edges are connected or disconnected from other edges and vertices. It's sort of like a 2D linked list. This configuration was described as a "Quad Edge" data structure by Guibas and Stolfi in their 1985 paper (see Quad Edge for useful pictures). I experimented with representing the edge as a single object. This reduced memory space but entailed messy additional runtime processing for logic to determine which direction an edge would be traversed at a particular phase of the logic.

The TIN is effectively a container class for nodes, so I don't maintain a separate list of nodes. It does accept a List<Vertex> as input and I do have an API that extracts one by traversing the edges. It's moderately expensive, but I don't really have a need for it anyway (right now). Triangles are not physically maintained in memory, but can be deduced by traversing the edges. I did define a Triangle class and have an API for building a list of triangles, but again don't really have a need for it right now.

The time complexity of the algorithm is, well, complex. In theory, I think the Location phase could be basically O(n^1.5) for random data and o(n) when subsequent input vertices tend to be located close together in space. Lidar samples tend to be closer to o(n). I implemented a sort based on the Hilbert Curve but don't use it very often. The Insertion phase looks to me like it should be o(n) except for pathological cases... 1 million points in a straight line with one more point offset by just enough to build triangles would have a time complexity of O(n^2). I've performed a linear regression on measured run times and it appears that the time-complexity overall is close to linear, but has a small quadratic term that would be relevant for samples of 100 million or larger.

William Brogden

Author and all-around good cowpoke

Rancher

Rancher

Posts: 13078

6

Gary W. Lucas

Ranch Hand

Posts: 65

6

posted 1 year ago

- 1

Bill,

The Location phase uses a technique called a Lawson's walk. Pick a triangle in the TIN, and then jump to the next triangle in the general direction of the target coordinates. Repeat and, in fairly short order, you'll land in the triangle that contains the target point. Lawson proved that as long as the triangulation is Delaunay optimal, the search will always terminate. I've included a link to a picture from a site, www.voronoi.com, that has lots of useful information on both the Delaunay Triangulation and the Voronoi Diagram. For my purposes, Lidar has a nice feature in that the Lidar device is essentially a laser scanner, so each subsequent point is not far away from the one that preceded it. So if the search starts at a triangle where the previous vertex was inserted, it's usually only a few steps away to reach the next. I have a sample of data from a one kilometer square area in the vicinity of Bear Mountain in Connecticut where the search takes an average of 3.56 steps per insertion. This also works when you're looking up points to interpolate a grid. On the down side, if the data is randomly spaced, you can't take advantage of the sequential-proximity trick, and the number of steps in the walk is on the order of SQRT(N). Of course, if the data is random, sorting it using the Hilbert Curve technique I mentioned above, results in a high degree of sequential proximity. For the Bear Mountain sample, which is pretty good to start with, sorting it reduces the average number of steps in the walk to 3.16. With Lidar data, I generally don't bother with the sort before inserting points.

Source: The Voronoi Web Site

The Location phase uses a technique called a Lawson's walk. Pick a triangle in the TIN, and then jump to the next triangle in the general direction of the target coordinates. Repeat and, in fairly short order, you'll land in the triangle that contains the target point. Lawson proved that as long as the triangulation is Delaunay optimal, the search will always terminate. I've included a link to a picture from a site, www.voronoi.com, that has lots of useful information on both the Delaunay Triangulation and the Voronoi Diagram. For my purposes, Lidar has a nice feature in that the Lidar device is essentially a laser scanner, so each subsequent point is not far away from the one that preceded it. So if the search starts at a triangle where the previous vertex was inserted, it's usually only a few steps away to reach the next. I have a sample of data from a one kilometer square area in the vicinity of Bear Mountain in Connecticut where the search takes an average of 3.56 steps per insertion. This also works when you're looking up points to interpolate a grid. On the down side, if the data is randomly spaced, you can't take advantage of the sequential-proximity trick, and the number of steps in the walk is on the order of SQRT(N). Of course, if the data is random, sorting it using the Hilbert Curve technique I mentioned above, results in a high degree of sequential proximity. For the Bear Mountain sample, which is pretty good to start with, sorting it reduces the average number of steps in the walk to 3.16. With Lidar data, I generally don't bother with the sort before inserting points.

Source: The Voronoi Web Site

Campbell Ritchie

Marshal

Posts: 52602

119

posted 1 year ago

If you are running at 3.56×N and sorting would reduce that to 3.16×N, you might get no particular improvement by going to all the trouble of sorting the nodes.

A node is 2 doubles and a float and 4 bytes more; that looks like 24B. If you have a List with 10000000 of them, you are looking at slightly under 240MiB. Plus 6× that for the edges. So you would need an 8GB RAM machine and use 25% of that for your heap space. At 1 million use 10% of that.

I suspect you have already optimised as much as you can. I do not think it takes very long to create your million objects.

I agree with Bill Brogden about how much effort you have put into this thread.

A node is 2 doubles and a float and 4 bytes more; that looks like 24B. If you have a List with 10000000 of them, you are looking at slightly under 240MiB. Plus 6× that for the edges. So you would need an 8GB RAM machine and use 25% of that for your heap space. At 1 million use 10% of that.

I suspect you have already optimised as much as you can. I do not think it takes very long to create your million objects.

I agree with Bill Brogden about how much effort you have put into this thread.

Gary W. Lucas

Ranch Hand

Posts: 65

6

posted 1 year ago

Bill & Campbell,

Thanks for your interest in my project. It was good to have an opportunity to talk about the algorithms and performance issues.

There are a couple of good Java implementations of TIN algorithms out there, but as far as I know none of them has addressed the scalability problems that come with millions-of-points data sets (though there are some truly excellent solutions in C/C++). So I'm hoping that we I get mine a bit more polished, I can find a home for it on the web.

Gary

Thanks for your interest in my project. It was good to have an opportunity to talk about the algorithms and performance issues.

There are a couple of good Java implementations of TIN algorithms out there, but as far as I know none of them has addressed the scalability problems that come with millions-of-points data sets (though there are some truly excellent solutions in C/C++). So I'm hoping that we I get mine a bit more polished, I can find a home for it on the web.

Gary