• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Access violation when using Java Unsafe API

 
Ranch Hand
Posts: 133
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have an application which requires the allocation of multiple-millions of small objects. Through testing and instrumentation, I've discovered that the time spent construction the objects is substantially longer than the time spent in algorithmic processing. So I've been experimenting using the Java Unsafe API to create objects through direct memory manipulation. Although I recognize that Unsafe is, well, unsafe and may go away in Java 9, this is still an interesting investigation for me because of the unusual nature of my problem (which I described in an earlier post Allocating large numbers of Java objects). If I could get this to work, it would mean that my code would have to take care of some of its own memory management, but that's a small price to pay to improve the run-time performance of my application.

The good news is that I am seeing a 20-fold improvement in the time to allocate objects. The bad news is that the objects seem to be unusable. Although I can set values for data primitives within the objects (ints, floats, etc.), when I try to store object references, the JVM crashes with a fatal EXCEPTION_ACCESS_VIOLATION.

Does anyone have suggestions on how to make this work or, failing that, why it doesn't work? I thought it might have something to do with memory barriers or the GC mark-and-sweep support, but haven't found an answer on the web.

Also, am running this under Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode windows-amd64 compressed oops)


A sample block of code follows.
 
Ranch Hand
Posts: 954
4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am not sure but i read somewhere we should not play with Unsafe APIs as this will modify underlying JVM characteristics. For Java 9 onwards i think this will come as a
public API.
 
Bartender
Posts: 15741
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I can't offer you any insights in the unsafe code department, except for that you shouldn't use it.

The regular way of dealing with this problem is to reduce the need to create so many small objects in the first place. What problem are you trying to solve?
 
Gary W. Lucas
Ranch Hand
Posts: 133
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Stephan and Tushar,
First, thanks for your replies. The problem I am trying to solve was discussed more fully in the earlier thread that I cited, but I never get tired of talking about it :-)

I am writing Java code for processing large data sets of land surface elevation measurements. The logic is based on a triangular mesh (a Delauany triangulation). A typical data set size may contain 5 million data points, though I've recently started working with one that contains 12 million. The mesh itself consists of two kinds of geometric primitives, vertices and edges, both of which have corresponding Java classes. There is a proof that shows that in a Delaunay triangulation over a large number of points, the number of edges approaches 3 times the number of vertices. There may be a way to represent the data with fewer objects, though I haven't been able to think of one. The design of these classes is as lean as I can make them and both have light-weight constructors. In my existing implementations, I do implement an object pool for the edges so that I can partition the data into smaller subsets and avoid the cost of construction by reusing objects. But I would really like to refine the process if I could.

In terms of using Unsafe, I am mindful of the problems it might introduce in the future. In particular, there are issues between the memory formats for classes in different JVMs and the worrisome fact that an Unsafe approach means using memory that is not under the management of the Java garbage collector. But the problem I am dealing with right now is that, for a triangulation with a large number of points, the overhead of object construction actually exceeds the time spent on constructing the mesh (which is what I consider the real work of the application).

Also, as a bit more explanation on some of the code I attached, the memory layout for standard objects was deduced by using the dump() method on plain-old-java-objects and arrays. The layout will vary with different JVMs. For the one I am using, the first eight bytes of Java objects are used by Java itself for object management. The next 4 bytes give a reference to the class definition (even in a 64-bit JVM, references can be 4 bytes thanks to Compressed OOPs). In an array, the next 4 bytes give the size of an array. Here is some of the output from the Java code I included in the original post (slightly edited):

 
Stephan van Hulst
Bartender
Posts: 15741
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would only follow the unsafe path for educational purposes. The dark days when people wrote C code and the Visigoths pillaged Rome are long past.

I'm not going to comment on the unsafe part of the discussion any further, simply because I don't have enough experience with it in Java.

To *really* solve the problem, you should find an efficient data structure that minimizes the amount of objects used. That means you need a class that represents a swath of land (or a big part of the mesh) and points in arrays. It's not very object oriented, but that simply isn't performant enough for this problem. I imagine something like this:


Sadly, this will still create an array per vertex, but the runtime may be far more efficient at creating arrays than it is at creating objects. Disclaimer: I would normally not write ugly code like this, but I prefer it over unsafe code.
 
Sheriff
Posts: 22849
132
Eclipse IDE Spring Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:The dark days when people wrote C code and the Visigoths pillaged Rome are long past.


I actually think that C is bigger than you think. It's still used quite a bit on Linux. I dare say that together with C++, Python and Perl, C is in the top 4 programming languages for the OS. I even worked on a little Linux program in C myself this week.
 
Marshal
Posts: 80616
468
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You are right, Rob. If you look on Tiobe, they show that Java® and C have been rivals for the top programming language for most of the millennium. Only C++ comes close as 3rd.
 
Rob Spoor
Sheriff
Posts: 22849
132
Eclipse IDE Spring Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Dear god, VB is still on the list? VB.NET at 7 (from 13) is bad enough, but plain old VB at 13? It's the one programming language I've worked with and really disliked. (Of course I haven't tried LOLCODE or brainf**k.)
 
Bartender
Posts: 689
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Unfortunately I don't have anything to help you with your problem now, but I believe the Java maintainers are aware of the general problem you're facing and are exploring a possible solution for a future release of Java. They are exploring the possibility of 'Value Types', which will look like classes in source code but handle like primitives in byte code.

Take a look at this post from the openjdk community. I'm not aware of exactly when (or if) this will make it into Java, but it certainly seems like a very sensible idea that will probably greatly increase performance in situations like yours.

It's worth keeping an eye on in any case.
 
Gary W. Lucas
Ranch Hand
Posts: 133
12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Stephan, you're not wrong. I gave some thought to your suggestion and I think that if I'm going to be successful at all in dealing with such a large number of vertices, I'm going to have to sacrifice a strict object-oriented approach and work with arrays. It's a bit of a shame because the triangulated mesh problem gives rise to some elegant class designs. Anyway, the structure of the mesh is actually better represented by its edges than its vertices (in my experience, of course). Each edge has primary links to 4 other edges (based on the Quad Edge structure which was popularized by Guibas and Stolfi back in 80's). So the trick is to represent how edges connect to other edges. I'm thinking of a big table of integers... about as un-object-oriented as you can get. On the other hand, it will reduce my memory size from 232 bytes per vertex to 124 and reduce the number of objects that have to be tracked by the GC by a factor of 1024 (I'll discuss the details only if somebody is actively interested). One thing that I do have to clarify is that each vertex connects, on average, to 6 other vertices, but that sometimes they connect to many more and sometimes as few as three. So the list of connections has to be variable size whether it is treated as an explicit object (and ArrayList, etc), or implied by a data table.

Mike, I've seen that paper about value types and I think it's going to be a very valuable and way overdue improvement to Java (I'd have done that one first, before Lambdas, but that's just me). One thing that I've got mixed feelings about is that the proposed value types are immutable. I suppose there are technical reasons for making them that way, but I think a lightweight, reusable data container with associated methods would be a pretty handy thing to have around, especially for folks doing high-performance processing on large data sets.

Finally, I am pretty much ready to concede that the Unsafe approach is a dead end... Still, if anyone knows a bit more about the internals of Java and can explain why it doesn't work, I'd really be grateful to find out.

Gary

 
Stephan van Hulst
Bartender
Posts: 15741
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Rob Spoor wrote:I actually think that C is bigger than you think. It's still used quite a bit on Linux. I dare say that together with C++, Python and Perl, C is in the top 4 programming languages for the OS. I even worked on a little Linux program in C myself this week.



I knew I was going to get flak for that remark. ;)

I like to call fidgeting with memory handles C code, regardless of whether it's actually written in C. Still, it *does* surprise me that C is still that high up there.
 
Stephan van Hulst
Bartender
Posts: 15741
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Gary, looking forward to seeing whether you'll solve your problem.
 
reply
    Bookmark Topic Watch Topic
  • New Topic