Joe Areeda

Ranch Hand
+ Follow
since Apr 15, 2011
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
In last 30 days
0
Forums and Threads

Recent posts by Joe Areeda

The operation you need are available in the Set interface.  Classes TreeSet or HashSet have addAll and removeAll methods.

One approach would be to build 2 sets od cnums from the lists  something like



Note I didn't compile or test it, so consider it pseudo-code

Joe

7 years ago
Hi Dwayne,

you seem to have thought about this quite a bit so I don't want to offer advice as much as discuss alternatives.

I've used SQLite and looked at Derby if I were to go for a single user database I think SQLite has the advantage of more users and developers. I don't think your choice of NetBeans has much effect on the database.

Maintaining synchronization among many single-user databases is a hard problem. If I were to design a system like yours I would be more inclined to implement a central server based on Java EE with the multiuser database like MySQL or Maria DB or postgres. Users would connect via a browser so it wouldn't matter (much) what kind of computer, tablet, or smart phone used.

There are several technologies involved in producing a website like that but I would say the complexity is very comparable to a well-written swing application or are you considering the NetBeans platform for your app?

Joe
8 years ago

Tim Holloway wrote:Well, if there's a gazillion dollars involved, that means probably at least one MajorVendor and when you tell them you're violating J2EE specs, they're likely to say you voided the warranty.


I wished there was a major Vendor that imposed standards but this is a multi-university collaboration supported by grants. It's a lot looser than that. Most of the code is written and maintained by physicists. That presents its own challenges.

Tim Holloway wrote:It's probably actually easier/cleaner to set up and run a master poller thead from a ServletContextListener than it is to code that sort of stuff in servlet code anyway. The general problem set isn't that uncommon, regardless of whether the scope or protocols are.[/quote[
Just to be clear this code won't be in the actual servlet. It will be in the class library that wraps the network protocol API.

Tim Holloway wrote:Something of this nature can be built in large part using off-the-shelf components and common design patterns and implemented as a JavaBean. Meaning that the Master Poller could be made to run stand-alone or under some other sort of service instead of being wholly dependent on servlet architecture. It's also easier to test and tune stuff built that way.


I will think more about this but I'm having trouble wrapping my mind around the Master Poller concept. If I understand what you mean it's a task of some sort that keeps track of all the data that's available on all the servers.

I did try something like that in the begining of the project but doing it on the client side is very slow. I have proposed a design that leverages the server's data structures to create a mysql database. The idea is each server would maintain a db of what it knows about and replicates it to my project located centrally. I'm sure if an when this project will be funded.

Best,
Joe

9 years ago
Tom,
I really appreciate the discussion, I rarely get such intelligent comments when talking to myself about this kind of stuff.

Tim Holloway wrote:Well, I did say results would be unpredictable. I worked with people who had system that would do that, although more like about 3 times a week. They never had time to do a post-mortem/repair but always had time to reboot. The problem with "once or twice over 3 years" is that very suddenly things can flip to "once every 4 or 5 minutes". I've had that happen to me. For example an OS update almost earned me a 3AM all-expenses paid trip to Chicago because someone's data compression library had been doing something "clever" for years.


I'm a big fan of post mortems and are learning to do them better. My suspicion is that the hold up is in the network transfer libraries. It's a proprietary protocol. I've been working closely with that team identifying and fixing problems. I've also seen rare problems turn frequent when making some unrelated change.

Tim Holloway wrote:OK, going back to the head of the thread and cross-correlating. It appears that you have a collection of content servers and you have been polling them for available content. Polling was done on-request per-user and not in conjunction with a master (shared) content list. This sounds like something that an outfit such as Netflix might do, although I'd expect that Netflix is also doing things that allow stuff like migrating content nearer to consumers.

First question would be whether or not you would be better served by having a master (non-servlet) thread handle the polling at regular intervals and leaving the servlets the less-intensive task of checking against the accumulated results (which could be cached locally and/or stored in a local database).

Second question would be whether it's feasible to avoid polling in part or in whole by making the content servers post content changes as they happen. For example, as a callback process.


Actually, I've been pushing for something like that without much luck. Part of it is we're near the end of a 5 yr gazillion dollar upgrade to the instruments and there are higher priority things before we go live in the fall. (details at http://ligo.org)

This app was the first that took the responsibility for finding the data, before that you had to know which server to contact or which cluster to log into. Data is continuously acquired in real time and needs to be copied before being accessible but there can be gaps. Just saying it's not an easy problem.

Best,
Joe
9 years ago
I do appreciate your comments Tom but I don't think you quite get what I'm doing.

1. The threads will not be asynchronous to the servlet. A single method called by the servlet can either run through a list of (i/o wait intensive) network connections testing for data availability or test them all at once, wait for all of them to complete then return. Either way, if something bad happens in this method the servlet will hang and eat up one of the threads from the pool. It's been running for about 3 years and that has happened once or twice. To my chagrin, I have no idea why.

2. This app has some navigation pages and some analysis functions. I agree with your timing suggestions on the navigation and they all have sub-second response even the database intensive ones.

The analysis which require network data transfers and some decent load for calculation and plotting. These are almost completely done in external programs written by me and others in various languages. Most of them are published so users can run them on their workstations or on one of the clusters. The long response times are almost all in the external programs and most of that time is spent in data transfers.

People use the app because a) the interface is easy to use, b) they don't have to install anything and c) they can use their phone or tablet if off site. In many cases the app is faster than running locally because the data center Internet connection is better.

I know this is atypical and I've struggled with the rule of thumb that every web page should load in less than a second. I collaborate with almost all the users of the app and their comments, suggestions and complaints almost never mention response time negatively.

Best,
Joe
9 years ago
Hi Tom,

Thanks for the clear description of what I would call a best practice. I'm not sure about this yet and will proceed with caution in my test.

I think the best way to look at this is from cost benefit perspective.

On one hand wort case looks like if, despite my best efforts, a sub-thread never exits. In the single thread model this would block the servlet's thread.

In the best case, when no data is available we return in a couple of seconds instead of 10's of seconds.

The system testing of this app is fairly complete. If this gets implemented against best practice we will beat it to death.

Best,
Joe
9 years ago
Thanks Stefan,

Stefan Evans wrote:>Only the single method is multithreaded, the servlet itself is one thread controlled by the container (per instance of course).

The "per instance of course" comment worries me a little.
As I understand it, Servlets are multithreaded by default. An instance of a servlet can have multiple threads running through it at one time , which is why you don't use instance variables in them.
Does this mean you have implemented SingleThreadModel on your servlets?
Perhaps you mean one thread per request?


I am currently reading Java Web Services: Up and Running by Martin Kalin. He says "A web server such as Tomcat can arbitrarily many instances of a servlet". But you are right it is one thread per request, I was unclear.

Stefan Evans wrote:
In terms of you plam. I'm wondering how you envision this working
You poll all of the servers asking "what have you got"

Do you wait for all the responses to come back before continuing?
What if one of them fails to respond?
What about errors/exceptions?


The plan is to either wait until they all respond or when one succeeds we kill the others. There is a timeout on each request. Errors and exceptions at this level are the same as data not available

Stefan Evans wrote:
What you have described sounds justifiable to me.
It might be better implemented as a seperate service that your servlets invoke. That keeps the puritans happy by wrapping the threads in another layer.
The servlet shouldn't have to care that the service is multithreaded as long as it gets the information it asks for.
You might consider using the Servlet 3.0 async calls if you aren't already.

my 2 cents in any case



Thanks again. I am not using the async calls, I'll have to read up on them.

Best,
Joe
9 years ago
After posting this, I see I forgot to mention the alternative.

This app already cal;s many external programs to do the heavy lifting, most of them in languages other than Java.

Rather than a simple multi-threaded method this could be done with an external executable, however, I don't see the advantages beyond satisfying the "don't play with threads in a servlet" requirement/recommendation. The disadvantage is it's a lot more work and if it's done in Java (easy) vs. C++ (harder but not unreasonable) there's the JVM start up time and memory requirements.

Joe
9 years ago
I'm working on improving the performance of several servlets.

What we have is more like an enterprise app with a web interface than a web site as some operations take minutes. Our users are more than happy with that kind of response because they understand what is being done and the alternatives are comparable in speed but not as easy to access.

I've described the app before and the Ranch has been very helpful. The short version is: we have multiple backend servers spread across the country with a 2PB data store soon to grow at about a PB/year. This app figures out which server is "best" for a request, transfers data analyses and plots, downloads, or plays as audio.

The simplest operation I would like to parallelize is data discovery. Right now we have a local database that keeps track of which backend servers have which channels but not what data is available for those channels. So the process is guess which backend is best, try it, if it fails try the next best...

The backend servers have just implemented a "check if data is available without transferring anything" operation. This process takes maybe 2 seconds, which BTW is a big improvement.

What I'd like to do is to ask all of them at once "what do you have for me". The process is mostly I/O wait time so server load (our app) is not an issue. Only the single method is multithreaded, the servlet itself is one thread controlled by the container (per instance of course).

I've read that multithreaded servlets are a big no-no but it's not clear to me whether this scenario could cause problems or violates any of the best practices.

I appreciate any comments and all suggestions.

Best,
Joe
9 years ago

Stephan van Hulst wrote:I don't believe -cp and -jar work very well in conjunction. Anyway, consider shipping the application with a shortcut fit for the OS (or shotgun all the shortcuts). For windows, the target should be java -cp myLittlePlot.jar;%MYLIB_HOME%\mylib.jar my.little.plot.Main.



Thanks Stephan!

I think that's about as good as we can do.

Best,
Joe
9 years ago

Stephan van Hulst wrote:I really don't understand what kind of architecture would lead these applications to be able to use the API that your library implements, while omitting a way to configure the plugin locations.

If these applications were compiled by you, but you don't include the library in each individual distribution to link against, this is BAD.

Setting a global classpath is WORSE.



Configuring the plugin locations is an interesting concept and perhaps what gets set on installation is not a global classpath but a different environment variable that is only used by this library.

A little more background. The library is a network data service that has multiple servers around the world and serves pieces of a 2PB store of science data to users for analysis and display.

The applications that use it are written by scientists and programmers. The science guys are mostly Pythonic, or Matlabian. I'm probably the biggest Java user although I do have C++, python and matab projects to maintain.

For the python stuff, users can download a .py file and run it on any system with the proper libraries installed.

For java it's not just java -jar myLittlePlot.jar but a wrapper script that figures out what OS your running on and figure out how to find the libraries the run java -cp ${libLoc} -jar myLittlePlot.jar.

It's that wrapper script that I want to replace.

Joe
9 years ago
Hi Stephan,

I am open to suggestion but honestly I have not come up with a good solution, everything I can think of sucks one way or the other. Excuse me while I describe the problem and muddle my way through the options we've considered so far.

One group I work in maintains a cross platform client/server application with a C library supplying the API. There is a C++ wrapper that was written to facilitate SWIG bindings for Python, Java, Matlab and Octave. The utilities and libraries are packaged for Scientific linux 6 and 7, Debian 7 & 8, ubuntu 12.04 LTS and 14.04 LTS, MacOs 9 &10 and Windows 7.

For C/C++, Python and Octave the packages are all installed in a standard location like /usr/lib64, /usr/lib64/python2.x/site-packages,/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7. C/C++ apps that use the libraries are built and packaged for each OS.

Python and Octave applications are independent of API libraries and just find them automatically.

The Java class library has several classes and a JNI shared object (.so, .dyld, .dll).

One thought is to package the classes in a jar and install the shared object in the standard location. If I include the jar in my app we have the version mismatch problem. So we can add code to check versions when we load the library. That was my favorite.

Another way to do it is to install the jar and native libraries someplace standard but the only sort of standard places are all dependent on the jre installation. Hence the global classpath hack. The only problem I see with this one is new libraries that are not backwards compatible. That is a very very rare occurrence. The blow back is significant and many people just refuse to upgrade meaning the developers have to maintain 2 versions.

One person suggested we put all the shared objects into one jar file but that is a packaging nightmare as the packages are maintained by different people (OS specialists) and many are on nightly build and test systems.

It is also worth mentioning that the applications that use these libraries are separate they are developed completely independent of this project. For example I maintain a Matlab app that uses these libraries. In their current form they are installed in /usr/lib64/java on Linux as .class and .so files. I know, I can't think of a worse way to do it. What the Matlab app has to do is figure out which OS it's running on, find the installation and set up the java path on startup.

I've been searching the web for the way others solve this without luck. If you know of a cross platform library with native code that addresses this problem, I'd love to study how they do it.

Best,
Joe
9 years ago

Campbell Ritchie wrote:]Please explain more. Setting a system CLASSPATH usually does more harm than good.


CRAP!

Now I have to rethink the whole thing.

My problem is how to disbute a c;ass libraty that a lot of things depend on in an environment that is, let's just say.mpt Java centric.

Most languages C/C+=, Python (their favorite),and Octave (fringe at best) just work.

Java and Matlab (based on java) require unique and fragile code to find the libraries.

Please help me find a better option to CLASPATH.

Best,
Joe
9 years ago
Thanks Stefan,

It took a while to research your suggestion.

I don't think it's optimal for our needs. Many of our systems have multiple Java versions installed plus Matlab (to my endless frustration) installs its own. We test this stuff with 1.6-1.8 and so far no problems.

Right now the best option I can come up with is to have the installer set the CLASSPATH variable system wide, which will work for everything but Matlab.

Best,
Joe
9 years ago