• Post Reply Bookmark Topic Watch Topic
  • New Topic

Complicated data copy problem  RSS feed

 
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Oh no! Disaster has struck some of ACME's redundant data centers. The administrators have managed to restore backups, but some data sets are still missing from some data centers. Fortunately, every data set can be found at least once in one or more of the data centers. However, before ACME can resume normal operations, it needs to ensure that each data center has a copy of every data set.
Your goal is to help ACME resume normal operations by writing a program to synchronize data sets between data centers using as few copies as possible.

The first line of input will contain an integer between 0 and 999999 inclusive, representing the number of data centers.
Following that will be one line of input for each data center. Each line will contain an integer from 0 to 299 representing the number of data sets at that data center, followed by a space and space-separated list of data set ids currently present at that data center. Data set ids are each an integer between 1 and 999999, inclusive. Each line will be at most 999 characters long.
Data set ids are not necessarily consecutive. The list of data sets will not be in any particular order.

Output:
The program must output an optimal data set copy strategy to ensure that every data center has a copy of every data set. Output one line for every copy instruction.
A copy instruction is of the form <data-set-id> <from> <to>, where <data-set-id> is the data set id, <from> is the index of the data center the data set will be copied from (1 = the first data center), and <to> is the index of the data center to copy the data set to.
When there are no more copy instructions, the program must output the word "done" on a line by itself.
There is often more than one correct output with minimal number of operations for a given input, and any output that satisfies the requirements is valid.

Example 1:
----------
Input:
4
3 1 3 4
3 1 2 3
2 1 3
3 1 4 2
One Possible Correct Output:
2 2 1
4 1 2
2 2 3
4 4 3
3 1 4
done

Example 2:
----------
Input:
2
2 1 2
2 2 1
Output:
done
 
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You forgot to ask a question.
You might want to read HowToAskQuestionsOnJavaRanch, especially the NotACodeMill link.
 
Neeta Bharadwaj
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I dont know how to start with this problem. Can anyone suggest an algorithm?
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I must be missing something but I don't understand how there can be an optimal number of copies - each problem will always require a fixed number of copies as far as I can see.
In your example, data centres 1, 2 and 4 are missing one data set and data centre 3 is missing 2 data sets, so five copies will be needed and those copies can be from any data center that has the required data set.
Maybe you can show us an example where there are an optimal and a non-optimal solution to the problem, but as far as I can tell at the moment the algorithm will be to find a data centre that has the required data set and copy it to the data centre that doesn't have it.
 
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Neeta Bharadwaj wrote:I dont know how to start with this problem. Can anyone suggest an algorithm?

As with Joanne, I don't know about a specific algorithm, but there are datasets (Sets, Maps, etc) that can help you out. But the first thing you need to do is to create some classes - at the very least, a Dataset class and a DataCentre one - and Datasets should only be equal() if their IDs are equal.

Once you have objects, you can read them into a Set (in the case of Datasets) or a List (for your DataCentres) and, if each DataCentre also contains a List of the Datasets it contains at the start, you can use the Set.removeAll() method to work out which ones it's missing (or alternatively, write a method to do it yourself). Just be warned: removeAll() updates Set contents, so you'll need to copy the Set before you run it.

This is also just ONE way to do it - there are many others - but pretty much all of them will work much better if you have proper classes to hold your information, so don't be dealing with this stuff simply as Strings and numbers.

Also: write out what you need to do in English (or your native language) before you write your first line of code.

HIH

Winston
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!