• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Liutauras Vilda
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Devaka Cooray
  • Paul Clapham
Saloon Keepers:
  • Scott Selikoff
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
  • Frits Walraven
Bartenders:
  • Stephan van Hulst
  • Carey Brown

Data structure for facebook

 
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, I have some questions about the different data structures used by social networks. For example,

1. For user registration, which data structure would be best? Is LinkedList the best choice?  

2. Can I use Hashmap for login?

3. When we enter into our Facebook profile, there is a search option on the top of the profile  "Search Facebook"  by which we search posts, friends, images and so many things. what type of data structure should I use for speedy search? Is it ArrayList or another type ds?

 
lowercase baba
Posts: 13091
67
Chrome Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Caveat :  I know nothing about FB or how they have coded their application.

I'm not sure i understand your first few questions.  "for user registration, which data structure would be best?"  What part of user registration are you talking about?  How do you think you would use a linked list or a hashmap?  Are you talking about having a huge linked list of all users?  or a huge hashmap of them?

Generally, you shouldn't start with an implementation or data structure in mind. You should start off thinking about what you want to do, and THEN look at what might work best.

An arraylist would be terrible for searching the amount of data something like Facebook has.  I'd bet they have terrabytes of data. An array list would be so slow, you'd (almost) never get results.  I'd think you'd need some kind of database that is specialized for large amounts of unstructured data.
 
Saloon Keeper
Posts: 28420
210
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think that for Facebook, one of the requisite components is toxic ooze.
 
Mahadi Hasan
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Fred Rosenberger, in every object-oriented language there are some data structures that perform different actions with the best performance. As you are an experienced java developer, you know that the ArrayList is good for searching with optimized time and memory, but not good for storing, editing, or deleting huge amounts of data as it stores data by indexing.  That's why I asked about the Facebook search option whether Arraylist can be used for quick searches.

Now talk about the LinkedList. You know that it stores data using nodes, not indexing. Therefore, editing, storing, and deleting data can be performed with LinkedList efficiently. It might be used for user registration, creating a new post, updating, or deleting data. That's how I understood (I might be wrong).

Hashmap is used for pair values. If the login requires a username and password only, is it good to use a hashmap only for login operation?

 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As Fred said, you're starting this from the wrong end. In-memory data structures are NOT the first thing to consider, but one of the last. User information, posts, images etc. are all stored in a database, or a file system. You wouldn't store them all in memory, certainly you couldn't on the scale FB does. So searching would be done on the database, not in an in-memory data structure. You might cache some of that data -in which case the caching implementation would need to worry about performance- but then you'd need to consider that some data might not be in the cache. So that's a complication, which it doesn't seem you're yet in a position to tackle.

Unless you want to implement something that's completely memory-based -no database at all- in which case all the considerations about how FB might do things are irrelevant. But you haven't given us much information about WHAT you intend to implement, as you were much too focused about HOW you were going to do this - which, as Fred pointed out, is the wrong way to go about things.
 
Bartender
Posts: 15737
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Almost no web applications perform those kinds of operations inside their own application code, at least not so much that they really care what underlying data structure is used.

The operations you are talking about are mostly performed by the DBMS. Either:

  • You're prototyping an application without using a database. The application is probably not large enough to care about the exact data structures used.
  • Your application has been released and uses a database. Storage is no longer performed in-memory, so the application doesn't care about the exact data structures used.
  •  
    Tim Holloway
    Saloon Keeper
    Posts: 28420
    210
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Mahadi Hasan wrote:Fred Rosenberger, in every object-oriented language there are some data structures that perform different actions with the best performance.



    No.

    First of all, data structures don't "perform", they contain. Some contain data more efficiently (memory performance), some allow more efficient access or manipulation of data (CPU performance). Then there's I/O performance and network performance, amd even energy performance, but that's another matter.

    Everyone Knows that QuickSort and HeapSort are the "most efficient" sorts, right? Well, as is often the case, Everyone is wrong. I spent several years supporting a system that contained a sort inside its main dispatcher. The data was mostly, but not entirely aleady in order. That is, ABCDYXEFMNOQ. The partitioning sorts perform at their worst when the data is already organized. The sort that performs best when data is already organized is the bubble sort. But with a lot of data, the bubbling of the Y and X is still sub-optimal, since they have to be moved up one slot at a time per pass. So the most performant sort for this data set was the Shell-Metzner sort, which is a binary bubbler that can quickly toss those extreme values towards the ends and them take relatively few passes to polish it off.

    There is no one-size-fits-all solution to computation, no matter what language or OS you use. When you ZIP files, the ZIP utility program attempts several different compression algorithms to see which one is most performant for that particular data. Because the worst-case result for most compression algorithms can actually result in a larger output than the input was. When ZIP says "stored", it is telling you that no compression was the most efficient way to archive that data.

    We have an entire forum here at the Ranch for Performance. One of the most common things you'll hear us say there is "Don't try to optimize prematurely". There are a number of reasons why this is considered an axiom, from the fact that "optimized" code is often harder to read and maintain to the fact that if you optimize where you "know" the optimization is needed you'll almost certainly be wrong. I have seen it happen all the time and it's unavoidable. You are far better off getting a system up and running and then measuring to find where the inefficiencies are than to try and anticipate them. And of course, if a system operates acceptably "as is", then Management isn't going to want you to waste your time — which is their money — working on something that doesn't need help.

    These days, generally the most inefficient/expensive part of a software product is the developer and they want that part to be performant.

    In terms of storage occupancy in Java, it should also be noted that Java runs in a heavily virtualized environment. It's also heavily self-optimising. That means that if it detects inefficiencies, it may actively replace code or even data organisation based on the measured workload. It's one reason why you will not see tables of how many bytes a Java object occupies. Java is not designed for data structures that are tightly-compacted at compile time.
     
    fred rosenberger
    lowercase baba
    Posts: 13091
    67
    Chrome Java Linux
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Mahadi Hasan wrote:[T]he ArrayList is good for searching with optimized time



    I think it depends on what you mean by "searching".  If you mean "find the item at position 1,248,423", then yes, an array list is very fast.  However, if you mean "find the element with "fred rosenberger" as the user name, then I'm pretty sure it is terrible.
     
    Marshal
    Posts: 80128
    417
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    fred rosenberger wrote:. . . "find the item at position 1,248,423", then yes, an array list is very fast.

    Yes, that operation runs in constant time. But it isn't what I would call searching.

    . . . "find the element with "fred rosenberger" . . . it is terrible.

    Unless you have a sorted list and use binary search, that search runs in linear time. I don't think that counts as terrible, but a hash‑based collection or map will find such an element much faster. As others have said, a DBMS will probably also be much faster. Horses for courses; you choose types of collection for their functionality rather than speed.
    Iterating an array list is probably faster than iterating a linked list; it suggests that in the ArrayList documentation.

    . . . toxic ooze.

    Hahahahahahahahaha!
     
    fred rosenberger
    lowercase baba
    Posts: 13091
    67
    Chrome Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:

    fred rosenberger wrote:. . . "find the item at position 1,248,423", then yes, an array list is very fast.

    Yes, that operation runs in constant time. But it isn't what I would call searching.


    Which is why I said "it depends on your definition of searching. In my mind, searching is more or less "find the thing that...."

    Find the thing that is called fred.
    Find the thing that has the greatest value in the amount field.
    Find the thing that is at position 1,287,646.

    Remember, we're not starting with any data structure, we're analyzing what we want to do.  If I know that I will frequently need to "find an element at position x", then I will use an array because searching/finding a positional element is constant time.
     
    Blood pressure normal? What do I change to get "magnificent"? Maybe this tiny ad?
    Smokeless wood heat with a rocket mass heater
    https://woodheat.net
    reply
      Bookmark Topic Watch Topic
    • New Topic