• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Devaka Cooray
  • Ron McLeod
  • Jeanne Boyarsky
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Piet Souris
  • Carey Brown
  • Tim Holloway
Bartenders:
  • Martijn Verburg
  • Frits Walraven
  • Himai Minh

Should I make an API for this ?

 
Ranch Hand
Posts: 99
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello, I made an android app for personal use that crawls a website and send me back the info I need. Basically I've an URL1, in this URL1 I search for other links, then I do an HTTP get request on those links and get back the results in the android apps. So there is somewhat a lot of requests going on and the app takes a long time (2minutes) to get the results.

I'd like to get the total information I need in a single request but I'm not sure how to do that since this is outside what I currently know. What are my options here ?

Should I make an API ?
How much would that cost to host on the web ?
If I make it localhost, (I know this is hardware specific) how much request can it hold ? I'm asking this because I might not use this only for personal use, so I need to know how many requests I can handle on localhost. Since, I guess, this is specific to my internet connection, how can I find out how much requests I can handle without having an increased response time?

EDIT after looking around some more jax-rs seems to be what I was looking for so I'm learning that for now.
 
Saloon Keeper
Posts: 14515
325
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can't crawl a web site in a single request, because whatever machine is analyzing the page must still make multiple requests for any links it finds.

A big improvement you can make is to make the requests concurrently: When you analyze a page and find new links, you can request the pages those links refer to at the same time.
 
Cedric Bosch
Ranch Hand
Posts: 99
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:You can't crawl a web site in a single request, because whatever machine is analyzing the page must still make multiple requests for any links it finds.

A big improvement you can make is to make the requests concurrently: When you analyze a page and find new links, you can request the pages those links refer to at the same time.



I thought about making multiple threads for each link. Is that what you are suggesting. Anyway I'm going with jax-rs since I wanted to learn how to do that anyway.
 
Stephan van Hulst
Saloon Keeper
Posts: 14515
325
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You don't necessarily have to make a new thread per request, just make a task to retrieve the page per link, and submit it to a thread pool.

Since these tasks can spawn new tasks, I suggest you use a work-stealing thread pool, which may further increase performance if some pages are very 'deep'.
 
I didn't say it. I'm just telling you what this tiny ad said.
the value of filler advertising in 2021
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic