Forums Register Login

avoiding webcrawlers

+Pie Number of slices to send: Send
Hi All,

I am planning to develop one site using jsp and html . so here i want to provide complete protection against web crawlers (web bots) .

Now a days in the fast revolution of internet everyone is copying the content of the pages easily using some kind software (web crawlers).

So,I want to avoid web crawlers to scroll(crawl) into pages and copy entire website .

How can i do this. anybody overcome this problem.

Please suggest me some thing about this.
Thanks in advance!

regards,
sai krishna C
+Pie Number of slices to send: Send
Could you tell me the folder structure of your website project.
+Pie Number of slices to send: Send
 

Originally posted by Andy Matt:
Could you tell me the folder structure of your website project.



for example it is in this way...
www.xxx.com/~yyy
/index.jsp
/x.jsp
/y.html
.....
so on...

please suggest me
+Pie Number of slices to send: Send
If your pages aren't password protected and are open to the public, anyone will be able to crawl through them.
Why are you concerned with this?
+Pie Number of slices to send: Send
 

Originally posted by Ben Souther:
If your pages aren't password protected and are open to the public, anyone will be able to crawl through them.
Why are you concerned with this?



Ben,I am not using any authentication in my site you may think it's like javaranch site .
In this site a user may explore every hwere as a guest (with out nay passowrd issue) crawl every where with using some sort of software.
+Pie Number of slices to send: Send
Most crawlers obey the robots exclusion standard. There is also an HTML meta tag (see same article) that can instruct crawlers to ignore a page or links.
+Pie Number of slices to send: Send
 

How can i do this. anybody overcome this problem.


I guess a good question at this point would be:
What, exactly, has you concerned?
or
What, exactly, is the problem?

Is it search engines that have you concerned or someone else?
+Pie Number of slices to send: Send
 

Originally posted by Ben Souther:

I guess a good question at this point would be:
What, exactly, has you concerned?
or
What, exactly, is the problem?

Is it search engines that have you concerned or someone else?



Hi Ben,
There is no problwm with search engine crawlers.
The problem is with only with third party software which is used by any intruder (end user) for copying all the html pages into his local system.
Hope this time i am specific !



+Pie Number of slices to send: Send
The only way to totally prevent that is to not put the page publicly online. After all, anyone who can look at a page also can save it to his hard disk.
+Pie Number of slices to send: Send
 

Originally posted by Ilja Preuss:
The only way to totally prevent that is to not put the page publicly online. After all, anyone who can look at a page also can save it to his hard disk.




Then there is no security over our content
There should be some thing to stop this. As a software professional we should not say nagative things.



+hp =Everything is possible
+Pie Number of slices to send: Send
 

Originally posted by saikrishna cinux:



Then there is no security over our content
There should be some thing to stop this. As a software professional we should not say nagative things.



+hp =Everything is possible



Think about it.
What does a web browser do?
It downloads your material to the user's local machine.
That's what it's supposed to do.
The server is supposed to make that content available.
Hyperlinks are there to show users (often someone clicking on links) what other pages are available for download.

If your content needs to be secured, password protect your site.
Then only people with the necessary credentials can download content from it.
+Pie Number of slices to send: Send
 

Originally posted by Ben Souther:


Think about it.
What does a web browser do?
It downloads your material to the user's local machine.
That's what it's supposed to do.
The server is supposed to make that content available.
Hyperlinks are there to show users (often someone clicking on links) what other pages are available for download.

If your content needs to be secured, password protect your site.
Then only people with the necessary credentials can download content from it.




Ok Ben, So far so good answer from you.
and your exactly correct in this point.
The web browser will copy entire data into the local system ( irrespective of loign password).

But there must be some kind of restriction to the end user for copying the entire content of the page.

If we can do this thing then we can bring revolution ,create benchmark .

What do you say ?
+Pie Number of slices to send: Send
 

Originally posted by saikrishna cinux:
But there must be some kind of restriction to the end user for copying the entire content of the page.


Why? What's the difference between someone reading a page, and someone copying the contents of a page? The web is a public medium; if you don't want something disseminated, don't put it online, or add a login for accessing it.

If we can do this thing then we can bring revolution ,create benchmark.


I have no idea what this means.
+Pie Number of slices to send: Send
 

Originally posted by Ulf Dittmer:

I have no idea what this means.



Hi ULF!!! ,

Congratulations for your 10K posts here so i've seen your 9999 post.
ok,
You are right there is no difference betten by sseing th ewb page content and copying it into the local system

But the thing is here when the user uses web crawlers or web spider software he will get some millions of pages into his local ystem
and the sites like largest community "orkut " can be easily crawlled and can be minused (extract) the phone numbers email id's etc. some confidentional or personal data can be accessed at once and can be misused.

Hope i have made some sense by clear idea!

Thanks !


regards
sai krishna c

+Pie Number of slices to send: Send
If you leave personal or confidential information on a publicly accessible page what do you expect? Of course it gets misappropriated. I'm anyway at a loss to understand the amount of personal detail some people choose to make available about themselves on the web; naive is about the nicest word I can find for this behavior.
+Pie Number of slices to send: Send
You cannot stop automated page downloads because they don't look any different from non-automated ones.

Except...

Downloading a million pages could take a long, long time if your site limited the bandwidth used by any one client. You might add a servlet filter that checked each request against a list of recent request IP addresses, and refused to serve a page if the previous request was less than X seconds ago. I imagine there are commercial products with this sort of capability built in.
+Pie Number of slices to send: Send
 

Originally posted by Ernest Friedman-Hill:
You cannot stop automated page downloads because they don't look any different from non-automated ones.

Except...

Downloading a million pages could take a long, long time if your site limited the bandwidth used by any one client. You might add a servlet filter that checked each request against a list of recent request IP addresses, and refused to serve a page if the previous request was less than X seconds ago. I imagine there are commercial products with this sort of capability built in.




Ernest, May i know some commercial (and /or )free site names .

ofcourse, This is really very good idea!

Great suggestion Boss.
+Pie Number of slices to send: Send
 

Originally posted by saikrishna cinux:

Ernest, May i know some commercial (and /or )free site names .



Read Ernest's post again.


I imagine there are commercial products with this sort of capability built in.



"Imagine" is the keyword there.
You will have to search for such a product yourself.
[ August 31, 2007: Message edited by: Ben Souther ]
+Pie Number of slices to send: Send
 

Originally posted by Ben Souther:


"Imagine" is the keyword there.
You will have to search for such a product yourself.

[ August 31, 2007: Message edited by: Ben Souther ]



Ok, Dear Ben
Any way you got a good Eye on each and every word in the posts !
Good!!
WHAT is your favorite color? Blue, no yellow, ahhhhhhh! Tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com


reply
reply
This thread has been viewed 933 times.
Similar Threads
WEB BOT with JAVA and XML
Screen Scrap API
What resources would be required for a java based web crawler
Using HTTPUnit with jsp and Weblogic
Group study for test 141 in Bangalore
Thread Boost feature
More...

All times above are in ranch (not your local) time.
The current ranch time is
Apr 16, 2024 00:28:07.