• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

Skipping all CSS adds

 
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So i had a problem with parsing the HTML from a web page and you helped me a lot
I found a way to skip the HTML stuff and save only the information i want from a webpage but i didn't know about the CSS

if i want to skip the CSS what should i do?
for example when i "clean" the information from the HTML stuff i get this:

-Πιστέψτε με, μπορεί να φαντάζει σαν η μέρα που θα κρίνει τη μετέπειτα ζωή σας αλλά δεν είναι έτσι, είναι απλώς η μέρα που θα φέρει αλλαγές #8211; τόσο αυτή όσο-

which is ok, all those words are in Greek, perfectly fine. But i hate the "#8211;" and there are more of em inside the rest file!!
(i used the UTF-8 so it can read Greek and other languages because it has to read everything)

any ideas how can we fix that? and this is not a static #8211;

there are more other and different CSS stuff.

cheers guys any help will be great! thanks

[i deleted the "&" because i can not post &+#8211]
 
Sheriff
Posts: 28333
97
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That's not CSS, that's an HTML entity. If you see "&#nnnn;" in an HTML page that simply represents the Unicode character at code point nnnn. In particular "–" that's this character.
 
Jim Size
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Nice i didn't know that, so maybe if i want to skip all those '&#nnnn;' and their friends, somehow i have to break the "word" and if it the first two characters are &# i won't add it, right?
is there any other solution?
 
Marshal
Posts: 79979
397
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This might sit better on our HTML forum, so I shall try moving this thread thither.
 
Clowns were never meant to be THAT big! We must destroy it with this tiny ad:
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic