Author Topic: Wget - Downloading Only Specific HTML Files Using Title Tag Information  (Read 4365 times)

0 Members and 1 Guest are viewing this topic.

Offline SpesMelior

  • Jr. Member
  • **
  • Posts: 1
  • Karma: 0
  • I've just joined!
    • View Profile
    • Awards
Having searched extensively I have been unable to find a solution so would be grateful if anyone can help me. Basically, what I am trying to achieve is to get Wget to download linked files but only specific files which meet a particular criterion. What I have in mind is to find out if wget is capable of doing this by only downloading those files which have a specific keyword in the  Title Tag ie within <Title></Title>. Say for instance a website had 100 html pages on various models of cars and included among these were 10 pages dealing with Vintage cars. The 90 Title tags would specify "Currently In Production" + the manufacturer + the car model. The other 10 would have "Vintage" + the Manufacturer + the car model.  I got all 100 pages using wget. What I want is just the 10 pages dealing with the vintage cars. I have tried everything I could think of but nothing works. I had thought that perhaps using "--follow-tags=" might do the trick but I couldn't make that work. I apologise for being so long winded but I just wanted to explain properly.

It may be that this is outwith the capabilities of wget but I just thought that as searching engines make use of Title tag info that perhaps wget did likewise. If anyone can help I would be grateful.
Thank you

 


SimplePortal 2.3.3 © 2008-2010, SimplePortal