Author Topic: File leacher  (Read 859 times)

0 Members and 1 Guest are viewing this topic.

Offline pooky2483

  • Hero Member
  • *****
  • Posts: 1192
  • Karma: 0
  • Gender: Male
  • Struggling Windows convert
    • View Profile
    • Awards
Re: File leacher
« Reply #15 on: December 03, 2012, 07:43:57 pm »
Looks like I'm at a dead end then. Back to manually downloading them...
Ubuntu 12.04 64bit|M3A76-CM|BIOS 2002|AMD Athlon64 X2 5200+|Realtek RTL8168C(P)|8111C(P) PCI-E Gigabit Ethernet NIC|NVIDIA 128MB GeForce6200 Turbocache|3.0GB Single-Channel DDR2 @ 387MHz (5-5-5-18)|PEAK 138508AGPK DVB-T Digital TV Hybrid PCI Card|T~bird|Firefox|MATE

Offline SeZo

  • Hero Member
  • *****
  • Posts: 715
  • Karma: 29
  • Gender: Male
    • View Profile
    • Awards
Re: File leacher
« Reply #16 on: December 03, 2012, 08:57:52 pm »
Quote
Looks like I'm at a dead end then. Back to manually downloading them...

Wget follows the breadcrumbs (links) in the pages it comes accross.

I guess if the pages are loaded by script, then you are outta luck.
However if you would start with a known subsection which contains the links to the pdfs then you might get somewhere.
Code: [Select]
wget --random-wait --limit-rate=20k -r --no-parent -l10 -A.pdf http://website.com/subfolder
That should start from the specified url (ignoring the directories above) and will go to 10 level deep.

Like Mark said, they might just tell you to sod off and do the downloading like everyone else does.

Offline Mark Greaves (PCNetSpec)

  • Administrator
  • Hero Member
  • *****
  • Posts: 8842
  • Karma: 235
  • Gender: Male
  • "-rw-rw-rw-" .. The Number Of The Beast
    • View Profile
    • PCNetSpec
    • Awards
Re: File leacher
« Reply #17 on: December 03, 2012, 09:15:56 pm »
I had a "quick" look at the site in question, and the pdf always seems to get oaded from the same link, which "becomes" a link to the selected pdf .. so I'm guessing it's ll scripted.

Nor wll it let you "browse" the directory it says the pdf comes from.

That said, playing with wget::gui and trying a few random options seemed to download quite a few directories .. but no pdf's, just a bunch of xml files, even though I'd told it to get pdf's
(maybe I just didn't give it enough time to follow all the links).

I'd also guess there will be gigabytes of pdf's on that site .. which would explain them not wanting 1000's of people all downloading the lot in one go .. they'd lock up the site for hours.
WARNING: You are logged into reality as 'root'

logging in as 'insane' is the only safe option.

Offline pooky2483

  • Hero Member
  • *****
  • Posts: 1192
  • Karma: 0
  • Gender: Male
  • Struggling Windows convert
    • View Profile
    • Awards
Re: File leacher
« Reply #18 on: December 11, 2012, 06:07:44 pm »
Just thought it'd be worth a try to get the PDF's in one go but if it can't be done, nevermind.
Thanks anyways for trying.
Ubuntu 12.04 64bit|M3A76-CM|BIOS 2002|AMD Athlon64 X2 5200+|Realtek RTL8168C(P)|8111C(P) PCI-E Gigabit Ethernet NIC|NVIDIA 128MB GeForce6200 Turbocache|3.0GB Single-Channel DDR2 @ 387MHz (5-5-5-18)|PEAK 138508AGPK DVB-T Digital TV Hybrid PCI Card|T~bird|Firefox|MATE

 


SimplePortal 2.3.3 © 2008-2010, SimplePortal