PDA

View Full Version : Having a hard time getting all the files from the site.


mike.humphrey275139
10-06-2011, 02:47 PM
Hi guys,

I have Site Import 2.1 and am having a hard time getting all the files from the web site.

There is one specific directory I want to get all the files from but your tool is only pulling down specific one, though they all have the same file type extention: ".asp" in that directory.

The last 4 hours of the site import was downloading files I had no interest in but there was no way I could choose which directories I wanted and which ones I didn't want downloaded to my PC; the site is pretty big.

I increased the maximum pages from 7000 to 20,000 but it still went to max.

Run time: 14:11:25
Import URL http://archive.catholic.com
Local path: G:/Websites/
Total files 20917

Progress

Levels deep 3797
Pages in queue 0

It downloaded this file:
http://archive.catholic.com/thisrock/1991/9101qq.asp
but not this:
http://archive.catholic.com/thisrock/1991/9101frs.asp

I searched the WAImportLog.txt file and a search on "9101frs" came up:
Not found in the current document?

Any ideas? Why would I upgrade to SI 3.0 if SI 2.1 doesn't work?

Mike a current customer.
e-mail: mike.humphrey@rcn.com

Jason Byrnes
10-06-2011, 05:21 PM
site import works by following the a href links in your site to find linked pages that are part of your domain.


I dont see any links to the /thisrock/1991/9101frs.asp file you are trying to download on your site, the link to that file I can see is using a full URL to the following domain:
www.catholic.com

<a href="http://www.catholic.com/thisrock/1991/9101frs.asp">Fathers Know Best</a>


but this is not the same domain you have chosen to import:
http://archive.catholic.com

since the link points to a different domain, it is being ignored.