FTP just html files? [earn money]

Filed under: Web/Tech | 10 Comments

If anyone knows how I can just download .htm[l] files from a server via FTP I will PayPal you $10. I have a server with probably 20,000 files and I only need the HTML stuff, and all of it. I have a feeling there is a way to do it with the terminal, or maybe even Transmit’s “batch” feature, but can’t figure it out (haven’t put much though to it, but it’s probably not worth my time). If you know how, just comment. First person with the tip that I use and works gets the cash.

Oh, and it has to spawn directories. I need EVERYTHING.

Update: The offer has expired. I bit the bullet and downloaded the whole darn thing. Ugh. Thanks for the ideas.

Read the latest posts

10 Responses to “FTP just html files? [earn money]”

  1. Jake says:

    I would have thought a simple “mget *.htm” might do it, but I don’t think it will do the whole recursive directory thing for you…

  2. Jon Gales says:

    Hmm, that’s what I saw in “man ftp”. There are tons of directories, so that makes it tough. Halfway there! Thanks!

  3. Kevin says:

    ncftpget has recursive and (I think) wildcard options. You could probably use ncftpget -u USER -p PASSWORD -R /LOCALDIR /REMOTEDIR/*.html – I haven’t tried it, but hey, worth a shot.

  4. Jon Gales says:

    Hmm. I don’t seem to have ncftpget. Is it a default install on OS X?

  5. Kevin says:

    Nope, but it doesn’t work anyway. I have it because I have Fink (fink.sourceforge.net). ncftp WILL do a recursive copy of a directory structure, but I don’t think it’ll do a wildcarded recursive copy. You could probably do some post processing after you’ve downloaded everything and delete everything else… or you could come up with a shell script fairly easily to get just html files. Do a recursive ls, take out everything that’s not .html, and then go get each file. For $10, I’d even write it for you. ; )

  6. Jon Gales says:

    Kevin:

    That could work, but part of this is time. I spent 3 hours yesterday and didn’t get it all. I’m telling you, there are a TON of non-html files on this server. I’ve talked with Steven of Panic, and he said Transmit can’t do it. He added it to the feature list though!

    Mike Cohen talked to the guy from Fetch and he said the same thing (well without a lot of Applescript).

    I can’t be the only guy that wants this…

  7. Kevin says:

    It would be pretty trivial if you had shell access on that machine. Do you?

  8. Jon Gales says:

    Yep. I just SSH’d in. As long as I don’t have to download everything, I’m happy :).

  9. Kevin says:

    See if you’ve got tclsh. If so, it would be pretty trivial to write a tcl script that would do a recursive glob on the directories, and copy all the .html files to a new directory that you could then tar and gzip up and download. I’ve got a recursing proc you could use to build the list of files… I’m the Tcl Bandit!! YEEEE-HAW!

  10. Kevin says:

    Or just plain old tcl… that would work too. You are welcome to e-mail me about this so I don’t pollute your comments with this stuff.

Leave a Reply