问题描述:

I am looking for a way to download files from different pages and get them stored under a particular folder in a local machine. I am using Python 2.7

See the field below:

EDIT

here is the html content:

<input type="hidden" name="supplier.orgProfiles(1152444).location.locationPurposes().extendedAttributes(Upload_RFI_Form).value.filename" value="Screenshot.docx">

<a style="display:inline; position:relative;" href="

/aems/file/filegetrevision.do?fileEntityId=8120070&cs=LU31NT9us5P9Pvkb1BrtdwaCrEraskiCJcY6E2ucP5s.xyz">

Screenshot.docx

</a>

One Possiblity I just tried: with the html content if add say https://xyz.test.com and construct the URL like as below

https://xyz.test.com/aems/file/filegetrevision.do?fileEntityId=8120070&cs=LU31NT9us5P9Pvkb1BrtdwaCrEraskiCJcY6E2ucP5s.xyz

and place that URL on to the browser and hit Enter giving me a chance to download the file as screenshot mentioned. But now can we find such aems/file/filegetrevision.do?fileEntityId=8120070&cs=LU31NT9us5P9Pvkb1BrtdwaCrEraskiCJcY6E2ucP5s.xyz values how many it is present there?

CODE what I tried till now

Only pain how to download that file. using scripts constructed URL:

for a in soup.find_all('a', {"style": "display:inline; position:relative;"}, href=True):

href = a['href'].strip()

href = "https://xyz.test.com/" + href

print(href)

Please help me here!

Let me know if you people need any more information from me, I am happy to share that to you people.

Thanks in advance!

网友答案:

As @JohnZwinck suggested you can use urllib.urlretrieve and use the re module to create a list of links on a given page and download each file. Below is an example.

#!/usr/bin/python

"""
This script would scrape and download files using the anchor links.
"""


#Imports

import os, re, sys
import urllib, urllib2

#Config
base_url = "http://www.google.com/"
destination_directory = "downloads"


def _usage():
    """
    This method simply prints out the Usage information.
    """

    print "USAGE: %s <url>" %sys.argv[0]


def _create_url_list(url):
    """
    This method would create a list of downloads, using the anchor links
    found on the URL passed.
    """

    raw_data = urllib2.urlopen(url).read()
    raw_list = re.findall('<a style="display:inline; position:relative;" href="(.+?)"', raw_data)
    url_list = [base_url + x for x in raw_list]
    return url_list


def _get_file_name(url):
    """
    This method will return the filename extracted from a passed URL
    """

    parts = url.split('/')
    return parts[len(parts) - 1]


def _download_file(url, filename):
    """
    Given a URL and a filename, this method will save a file locally to the»
    destination_directory path.
    """
    if not os.path.exists(destination_directory):
        print 'Directory [%s] does not exist, Creating directory...' % destination_directory
        os.makedirs(destination_directory)
    try:
        urllib.urlretrieve(url, os.path.join(destination_directory, filename))
        print 'Downloading File [%s]' % (filename)
    except:
        print 'Error Downloading File [%s]' % (filename)


def _download_all(main_url):
    """
    Given a URL list, this method will download each file in the destination
    directory.
    """

    url_list = _create_url_list(main_url)
    for url in url_list:
        _download_file(url, _get_file_name(url))


def main(argv):
    """
    This is the script's launcher method.
    """

    if len(argv) != 1:
        _usage()
        sys.exit(1)
    _download_all(sys.argv[1])
    print 'Finished Downloading.'


if __name__ == '__main__':
    main(sys.argv[1:])

You can Change the base_url and the destination_directory according to your needs and save the script as download.py. Then from the terminal use it like

python download.py http://www.example.com/?page=1
网友答案:

We can't know what service you got that first image from, but we'll assume it's on a website of some kind--probably one internal to your company.

The easiest things you can try are to use urllib.urlretrieve to "get" the file based on its URL. You may be able to do this if you can right-click the link on that page, copy the URL, and paste it into your code.

However, that may not work, for example if there is complex authentication required before accessing that page. You might need to write Python code that actually does the login (as if the user were controlling it, typing a password). If you get that far, you should post that as a separate question.

相关阅读:
Top