实例引入

import requests

response = requests.get('http://httpbin.org/')
print(response.text)

==========返回结果==========

<!DOCTYPE html>   //网页具体内容
<html lang="en">

请求

请求代理

在本地模拟环境下,开启SS代理,使用全局模式,本地代理使用1080端口

发送请求后返回origin为代理服务器的IP地址


import requests

proxies = {
    "http":"http://127.0.0.1:1080"
}
resp = requests.get("http://httpbin.org/get",proxies=proxies)
print(resp.json()['origin'])

==========返回结果==========

184.170.217.70

超时设置

规定请求的响应时间,若超出时间则报错,使用timeout参数限制时间


import requests
from requests.exceptions import ConnectTimeout  #异常处理库

try:
    resp = requests.get("http://httpbin.org/",timeout = 0.1)
    print(resp.status_code)
except ConnectTimeout:  #异常捕获
    print("连接超时")

==========返回结果==========

连接超时

GET请求

  • 基本请求
import requests

response = requests.get('http://httpbin.org/get')
print(response.status_code)
print(response.cookies)
print(response.text)

==========返回结果==========

200   //请求状态码
<RequestsCookieJar[]>   //Cookies内容
{   //网页具体内容
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "www.httpbin.org", 
    "User-Agent": "python-requests/2.19.1"
  }, 
  "origin": "223.214.141.29", 
  "url": "https://www.httpbin.org/get"
}
.....
  • 带参数、header请求

可将参数、header使用字典传入params headers参数内,某些网站对请求头有判断,可手工构造请求头

import requests

data={
    'name':'elissa',
    'code':'N/A'
}

headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}

response = requests.get("http://httpbin.org/get",params=data,headers=headers)
print(response.text)

==========返回结果==========

{
  "args": {
    "code": "N/A", 
    "name": "elissa"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
  }, 
  "origin": "223.214.141.29", 
  "url": "http://httpbin.org/get?name=elissa&code=N%2FA"
}

POST请求

  • 基本请求

import requests

response = requests.post("https://www.httpbin.org/post")
print(response.status_code)
print(response.headers)
print(response.cookies)
print(response.history)
print(response.url)
print(response.text)

==========返回结果==========

200   //返回状态码(int型)
{'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', ......,'Access-Control-Allow-Credentials': 'true'}   //响应头(字典型)
<RequestsCookieJar[]>   //返回cookie
[]   //请求历史(list型)
https://www.httpbin.org/post   //响应url(str型)
{  //响应内容(str型)
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {}, 
    ......
  "json": null, 
  "origin": "223.214.141.29", 
  "url": "https://www.httpbin.org/post"
}
  • 带from表单、header请求

可将from表单、header使用字典传入data headers参数内


import requests

data = {
    'name':'elissa',
    'type':'post'
}

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}

response = requests.post("https://www.httpbin.org/post",headers=headers,data=data)
print(response.text)

==========返回结果==========

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "name": "elissa", 
    "type": "post"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "21", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "www.httpbin.org", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
  }, 
  "json": null, 
  "origin": "223.214.141.29", 
  "url": "https://www.httpbin.org/post"
}

响应处理

解析json

若返回的结果为json格式,可使用.json()解析,解析得到的结果为字典类型


import requests

response = requests.get("http://httpbin.org/get")
print(response.json()['headers'])

==========返回结果==========

{'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.19.1'}

解析二进制数据

适用于抓取资源为图片、视频、音频等文件数据

import requests

response = requests.get("https://www.bilibili.com/favicon.ico")
with open('favicon.ico','wb') as f:
    f.write(response.content)
    f.close()

获取cookie

cookie为字典类型,可以使用循环输出键值对


import requests

response = requests.get("https://www.baidu.com")

for key,value in response.cookies.items():
    print(key + "=" + value)

==========返回结果==========

BDORZ=27315

会话维持

在拿到cookie后,需要使用Session()对象发起多次get请求,cookie可被正确的包含在其他的请求中


import requests

s = requests.Session()
s.get("http://httpbin.org/cookies/set/id/elissa")   #模拟设置cookies
resp = s.get("http://httpbin.org/cookies")   #获取cookies
print(resp.text)

==========返回结果==========

{
  "cookies": {
    "id": "elissa"
  }
}

异常处理

在弱网环境下,可以对超时、请求错误、HTTP错误等进行错误捕获

需要引入requests.exceptions父类,具体错误类型见官方API(点此)


import requests
from requests.exceptions import ConnectTimeout,ReadTimeout,HTTPError,RequestException  #异常处理

try:
    resp = requests.get("http://httpbin.org/",timeout = 0.8)
    print(resp.status_code)
except ConnectTimeout:  #开始捕获
    print("ConTimeout")
except ReadTimeout:
    print("ReadTimeout")
except HTTPError:
    print("Http error")
except RequestException:
    print("Net Error")