MD,又被骗着装了一次Java
首先按照上一篇内容安装一下Browsermob代理,它年久失修,没有进步.
安装Browsermob代理
然后记得安装Java8,在Path里面设置一下
Python代码记得标记一下这款神器的路径
__BMP = r"D:\XX\FastAPI\NG\browsermob-proxy-2.1.4\bin\browsermob-proxy.bat"
看一下全部的代码吧,就不藏着了:
import time
from browsermobproxy import Server
from selenium import webdriver
import time
import pprint
class ProxyManger:
__BMP = r"D:\XX\FastAPI\NG\browsermob-proxy-2.1.4\bin\browsermob-proxy.bat"
def __init__(self):
self.__server = Server(ProxyManger.__BMP)
self.__client = None
def start_server(self):
self.__server.start()
return self.__server
def start_client(self):
self.__client = self.__server.create_proxy(params={"trustAllServers": "true"})
return self.__client
@property
def client(self):
return self.__client
@property
def server(self):
return self.__server
if __name__=="__main__":
# 开启Proxy
proxy = ProxyManger()
server = proxy.start_server()
client = proxy.start_client()
# 配置Proxy启动WebDriver
options = webdriver.ChromeOptions()
options.add_argument("--proxy-server={}".format(client.proxy))
options.add_argument('--ignore-certificate-errors')
options.add_argument('user-agent="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"')
# options.add_argument('--headless')
#chromePath = r"D:\AzRjN\anaconda3_7\envs\demo36\Lib\site-packages\selenium\webdriver\chrome\chromedriver.exe"
driver = webdriver.Chrome(chrome_options=options)
# 获取返回的内容
client.new_har("shopee.com.my", options={'captureHeaders': True, 'captureContent': True})
driver.get("https://shopee.com.my/search?keyword=phone")
time.sleep(3)
result = client.har
# print(result)
for entry in result['log']['entries']:
_url = entry['request']['url']
# print("请求地址:", _url)
if "/api/v4/search/search_items?" in _url:
_response = entry['response']
_content = _response['content']
print("请求响应内容:", _response)
server.stop()
你会很容易的发现,大厂的反爬措施做的不错,很明显的抓取了Selenium的特征,Shopee本身就有弹框需要你点击,可是这个内容防止不了更高手段的人,所以一旦检测到你非真浏览器,会直接弹到登录页面,这样就会导致你的神器根本用不上,研究到这里我就不继续了,始终这款神器久未更新,是Java领域的,研究了很久发现抓取har确实没什么问题,MitmProxy无法完美的契合Python脚本,但是这款神器确实做到了,不过速度却并没有我现象中的理想,再加上其他的弊端,应该不合适...