Playwright:修订间差异
无编辑摘要 |
无编辑摘要 |
||
(未显示同一用户的6个中间版本) | |||
第81行: | 第81行: | ||
|[https://playwright.dev/python/docs/library Playwright Python 文档:入门] | |[https://playwright.dev/python/docs/library Playwright Python 文档:入门] | ||
}} | }} | ||
==浏览器== | ==浏览器== | ||
===安装和使用=== | ===安装和使用=== | ||
第121行: | 第122行: | ||
| | | | ||
|} | |} | ||
==页面== | |||
{| class="wikitable" | |||
! 名称 | |||
! 描述 | |||
! 示例 | |||
|- | |||
| goto() | |||
| | |||
| | |||
|- | |||
| | |||
| | |||
| | |||
|- | |||
| content() | |||
| 页面HTML源代码 | |||
| <syntaxhighlight lang="python" > | |||
with open('test.txt', 'w', encoding='utf-8' ) as f: | |||
f.write(page.content()) | |||
</syntaxhighlight> | |||
|- | |||
| | |||
| | |||
| | |||
|} | |||
==元素== | |||
===定位=== | |||
{| class="wikitable" | |||
! 名称 | |||
! 描述 | |||
! 示例 | |||
|- | |||
| | |||
| | |||
| | |||
|- | |||
| | |||
| | |||
| | |||
|- | |||
| | |||
| | |||
| | |||
|} | |||
{{了解更多 | |||
|[https://playwright.dev/python/docs/locators Playwright Python 文档:定位] | |||
}} | |||
===属性=== | |||
{| class="wikitable" | |||
! 名称 | |||
! 描述 | |||
! 示例 | |||
|- | |||
| | |||
| | |||
| | |||
|- | |||
| | |||
| | |||
| | |||
|- | |||
| | |||
| | |||
| | |||
|} | |||
{{了解更多 | |||
|[https://playwright.dev/python/docs/api/class-locator Playwright Python API:locator类] | |||
}} | |||
== 网络 == | |||
===监听请求和响应=== | |||
使用<code>page.on("request", handler)</code>和<code>page.on("response", handler)</code>可以监听所有请求和响应事件。 | |||
{{了解更多 | |||
|[https://playwright.dev/python/docs/network#network-events Playwright Python 文档:网络 - 网络事件] | |||
|[https://playwright.dev/python/docs/api/class-page#events Playwright Python API:Page类 - events] | |||
|[https://playwright.dev/python/docs/api/class-response Playwright Python API:Response类] | |||
|[https://playwright.dev/python/docs/api/class-request Playwright Python API:Request类] | |||
}} | |||
===处理请求=== | |||
使用<code>page.route()</code>或<code>browser_context.route()</code>可以修改或终止请求。 | |||
{{了解更多 | |||
|[https://playwright.dev/python/docs/network#handle-requests Playwright Python 文档:网络 - 处理请求] | |||
|[https://playwright.dev/python/docs/api/class-page#page-route Playwright Python API:Page类 - route] | |||
}} | |||
==== 修改请求 ==== | |||
<syntaxhighlight lang="python" > | |||
# 修改header,删除"x-secret"键 | |||
def handle_route(route): | |||
headers = route.request.headers | |||
del headers["x-secret"] | |||
route.continue_(headers=headers) | |||
page.route("**/*", handle_route) | |||
# Continue requests as POST. | |||
page.route("**/*", lambda route: route.continue_(method="POST")) | |||
</syntaxhighlight> | |||
==== 终止请求 ==== | |||
使用<code>page.route()</code>和<code>route.abort()</code>可以终止请求。如有时候不想加载图片和一些请求。 | |||
<syntaxhighlight lang="python" > | |||
page = browser.new_page() | |||
page.route("**/*.{png,jpg,jpeg}", lambda route: route.abort()) | |||
page.goto("https://example.com") | |||
browser.close() | |||
</syntaxhighlight> | |||
根据正则表达式: | |||
<syntaxhighlight lang="python" > | |||
page = browser.new_page() | |||
page.route(re.compile(r"(\.png$)|(\.jpg$)"), lambda route: route.abort()) | |||
page.goto("https://example.com") | |||
browser.close() | |||
</syntaxhighlight> | |||
以下代码根据请求类型和请求url终止某些请求: | |||
<syntaxhighlight lang="python" > | |||
from playwright.sync_api import sync_playwright | |||
ABORT_TYPES = ['image', 'font', 'media'] | |||
ABORT_URL_NAME = ['bdstatic.com', '/static/superman', '.js'] | |||
def handle_route(route): | |||
if route.request.resource_type in ABORT_TYPES: | |||
return route.abort() | |||
elif any(name in route.request.url for name in ABORT_URL_NAME): | |||
return route.abort() | |||
else: | |||
route.continue_() | |||
with sync_playwright() as p: | |||
browser = p.chromium.launch(headless=False) | |||
page = browser.new_page() | |||
page.route("**/*", handle_route) | |||
page.goto("https://www.baidu.com") | |||
page.wait_for_timeout(10*1000) | |||
browser.close() | |||
</syntaxhighlight> | |||
{{了解更多 | |||
|[https://playwright.dev/python/docs/network#abort-requests Playwright Python 文档:网络 - 终止请求] | |||
|[https://playwright.dev/python/docs/api/class-page#page-route Playwright Python API:Page类 - route] | |||
}} | |||
===处理响应=== | |||
要修改响应,先使用<code>APIRequestContext</code>获取原始响应,然后将响应传递给<code>route.fulfill()</code> 。 | |||
{{了解更多 | |||
|[https://playwright.dev/python/docs/network#modify-responses Playwright Python 文档:网络 - 修改响应] | |||
}} | |||
==调试工具== | |||
===Inspector=== | |||
设置为debug模式,运行代码时即可打开Playwright Inspector。设置方法:设置环境变量<code>PWDEBUG=1</code>。 | |||
<syntaxhighlight lang="python" > | |||
# Bash | |||
PWDEBUG=1 pytest -s | |||
# PowerShell中 | |||
$env:PWDEBUG=1 | |||
# Batch | |||
set PWDEBUG=1 | |||
</syntaxhighlight> | |||
{{了解更多 | |||
|[https://playwright.dev/python/docs/debug#playwright-inspector Playwright Python 文档:调试 - Playwright Inspector] | |||
}} | |||
===Trace Viewer=== | |||
{{了解更多 | |||
|[https://playwright.dev/python/docs/debug#trace-viewer Playwright Python 文档:调试 - Trace Viewer] | |||
}} | |||
==代码生成器== | |||
使用<code>playwright codegen</code>命令可以运行代码生成,会打开两个窗口,一个是浏览器,另一个是Playwright Inspector窗口。在浏览器器中操作,会在Inspector窗口实时生成代码。可以使用<code>playwright codegen -h</code>查看帮助。 | |||
<syntaxhighlight lang="bash" > | |||
# 使用firefox浏览器,打开www.baidu.com网页。 | |||
playwright codegen www.baidu.com -b firefox | |||
# 生成代码保存到test.py | |||
playwright codegen -o test.py -b firefox | |||
</syntaxhighlight> | |||
{{了解更多 | |||
|[https://playwright.dev/python/docs/codegen Playwright Python 文档:测试生成器] | |||
}} | |||
==检测与防检测== | ==检测与防检测== | ||
===防检测=== | ===防检测=== | ||
第126行: | 第313行: | ||
! 名称 | ! 名称 | ||
! 描述 | ! 描述 | ||
|- | |- | ||
| 设置参数,运行add_init_script | | 设置参数,运行add_init_script | ||
第151行: | 第337行: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
|- | |- | ||
| | | | ||
| | | | ||
|- | |- | ||
| | | | ||
| | | | ||
第166行: | 第350行: | ||
* Playwright Python版源代码:https://github.com/microsoft/playwright-python | * Playwright Python版源代码:https://github.com/microsoft/playwright-python | ||
* Playwright Python 文档:https://playwright.dev/python/docs/intro | * Playwright Python 文档:https://playwright.dev/python/docs/intro | ||
* Playwright Python API:https://playwright.dev/python/docs/api/class-playwright | |||
===网站=== | ===网站=== | ||
===文章=== | ===文章=== | ||
*[https://cuiqingcai.com/36045.html 静觅:崔庆才 - 新兴爬虫利器Playwright 的基本用法] | |||
*[https://soulteary.com/2022/11/28/playwrights-concise-introductory-tutorial-recording-automated-test-cases-and-using-it-with-docker.html#%E5%86%99%E5%9C%A8%E5%89%8D%E9%9D%A2 苏洋博客:Playwright 简明入门教程:录制自动化测试用例,结合 Docker 使用] |
2023年4月30日 (日) 14:49的最新版本
Playwright是微软源的一个Web测试和自动化框架。支持 Chromium、Firefox和WebKit浏览器,Linux、macOS和Windows平台,Python、.NET和Java等多语言。
简介
时间轴
安装
安装Python版本:
# 安装pytest插件版playwright
# pip install pytest-playwright
# 安装Pytest
pip install playwright
# 安装所有支持的浏览器及配置驱动
# playwright install
# 只安装chrome浏览器及配置驱动,使用playwright install -h可以查看帮助
# 目前支持chromium, chrome, chrome-beta, msedge, msedge-beta, msedge-dev, firefox, webkit浏览器。
playwright install chrome
快速入门
同步模式
from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
# 使用playwright.chromium, playwright.firefox or playwright.webkit
# 默认无界面模式,launch使用headless=False设置有界面
browser = playwright.firefox.launch(headless=False)
page = browser.new_page()
page.goto("https://www.baidu.com")
page.screenshot(path="截图.png")
browser.close()
playwright.stop()
更常用使用with语句:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.firefox.launch(headless=False)
page = browser.new_page()
page.goto("https://www.baidu.com/")
# 输入框输入文字
# page.locator('//input[@id="kw"]').fill('playwright')
page.fill('//input[@id="kw"]', 'playwright')
# 点击搜索按钮
# page.locator('//input[@id="su"]').click()
page.click('//input[@id="su"]')
# 延迟5秒,单位毫秒
page.wait_for_timeout(5*1000)
page.screenshot(path="截图.png")
browser.close()
代码在Jupyter中运行会出现错误:Error: It looks like you are using Playwright Sync API inside the asyncio loop.Please use the Async API instead.
。解决办法:代码保存到测试.py
,在终端运行python 测试.py
。
了解更多 >> Playwright Python 文档:入门
异步模式
使用with语句
import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as p:
browser = await p.firefox.launch(headless=False)
page = await browser.new_page()
await page.goto("https://wwww.baidu.com")
print(await page.title())
await browser.close()
asyncio.run(main())
了解更多 >> Playwright Python 文档:入门
浏览器
安装和使用
名称 | 描述 |
---|---|
chromium | 使用playwright install chromium 安装好浏览器和驱动
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://www.baidu.com/")
page.wait_for_timeout(5*1000) # 等待5秒
page.screenshot(path="截图.png")
browser.close()
|
chrome | 使用playwright install chrome 自动安装浏览器和驱动,也可以自己安装浏览器和驱动。
browser = p.chromium.launch(
channel="chrome",
headless=False,
slow_mo=10,
# 跳过检测
args=['--disable-blink-features=AutomationControlled']
)
|
firefox | 使用playwright install firefox 自动安装浏览器和驱动browser = p.firefox.launch(headless=False)
|
页面
名称 | 描述 | 示例 |
---|---|---|
goto() | ||
content() | 页面HTML源代码 | with open('test.txt', 'w', encoding='utf-8' ) as f:
f.write(page.content())
|
元素
定位
名称 | 描述 | 示例 |
---|---|---|
了解更多 >> Playwright Python 文档:定位
属性
名称 | 描述 | 示例 |
---|---|---|
了解更多 >> Playwright Python API:locator类
网络
监听请求和响应
使用page.on("request", handler)
和page.on("response", handler)
可以监听所有请求和响应事件。
了解更多 >> Playwright Python 文档:网络 - 网络事件 Playwright Python API:Page类 - events Playwright Python API:Response类 Playwright Python API:Request类
处理请求
使用page.route()
或browser_context.route()
可以修改或终止请求。
修改请求
# 修改header,删除"x-secret"键
def handle_route(route):
headers = route.request.headers
del headers["x-secret"]
route.continue_(headers=headers)
page.route("**/*", handle_route)
# Continue requests as POST.
page.route("**/*", lambda route: route.continue_(method="POST"))
终止请求
使用page.route()
和route.abort()
可以终止请求。如有时候不想加载图片和一些请求。
page = browser.new_page()
page.route("**/*.{png,jpg,jpeg}", lambda route: route.abort())
page.goto("https://example.com")
browser.close()
根据正则表达式:
page = browser.new_page()
page.route(re.compile(r"(\.png$)|(\.jpg$)"), lambda route: route.abort())
page.goto("https://example.com")
browser.close()
以下代码根据请求类型和请求url终止某些请求:
from playwright.sync_api import sync_playwright
ABORT_TYPES = ['image', 'font', 'media']
ABORT_URL_NAME = ['bdstatic.com', '/static/superman', '.js']
def handle_route(route):
if route.request.resource_type in ABORT_TYPES:
return route.abort()
elif any(name in route.request.url for name in ABORT_URL_NAME):
return route.abort()
else:
route.continue_()
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
page.route("**/*", handle_route)
page.goto("https://www.baidu.com")
page.wait_for_timeout(10*1000)
browser.close()
处理响应
要修改响应,先使用APIRequestContext
获取原始响应,然后将响应传递给route.fulfill()
。
了解更多 >> Playwright Python 文档:网络 - 修改响应
调试工具
Inspector
设置为debug模式,运行代码时即可打开Playwright Inspector。设置方法:设置环境变量PWDEBUG=1
。
# Bash
PWDEBUG=1 pytest -s
# PowerShell中
$env:PWDEBUG=1
# Batch
set PWDEBUG=1
Trace Viewer
代码生成器
使用playwright codegen
命令可以运行代码生成,会打开两个窗口,一个是浏览器,另一个是Playwright Inspector窗口。在浏览器器中操作,会在Inspector窗口实时生成代码。可以使用playwright codegen -h
查看帮助。
# 使用firefox浏览器,打开www.baidu.com网页。
playwright codegen www.baidu.com -b firefox
# 生成代码保存到test.py
playwright codegen -o test.py -b firefox
了解更多 >> Playwright Python 文档:测试生成器
检测与防检测
防检测
名称 | 描述 |
---|---|
设置参数,运行add_init_script | 删除一些特征。
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(
channel="chrome",
headless=False,
slow_mo=10,
# 防检测
args=['--disable-blink-features=AutomationControlled']
)
page = browser.new_page()
page.add_init_script("""
Object.defineProperties(navigator, {webdriver:{get: () => undefined}});
""")
page.goto("https://wwww.baidu.com")
page.wait_for_timeout(5*1000)
page.screenshot(path="截图.png")
browser.close()
|
资源
官网
- Playwright 官网:https://playwright.dev
- Playwright 源代码:https://github.com/microsoft/playwright
- Playwright Python版源代码:https://github.com/microsoft/playwright-python
- Playwright Python 文档:https://playwright.dev/python/docs/intro
- Playwright Python API:https://playwright.dev/python/docs/api/class-playwright