首先提问:
使用python搜取https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm的内容
得到代码如下,能直接使用:
import requests
from bs4 import BeautifulSoup# 目标网页URL
url = 'https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm'# 使用requests获取网页内容
response = requests.get(url)# 检查请求是否成功
if response.status_code == 200:# 使用BeautifulSoup解析网页内容soup = BeautifulSoup(response.text, 'html.parser')# 查找您感兴趣的内容(例如,标题、段落等)title = soup.find('title').get_text()paragraphs = soup.find_all('p')# 打印内容print(title)for p in paragraphs:print(p.get_text())
else:print(f'Failed to retrieve the webpage. Status code: {response.status_code}')
不过会有乱码,告诉智谱清言需要使用utf-8编码,于是重新生成了代码:
import requests
from bs4 import BeautifulSoup# 目标网页URL
url = 'https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm'# 使用requests获取网页内容
response = requests.get(url)# 检查请求是否成功
if response.status_code == 200:# 检查响应的encoding属性encoding = response.encodingif encoding != 'utf-8':# 如果encoding不是UTF-8,使用正确的编码解析网页内容response.encoding = 'gbk' # 假设网页使用GBK编码# 使用BeautifulSoup解析网页内容soup = BeautifulSoup(response.text, 'html.parser')# 查找您感兴趣的内容(例如,标题、段落等)title = soup.find('title').get_text()paragraphs = soup.find_all('p')# 打印内容print(title)for p in paragraphs:print(p.get_text())
else:print(f'Failed to retrieve the webpage. Status code: {response.status_code}')
这下就完全搞定了!!!
我们需要存储在txt文件中,于是提问:
得到代码如下:
import requests
from bs4 import BeautifulSoup# 目标网页URL
url = 'https://xxgk.ecupl.edu.cn/2024/0509/c1334a213900/page.htm'# 使用requests获取网页内容
response = requests.get(url)# 检查请求是否成功
if response.status_code == 200:# 检查响应的encoding属性encoding = response.encodingif encoding != 'utf-8':# 如果encoding不是UTF-8,使用正确的编码解析网页内容response.encoding = 'gbk' # 假设网页使用GBK编码# 使用BeautifulSoup解析网页内容soup = BeautifulSoup(response.text, 'html.parser')# 查找您感兴趣的内容(例如,标题、段落等)title = soup.find('title').get_text()paragraphs = soup.find_all('p')# 准备写入的文本内容text_content = f"{title}\n\n"for p in paragraphs:text_content += p.get_text() + '\n'# 存储为.txt文件file_path = 'output.txt'with open(file_path, 'w', encoding='utf-8') as file:file.write(text_content)print(f'Content saved to {file_path}.')
else:print(f'Failed to retrieve the webpage. Status code: {response.status_code}')
得到结果:
告诉大模型,你的代码满足我的要求,请详细分析,哪儿不懂点哪里!!!
比如: