Web Crawling by BeautifulSoup (1)

BeautifulSoup란?

HTML과 XML을 분석해주는 라이브러리.


find_all() 메서드로 <a> 태그 추출하기

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from bs4 import BeautifulSoup

html = """
<html><body>
<ul>
<li><a href="http://www.naver.com">naver</a></li>
</ul>
</body></html>
"""

soup = BeautifulSoup(html, 'html.parser')

links = soup.find_all("a")

for a in links:
href = a.attrs['href']
text = a.string
print(text, ">", href)