[python]웹 크롤링

웹 크롤링이란?

파이썬 웹 크롤링 용 라이브러리들

파이썬에서 웹 크롤링에 많이 쓰이는 유용한 라이브러리들이다. 3rd party 라이브러리들이므로 pip나 conda같은 명령어를 이용해 따로 설치 후 사용한다.

requests : 파이썬에서 http, https 웹 사이트에 요청을 하기 위해 만들어진 모듈

기본 이용법 ex)

  import requests
  response = requests.get("url주소값") # 해당 url주소에서 html파일을 get해서 response변수에 담아줌

selenium : 웹에서 동적으로 생성된 데이터를 요청하기 위해 만들어진 모듈

기본 이용법 ex)

  from selenium import webdriver
  from selenium.webdriver.common.keys import Keys
  import time
  driver = webdriver.Chrome("크롬드라이버 주소") # 크롬을 이용해 크롤링한다고 가정(브라우저별 드라이버가 필요하다)
  driver.get("url주소값")
  time.sleep(3) # 페이지 로딩이 완전히 되도록 일부러 3초 쉼
  search = driver.find_element_by_xpath("내가원하는동적요소찾아줌 ex. 검색input태그, 로그인input태그")
  search.send_keys("해당동적요소에 넣어줄 값")
  time.sleep(1) # 처리 1초동안 기다리기
  search.send_keys(Keys.ENTER) # 엔터로 입력값 넣기

beautifulsoup4 : 웹에서 받아온 html문서를 파싱하는 모듈( html에서 내가 원하는 요소 찾기 )

기본 이용법 ex)

  import requests
  from bs4 import BeautifulSoup as bs
  response = requests.get('https://www.google.co.kr')
  html = bs(response.text, 'html.parser') # 뷰티풀수프로 파싱해서 html변수에 저장
  a = html.find("타입", "해당값") # 원하는 태그나 요소 찾아서 a변수에 저장

Selenium

selenium 버전에 따라 명령어가 조금씩 다를 수 있다. 현재 나는 selenium 4.3.0 버전을 사용 중이다.

selenium 모듈 불러오기

  from selenium import webdriver 
  from selenium.webdriver import ActionChains
  from selenium.webdriver.common.keys import Keys
  from selenium.webdriver.common.by import By

브라우저에 맞는 드라이버 다운 후 불러오기

  # 먼저 인터넷에서 원하는 브라우저용 드라이버를 다운받아야 한다.
  # 나는 크롬을 사용하므로 크롬드라이버를 다운받았고
  # 다음과 같이 다운 받은 크롬드라이버 경로를 입력해 driver변수를 선언했다.
  driver = webdriver.Chrome("크롬드라이버 주소")

원하는 url주소로 이동
```
  driver.get("url주소값")
```
원하는 요소 찾기 : driver.find_element(By.”요소종류”, “요소값”)
- 요소종류
  - ㄴㅇ
  - ㅇㄹㄴ

[python]웹 크롤링

[python]웹 크롤링

웹 크롤링이란?

파이썬 웹 크롤링 용 라이브러리들

Selenium

results matching ""

No results matching ""