Thursday, July 2, 2015

[PYTHON] JKT NEWS (updated)

2 July 2015



My first Python script
Ngambil berita terakhir dari web official jkt
import urllib.request
import re
url = 'http://jkt48.com/news/list?lang=id'
req = urllib.request.Request(url)
resp = urllib.request.urlopen(req)
respData = resp.read()
paragraphs = re.findall('<div class="contentpink">(.*?)</div>',str(respData))
for eachP in paragraphs:
berita = re.findall('<h2>(.*?)</h2>',str(eachP))
for eachH in berita:
judul = re.findall('">(.*?)</a>',str(eachH))
print (judul[0])
view raw jkt_news.py hosted with ❤ by GitHub


Menggunakan BeautifulSoup

from bs4 import BeautifulSoup
import urllib.request
#import url
url = 'http://jkt48.com/news/list?lang=id'
#init request
req = urllib.request.Request(url)
resp = urllib.request.urlopen(req)
respData = resp.read()
#new class soup
soup = BeautifulSoup(respData, 'html.parser')
#find div class=contentpink
contentpink = soup.find_all('div', 'contentpink')
#set parameter 10 latest news
i = 0
#find every post news
for posting in contentpink:
#10 latest news
if i<10:
#find title
title = posting.h2
#find date posted
posted = posting.find_all('div', 'metadata')
#print
print (title.string, '-', posted[0].string)
# i+1 for parameter
i += 1
view raw bs_jkt.py hosted with ❤ by GitHub
Refferensi
-https://www.daniweb.com/software-development/python/threads/213221/urllib-in-python-3-1
-http://pythonprogramming.net/parse-website-using-regular-expressions-urllib/
-http://www.crummy.com/software/BeautifulSoup/bs4/doc/

No comments:

Post a Comment