rome是一个很好用的解析rss的工具包,支持rote格式,最近在项目中使用了此包,共享给大家。在此只公开部分代码public class FeedReader {
public static void main(String[] args) {
String source = "http://www.36kr.com/feed";
// source = "http://news.163.com/special/00011K6L/rss_newstop.xml";
// source = "http://hi.baidu.com/ybhanxiao/rss";
// source = "http://ybhanxiao.iteye.com/rss";
// source="http://www.luoyundeng.com/?feed=rss2";
// source = "http://yanglanvip.blog.sohu.com/rss";
// source="http://feed.williamlong.info/";
try {
SyndFeed feed = readFeed(source);
ReaderSource readerSource = new ReaderSource();
if (feed != null) {
getArticles(feed, readerSource, null);
}
System.out.println(readerSource.getSourceName() + readerSource.getSourceDesc());
} catch (IllegalArgumentException e) {
e.printStackTrace();
} catch (FeedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (SecurityException e) {
System.out.print("异常" + e);
}
}
/**
* 请求rss并返回syndFeed
*
* @param source
* @return
* @throws IllegalArgumentException
* @throws FeedException
* @throws IOException
* @author echo
* @date 2012-1-11
*/
@SuppressWarnings("unchecked")
public static SyndFeed readFeed(String source) throws IllegalArgumentException, FeedException, IOException {
// SyndFeedInput:从远程读到xml结构的内容转成SyndFeedImpl实例
SyndFeedInput input = new SyndFeedInput();
// Locale.setDefault(Locale.ENGLISH);
SyndFeed feed = null;
if (source.startsWith("http")) {
URLConnection feedUrl = new URL(source).openConnection();
/**
*解决此异常: java.io.IOException: Server returned HTTP response code:
* 403 for URL 但是自己却可以用浏览器访问,发现可能是服务器对java程序屏蔽了。
* 因为服务器的安全设置不接受Java程序作为客户端访问,解决方案是设置客户端的User Agent
*/
feedUrl.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");
feedUrl.setConnectTimeout(5000);
feed = input.build(new XmlReader(feedUrl));
} else {
File feedUrl = new File(source);
feed = input.build(new XmlReader(feedUrl));
}
return feed;
}}
如果解析某些站点时候日期出现为空时可以再src下加入rome.properties文件,并将datetime.extra.masks的指改为如下:
yyyy-MM-dd HH:mm:ss|yyyy-MM-dd HH:mm|yyyy-MM-dd?