rome是一个很好用的解析rss的工具包,支持rote格式,最近在项目中使用了此包,共享给大家。在此只公开部分代码public class FeedReader {
public static void main(String[] args) { String source = "http://www.36kr.com/feed"; // source = "http://news.163.com/special/00011K6L/rss_newstop.xml"; // source = "http://hi.baidu.com/ybhanxiao/rss"; // source = "http://ybhanxiao.iteye.com/rss"; // source="http://www.luoyundeng.com/?feed=rss2"; // source = "http://yanglanvip.blog.sohu.com/rss"; // source="http://feed.williamlong.info/"; try { SyndFeed feed = readFeed(source); ReaderSource readerSource = new ReaderSource(); if (feed != null) { getArticles(feed, readerSource, null); } System.out.println(readerSource.getSourceName() + readerSource.getSourceDesc()); } catch (IllegalArgumentException e) { e.printStackTrace(); } catch (FeedException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (SecurityException e) { System.out.print("异常" + e); } } /** * 请求rss并返回syndFeed * * @param source * @return * @throws IllegalArgumentException * @throws FeedException * @throws IOException * @author echo * @date 2012-1-11 */ @SuppressWarnings("unchecked") public static SyndFeed readFeed(String source) throws IllegalArgumentException, FeedException, IOException { // SyndFeedInput:从远程读到xml结构的内容转成SyndFeedImpl实例 SyndFeedInput input = new SyndFeedInput(); // Locale.setDefault(Locale.ENGLISH); SyndFeed feed = null; if (source.startsWith("http")) { URLConnection feedUrl = new URL(source).openConnection(); /** *解决此异常: java.io.IOException: Server returned HTTP response code: * 403 for URL 但是自己却可以用浏览器访问,发现可能是服务器对java程序屏蔽了。 * 因为服务器的安全设置不接受Java程序作为客户端访问,解决方案是设置客户端的User Agent */ feedUrl.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)"); feedUrl.setConnectTimeout(5000); feed = input.build(new XmlReader(feedUrl)); } else { File feedUrl = new File(source); feed = input.build(new XmlReader(feedUrl)); } return feed; }}
如果解析某些站点时候日期出现为空时可以再src下加入rome.properties文件,并将datetime.extra.masks的指改为如下:
yyyy-MM-dd HH:mm:ss|yyyy-MM-dd HH:mm|yyyy-MM-dd?