当前位置：移动技术网 > IT编程>开发语言>Asp > Asp 使用 Microsoft.XMLHTTP 抓取网页内容并过滤需要的

Asp 使用 Microsoft.XMLHTTP 抓取网页内容并过滤需要的

2017年12月08日 | 移动技术网IT编程 | 我要评论

asp 使用 microsoft.xmlhttp 抓取网页内容(没用乱码)，并过滤需要的内容

示例源码：

 
<% 
dim xmlurl,http,strhtml,strbody 
xmlurl = request.querystring("u") 

rem 异步读取xml源 
set http = server.createobject("microsoft.xmlhttp") 
http.open "post",xmlurl,false 
http.setrequestheader "user-agent", "mozilla/4.0" 
http.setrequestheader "connection", "keep-alive" 
http.setrequestheader "content-type", "application/x-www-form-urlencoded" 
http.send() 

strhtml = bytestobstr(http.responsebody) 
set http = nothing 

rem 抓取主要内容 
strbody = getbody(strhtml,"<div id=""div_newscontentc"" class=""cnt"">","</div>",0,0) 
strbody =replace(strbody,"（本文首发于","") 
strbody =replace(strbody,"财富动力网</a>，转载请注明出处。）","") 
strbody =replace(strbody,"本文首发于，转载请注明出处。）","") 
strbody =replace(strbody,"财富动力网</a>:http://www.927953.com","") 
strbody =replace(strbody,"本文首发于","") 

response.write regremovehref(strbody) 

rem 获取对应网址响应的html 
function bytestobstr(body) 
dim objstream 
set objstream = server.createobject("adodb.stream") 
objstream.type = 1 
objstream.mode =3 
objstream.open 
objstream.write body 
objstream.position = 0 
objstream.type = 2 
objstream.charset = "utf-8" 

'转换原来默认的utf-8编码转换成gb2312编码，否则直接用 
'xmlhttp调用有中文字符的网页得到的将是乱码 
bytestobstr = objstream.readtext 
objstream.close 
set objstream = nothing 
end function 


rem 使用正则表达式，抓取之内标记的内容 
function getbody(constr,startstr,overstr,inclul,inclur) 
if constr="$false$" or constr="" or isnull(constr)=true or startstr="" or isnull(startstr)=true or overstr="" or isnull(overstr)=true then 
getbody="$false$" 
exit function 
end if 
dim constrtemp 
dim start,over 
constrtemp=lcase(constr) 
startstr=lcase(startstr) 
overstr=lcase(overstr) 
start = instrb(1, constrtemp, startstr, vbbinarycompare) 
if start<=0 then 
getbody="$false$" 
exit function 
else 
if inclul=false then 
start=start+lenb(startstr) 
end if 
end if 
over=instrb(start,constrtemp,overstr,vbbinarycompare) 
if over<=0 or over<=start then 
getbody="$false$" 
exit function 
else 
if inclur=true then 
over=over+lenb(overstr) 
end if 
end if 
getbody=midb(constr,start,over-start) 
end function 

rem 过滤a超链接 
function regremovehref(htmlstr) 
set ra = new regexp 
ra.ignorecase = true 
ra.global = true 
ra.pattern = "<a[^>]+>(.+?)<\/a>" 

regremovehref = replace(ra.replace(htmlstr,"$1"),"href=""http://www.927953.com""","") 
end function 
%> 

效果图如下：

您可能感兴趣的文章:

如对本文有疑问，点击进行留言回复！！

ASP中常用的22个FSO文件操作函数整理

在asp中，fso的意思是file system object，即文件系统对象。我们将要操纵的计算机文件系统，在这里是指位于web服务器之上。所... [阅读全文]
asp中Request.ServerVariables的参数集合

asp(vb)获取计算机名： <%set owsh = server.createobject("wscript.network")%&... [阅读全文]
ASP是使用正则提取内容里所有图片路径SRC的实现代码

函数 function regimg(thestr) dim&... [阅读全文]
javascript css实现三级目录(简单的)

是在原先的二级目录改的,先给出演示这里是css /*bg macji(http://www.macji.c... [阅读全文]
Asp.Net MVC记住用户登录信息下次直接登录功能

有的时候做网站，就需要记住用户登录信息，下次再登录网站时，不用重复输入用户名和密码，原理是浏览器的cookie把状态给记住了！那... [阅读全文]
一次性下载远程页面上的所有内容第1/2页

一次性下载远程页面上的所有内容使用方法,将上面的代码保存为一个比如:downfile.asp在浏览器上输入:http://你的地址/... [阅读全文]
asp 多字段模糊搜索的函数

比较简单直接的sql语句 recordset1.source = "select * from 表 where 字段 li... [阅读全文]
一个ACCESS数据库访问的类第1/3页

大部分asp应用，都离不开对数据库的访问及操作，所以，对于数据库部分的访问操作，我们应该单独抽象出来，封装成一个单独的类。如果所用语... [阅读全文]
pjblog2的参数第1/2页

<% '=====================================================... [阅读全文]
本人常用的asp代码原创

我把平时所用的东西，备份一下，经常更新1、循环读取form的值for each items in&nb... [阅读全文]

网友评论


验证码：

Asp 使用 Microsoft.XMLHTTP 抓取网页内容并过滤需要的

2017年12月08日 | 移动技术网IT编程 | 我要评论

您可能感兴趣的文章:

相关文章:

网友评论