拇指快播,追爱跨世纪,谭文颖
如下所示:
#coding=utf-8 import sys, re, os def getdictlist(dict): regx = '''[\w\~`\!\@\#\$\%\^\&\*\(\)\_\-\+\=\[\]\{\}\:\;\,\.\/\<\>\?]+''' with open(dict) as f: data = f.read() return re.findall(regx, data) def rmdp(dictlist): return list(set(dictlist)) def filesave(dictrmdp, out): with open(out, 'a') as f: for line in dictrmdp: f.write(line + '\n') def main(): try: dict = sys.argv[1].strip() out = sys.argv[2].strip() except exception, e: print 'error:', e me = os.path.basename(__file__) print 'usage: %s <input> <output>' %me print 'example: %s dict.txt dict_rmdp.txt' %me exit() dictlist = getdictlist(dict) dictrmdp = rmdp(dictlist) filesave(dictrmdp, out) if __name__ == '__main__': main()
以上这篇python 高效去重复 支持gb级别大文件的示例代码就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持移动技术网。
如对本文有疑问,请在下面进行留言讨论,广大热心网友会与你互动!! 点击进行留言回复
Python 实现将numpy中的nan和inf,nan替换成对应的均值
python爬虫把url链接编码成gbk2312格式过程解析
网友评论