当前位置：移动技术网 > IT编程>开发语言>其他编程 > VBA处理数据与Python Pandas处理数据案例比较分析

VBA处理数据与Python Pandas处理数据案例比较分析

2020年06月23日 | 移动技术网IT编程 | 我要评论

需求：

现有一个 csv文件，包含'cnum'和'company'两列，数据里包含空行，且有内容重复的行数据。

要求：

1）去掉空行；
2）重复行数据只保留一行有效数据；
3）修改'company'列的名称为'company_new‘；
4）并在其后增加六列，分别为'c_col',‘d_col',‘e_col',‘f_col',‘g_col',‘h_col'。

在这里插入图片描述

一，使用 python pandas来处理：

import pandas as pd
import numpy as np
from pandas import dataframe,series

def deal_with_data(filepath,newpath):
  file_obj=open(filepath)
  df=pd.read_csv(file_obj)  # 读取csv文件，创建 dataframe
  df=df.reindex(columns=['cnum','company','c_col','d_col','e_col','f_col','g_col','h_col'],fill_value=none)  # 重新指定列索引
  df.rename(columns={'company':'company_new'}, inplace = true) # 修改列名
  df=df.dropna(axis=0,how='all')         # 去除 nan 即文件中的空行
  df['cnum'] = df['cnum'].astype('int32')    # 将 cnum 列的数据类型指定为 int32
  df = df.drop_duplicates(subset=['cnum', 'company_new'], keep='first') # 去除重复行
  df.to_csv(newpath,index=false,encoding='gbk')
  file_obj.close()
  
if __name__=='__main__':
  file_path=r'c:\users\12078\desktop\python\cnum_company.csv'
  file_save_path=r'c:\users\12078\desktop\python\cnum_company_output.csv'
  deal_with_data(file_path,file_save_path)

二，使用 vba来处理：

option base 1
option explicit

sub main()
 on error goto error_handling
 dim wb         as workbook
 dim wb_out       as workbook
 dim sht         as worksheet
 dim sht_out       as worksheet
 dim rng         as range
 dim usedrows      as byte
 dim usedrows_out    as byte
 dim dict_cnum_company  as object
 dim str_file_path    as string
    dim str_new_file_path  as string
    'assign values to variables:
    str_file_path = "c:\users\12078\desktop\python\cnum_company.csv"
    str_new_file_path = "c:\users\12078\desktop\python\cnum_company_output.csv"
 
 set wb = checkandattachworkbook(str_file_path)
 set sht = wb.worksheets("cnum_company")
 set wb_out = workbooks.add
 wb_out.saveas str_new_file_path, xlcsv 'create a csv file
 set sht_out = wb_out.worksheets("cnum_company_output")

 set dict_cnum_company = createobject("scripting.dictionary")
 usedrows = worksheetfunction.max(getlastvalidrow(sht, "a"), getlastvalidrow(sht, "b"))

 'rename the header 'company' to 'company_new',remove blank & duplicate lines/rows.
 dim cnum_company as string
 cnum_company = ""
 for each rng in sht.range("a1", "a" & usedrows)
   if vba.trim(rng.offset(0, 1).value) = "company" then
     rng.offset(0, 1).value = "company_new"
   end if
   cnum_company = rng.value & "-" & rng.offset(0, 1).value
   if vba.trim(cnum_company) <> "-" and not dict_cnum_company.exists(rng.value & "-" & rng.offset(0, 1).value) then
     dict_cnum_company.add rng.value & "-" & rng.offset(0, 1).value, ""
   end if
 next rng
 
 'loop the keys of dict split the keyes by '-' into cnum array and company array.
 dim index_dict as byte
 dim arr_cnum()
 dim arr_company()
 for index_dict = 0 to ubound(dict_cnum_company.keys)
   redim preserve arr_cnum(1 to ubound(dict_cnum_company.keys) + 1)
   redim preserve arr_company(1 to ubound(dict_cnum_company.keys) + 1)
   arr_cnum(index_dict + 1) = split(dict_cnum_company.keys()(index_dict), "-")(0)
   arr_company(index_dict + 1) = split(dict_cnum_company.keys()(index_dict), "-")(1)
   debug.print index_dict
 next

 'assigns the value of the arrays to the celles.
 sht_out.range("a1", "a" & ubound(arr_cnum)) = application.worksheetfunction.transpose(arr_cnum)
 sht_out.range("b1", "b" & ubound(arr_company)) = application.worksheetfunction.transpose(arr_company)

 'add 6 columns to output csv file:
 dim arr_columns() as variant
 arr_columns = array("c_col", "d_col", "e_col", "f_col", "g_col", "h_col")  '
 sht_out.range("c1:h1") = arr_columns
 call checkandcloseworkbook(str_file_path, false)
 call checkandcloseworkbook(str_new_file_path, true)

exit sub
error_handling:
  call checkandcloseworkbook(str_file_path, false)
  call checkandcloseworkbook(str_new_file_path, false)
end sub

' 辅助函数：
'get last row of column n in a worksheet
function getlastvalidrow(in_ws as worksheet, in_col as string)
  getlastvalidrow = in_ws.cells(in_ws.rows.count, in_col).end(xlup).row
end function

function checkandattachworkbook(in_wb_path as string) as workbook
  dim wb as workbook
  dim mywb as string
  mywb = in_wb_path
  
  for each wb in workbooks
    if lcase(wb.fullname) = lcase(mywb) then
      set checkandattachworkbook = wb
      exit function
    end if
  next
  
  set wb = workbooks.open(in_wb_path, updatelinks:=0)
  set checkandattachworkbook = wb

end function
 
function checkandcloseworkbook(in_wb_path as string, in_saved as boolean)
  dim wb as workbook
  dim mywb as string
  mywb = in_wb_path
  for each wb in workbooks
    if lcase(wb.fullname) = lcase(mywb) then
      wb.close savechanges:=in_saved
      exit function
    end if
  next
end function

三，输出结果：

在这里插入图片描述

两种方法输出结果相同：

四，比较总结：

python pandas 内置了大量处理数据的方法，我们不需要重复造轮子，用起来很方便，代码简洁的多。
excel vba 处理这个需求，使用了数组，字典等数据结构（实际需求中，数据量往往很大，所以一些地方没有直接使用遍历单元格的方法），以及处理字符串，数组和字典的很多方法，对文件的操作也很复杂，一旦出错，调试起来比python也较困难，代码已经尽量优化，但还是远比 python要多。

到此这篇关于vba处理数据与python pandas处理数据案例比较分析的文章就介绍到这了,更多相关vba与python pandas处理数据内容请搜索移动技术网以前的文章或继续浏览下面的相关文章希望大家以后多多支持移动技术网！

您可能感兴趣的文章:

如对本文有疑问，点击进行留言回复！！

Linux/Ubuntu Git从安装到使用的方法步骤

说明：该篇博客是博主一字一码编写的，实属不易，请尊重原创，谢谢大家！一.叙述1.说明：需要在windows 安装git的同学，可以查看该篇博客相对windows... [阅读全文]
搭建websocket消息推送服务,必须要考虑的几个问题

近年，不论是正在快速增长的直播，远程教育以及im聊天场景，还是在常规企业级系统中用到的系统提醒，对websocket的需求越来越大，对websocket的要求也... [阅读全文]
vscode使用editorconfig插件以及.editorconfig配置文件说明详解

简介editorconfig和prettier一样，都是用来配置格式化你的代码的，这个格式化代码，要和你lint配置相符！否则会出现你格式化代码以后，却不能通过... [阅读全文]
VsCode的jsconfig配置文件说明详解

目录中出现 jsconfig.json 文件表明该目录是 javascript 项目的根目录。 json 文件指定了根文件和 javascript 语言服务提供... [阅读全文]
VSCode设置网页代码实时预览的实现

一、设置描述 1.vscode作为一款很不错的开发软件，相比dw更小巧，用来测试前端特别不错，那么我们平时开发网页发现只有写完代码，然后保存，接下来到浏览器中... [阅读全文]
详解VSCode打开多个项目文件夹的解决方法

最近从sublime转vscode，自然而然就会把sublime的一些习惯带过来，其中有一点让人头疼的是：当把一个文件夹拖进vscode里面的时候，会把原来的文... [阅读全文]
vscode 前端最佳配置小结

vscode最佳配置添加如何设置屏幕阅读器优化添加对 flutter（dart）的语法支持，配置在最底部最近一次更新时间： 2020.03.22 （... [阅读全文]
在VScode中创建你的代码模板的方法

使用vscode的用户代码片段功能，来生成自己习惯的代码模板，提升开发效率1.选择菜单里的文件 > 首选项 > 用户代码片段2.选择你需要自定义模... [阅读全文]
VSCODE添加open with code实现右键打开文件夹

问题描述由于之前在安装vscode的时候，没注意详细阅读提示，而且第一次安装比较随意，只是带着想试一下vscode才安装的，所以安装的时候漏了勾选open wi... [阅读全文]
Azkaban3.81.x部署过程及遇到的坑

azkaban是什么？azkaban是由linkedin公司推出的一个批量工作流任务调度器，主要用于在一个工作流内以一个特定的顺序运行一组工作和流程，它的配置是... [阅读全文]

网友评论


验证码：

VBA处理数据与Python Pandas处理数据案例比较分析

2020年06月23日 | 移动技术网IT编程 | 我要评论

您可能感兴趣的文章:

相关文章:

网友评论