Python: module DataPreProcess

DataPreProcess

index
d:\platform\platform\backend\datapreprocess.py

数据预处理类

Modules

PathSetting
Tools
numpy
pandas
pdb
sys

Classes



builtins.object

DataPreProcess

class DataPreProcess(builtins.object)

    Methods defined here:

__init__(self)
Initialize self.  See help(type(self)) for accurate signature.

cal_stand_time(self, dfin)
# 计算静置时间 # 将每次行车或充电的前后静置时间，赋值给stand_time 列，单位为分钟 ----------------输入参数--------- dfin: 调用data_split_by_status()后输出的bms数据 ----------------输出参数---------- 在输入数据后面，增加stand_time列 stand_time : 在行车段或充电段的起止两个位置处，表明开始前和结束后的静置时长，单位为分钟

combine_drive_stand(self, dfin)
合并放电和静置段：将两次充电之间的所有数据段合并为一段，状态分为 charge 和not charge ---------------输入---------- dfin：调用data_split_by_status()后输出的bms数据 ---------------输出---------- 在输入数据后面，增加data_split_by_status_after_combine， data_status_after_combine 两列 data_split_by_status_after_combine：将两次充电间的数据合并后的段序号 data_status_after_combine：每段数据的状态标识

data_gps_judge_after_combine(self, df_bms, df_gps, time_diff_thre=600, odo_sum_thre=200, drive_spd_thre=80, parking_spd_thre=2)
GPS数据可靠性判断函数2 (基于combine后的分段) 判别方式同data_gps_judge

data_split_by_status(self, dfin, drive_interval_threshold=120, charge_interval_threshold=300, drive_stand_threshold=120, charge_stand_threshold=300)
# 数据预处理分段, 将原始数据段分为 charge、drive、stand、none段 # 状态判断 # 1、drive：(状态为2或3 且存在电流>0 ) 或 (电流持续为0 且持续时间<阈值且上一段数据为行车) # 2、charge：(状态为2或3 且不存在电流>0 ) 或 (电流持续为0 且持续时间<阈值且上一段数据为充电) # 3、stand：(电流持续为0 且是数据段的第一段) 或 (电流持续为0 且持续时间>阈值) # 4、none：其他 --------------输入参数-------------： drive_interval_threshold: 行车段拼接阈值，如果两段行车的间隔时间小于该值，则两段行车合并。 charge_interval_threshold: 充电段拼接阈值，如果两段充电的间隔时间小于该值，则两段充电合并。 drive_stand_threshold: 静置段合并至行车段阈值，如果静置时间小于该值，则合并到上一段的行车中。 charge_stand_threshold: 静置段合并至充电段阈值，如果静置时间小于该值，则合并到上一段的充电中。 --------------输出-----------------: 在原始数据后面，增加data_split_by_crnt， data_split_by_status, data_status 三列 data_split_by_crnt: 按电流分段的序号 data_split_by_status：按电流和状态分段的序号 data_status：状态标识

data_split_by_time(self, dfin, default_time_threshold=300, drive_time_threshold=300, charge_time_threshold=300, stand_time_threshold=1800)
# 该函数用来解决数据丢失问题导致的分段序号异常， # 将经过data_split_by_status分段后的数据，每个段内两行数据的时间跳变如果超过阈值，则继续分为两段 --------------输入参数-------------： dfin:  调用data_split_by_status之后的函数 default_time_threshold: 默认时间阈值，如果状态内部时间跳变大于该值，则划分为两段 drive_time_threshold: 行车时间阈值，如果行车状态内部时间跳变大于该值，则划分为两段 charge_time_threshold: 充电时间阈值，如果充电状态内部时间跳变大于该值，则划分为两段 stand_time_threshold：静置时间阈值，如果静置状态内部时间跳变大于该值，则划分为两段 --------------输出-----------------: 在输入数据后面，增加data_split_by_status_time 一列 data_split_by_status_time: 按照状态和时间分段后的序号

gps_data_judge(self, df_bms, df_gps, time_diff_thre=300, odo_sum_thre=200, drive_spd_thre=80, parking_spd_thre=2)
GPS数据可靠性判断函数(基于combine前的分段) GPS数据出现以下情况时，判定为不可靠： 1）如果该段对应的地理位置数据少于2 个，则认为不可靠 2）如果截取的GPS数据的起止时间，与BMS数据段的起止时间相差超过阈值，则认为不可靠 3）如果行车段累积里程超过阈值，车速超过阈值 4) 如果非行车段车速超过阈值 --------------输入参数--------------： time_diff_thre：时间差阈值 odo_sum_thre: 累积里程阈值 drive_spd_thre: 行车车速阈值 parking_spd_thre：非行车状态车速阈值 --------------输出参数--------------: df_bms 增加一列gps_rely，表明对应的GPS数据是否可靠。         1：可靠         <0: 表示不可靠的原因 df_gps 增加两列odo， speed，分别表示前后两点间的距离和速度

time_filter(self, df_bms, df_gps)
#     '''

Data descriptors defined here:

__dict__

dictionary for instance variables (if defined)

__weakref__

list of weak references to the object (if defined)

Data

CONF_PATH = r'D:\Platform\platform\CONFIGURE\'
defpath = r'.;C:\bin'

Author

lmstack

Data
		CONF_PATH = r'D:\Platform\platform\CONFIGURE\' defpath = r'.;C:\bin'