Feathr 是领英为简化机器学习 (ML) 功能管理和提高开发者生产力而构建的特征存储,并已在生产环境中使用多年,该项目于今年 4 月正式开源。
重要变化:
将派生功能的执行引擎改为 Spark SQL,所以对于没有运行最新样本 notebooks 的用户来说,这可能会带来一点破坏性的变化。具体而言,他们可能会面临下方的问题:
Preprocessed DataFrames are:
{'feature_user_age,feature_user_gift_card_balance,feature_user_has_valid_credit_card,feature_user_tax_rate': JavaObject id=o243}
Traceback (most recent call last):
File "feathr_pyspark_driver.py", line 107, in <module>
submit_spark_job(feature_names_funcs)
File "feathr_pyspark_driver.py", line 85, in submit_spark_job
py4j_feature_job.mainWithPreprocessedDataFrame(job_param_java_array, new_preprocessed_df_map)
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__
return_value = get_return_value(
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 117, in deco
pyspark.sql.utils.AnalysisException: Undefined function: 'toBoolean'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 84
)
用户应该修改下方的代码:
feature_user_purchasing_power = DerivedFeature(name="feature_user_purchasing_power",
key=user_id,
feature_type=FLOAT,
input_features=[
feature_user_gift_card_balance, feature_user_has_valid_credit_card],
transform="feature_user_gift_card_balance + if_else(toBoolean(feature_user_has_valid_credit_card), 100, 0)")
为这个:
feature_user_purchasing_power = DerivedFeature(name="feature_user_purchasing_power",
key=user_id,
feature_type=FLOAT,
input_features=[
feature_user_gift_card_balance, feature_user_has_valid_credit_card],
transform="feature_user_gift_card_balance + if(boolean(feature_user_has_valid_credit_card), 100, 0)")
其他变化:
- 修复特征类型错误 #701
- 修复 Purview+RBAC 注册表的网络应用问题 #700
- 删除文档中的硬编码资源 #696
- 添加 Purview 注册表和 RBAC 注册表的 e2e 测试 #689
- 改进 databricks 提交的错误信息 #710
- 改进 purview 注册表的错误信息 #709
- [WIP] 热修复 databricks es 的依赖性问题 #713
- Fix materialize to sql e2e test failure by @blrchen in #717
- 在 Feathr 中添加数据模型 #659
- 修复将特征定义转换为 HOCON 文件时的查找特征缺失问题
- 修复函数字符串解析问题 #725
- 删除未使用的证书和废弃的 purview 设置
- 撤销错误提交的 adb 令牌 #730
- 修复 synapse 错误不打印的问题 #734
- 修复 Spark 配置传递错误 #729
- 在派生特征转换中支持 SQL 表达式 #731
更多详情可查看:https://github.com/feathr-ai/feathr/releases/tag/v0.9.0