Version:

Profiles 0.2.x Changelog

Changelog for Profiles v0.2.x.

Version 0.2.2

12 November 2022

Our November release is significant as it has several fixes and improvements for an enhanced experience. Check it out and be sure to let us know your feedback.

What’s New

  • ID Stitcher / Feature Table - You can now define a view as source, in addition to table, in the inputs file. This is particularly of use when you need to support an sql query that’s complex or out of scope for PB. To use it, in your inputs file define the edge_source as view: <view_name> instead of table: <table_name>.
  • Inputvars - A new identifier which adds temporary helper columns to an input table, for use in calculating a featuretable.
  • Window Functions - In your model file, you can now add window function support to features, tablevars, tablefeatures and inputvars. Also, you can add filters to features.

Improvements

  • Schema version 9 makes it more streamlined to define the model. We welcome your feedback for further improvements on this.
  • Compile command now show errors if the input SQL is buggy.
  • Discover - subcommands entities and features now show a few more fields.
  • Discover - Export to CSV works for subcommands and also generates files in the output folder.
  • Init pb-project - Based on feedback, it now generates a README file and also has simpler YAML files with comments. It should now be easier for our users to create a model and get it running.
  • Several internal refactorings on how the application works.
  • Web app - Massive improvements under the hood related to UI elements, preserving state when entering data, showing correct data and validations, and displaying run time in user’s local time zone.

Bug Fixes

  • Fixed the issue where every time pb run was executed for a feature table, it was adding a new row to the output of pb discover features.
  • Resolved the bug where error wasn’t shown if an unknown flag was used.
  • There was an issue generating material tables on a new schema, which has now been resolved.
  • Bug fix on generating empty SQL files from input models.
  • Fixed bug where model names with _ in the name would sometimes fail to update the latest view pointer correctly.
  • Web app - Artifacts list now shows different folders for different runs to isolate them.
  • Web app - When the PB project is running, the screen now shows correct start timestamp.
  • Web app - Date filters to find PB runs are now working.
  • Web app - Scheduling UI is now fully responsive about when the run will take place.
  • Web app - Resolved the issue where a project would run only once and was then showing error.

Known Issues:

  • Warning: While the run command is being executed, canceling it by pressing Ctrl+C doesn’t work as expected. Though it will stop the program’s execution on the CLI, the query will keep running on the data warehouse. This is a documented Snowflake behavior.
  • In a model, an input can’t use columns named “MAIN_ID”, “OTHER_ID”, “OTHER_ID_TYPE”, or “VALID_AT” in its ID SQL.
  • When creating a profile via init command, pressing the Ctrl+C command doesn’t exit the application.
  • Logger file generation is disabled at the moment.
  • Some no-op parameters are shown upon passing the help flag(-h) to validate access command.

Version 0.2.0

5 October 2022

The September release is our largest update yet. We have added a lot of quality of life improvements and net new features to the PB product line. We plan on releasing even more features in our mid-October release to further improve the usability of the product as well as add additional features that will further help form the core of the product. A substantial amount of the features in this release were based directly off feedback from the first beta testing with external users and internal stakeholders. Please feel free to walk through our newest release. We welcome and encourage all constructive feedback on the product.

What’s New

  • Feature Table - After encouraging feedback from beta testing of the ID Stitcher, we are feeling more confident about sharing our C360 feature table functionality with beta customers. During testing of this release, we benchmarked ourselves against the feature set that our E-Commerce ML models expect. Many features were implemented successfully. Some needed functionality which could not be pushed through QA gates in this September release. Nevertheless, the feature table YAML is now ready for internal customers to explore.
  • Web App - We are now ready to share the scheduling functionality within the web app. This will allow the user to schedule, and automatically run PB models from the Rudder backplane. Any artifacts and log files created during the execution of PB projects are also available for the user to explore. This critical functionality will enable users to debug their cloud PB runs.
  • Validate - A new command, pb validate allows users to run various tests on the project related configurations and validate the privileges associated with the role used for running the project. For example, the subcommand pb validate access does an exhaustive test of required privileges before accessing the warehouse.
  • Version - This is another new command that provides information on the current version of the app.
  • Logger - When you execute the compile and run commands, all errors and success messages that were previously only displayed on screen, are now also logged in a file inside the project output folder.
  • Discover - You can now export the output of the discover command in a CSV file. The ability to discover across all schemas in one’s warehouse is also added.

Improvements

  • We have made many changes to the way ID Stitcher config is written. We are forming a more complete opinion on the semantic model representation for customer’s data. Entities, IDs, and ID types are now defined in the PB project file. The model file syntax is also more organized and easier to write. To see examples of the new syntax check out the section on Identity stitching or sample files by executing command pb init pb-project. The sample project file also contains include and exclude filters, to illustrate their usage.
  • In PB command invocation, whenever a file is written, its location is now shown on the console and in log files.
  • Many enhancements on how errors are handled inside the application.
  • Massive improvements under the hood.

Bug Fixes

  • Fixed the issue in ID stitching where it was not picking up singleton components (i.e. the ones with only 1 edge), due to which they were getting skipped in the final output table.
  • In the init command, not entering any value for target wasn’t setting it to default value as “dev”.
  • Pressing Ctrl+C wasn’t exiting the application.
  • The command init profile now appends to an existing profile, instead of overwriting it.
  • Fixed the issue in discover command where the material table name was being displayed instead of the model name.

Known Issues:

  • Warning: While the run command is being executed, canceling it by pressing Ctrl+C doesn’t work as expected. Though it will stop the program’s execution on the CLI, the query will keep running on the data warehouse. This is a documented Snowflake behavior.
  • In a model, an input can’t use columns named “MAIN_ID”, “OTHER_ID”, “OTHER_ID_TYPE”, or “VALID_AT” in its ID SQL.
  • The web app is not showing a description and last run on the landing page.
  • In the web app, date filters to find PB runs aren’t working.
  • In the web app, when the PB project is running, the screen shows an incorrect start timestamp.
  • Artifacts list changes when a project is running versus when it completes execution. Since all runs on the same Kubernetes pod share the same project folder, we are creating artifacts of different runs under the same parent folder. So, the same folder is currently shown for different runs of the project. In the next release, we will configure different folders for different runs to isolate them.
  • In case of feature table models, the compile command doesn’t always show error if the input SQL is buggy. Thise error may still be found when the model is run.
  • When creating a profile via init command, pressing the Ctrl+C command doesn’t exit the application.
  • Creating a PB Project doesn’t currently include a sample independent ID stitcher. Instead, it is a child model to the generated feature table model.
  • We are working toward better readability of the logger file. We welcome any feedback here.
  • The command pb discover features needs to show a few more fields.
  • Every time pb run is executed for a feature table, it adds a new row to the output of pb discover features. Only one row should appear for each feature.
  • Export to CSV for the discover command should work for subcommands and also generate files in an output folder.
  • Some no-op parameters are shown upon passing the help flag(-h) to validate access command.
  • In some cases, error isn’t shown if an unknown flag is used.
  • Scheduling UI isn’t sometimes fully responsive about when the run will take place.
info
The documentation for September release does not completely match with the current release. We are currently working on updating the documentation and will have new versions out soon. Please contact the Data Apps team if you are confused by some deviation.

Questions? Contact us by email or on Slack