Smart Feature Sampling and Performance Profiling
Smart Feature Sampling
For large datasets, RIME can be optimized by running on a subset of features. This works by specifying the number of features to test on. RIME automatically calculates feature importances in both the with model case and without model case (as long as labels or predictions are provided), and runs its suite of tests across all of them.
To set up smart feature sampling, specify the num_feats_to_profile option in DataProfilingInfo.
The maximum number of features that RIME can be run on is 750.
More information can be found in DataProfilingInfo Configuration.
Performance Profiling
RIME’s analysis of datasets augments your data with additional profiles and relationships. The following tables show RIME’s limits in reference to data in CSV format.
Each feature/ row pair represents the total recommended number of rows to run RIME on, assuming a memory ceiling. Memory scales roughly linearly, so calculating the maximum advisable rows to run on for your features can be accomplished by taking a ratio with the closest feature count to that in the table.
For ranking use cases, recommended row values should be roughly 80% of the values provided below.
If your data is in parquet format, RIME supports 20% more rows than recommended below.
8GB Memory Ceiling Recommendations
| Feature Count | Row Guidelines for Standard Production Resources | 
|---|---|
| 25 | 13,000,000 | 
| 50 | 6,400,000 | 
| 75 | 4,200,000 | 
| 100 | 3,000,000 | 
| 200 | 1,600,000 | 
| 300 | 1,000,000 | 
| 400 | 750,000 | 
| 500 | 600,000 | 
| 750 | 175,000 | 
16GB Memory Ceiling Recommendations
| Feature Count | Row Guidelines for Standard Production Resources | 
|---|---|
| 25 | 26,000,000 | 
| 50 | 12,800,000 | 
| 75 | 8,400,000 | 
| 100 | 6,000,000 | 
| 200 | 3,200,000 | 
| 300 | 2,000,000 | 
| 400 | 1,500,000 | 
| 500 | 1,200,000 | 
| 750 | 350,000 | 
32GB Memory Ceiling Recommendations
| Feature Count | Row Guidelines for Standard Production Resources | 
|---|---|
| 25 | 50,000,000 | 
| 50 | 24,500,000 | 
| 75 | 16,200,000 | 
| 100 | 11,500,000 | 
| 200 | 6,100,000 | 
| 300 | 3,800,000 | 
| 400 | 2,800,000 | 
| 500 | 2,200,000 | 
| 750 | 650,000 | 
