Loading...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 | .. SPDX-License-Identifier: GPL-2.0 ============================================================ Intel(R) Speed Select Technology User Guide ============================================================ The Intel(R) Speed Select Technology (Intel(R) SST) provides a powerful new collection of features that give more granular control over CPU performance. With Intel(R) SST, one server can be configured for power and performance for a variety of diverse workload requirements. Refer to the links below for an overview of the technology: - https://www.intel.com/content/www/us/en/architecture-and-technology/speed-select-technology-article.html - https://builders.intel.com/docs/networkbuilders/intel-speed-select-technology-base-frequency-enhancing-performance.pdf These capabilities are further enhanced in some of the newer generations of server platforms where these features can be enumerated and controlled dynamically without pre-configuring via BIOS setup options. This dynamic configuration is done via mailbox commands to the hardware. One way to enumerate and configure these features is by using the Intel Speed Select utility. This document explains how to use the Intel Speed Select tool to enumerate and control Intel(R) SST features. This document gives example commands and explains how these commands change the power and performance profile of the system under test. Using this tool as an example, customers can replicate the messaging implemented in the tool in their production software. intel-speed-select configuration tool ====================================== Most Linux distribution packages may include the "intel-speed-select" tool. If not, it can be built by downloading the Linux kernel tree from kernel.org. Once downloaded, the tool can be built without building the full kernel. From the kernel tree, run the following commands:: # cd tools/power/x86/intel-speed-select/ # make # make install Getting Help ------------ To get help with the tool, execute the command below:: # intel-speed-select --help The top-level help describes arguments and features. Notice that there is a multi-level help structure in the tool. For example, to get help for the feature "perf-profile":: # intel-speed-select perf-profile --help To get help on a command, another level of help is provided. For example for the command info "info":: # intel-speed-select perf-profile info --help Summary of platform capability ------------------------------ To check the current platform and driver capabilities, execute:: #intel-speed-select --info For example on a test system:: # intel-speed-select --info Intel(R) Speed Select Technology Executing on CPU model: X Platform: API version : 1 Platform: Driver version : 1 Platform: mbox supported : 1 Platform: mmio supported : 1 Intel(R) SST-PP (feature perf-profile) is supported TDP level change control is unlocked, max level: 4 Intel(R) SST-TF (feature turbo-freq) is supported Intel(R) SST-BF (feature base-freq) is not supported Intel(R) SST-CP (feature core-power) is supported Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) ------------------------------------------------------------------------ This feature allows configuration of a server dynamically based on workload performance requirements. This helps users during deployment as they do not have to choose a specific server configuration statically. This Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) feature introduces a mechanism that allows multiple optimized performance profiles per system. Each profile defines a set of CPUs that need to be online and rest offline to sustain a guaranteed base frequency. Once the user issues a command to use a specific performance profile and meet CPU online/offline requirement, the user can expect a change in the base frequency dynamically. This feature is called "perf-profile" when using the Intel Speed Select tool. Number or performance levels ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There can be multiple performance profiles on a system. To get the number of profiles, execute the command below:: # intel-speed-select perf-profile get-config-levels Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 get-config-levels:4 package-1 die-0 cpu-14 get-config-levels:4 On this system under test, there are 4 performance profiles in addition to the base performance profile (which is performance level 0). Lock/Unlock status ~~~~~~~~~~~~~~~~~~ Even if there are multiple performance profiles, it is possible that they are locked. If they are locked, users cannot issue a command to change the performance state. It is possible that there is a BIOS setup to unlock or check with your system vendor. To check if the system is locked, execute the following command:: # intel-speed-select perf-profile get-lock-status Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 get-lock-status:0 package-1 die-0 cpu-14 get-lock-status:0 In this case, lock status is 0, which means that the system is unlocked. Properties of a performance level ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To get properties of a specific performance level (For example for the level 0, below), execute the command below:: # intel-speed-select perf-profile info -l 0 Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 perf-profile-level-0 cpu-count:28 enable-cpu-mask:000003ff,f0003fff enable-cpu-list:0,1,2,3,4,5,6,7,8,9,10,11,12,13,28,29,30,31,32,33,34,35,36,37,38,39,40,41 thermal-design-power-ratio:26 base-frequency(MHz):2600 speed-select-turbo-freq:disabled speed-select-base-freq:disabled ... ... Here -l option is used to specify a performance level. If the option -l is omitted, then this command will print information about all the performance levels. The above command is printing properties of the performance level 0. For this performance profile, the list of CPUs displayed by the "enable-cpu-mask/enable-cpu-list" at the max can be "online." When that condition is met, then base frequency of 2600 MHz can be maintained. To understand more, execute "intel-speed-select perf-profile info" for performance level 4:: # intel-speed-select perf-profile info -l 4 Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 perf-profile-level-4 cpu-count:28 enable-cpu-mask:000000fa,f0000faf enable-cpu-list:0,1,2,3,5,7,8,9,10,11,28,29,30,31,33,35,36,37,38,39 thermal-design-power-ratio:28 base-frequency(MHz):2800 speed-select-turbo-freq:disabled speed-select-base-freq:unsupported ... ... There are fewer CPUs in the "enable-cpu-mask/enable-cpu-list". Consequently, if the user only keeps these CPUs online and the rest "offline," then the base frequency is increased to 2.8 GHz compared to 2.6 GHz at performance level 0. Get current performance level ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To get the current performance level, execute:: # intel-speed-select perf-profile get-config-current-level Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 get-config-current_level:0 First verify that the base_frequency displayed by the cpufreq sysfs is correct:: # cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency 2600000 This matches the base-frequency (MHz) field value displayed from the "perf-profile info" command for performance level 0(cpufreq frequency is in KHz). To check if the average frequency is equal to the base frequency for a 100% busy workload, disable turbo:: # echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo Then runs a busy workload on all CPUs, for example:: #stress -c 64 To verify the base frequency, run turbostat:: #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 Package Core CPU Bzy_MHz - - 2600 0 0 0 2600 0 1 1 2600 0 2 2 2600 0 3 3 2600 0 4 4 2600 . . . . Changing performance level ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To the change the performance level to 4, execute:: # intel-speed-select -d perf-profile set-config-level -l 4 -o Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 perf-profile set_tdp_level:success In the command above, "-o" is optional. If it is specified, then it will also offline CPUs which are not present in the enable_cpu_mask for this performance level. Now if the base_frequency is checked:: #cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency 2800000 Which shows that the base frequency now increased from 2600 MHz at performance level 0 to 2800 MHz at performance level 4. As a result, any workload, which can use fewer CPUs, can see a boost of 200 MHz compared to performance level 0. Changing performance level via BMC Interface ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It is possible to change SST-PP level using out of band (OOB) agent (Via some remote management console, through BMC "Baseboard Management Controller" interface). This mode is supported from the Sapphire Rapids processor generation. The kernel and tool change to support this mode is added to Linux kernel version 5.18. To enable this feature, kernel config "CONFIG_INTEL_HFI_THERMAL" is required. The minimum version of the tool is "v1.12" to support this feature, which is part of Linux kernel version 5.18. To support such configuration, this tool can be used as a daemon. Add a command line option --oob:: # intel-speed-select --oob Intel(R) Speed Select Technology Executing on CPU model:143[0x8f] OOB mode is enabled and will run as daemon In this mode the tool will online/offline CPUs based on the new performance level. Check presence of other Intel(R) SST features --------------------------------------------- Each of the performance profiles also specifies weather there is support of other two Intel(R) SST features (Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF) and Intel(R) Speed Select Technology - Turbo Frequency (Intel SST-TF)). For example, from the output of "perf-profile info" above, for level 0 and level 4: For level 0:: speed-select-turbo-freq:disabled speed-select-base-freq:disabled For level 4:: speed-select-turbo-freq:disabled speed-select-base-freq:unsupported Given these results, the "speed-select-base-freq" (Intel(R) SST-BF) in level 4 changed from "disabled" to "unsupported" compared to performance level 0. This means that at performance level 4, the "speed-select-base-freq" feature is not supported. However, at performance level 0, this feature is "supported", but currently "disabled", meaning the user has not activated this feature. Whereas "speed-select-turbo-freq" (Intel(R) SST-TF) is supported at both performance levels, but currently not activated by the user. The Intel(R) SST-BF and the Intel(R) SST-TF features are built on a foundation technology called Intel(R) Speed Select Technology - Core Power (Intel(R) SST-CP). The platform firmware enables this feature when Intel(R) SST-BF or Intel(R) SST-TF is supported on a platform. Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP) --------------------------------------------------------------- Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP) is an interface that allows users to define per core priority. This defines a mechanism to distribute power among cores when there is a power constrained scenario. This defines a class of service (CLOS) configuration. The user can configure up to 4 class of service configurations. Each CLOS group configuration allows definitions of parameters, which affects how the frequency can be limited and power is distributed. Each CPU core can be tied to a class of service and hence an associated priority. The granularity is at core level not at per CPU level. Enable CLOS based prioritization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To use CLOS based prioritization feature, firmware must be informed to enable and use a priority type. There is a default per platform priority type, which can be changed with optional command line parameter. To enable and check the options, execute:: # intel-speed-select core-power enable --help Intel(R) Speed Select Technology Executing on CPU model: X Enable core-power for a package/die Clos Enable: Specify priority type with [--priority|-p] 0: Proportional, 1: Ordered There are two types of priority types: - Ordered Priority for ordered throttling is defined based on the index of the assigned CLOS group. Where CLOS0 gets highest priority (throttled last). Priority order is: CLOS0 > CLOS1 > CLOS2 > CLOS3. - Proportional When proportional priority is used, there is an additional parameter called frequency_weight, which can be specified per CLOS group. The goal of proportional priority is to provide each core with the requested min., then distribute all remaining (excess/deficit) budgets in proportion to a defined weight. This proportional priority can be configured using "core-power config" command. To enable with the platform default priority type, execute:: # intel-speed-select core-power enable Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 core-power enable:success package-1 die-0 cpu-6 core-power enable:success The scope of this enable is per package or die scoped when a package contains multiple dies. To check if CLOS is enabled and get priority type, "core-power info" command can be used. For example to check the status of core-power feature on CPU 0, execute:: # intel-speed-select -c 0 core-power info Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 core-power support-status:supported enable-status:enabled clos-enable-status:enabled priority-type:proportional package-1 die-0 cpu-24 core-power support-status:supported enable-status:enabled clos-enable-status:enabled priority-type:proportional Configuring CLOS groups ~~~~~~~~~~~~~~~~~~~~~~~ Each CLOS group has its own attributes including min, max, freq_weight and desired. These parameters can be configured with "core-power config" command. Defaults will be used if user skips setting a parameter except clos id, which is mandatory. To check core-power config options, execute:: # intel-speed-select core-power config --help Intel(R) Speed Select Technology Executing on CPU model: X Set core-power configuration for one of the four clos ids Specify targeted clos id with [--clos|-c] Specify clos Proportional Priority [--weight|-w] Specify clos min in MHz with [--min|-n] Specify clos max in MHz with [--max|-m] For example:: # intel-speed-select core-power config -c 0 Intel(R) Speed Select Technology Executing on CPU model: X clos epp is not specified, default: 0 clos frequency weight is not specified, default: 0 clos min is not specified, default: 0 MHz clos max is not specified, default: 25500 MHz clos desired is not specified, default: 0 package-0 die-0 cpu-0 core-power config:success package-1 die-0 cpu-6 core-power config:success The user has the option to change defaults. For example, the user can change the "min" and set the base frequency to always get guaranteed base frequency. Get the current CLOS configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To check the current configuration, "core-power get-config" can be used. For example, to get the configuration of CLOS 0:: # intel-speed-select core-power get-config -c 0 Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 core-power clos:0 epp:0 clos-proportional-priority:0 clos-min:0 MHz clos-max:Max Turbo frequency clos-desired:0 MHz package-1 die-0 cpu-24 core-power clos:0 epp:0 clos-proportional-priority:0 clos-min:0 MHz clos-max:Max Turbo frequency clos-desired:0 MHz Associating a CPU with a CLOS group ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To associate a CPU to a CLOS group "core-power assoc" command can be used:: # intel-speed-select core-power assoc --help Intel(R) Speed Select Technology Executing on CPU model: X Associate a clos id to a CPU Specify targeted clos id with [--clos|-c] For example to associate CPU 10 to CLOS group 3, execute:: # intel-speed-select -c 10 core-power assoc -c 3 Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-10 core-power assoc:success Once a CPU is associated, its sibling CPUs are also associated to a CLOS group. Once associated, avoid changing Linux "cpufreq" subsystem scaling frequency limits. To check the existing association for a CPU, "core-power get-assoc" command can be used. For example, to get association of CPU 10, execute:: # intel-speed-select -c 10 core-power get-assoc Intel(R) Speed Select Technology Executing on CPU model: X package-1 die-0 cpu-10 get-assoc clos:3 This shows that CPU 10 is part of a CLOS group 3. Disable CLOS based prioritization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To disable, execute:: # intel-speed-select core-power disable Some features like Intel(R) SST-TF can only be enabled when CLOS based prioritization is enabled. For this reason, disabling while Intel(R) SST-TF is enabled can cause Intel(R) SST-TF to fail. This will cause the "disable" command to display an error if Intel(R) SST-TF is already enabled. In turn, to disable, the Intel(R) SST-TF feature must be disabled first. Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF) ------------------------------------------------------------------- The Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF) feature lets the user control base frequency. If some critical workload threads demand constant high guaranteed performance, then this feature can be used to execute the thread at higher base frequency on specific sets of CPUs (high priority CPUs) at the cost of lower base frequency (low priority CPUs) on other CPUs. This feature does not require offline of the low priority CPUs. The support of Intel(R) SST-BF depends on the Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) performance level configuration. It is possible that only certain performance levels support Intel(R) SST-BF. It is also possible that only base performance level (level = 0) has support of Intel SST-BF. Consequently, first select the desired performance level to enable this feature. In the system under test here, Intel(R) SST-BF is supported at the base performance level 0, but currently disabled. For example for the level 0:: # intel-speed-select -c 0 perf-profile info -l 0 Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 perf-profile-level-0 ... speed-select-base-freq:disabled ... Before enabling Intel(R) SST-BF and measuring its impact on a workload performance, execute some workload and measure performance and get a baseline performance to compare against. Here the user wants more guaranteed performance. For this reason, it is likely that turbo is disabled. To disable turbo, execute:: #echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo Based on the output of the "intel-speed-select perf-profile info -l 0" base frequency of guaranteed frequency 2600 MHz. Measure baseline performance for comparison ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To compare, pick a multi-threaded workload where each thread can be scheduled on separate CPUs. "Hackbench pipe" test is a good example on how to improve performance using Intel(R) SST-BF. Below, the workload is measuring average scheduler wakeup latency, so a lower number means better performance:: # taskset -c 3,4 perf bench -r 100 sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 6.102 [sec] 6.102445 usecs/op 163868 ops/sec While running the above test, if we take turbostat output, it will show us that 2 of the CPUs are busy and reaching max. frequency (which would be the base frequency as the turbo is disabled). The turbostat output:: #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 Package Core CPU Bzy_MHz 0 0 0 1000 0 1 1 1005 0 2 2 1000 0 3 3 2600 0 4 4 2600 0 5 5 1000 0 6 6 1000 0 7 7 1005 0 8 8 1005 0 9 9 1000 0 10 10 1000 0 11 11 995 0 12 12 1000 0 13 13 1000 From the above turbostat output, both CPU 3 and 4 are very busy and reaching full guaranteed frequency of 2600 MHz. Intel(R) SST-BF Capabilities ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To get capabilities of Intel(R) SST-BF for the current performance level 0, execute:: # intel-speed-select base-freq info -l 0 Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 speed-select-base-freq high-priority-base-frequency(MHz):3000 high-priority-cpu-mask:00000216,00002160 high-priority-cpu-list:5,6,8,13,33,34,36,41 low-priority-base-frequency(MHz):2400 tjunction-temperature(C):125 thermal-design-power(W):205 The above capabilities show that there are some CPUs on this system that can offer base frequency of 3000 MHz compared to the standard base frequency at this performance levels. Nevertheless, these CPUs are fixed, and they are presented via high-priority-cpu-list/high-priority-cpu-mask. But if this Intel(R) SST-BF feature is selected, the low priorities CPUs (which are not in high-priority-cpu-list) can only offer up to 2400 MHz. As a result, if this clipping of low priority CPUs is acceptable, then the user can enable Intel SST-BF feature particularly for the above "sched pipe" workload since only two CPUs are used, they can be scheduled on high priority CPUs and can get boost of 400 MHz. Enable Intel(R) SST-BF ~~~~~~~~~~~~~~~~~~~~~~ To enable Intel(R) SST-BF feature, execute:: # intel-speed-select base-freq enable -a Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 base-freq enable:success package-1 die-0 cpu-14 base-freq enable:success In this case, -a option is optional. This not only enables Intel(R) SST-BF, but it also adjusts the priority of cores using Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP) features. This option sets the minimum performance of each Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) class to maximum performance so that the hardware will give maximum performance possible for each CPU. If -a option is not used, then the following steps are required before enabling Intel(R) SST-BF: - Discover Intel(R) SST-BF and note low and high priority base frequency - Note the high priority CPU list - Enable CLOS using core-power feature set - Configure CLOS parameters. Use CLOS.min to set to minimum performance - Subscribe desired CPUs to CLOS groups With this configuration, if the same workload is executed by pinning the workload to high priority CPUs (CPU 5 and 6 in this case):: #taskset -c 5,6 perf bench -r 100 sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 5.627 [sec] 5.627922 usecs/op 177685 ops/sec This way, by enabling Intel(R) SST-BF, the performance of this benchmark is improved (latency reduced) by 7.79%. From the turbostat output, it can be observed that the high priority CPUs reached 3000 MHz compared to 2600 MHz. The turbostat output:: #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 Package Core CPU Bzy_MHz 0 0 0 2151 0 1 1 2166 0 2 2 2175 0 3 3 2175 0 4 4 2175 0 5 5 3000 0 6 6 3000 0 7 7 2180 0 8 8 2662 0 9 9 2176 0 10 10 2175 0 11 11 2176 0 12 12 2176 0 13 13 2661 Disable Intel(R) SST-BF ~~~~~~~~~~~~~~~~~~~~~~~ To disable the Intel(R) SST-BF feature, execute:: # intel-speed-select base-freq disable -a Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF) -------------------------------------------------------------------- This feature enables the ability to set different "All core turbo ratio limits" to cores based on the priority. By using this feature, some cores can be configured to get higher turbo frequency by designating them as high priority at the cost of lower or no turbo frequency on the low priority cores. For this reason, this feature is only useful when system is busy utilizing all CPUs, but the user wants some configurable option to get high performance on some CPUs. The support of Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF) depends on the Intel(R) Speed Select Technology - Performance Profile (Intel SST-PP) performance level configuration. It is possible that only a certain performance level supports Intel(R) SST-TF. It is also possible that only the base performance level (level = 0) has the support of Intel(R) SST-TF. Hence, first select the desired performance level to enable this feature. In the system under test here, Intel(R) SST-TF is supported at the base performance level 0, but currently disabled:: # intel-speed-select -c 0 perf-profile info -l 0 Intel(R) Speed Select Technology package-0 die-0 cpu-0 perf-profile-level-0 ... ... speed-select-turbo-freq:disabled ... ... To check if performance can be improved using Intel(R) SST-TF feature, get the turbo frequency properties with Intel(R) SST-TF enabled and compare to the base turbo capability of this system. Get Base turbo capability ~~~~~~~~~~~~~~~~~~~~~~~~~ To get the base turbo capability of performance level 0, execute:: # intel-speed-select perf-profile info -l 0 Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 perf-profile-level-0 ... ... turbo-ratio-limits-sse bucket-0 core-count:2 max-turbo-frequency(MHz):3200 bucket-1 core-count:4 max-turbo-frequency(MHz):3100 bucket-2 core-count:6 max-turbo-frequency(MHz):3100 bucket-3 core-count:8 max-turbo-frequency(MHz):3100 bucket-4 core-count:10 max-turbo-frequency(MHz):3100 bucket-5 core-count:12 max-turbo-frequency(MHz):3100 bucket-6 core-count:14 max-turbo-frequency(MHz):3100 bucket-7 core-count:16 max-turbo-frequency(MHz):3100 Based on the data above, when all the CPUS are busy, the max. frequency of 3100 MHz can be achieved. If there is some busy workload on cpu 0 - 11 (e.g. stress) and on CPU 12 and 13, execute "hackbench pipe" workload:: # taskset -c 12,13 perf bench -r 100 sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 5.705 [sec] 5.705488 usecs/op 175269 ops/sec The turbostat output:: #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 Package Core CPU Bzy_MHz 0 0 0 3000 0 1 1 3000 0 2 2 3000 0 3 3 3000 0 4 4 3000 0 5 5 3100 0 6 6 3100 0 7 7 3000 0 8 8 3100 0 9 9 3000 0 10 10 3000 0 11 11 3000 0 12 12 3100 0 13 13 3100 Based on turbostat output, the performance is limited by frequency cap of 3100 MHz. To check if the hackbench performance can be improved for CPU 12 and CPU 13, first check the capability of the Intel(R) SST-TF feature for this performance level. Get Intel(R) SST-TF Capability ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To get the capability, the "turbo-freq info" command can be used:: # intel-speed-select turbo-freq info -l 0 Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-0 speed-select-turbo-freq bucket-0 high-priority-cores-count:2 high-priority-max-frequency(MHz):3200 high-priority-max-avx2-frequency(MHz):3200 high-priority-max-avx512-frequency(MHz):3100 bucket-1 high-priority-cores-count:4 high-priority-max-frequency(MHz):3100 high-priority-max-avx2-frequency(MHz):3000 high-priority-max-avx512-frequency(MHz):2900 bucket-2 high-priority-cores-count:6 high-priority-max-frequency(MHz):3100 high-priority-max-avx2-frequency(MHz):3000 high-priority-max-avx512-frequency(MHz):2900 speed-select-turbo-freq-clip-frequencies low-priority-max-frequency(MHz):2600 low-priority-max-avx2-frequency(MHz):2400 low-priority-max-avx512-frequency(MHz):2100 Based on the output above, there is an Intel(R) SST-TF bucket for which there are two high priority cores. If only two high priority cores are set, then max. turbo frequency on those cores can be increased to 3200 MHz. This is 100 MHz more than the base turbo capability for all cores. In turn, for the hackbench workload, two CPUs can be set as high priority and rest as low priority. One side effect is that once enabled, the low priority cores will be clipped to a lower frequency of 2600 MHz. Enable Intel(R) SST-TF ~~~~~~~~~~~~~~~~~~~~~~ To enable Intel(R) SST-TF, execute:: # intel-speed-select -c 12,13 turbo-freq enable -a Intel(R) Speed Select Technology Executing on CPU model: X package-0 die-0 cpu-12 turbo-freq enable:success package-0 die-0 cpu-13 turbo-freq enable:success package--1 die-0 cpu-63 turbo-freq --auto enable:success In this case, the option "-a" is optional. If set, it enables Intel(R) SST-TF feature and also sets the CPUs to high and low priority using Intel Speed Select Technology Core Power (Intel(R) SST-CP) features. The CPU numbers passed with "-c" arguments are marked as high priority, including its siblings. If -a option is not used, then the following steps are required before enabling Intel(R) SST-TF: - Discover Intel(R) SST-TF and note buckets of high priority cores and maximum frequency - Enable CLOS using core-power feature set - Configure CLOS parameters - Subscribe desired CPUs to CLOS groups making sure that high priority cores are set to the maximum frequency If the same hackbench workload is executed, schedule hackbench threads on high priority CPUs:: #taskset -c 12,13 perf bench -r 100 sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 5.510 [sec] 5.510165 usecs/op 180826 ops/sec This improved performance by around 3.3% improvement on a busy system. Here the turbostat output will show that the CPU 12 and CPU 13 are getting 100 MHz boost. The turbostat output:: #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 Package Core CPU Bzy_MHz ... 0 12 12 3200 0 13 13 3200 |