<!DOCTYPE html>
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <title>d1436r1.html</title>
  <meta name="generator" content="Haroopad 0.13.1" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0">

  <style>div.oembedall-githubrepos{border:1px solid #DDD;border-radius:4px;list-style-type:none;margin:0 0 10px;padding:8px 10px 0;font:13.34px/1.4 helvetica,arial,freesans,clean,sans-serif;width:452px;background-color:#fff}div.oembedall-githubrepos .oembedall-body{background:-moz-linear-gradient(center top,#FAFAFA,#EFEFEF);background:-webkit-gradient(linear,left top,left bottom,from(#FAFAFA),to(#EFEFEF));border-bottom-left-radius:4px;border-bottom-right-radius:4px;border-top:1px solid #EEE;margin-left:-10px;margin-top:8px;padding:5px 10px;width:100%}div.oembedall-githubrepos h3{font-size:14px;margin:0;padding-left:18px;white-space:nowrap}div.oembedall-githubrepos p.oembedall-description{color:#444;font-size:12px;margin:0 0 3px}div.oembedall-githubrepos p.oembedall-updated-at{color:#888;font-size:11px;margin:0}div.oembedall-githubrepos ul.oembedall-repo-stats{border:none;float:right;font-size:11px;font-weight:700;padding-left:15px;position:relative;z-index:5;margin:0}div.oembedall-githubrepos ul.oembedall-repo-stats li{border:none;color:#666;display:inline-block;list-style-type:none;margin:0!important}div.oembedall-githubrepos ul.oembedall-repo-stats li a{background-color:transparent;border:none;color:#666!important;background-position:5px -2px;background-repeat:no-repeat;border-left:1px solid #DDD;display:inline-block;height:21px;line-height:21px;padding:0 5px 0 23px}div.oembedall-githubrepos ul.oembedall-repo-stats li:first-child a{border-left:medium none;margin-right:-3px}div.oembedall-githubrepos ul.oembedall-repo-stats li a:hover{background:5px -27px no-repeat #4183C4;color:#FFF!important;text-decoration:none}div.oembedall-githubrepos ul.oembedall-repo-stats li:first-child a:hover{border-bottom-left-radius:3px;border-top-left-radius:3px}ul.oembedall-repo-stats li:last-child a:hover{border-bottom-right-radius:3px;border-top-right-radius:3px}span.oembedall-closehide{background-color:#aaa;border-radius:2px;cursor:pointer;margin-right:3px}div.oembedall-container{margin-top:5px;text-align:left}.oembedall-ljuser{font-weight:700}.oembedall-ljuser img{vertical-align:bottom;border:0;padding-right:1px}.oembedall-stoqembed{border-bottom:1px dotted #999;float:left;overflow:hidden;width:730px;line-height:1;background:#FFF;color:#000;font-family:Arial,Liberation Sans,DejaVu Sans,sans-serif;font-size:80%;text-align:left;margin:0;padding:0}.oembedall-stoqembed a{color:#07C;text-decoration:none;margin:0;padding:0}.oembedall-stoqembed a:hover{text-decoration:underline}.oembedall-stoqembed a:visited{color:#4A6B82}.oembedall-stoqembed h3{font-family:Trebuchet MS,Liberation Sans,DejaVu Sans,sans-serif;font-size:130%;font-weight:700;margin:0;padding:0}.oembedall-stoqembed .oembedall-reputation-score{color:#444;font-size:120%;font-weight:700;margin-right:2px}.oembedall-stoqembed .oembedall-user-info{height:35px;width:185px}.oembedall-stoqembed .oembedall-user-info .oembedall-user-gravatar32{float:left;height:32px;width:32px}.oembedall-stoqembed .oembedall-user-info .oembedall-user-details{float:left;margin-left:5px;overflow:hidden;white-space:nowrap;width:145px}.oembedall-stoqembed .oembedall-question-hyperlink{font-weight:700}.oembedall-stoqembed .oembedall-stats{background:#EEE;margin:0 0 0 7px;padding:4px 7px 6px;width:58px}.oembedall-stoqembed .oembedall-statscontainer{float:left;margin-right:8px;width:86px}.oembedall-stoqembed .oembedall-votes{color:#555;padding:0 0 7px;text-align:center}.oembedall-stoqembed .oembedall-vote-count-post{font-size:240%;color:#808185;display:block;font-weight:700}.oembedall-stoqembed .oembedall-views{color:#999;padding-top:4px;text-align:center}.oembedall-stoqembed .oembedall-status{margin-top:-3px;padding:4px 0;text-align:center;background:#75845C;color:#FFF}.oembedall-stoqembed .oembedall-status strong{color:#FFF;display:block;font-size:140%}.oembedall-stoqembed .oembedall-summary{float:left;width:635px}.oembedall-stoqembed .oembedall-excerpt{line-height:1.2;margin:0;padding:0 0 5px}.oembedall-stoqembed .oembedall-tags{float:left;line-height:18px}.oembedall-stoqembed .oembedall-tags a:hover{text-decoration:none}.oembedall-stoqembed .oembedall-post-tag{background-color:#E0EAF1;border-bottom:1px solid #3E6D8E;border-right:1px solid #7F9FB6;color:#3E6D8E;font-size:90%;line-height:2.4;margin:2px 2px 2px 0;padding:3px 4px;text-decoration:none;white-space:nowrap}.oembedall-stoqembed .oembedall-post-tag:hover{background-color:#3E6D8E;border-bottom:1px solid #37607D;border-right:1px solid #37607D;color:#E0EAF1}.oembedall-stoqembed .oembedall-fr{float:right}.oembedall-stoqembed .oembedall-statsarrow{background-image:url(http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=3);background-repeat:no-repeat;overflow:hidden;background-position:0 -435px;float:right;height:13px;margin-top:12px;width:7px}.oembedall-facebook1{border:1px solid #1A3C6C;padding:0;font:13.34px/1.4 verdana;width:500px}.oembedall-facebook2{background-color:#627add}.oembedall-facebook2 a{color:#e8e8e8;text-decoration:none}.oembedall-facebookBody{background-color:#fff;vertical-align:top;padding:5px}.oembedall-facebookBody .contents{display:inline-block;width:100%}.oembedall-facebookBody div img{float:left;margin-right:5px}div.oembedall-lanyard{-webkit-box-shadow:none;-webkit-transition-delay:0s;-webkit-transition-duration:.4000000059604645s;-webkit-transition-property:width;-webkit-transition-timing-function:cubic-bezier(0.42,0,.58,1);background-attachment:scroll;background-clip:border-box;background-color:transparent;background-image:none;background-origin:padding-box;border-width:0;box-shadow:none;color:#112644;display:block;float:left;font-family:'Trebuchet MS',Trebuchet,sans-serif;font-size:16px;height:253px;line-height:19px;margin:0;max-width:none;min-height:0;outline:#112644 0;overflow-x:visible;overflow-y:visible;padding:0;position:relative;text-align:left;vertical-align:baseline;width:804px}div.oembedall-lanyard .tagline{font-size:1.5em}div.oembedall-lanyard .wrapper{overflow:hidden;clear:both}div.oembedall-lanyard .split{float:left;display:inline}div.oembedall-lanyard .prominent-place .flag:active,div.oembedall-lanyard .prominent-place .flag:focus,div.oembedall-lanyard .prominent-place .flag:hover,div.oembedall-lanyard .prominent-place .flag:link,div.oembedall-lanyard .prominent-place .flag:visited{float:left;display:block;width:48px;height:48px;position:relative;top:-5px;margin-right:10px}div.oembedall-lanyard .place-context{font-size:.889em}div.oembedall-lanyard .prominent-place .sub-place{display:block}div.oembedall-lanyard .prominent-place{font-size:1.125em;line-height:1.1em;font-weight:400}div.oembedall-lanyard .main-date{color:#8CB4E0;font-weight:700;line-height:1.1}div.oembedall-lanyard .first{width:48.57%;margin:0 0 0 2.857%}.mermaid .label{color:#333}.node circle,.node polygon,.node rect{fill:#cde498;stroke:#13540c;stroke-width:1px}.edgePath .path{stroke:green;stroke-width:1.5px}.cluster rect{fill:#cdffb2;rx:40;stroke:#6eaa49;stroke-width:1px}.cluster text{fill:#333}.actor{stroke:#13540c;fill:#cde498}text.actor{fill:#000;stroke:none}.actor-line{stroke:grey}.messageLine0{stroke-width:1.5;stroke-dasharray:"2 2";marker-end:"url(#arrowhead)";stroke:#333}.messageLine1{stroke-width:1.5;stroke-dasharray:"2 2";stroke:#333}#arrowhead{fill:#333}#crosshead path{fill:#333!important;stroke:#333!important}.messageText{fill:#333;stroke:none}.labelBox{stroke:#326932;fill:#cde498}.labelText,.loopText{fill:#000;stroke:none}.loopLine{stroke-width:2;stroke-dasharray:"2 2";marker-end:"url(#arrowhead)";stroke:#326932}.note{stroke:#6eaa49;fill:#fff5ad}.noteText{fill:#000;stroke:none;font-family:'trebuchet ms',verdana,arial;font-size:14px}.section{stroke:none;opacity:.2}.section0,.section2{fill:#6eaa49}.section1,.section3{fill:#fff;opacity:.2}.sectionTitle0,.sectionTitle1,.sectionTitle2,.sectionTitle3{fill:#333}.sectionTitle{text-anchor:start;font-size:11px;text-height:14px}.grid .tick{stroke:lightgrey;opacity:.3;shape-rendering:crispEdges}.grid path{stroke-width:0}.today{fill:none;stroke:red;stroke-width:2px}.task{stroke-width:2}.taskText{text-anchor:middle;font-size:11px}.taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px}.taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}.taskText0,.taskText1,.taskText2,.taskText3{fill:#fff}.task0,.task1,.task2,.task3{fill:#487e3a;stroke:#13540c}.taskTextOutside0,.taskTextOutside1,.taskTextOutside2,.taskTextOutside3{fill:#000}.active0,.active1,.active2,.active3{fill:#cde498;stroke:#13540c}.activeText0,.activeText1,.activeText2,.activeText3{fill:#000!important}.done0,.done1,.done2,.done3{stroke:grey;fill:lightgrey;stroke-width:2}.doneText0,.doneText1,.doneText2,.doneText3{fill:#000!important}.crit0,.crit1,.crit2,.crit3{stroke:#f88;fill:red;stroke-width:2}.activeCrit0,.activeCrit1,.activeCrit2,.activeCrit3{stroke:#f88;fill:#cde498;stroke-width:2}.doneCrit0,.doneCrit1,.doneCrit2,.doneCrit3{stroke:#f88;fill:lightgrey;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}.activeCritText0,.activeCritText1,.activeCritText2,.activeCritText3,.doneCritText0,.doneCritText1,.doneCritText2,.doneCritText3{fill:#000!important}.titleText{text-anchor:middle;font-size:18px;fill:#000}text{font-family:'trebuchet ms',verdana,arial;font-size:14px}html{height:100%}body{margin:0!important;padding:5px 20px 26px!important;background-color:#fff;font-family:"Lucida Grande","Segoe UI","Apple SD Gothic Neo","Malgun Gothic","Lucida Sans Unicode",Helvetica,Arial,sans-serif;font-size:.9em;overflow-x:hidden;overflow-y:auto}br,h1,h2,h3,h4,h5,h6{clear:both}hr.page{background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAYAAAAECAYAAACtBE5DAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAyJpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMC1jMDYwIDYxLjEzNDc3NywgMjAxMC8wMi8xMi0xNzozMjowMCAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RSZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNSBNYWNpbnRvc2giIHhtcE1NOkluc3RhbmNlSUQ9InhtcC5paWQ6OENDRjNBN0E2NTZBMTFFMEI3QjRBODM4NzJDMjlGNDgiIHhtcE1NOkRvY3VtZW50SUQ9InhtcC5kaWQ6OENDRjNBN0I2NTZBMTFFMEI3QjRBODM4NzJDMjlGNDgiPiA8eG1wTU06RGVyaXZlZEZyb20gc3RSZWY6aW5zdGFuY2VJRD0ieG1wLmlpZDo4Q0NGM0E3ODY1NkExMUUwQjdCNEE4Mzg3MkMyOUY0OCIgc3RSZWY6ZG9jdW1lbnRJRD0ieG1wLmRpZDo4Q0NGM0E3OTY1NkExMUUwQjdCNEE4Mzg3MkMyOUY0OCIvPiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1wbWV0YT4gPD94cGFja2V0IGVuZD0iciI/PqqezsUAAAAfSURBVHjaYmRABcYwBiM2QSA4y4hNEKYDQxAEAAIMAHNGAzhkPOlYAAAAAElFTkSuQmCC) repeat-x;border:0;height:3px;padding:0}hr.underscore{border-top-style:dashed!important}body >:first-child{margin-top:0!important}img.plugin{box-shadow:0 1px 3px rgba(0,0,0,.1);border-radius:3px}iframe{border:0}figure{-webkit-margin-before:0;-webkit-margin-after:0;-webkit-margin-start:0;-webkit-margin-end:0}kbd{border:1px solid #aaa;-moz-border-radius:2px;-webkit-border-radius:2px;border-radius:2px;-moz-box-shadow:1px 2px 2px #ddd;-webkit-box-shadow:1px 2px 2px #ddd;box-shadow:1px 2px 2px #ddd;background-color:#f9f9f9;background-image:-moz-linear-gradient(top,#eee,#f9f9f9,#eee);background-image:-o-linear-gradient(top,#eee,#f9f9f9,#eee);background-image:-webkit-linear-gradient(top,#eee,#f9f9f9,#eee);background-image:linear-gradient(top,#eee,#f9f9f9,#eee);padding:1px 3px;font-family:inherit;font-size:.85em}.oembeded .oembed_photo{display:inline-block}img[data-echo]{margin:25px 0;width:100px;height:100px;background:url(../img/ajax.gif) center center no-repeat #fff}.spinner{display:inline-block;width:10px;height:10px;margin-bottom:-.1em;border:2px solid rgba(0,0,0,.5);border-top-color:transparent;border-radius:100%;-webkit-animation:spin 1s infinite linear;animation:spin 1s infinite linear}.spinner:after{content:'';display:block;width:0;height:0;position:absolute;top:-6px;left:0;border:4px solid transparent;border-bottom-color:rgba(0,0,0,.5);-webkit-transform:rotate(45deg);transform:rotate(45deg)}@-webkit-keyframes spin{to{-webkit-transform:rotate(360deg)}}@keyframes spin{to{transform:rotate(360deg)}}p.toc{margin:0!important}p.toc ul{padding-left:10px}p.toc>ul{padding:10px;margin:0 10px;display:inline-block;border:1px solid #ededed;border-radius:5px}p.toc li,p.toc ul{list-style-type:none}p.toc li{width:100%;padding:0;overflow:hidden}p.toc li a::after{content:"."}p.toc li a:before{content:"• "}p.toc h5{text-transform:uppercase}p.toc .title{float:left;padding-right:3px}p.toc .number{margin:0;float:right;padding-left:3px;background:#fff;display:none}input.task-list-item{margin-left:-1.62em}.markdown{font-family:"Hiragino Sans GB","Microsoft YaHei",STHeiti,SimSun,"Lucida Grande","Lucida Sans Unicode","Lucida Sans",'Segoe UI',AppleSDGothicNeo-Medium,'Malgun Gothic',Verdana,Tahoma,sans-serif;padding:20px}.markdown a{text-decoration:none;vertical-align:baseline}.markdown a:hover{text-decoration:underline}.markdown h1{font-size:2.2em;font-weight:700;margin:1.5em 0 1em}.markdown h2{font-size:1.8em;font-weight:700;margin:1.275em 0 .85em}.markdown h3{font-size:1.6em;font-weight:700;margin:1.125em 0 .75em}.markdown h4{font-size:1.4em;font-weight:700;margin:.99em 0 .66em}.markdown h5{font-size:1.2em;font-weight:700;margin:.855em 0 .57em}.markdown h6{font-size:1em;font-weight:700;margin:.75em 0 .5em}.markdown h1+p,.markdown h1:first-child,.markdown h2+p,.markdown h2:first-child,.markdown h3+p,.markdown h3:first-child,.markdown h4+p,.markdown h4:first-child,.markdown h5+p,.markdown h5:first-child,.markdown h6+p,.markdown h6:first-child{margin-top:0}.markdown hr{border:1px solid #ccc}.markdown p{margin:1em 0;word-wrap:break-word}.markdown ol{list-style-type:decimal}.markdown li{display:list-item;line-height:1.4em}.markdown blockquote{margin:1em 20px}.markdown blockquote>:first-child{margin-top:0}.markdown blockquote>:last-child{margin-bottom:0}.markdown blockquote cite:before{content:'\2014 \00A0'}.markdown .code{border-radius:3px;word-wrap:break-word}.markdown pre{border-radius:3px;word-wrap:break-word;border:1px solid #ccc;overflow:auto;padding:.5em}.markdown pre code{border:0;display:block}.markdown pre>code{font-family:Consolas,Inconsolata,Courier,monospace;font-weight:700;white-space:pre;margin:0}.markdown code{border-radius:3px;word-wrap:break-word;border:1px solid #ccc;padding:0 5px;margin:0 2px}.markdown img{max-width:100%}.markdown mark{color:#000;background-color:#fcf8e3}.markdown table{padding:0;border-collapse:collapse;border-spacing:0;margin-bottom:16px}.markdown table tr td,.markdown table tr th{border:1px solid #ccc;margin:0;padding:6px 13px}.markdown table tr th{font-weight:700}.markdown table tr th>:first-child{margin-top:0}.markdown table tr th>:last-child{margin-bottom:0}.markdown table tr td>:first-child{margin-top:0}.markdown table tr td>:last-child{margin-bottom:0}@import url(http://fonts.googleapis.com/css?family=Roboto+Condensed:300italic,400italic,700italic,400,300,700);.haroopad{padding:20px;color:#222;font-size:15px;font-family:"Roboto Condensed",Tauri,"Hiragino Sans GB","Microsoft YaHei",STHeiti,SimSun,"Lucida Grande","Lucida Sans Unicode","Lucida Sans",'Segoe UI',AppleSDGothicNeo-Medium,'Malgun Gothic',Verdana,Tahoma,sans-serif;background:#fff;line-height:1.6;-webkit-font-smoothing:antialiased}.haroopad a{color:#3269a0}.haroopad a:hover{color:#4183c4}.haroopad h2{border-bottom:1px solid #e6e6e6}.haroopad h6{color:#777}.haroopad hr{border:1px solid #e6e6e6}.haroopad blockquote>code,.haroopad h1>code,.haroopad h2>code,.haroopad h3>code,.haroopad h4>code,.haroopad h5>code,.haroopad h6>code,.haroopad li>code,.haroopad p>code,.haroopad td>code{font-family:Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:85%;background-color:rgba(0,0,0,.02);padding:.2em .5em;border:1px solid #efefef}.haroopad pre>code{font-size:1em;letter-spacing:-1px;font-weight:700}.haroopad blockquote{border-left:4px solid #e6e6e6;padding:0 15px;color:#777}.haroopad table{background-color:#fafafa}.haroopad table tr td,.haroopad table tr th{border:1px solid #e6e6e6}.haroopad table tr:nth-child(2n){background-color:#f2f2f2}.hljs{display:block;overflow-x:auto;padding:.5em;background:#fdf6e3;color:#657b83;-webkit-text-size-adjust:none}.diff .hljs-header,.hljs-comment,.hljs-doctype,.hljs-javadoc,.hljs-pi,.lisp .hljs-string{color:#93a1a1}.css .hljs-tag,.hljs-addition,.hljs-keyword,.hljs-request,.hljs-status,.hljs-winutils,.method,.nginx .hljs-title{color:#859900}.hljs-command,.hljs-dartdoc,.hljs-hexcolor,.hljs-link_url,.hljs-number,.hljs-phpdoc,.hljs-regexp,.hljs-rules .hljs-value,.hljs-string,.hljs-tag .hljs-value,.tex .hljs-formula{color:#2aa198}.css .hljs-function,.hljs-built_in,.hljs-chunk,.hljs-decorator,.hljs-id,.hljs-identifier,.hljs-localvars,.hljs-title,.vhdl .hljs-literal{color:#268bd2}.hljs-attribute,.hljs-class .hljs-title,.hljs-constant,.hljs-link_reference,.hljs-parent,.hljs-type,.hljs-variable,.lisp .hljs-body,.smalltalk .hljs-number{color:#b58900}.css .hljs-pseudo,.diff .hljs-change,.hljs-attr_selector,.hljs-cdata,.hljs-header,.hljs-pragma,.hljs-preprocessor,.hljs-preprocessor .hljs-keyword,.hljs-shebang,.hljs-special,.hljs-subst,.hljs-symbol,.hljs-symbol .hljs-string{color:#cb4b16}.hljs-deletion,.hljs-important{color:#dc322f}.hljs-link_label{color:#6c71c4}.tex .hljs-formula{background:#eee8d5}.MathJax_Hover_Frame{border-radius:.25em;-webkit-border-radius:.25em;-moz-border-radius:.25em;-khtml-border-radius:.25em;box-shadow:0 0 15px #83A;-webkit-box-shadow:0 0 15px #83A;-moz-box-shadow:0 0 15px #83A;-khtml-box-shadow:0 0 15px #83A;border:1px solid #A6D!important;display:inline-block;position:absolute}.MathJax_Hover_Arrow{position:absolute;width:15px;height:11px;cursor:pointer}#MathJax_About{position:fixed;left:50%;width:auto;text-align:center;border:3px outset;padding:1em 2em;background-color:#DDD;color:#000;cursor:default;font-family:message-box;font-size:120%;font-style:normal;text-indent:0;text-transform:none;line-height:normal;letter-spacing:normal;word-spacing:normal;word-wrap:normal;white-space:nowrap;float:none;z-index:201;border-radius:15px;-webkit-border-radius:15px;-moz-border-radius:15px;-khtml-border-radius:15px;box-shadow:0 10px 20px gray;-webkit-box-shadow:0 10px 20px gray;-moz-box-shadow:0 10px 20px gray;-khtml-box-shadow:0 10px 20px gray;filter:progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')}.MathJax_Menu{position:absolute;background-color:#fff;color:#000;width:auto;padding:2px;border:1px solid #CCC;margin:0;cursor:default;font:menu;text-align:left;text-indent:0;text-transform:none;line-height:normal;letter-spacing:normal;word-spacing:normal;word-wrap:normal;white-space:nowrap;float:none;z-index:201;box-shadow:0 10px 20px gray;-webkit-box-shadow:0 10px 20px gray;-moz-box-shadow:0 10px 20px gray;-khtml-box-shadow:0 10px 20px gray;filter:progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')}.MathJax_MenuItem{padding:2px 2em;background:0 0}.MathJax_MenuArrow{position:absolute;right:.5em;color:#666}.MathJax_MenuActive .MathJax_MenuArrow{color:#fff}.MathJax_MenuArrow.RTL{left:.5em;right:auto}.MathJax_MenuCheck{position:absolute;left:.7em}.MathJax_MenuCheck.RTL{right:.7em;left:auto}.MathJax_MenuRadioCheck{position:absolute;left:1em}.MathJax_MenuRadioCheck.RTL{right:1em;left:auto}.MathJax_MenuLabel{padding:2px 2em 4px 1.33em;font-style:italic}.MathJax_MenuRule{border-top:1px solid #CCC;margin:4px 1px 0}.MathJax_MenuDisabled{color:GrayText}.MathJax_MenuActive{background-color:Highlight;color:HighlightText}.MathJax_Menu_Close{position:absolute;width:31px;height:31px;top:-15px;left:-15px}#MathJax_Zoom{position:absolute;background-color:#F0F0F0;overflow:auto;display:block;z-index:301;padding:.5em;border:1px solid #000;margin:0;font-weight:400;font-style:normal;text-align:left;text-indent:0;text-transform:none;line-height:normal;letter-spacing:normal;word-spacing:normal;word-wrap:normal;white-space:nowrap;float:none;box-shadow:5px 5px 15px #AAA;-webkit-box-shadow:5px 5px 15px #AAA;-moz-box-shadow:5px 5px 15px #AAA;-khtml-box-shadow:5px 5px 15px #AAA;filter:progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')}#MathJax_ZoomOverlay{position:absolute;left:0;top:0;z-index:300;display:inline-block;width:100%;height:100%;border:0;padding:0;margin:0;background-color:#fff;opacity:0;filter:alpha(opacity=0)}#MathJax_ZoomFrame{position:relative;display:inline-block;height:0;width:0}#MathJax_ZoomEventTrap{position:absolute;left:0;top:0;z-index:302;display:inline-block;border:0;padding:0;margin:0;background-color:#fff;opacity:0;filter:alpha(opacity=0)}.MathJax_Preview{color:#888}#MathJax_Message{position:fixed;left:1px;bottom:2px;background-color:#E6E6E6;border:1px solid #959595;margin:0;padding:2px 8px;z-index:102;color:#000;font-size:80%;width:auto;white-space:nowrap}#MathJax_MSIE_Frame{position:absolute;top:0;left:0;width:0;z-index:101;border:0;margin:0;padding:0}.MathJax_Error{color:#C00;font-style:italic}footer{position:fixed;font-size:.8em;text-align:right;bottom:0;margin-left:-25px;height:20px;width:100%}</style>
</head>
<body class="markdown haroopad">
<h1 id="p1436r1:-executor-properties-for-affinity-based-execution"><a name="p1436r1:-executor-properties-for-affinity-based-execution" href="#p1436r1:-executor-properties-for-affinity-based-execution"></a>P1436r1: Executor properties for affinity-based execution</h1><p><strong>Date: 2019-03-31</strong></p><p><strong>Audience: SG1, SG14, LEWG</strong></p><p><strong>Authors: Gordon Brown, Ruyman Reyes, Michael Wong, H. Carter Edwards, Thomas Rodgers, Mark Hoemmen</strong></p><p><strong>Contributors: Patrice Roy, Carl Cook, Jeff Hammond, Hartmut Kaiser, Christian Trott, Paul Blinzer, Alex Voicu, Nat Goodspeed, Tony Tye, Paul Blinzer, Chris Kohlhoff</strong></p><p><strong>Emails: gordon@codeplay.com, ruyman@codeplay.com, michael@codeplay.com, hedwards@nvidia.com, rodgert@twrodgers.com, mhoemme@sandia.gov</strong></p><p><strong>Reply to: gordon@codeplay.com</strong></p><h1 id="changelog"><a name="changelog" href="#changelog"></a>Changelog</h1><h3 id="p1436r1-(col-2019)"><a name="p1436r1-(col-2019)" href="#p1436r1-(col-2019)"></a>P1436r1 (COL 2019)</h3><ul>
<li>Introduce wording to clarify when two invocations of bulk_execute<br>are expected to have consistent binding.</li><li>Introduce wording to describe how bulk_execute should handle an<br>execution context failing to provide the guaranteed binding.</li><li>Update the wording of bulk_execution_affinity.scatter and<br>bulk_execution_affinity.balance to better describe the expected<br>binding pattern.</li></ul><h3 id="p1436r0-(kon-2019)"><a name="p1436r0-(kon-2019)" href="#p1436r0-(kon-2019)"></a>P1436r0 (KON 2019)</h3><ul>
<li>Separation of high-level features from P0796r3 <a href="http://wg21.link/p0796">[35]</a>.</li><li>Update motivational examples.</li><li>Introduce new executor property <code>concurrency_t</code>.</li><li>Introduce new executor property <code>execution_locality_intersection_t</code>.</li><li>Introduce new executor property <code>memory_locality_intersection_t</code>.</li><li>Update direction for future work.</li></ul><h3 id="p0796r3-(san-2018)"><a name="p0796r3-(san-2018)" href="#p0796r3-(san-2018)"></a>P0796r3 (SAN 2018)</h3><ul>
<li>Remove reference counting requirement from <code>execution_resource</code>.</li><li>Change lifetime model of <code>execution_resource</code>: it now either consistently identifies some underlying resource, or is invalid; context creation rejects an invalid resource.ster</li><li>Remove <code>this_thread::bind</code> &amp; <code>this_thread::unbind</code> interfaces.</li><li>Make <code>execution_resource</code>s iterable by replacing <code>execution_resource::resources</code> with <code>execution_resource::begin</code> and <code>execution_resource::end</code>.</li><li>Add <code>size</code> and <code>operator[]</code> for <code>execution_resource</code>.</li><li>Rename <code>this_system::get_resources</code> to <code>this_system::discover_topology</code>.</li><li>Introduce <code>memory_resource</code> to represent the memory component of a system topology.</li><li>Remove <code>can_place_memory</code> and <code>can_place_agents</code> from the <code>execution_resource</code> as these are no longer required.</li><li>Remove <code>memory_resource</code> and <code>allocator</code> from the <code>execution_context</code> as these no longer make sense.</li><li>Update the wording to describe how execution resources and memory resources are structured.</li><li>Refactor <code>affinity_query</code> to be between an <code>execution_resource</code> and a <code>memory_resource</code>.</li></ul><h3 id="p0796r2-(rap-2018)"><a name="p0796r2-(rap-2018)" href="#p0796r2-(rap-2018)"></a>P0796r2 (RAP 2018)</h3><ul>
<li>Introduce a free function for retrieving the execution resource underlying the current thread of execution.</li><li>Introduce <code>this_thread::bind</code> &amp; <code>this_thread::unbind</code> for binding and unbinding a thread of execution to an execution resource.</li><li>Introduce <code>bulk_execution_affinity</code> executor properties for specifying affinity binding patterns on bulk execution functions.</li></ul><h3 id="p0796r1-(jax-2018)"><a name="p0796r1-(jax-2018)" href="#p0796r1-(jax-2018)"></a>P0796r1 (JAX 2018)</h3><ul>
<li>Introduce proposed wording.</li><li>Based on feedback from SG1, introduce a pair-wise interface for querying the relative affinity between execution resources.</li><li>Introduce an interface for retrieving an allocator or polymorphic memory resource.</li><li>Based on feedback from SG1, remove requirement for a hierarchical system topology structure, which doesn’t require a root resource.</li></ul><h3 id="p0796r0-(abq-2017)"><a name="p0796r0-(abq-2017)" href="#p0796r0-(abq-2017)"></a>P0796r0 (ABQ 2017)</h3><ul>
<li>Initial proposal.</li><li>Enumerate design space, hierarchical affinity, issues to the committee.</li></ul><h1 id="abstract"><a name="abstract" href="#abstract"></a>Abstract</h1><p>This paper is the result of a request from SG1 at the 2018 San Diego meeting to split P0796: Supporting Heterogeneous &amp; Distributed Computing Through Affinity <a href="http://wg21.link/p0796">[35]</a> into two separate papers, one for the high-level interface and one for the low-level interface. This paper focusses on the high-level interface: a series of properties for querying affinity relationships and requesting affinity on work being executed. P0437 will focus on the low-level interface: a mechanism for discovering the topology and affinity properties of a given system, however this paper was not submitted in this mailing.</p><p>The aim of this paper is to provide a number of executor properties that if supported allow the user of an executor to query and manipulate the binding of <em>execution agents</em> and the underlying <em>execution resources</em> of the <em>threads of execution</em> they are run on.</p><h1 id="motivation"><a name="motivation" href="#motivation"></a>Motivation</h1><p><em>Affinity</em> refers to the “closeness” in terms of memory access performance, between running code, the hardware execution resource on which the code runs, and the data that the code accesses.  A hardware execution resource has “more affinity” to a part of memory or to some data, if it has lower latency and/or higher bandwidth when accessing that memory / those data.</p><p>On almost all computer architectures, the cost of accessing different data may differ. Most computers have caches that are associated with specific processing units. If the operating system moves a thread or process from one processing unit to another, the thread or process will no longer have data in its new cache that it had in its old cache. This may make the next access to those data slower. Many computers also have a Non-Uniform Memory Architecture (NUMA), which means that even though all processing units see a single memory in terms of programming model, different processing units may still have more affinity to some parts of memory than others. NUMA exists because it is difficult to scale non-NUMA memory systems to the performance needed by today’s highly parallel computers and applications.</p><p>One strategy to improve applications’ performance, given the importance of affinity, is processor and memory <em>binding</em>. Keeping a process bound to a specific thread and local memory region optimizes cache affinity. It also reduces context switching and unnecessary scheduler activity. Since memory accesses to remote locations incur higher latency and/or lower bandwidth, control of thread placement to enforce affinity within parallel applications is crucial to fuel all the cores and to exploit the full performance of the memory subsystem on NUMA computers. </p><p>Operating systems (OSes) traditionally take responsibility for assigning threads or processes to run on processing units. However, OSes may use high-level policies for this assignment that do not necessarily match the optimal usage pattern for a given application. Application developers must leverage the placement of memory and <em>placement of threads</em> for best performance on current and future architectures. For C++ developers to achieve this, native support for <em>placement of threads and memory</em> is critical for application portability. We will refer to this as the <em>affinity problem</em>. </p><p>The affinity problem is especially challenging for applications whose behavior changes over time or is hard to predict, or when different applications interfere with each other’s performance. Today, most OSes already can group processing units according to their locality and distribute processes, while keeping threads close to the initial thread, or even avoid migrating threads and maintain first touch policy. Nevertheless, most programs can change their work distribution, especially in the presence of nested parallelism.</p><p>Frequently, data are initialized at the beginning of the program by the initial thread and are used by multiple threads. While some OSes automatically migrate threads or data for better affinity, migration may have high overhead. In an optimal case, the OS may automatically detect which thread access which data most frequently, or it may replicate data which are read by multiple threads, or migrate data which are modified and used by threads residing on remote locality groups. However, the OS often does a reasonable job, if the machine is not overloaded, if the application carefully uses first-touch allocation, and if the program does not change its behavior with respect to locality.</p><p>The affinity interface we propose should help computers achieve a much higher fraction of peak memory bandwidth when using parallel algorithms. In the future, we plan to extend this to heterogeneous and distributed computing. This follows the lead of OpenMP <a href="https://link.springer.com/chapter/10.1007/978-3-642-30961-8_2">[2]</a>, which has plans to integrate its affinity model with its heterogeneity model [3]. (One of the authors of this document participated in the design of OpenMP’s affinity model.)</p><h2 id="motivational-examples"><a name="motivational-examples" href="#motivational-examples"></a>Motivational examples</h2><p>To identify the requirements for supporting affinity we have looked at a number of use cases where affinity between memory locality and execution can provide better performance.</p><p>Consider the following code example <em>(Listing 1)</em> where the C++17 parallel STL algorithm <code>for_each</code> is used to modify the elements of a <code>std::vector</code> <code>data</code> on an <em>executor</em> that will execute on a NUMA system with a number of CPU cores. However the memory is allocated by the <code>std::vector</code> default allocator immediately during the construction of <code>data</code> on memory local to the calling thread of execution. This means that the memory allocated for <code>data</code> may have poor locality to all of the NUMA regions on the system, other than the one in which the constructor executed. Therefore, accesses in the parallel <code>for_each</code> made by threads in other NUMA regions will incur high latency. In this example, this is avoided by migrating <code>data</code> to have better affinity with the NUMA regions on the system using an <em>executor</em> with the <code>bulk_execution_affinity.scatter</code> property applied, before it is accessed by the <code>for_each</code>. Note that a mechanism for migration is not yet specified in this paper, so this example currently uses an arbitrary vendor API, <code>vendor_api::migrate</code>. Our intention is that a future revision of this paper will specify a standard mechanism for migration</p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>// NUMA executor representing N NUMA regions.
numa_executor exec;

// Storage required for vector allocated on construction local to current thread
// of execution, (N == 0).
std::vector&amp;lt;float&amp;gt; data(N * SIZE);

// Require the NUMA executor to bind its migration of memory to the underlying
// memory resources in a scatter pattern.
auto affinityExec = std::execution::require(exec,
  bulk_execution_affinity.scatter);

// Migrate the memory allocated for the vector across the NUMA regions in a
// scatter pattern.
vendor_api::migrate(data, affinityExec);

// Placement of data is local to NUMA region 0, so data for execution on other
// NUMA nodes must is migrated when accessed.
std::for_each(std::execution::par.on(affinityExec), std::begin(data),
  std::end(data), [=](float &amp;amp;value) { do_something(value): });
</code></pre>"><span class="hljs-comment">// NUMA executor representing N NUMA regions.</span>
numa_executor exec;

<span class="hljs-comment">// Storage required for vector allocated on construction local to current thread</span>
<span class="hljs-comment">// of execution, (N == 0).</span>
<span class="hljs-built_in">std</span>::<span class="hljs-built_in">vector</span>&lt;<span class="hljs-keyword">float</span>&gt; data(N * SIZE);

<span class="hljs-comment">// Require the NUMA executor to bind its migration of memory to the underlying</span>
<span class="hljs-comment">// memory resources in a scatter pattern.</span>
<span class="hljs-keyword">auto</span> affinityExec = <span class="hljs-built_in">std</span>::execution::require(exec,
  bulk_execution_affinity.scatter);

<span class="hljs-comment">// Migrate the memory allocated for the vector across the NUMA regions in a</span>
<span class="hljs-comment">// scatter pattern.</span>
vendor_api::migrate(data, affinityExec);

<span class="hljs-comment">// Placement of data is local to NUMA region 0, so data for execution on other</span>
<span class="hljs-comment">// NUMA nodes must is migrated when accessed.</span>
<span class="hljs-built_in">std</span>::for_each(<span class="hljs-built_in">std</span>::execution::par.on(affinityExec), <span class="hljs-built_in">std</span>::begin(data),
  <span class="hljs-built_in">std</span>::end(data), [=](<span class="hljs-keyword">float</span> &amp;value) { do_something(value): });
</code></pre><p><em>Listing 1: Migrating previously allocated memory.</em></p><p>Now consider a similar code example <em>(Listing 2)</em> where again the C++17 parallel STL algorithm <code>for_each</code> is used to modify the elements of a <code>std::vector</code> <code>data</code> on an <em>executor</em> that will execute on a NUMA system with a number of CPU cores. However, instead of migrating <code>data</code> to have affinity with the NUMA regions, <code>data</code> is allocated within a bulk execution by an <em>executor</em> with the <code>bulk_execution_affinity.scatter</code> property applied so that <code>data</code> is allocated with affinity. Then when the <code>for_each</code> is called with the same executor, <code>data</code> maintains its affinity with the NUMA regions.</p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>// NUMA executor representing N NUMA regions.
numa_executor exec;

// Reserve space in a vector for a unique_ptr for each index in the bulk
// execution.
std::vector&amp;lt;std::unique_ptr&amp;lt;float[SIZE]&amp;gt;&amp;gt; data{};
data.reserve(N);

// Require the NUMA executor to bind its allocation of memory to the underlying
// memory resources in a scatter patter.
auto affinityExec = std::execution::require(exec,
  bulk_execution_affinity.scatter);

// Launch a bulk execution that will allocate each unique_ptr in the vector with
// locality to the nearest NUMA region.
affinityExec.bulk_execute([&amp;amp;](size_t id) {
  data[id] = std::make_unique&amp;lt;float&amp;gt;(); }, N, sharedFactory);

// Execute a for_each using the same executor so that each unique_ptr in the
// vector maintains its locality.
std::for_each(std::execution::par.on(affinityExec), std::begin(data),
  std::end(data), [=](float &amp;amp;value) { do_something(value): });
</code></pre>"><span class="hljs-comment">// NUMA executor representing N NUMA regions.</span>
numa_executor exec;

<span class="hljs-comment">// Reserve space in a vector for a unique_ptr for each index in the bulk</span>
<span class="hljs-comment">// execution.</span>
<span class="hljs-built_in">std</span>::<span class="hljs-built_in">vector</span>&lt;<span class="hljs-built_in">std</span>::unique_ptr&lt;<span class="hljs-keyword">float</span>[SIZE]&gt;&gt; data{};
data.reserve(N);

<span class="hljs-comment">// Require the NUMA executor to bind its allocation of memory to the underlying</span>
<span class="hljs-comment">// memory resources in a scatter patter.</span>
<span class="hljs-keyword">auto</span> affinityExec = <span class="hljs-built_in">std</span>::execution::require(exec,
  bulk_execution_affinity.scatter);

<span class="hljs-comment">// Launch a bulk execution that will allocate each unique_ptr in the vector with</span>
<span class="hljs-comment">// locality to the nearest NUMA region.</span>
affinityExec.bulk_execute([&amp;](size_t id) {
  data[id] = <span class="hljs-built_in">std</span>::make_unique&lt;<span class="hljs-keyword">float</span>&gt;(); }, N, sharedFactory);

<span class="hljs-comment">// Execute a for_each using the same executor so that each unique_ptr in the</span>
<span class="hljs-comment">// vector maintains its locality.</span>
<span class="hljs-built_in">std</span>::for_each(<span class="hljs-built_in">std</span>::execution::par.on(affinityExec), <span class="hljs-built_in">std</span>::begin(data),
  <span class="hljs-built_in">std</span>::end(data), [=](<span class="hljs-keyword">float</span> &amp;value) { do_something(value): });
</code></pre><p><em>Listing 2: Aligning memory and process affinity.</em></p><h1 id="background-research"><a name="background-research" href="#background-research"></a>Background Research</h1><p>In this paper we describe the problem space of affinity for C++, the various challenges which need to be addressed in defining a partitioning and affinity interface for C++, and some suggested solutions. These include:</p><ul>
<li>How to migrate memory work and memory allocations between execution resources.</li><li>How to query affinity properties between different <em>executors</em>.</li><li>How to bind execution and allocation particular <em>executors</em>.</li></ul><p>Wherever possible, we also evaluate how an affinity-based solution could be scaled to support both distributed and heterogeneous systems.</p><h2 id="state-of-the-art"><a name="state-of-the-art" href="#state-of-the-art"></a>State of the art</h2><p>The <em>affinity problem</em> existed for some time, and there are a number of third-party libraries and standards which provide APIs to solve the problem. In order to standardize this process for C++, we must carefully look at all of these approaches and identify which ideas are suitable for adoption into C++. Below is a list of the libraries and standards from which this proposal will draw:</p><ul>
<li>Portable Hardware Locality <a href="https://www.open-mpi.org/projects/hwloc/">[4]</a></li><li>SYCL 1.2 <a href="https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf">[5]</a></li><li>OpenCL 2.2 <a href="https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf">[6]</a></li><li>HSA <a href="http://www.hsafoundation.com/standards/">[7]</a></li><li>OpenMP 5.0 <a href="http://www.openmp.org/wp-content/uploads/openmp-TR5-final.pdf">[8]</a></li><li>cpuaff <a href="https://github.com/dcdillon/cpuaff">[9]</a></li><li>Persistent Memory Programming <a href="http://pmem.io/">[10]</a></li><li>MEMKIND <a href="https://github.com/memkind/memkind">[11]</a></li><li>Solaris pbind() <a href="https://docs.oracle.com/cd/E26502_01/html/E29031/pbind-1m.html">[12]</a></li><li>Linux sched_setaffinity() <a href="https://linux.die.net/man/2/sched_setaffinity">[13]</a></li><li>Windows SetThreadAffinityMask() <a href="https://msdn.microsoft.com/en-us/library/windows/desktop/ms686247(v=vs.85).aspx">[14]</a></li><li>Chapel <a href="https://chapel-lang.org/">[15]</a></li><li>X10 <a href="http://x10-lang.org/">[16]</a></li><li>UPC++ <a href="https://bitbucket.org/berkeleylab/upcxx/wiki/Home">[17]</a></li><li>TBB <a href="https://www.threadingbuildingblocks.org/">[18]</a></li><li>HPX <a href="https://github.com/STEllAR-GROUP/hpx">[19]</a></li><li>MADNESS <a href="https://github.com/m-a-d-n-e-s-s/madness">[20]</a><a href="http://dx.doi.org/10.1137/15M1026171">[32]</a></li></ul><p>Libraries such as the <a href="https://www.open-mpi.org/projects/hwloc/">Portable Hardware Locality (hwloc) library</a> provide a low-level of hardware abstraction, and offer a solution for the portability problem by supporting many platforms and operating systems. This and similar approaches use a tree structure to represent details of CPUs and the memory system. However, even some current systems cannot be represented correctly by a tree, if the number of hops between two sockets varies between socket pairs <a href="https://link.springer.com/chapter/10.1007/978-3-642-30961-8_2">[2]</a>.</p><p>Some systems give additional user control through explicit binding of threads to processors through environment variables consumed by various compilers, system commands, or system calls.  Examples of system commands include Linux’s <code>taskset</code> and <code>numactl</code>, and Windows’ <code>start /affinity</code>.  System call examples include Solaris’ <code>pbind()</code>, Linux’s <code>sched_setaffinity()</code>, and Windows’ <code>SetThreadAffinityMask()</code>.</p><h2 id="relative-affinity-of-execution-resources"><a name="relative-affinity-of-execution-resources" href="#relative-affinity-of-execution-resources"></a>Relative affinity of execution resources</h2><p>In order to make decisions about where to place execution or allocate memory in a given <em>system’s resource topology</em>, it is important to understand the concept of affinity between different hardware and software resources. This is usually expressed in terms of latency between two resources. Distance does not need to be symmetric in all architectures. The relative position of two components in a system’s topology does not necessarily indicate their affinity. For example, two cores from two different CPU sockets may have the same latency to access the same NUMA memory node.</p><p>This can be scaled to heterogeneous and distributed systems, as the relative affinity between components can apply to discrete heterogeneous and distributed systems as well.</p><h2 id="inaccessible-memory"><a name="inaccessible-memory" href="#inaccessible-memory"></a>Inaccessible memory</h2><p>The initial solution proposed by this paper may only target systems with a single addressable memory region. It may therefore exclude certain heterogeneous devices such as some discrete GPUs. However, in order to maintain a unified interface going forward, the initial solution should consider these devices and be able to scale to support them in the future.</p><h1 id="proposal"><a name="proposal" href="#proposal"></a>Proposal</h1><h2 id="overview"><a name="overview" href="#overview"></a>Overview</h2><p>In this paper we propose executor properties that can be used for querying the affinity between different hardware and software resources within a system available that are available to executors and to require binding of <em>execution agents</em> to the underlying hardware or software resources in order to achieve performance through data locality. These properties provide a low granularity and is aimed at users who may have little or no knowledge of the system architecture.</p><p>The interface described in this paper builds on the existing interface for executors and execution contexts defined in the executors proposal <a href="http://wg21.link/p0443">[22]</a>.</p><h2 id="execution-resources"><a name="execution-resources" href="#execution-resources"></a>Execution resources</h2><p>An <em>execution resource</em> represents an abstraction of a hardware or software layer that guarantees a particular set of affinity properties, where the level of abstraction is implementation-defined. An implementation is permitted to migrate any underlying resources providing it guarantees the affinity properties remain consistent. This allows freedom for the implementor but also consistency for users.</p><p>If an <em>execution resource</em> is valid, then it must always point to the same underlying thing. For example, a <em>resource</em> cannot first point to one CPU core, and then suddenly point to a different CPU core. An <em>execution context</em> can thus rely on properties like binding of operating system threads to CPU cores. However, the “thing” to which an <em>execution resource</em> points may be a dynamic, possibly a software-managed pool of hardware. Here are three examples of this phenomenon:</p><ol>
<li>The “hardware” may actually be a virtual machine (VM). At any point, the VM may pause, migrate to different physical hardware, and resume. If the VM presents the same virtual hardware before and after the migration, then the <em>resources</em> that an application running on the VM sees should not change.</li><li>The OS may maintain a pool of a varying number of CPU cores as a shared resource among different user-level processes. When a process stops using the resource, the OS may reclaim cores. It may make sense to present this pool as an <em>execution resource</em>.</li><li>A low-level device driver on a laptop may switch between a “discrete” GPU and an “integrated” GPU, depending on utilization and power constraints. If the two GPUs have the same instruction set and can access the same memory, it may make sense to present them as a “virtualized” single <em>execution resource</em>.</li></ol><p>In summary, an <em>execution resource</em> either identifies a thing uniquely, or harmlessly points to nothing.</p><h2 id="header-``-synopsis"><a name="header-``-synopsis" href="#header-``-synopsis"></a>Header <code>&lt;execution&gt;</code> synopsis</h2><p>Below <em>(Listing 3)</em> is a proposed extension to the <code>&lt;execution&gt;</code> header.</p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>namespace std {
namespace experimental {
namespace execution {

// Bulk execution affinity properties

struct bulk_execution_affinity_t;

constexpr bulk_execution_affinity_t bulk_execution_affinity;

// Concurrency property

struct concurrency_t;

constexpr concurrency_t concurrency;

// Execution locality intersection property

struct execution_locality_intersection_t;

constexpr execution_locality_intersection_t&amp;lt;DestExecutor&amp;gt;;

// Memory locality intersection property

struct memory_locality_intersection_t;

constexpr memory_locality_intersection_t memory_locality_intersection;

}  // execution
}  // experimental
}  // std
</code></pre>"><span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span> {
<span class="hljs-keyword">namespace</span> experimental {
<span class="hljs-keyword">namespace</span> execution {

<span class="hljs-comment">// Bulk execution affinity properties</span>

<span class="hljs-keyword">struct</span> bulk_execution_affinity_t;

<span class="hljs-keyword">constexpr</span> bulk_execution_affinity_t bulk_execution_affinity;

<span class="hljs-comment">// Concurrency property</span>

<span class="hljs-keyword">struct</span> concurrency_t;

<span class="hljs-keyword">constexpr</span> concurrency_t concurrency;

<span class="hljs-comment">// Execution locality intersection property</span>

<span class="hljs-keyword">struct</span> execution_locality_intersection_t;

<span class="hljs-keyword">constexpr</span> execution_locality_intersection_t&lt;DestExecutor&gt;;

<span class="hljs-comment">// Memory locality intersection property</span>

<span class="hljs-keyword">struct</span> memory_locality_intersection_t;

<span class="hljs-keyword">constexpr</span> memory_locality_intersection_t memory_locality_intersection;

}  <span class="hljs-comment">// execution</span>
}  <span class="hljs-comment">// experimental</span>
}  <span class="hljs-comment">// std</span>
</code></pre><p><em>Listing 3: Header synopsis</em></p><h2 id="bulk-execution-affinity-properties"><a name="bulk-execution-affinity-properties" href="#bulk-execution-affinity-properties"></a>Bulk execution affinity properties</h2><p>We propose an executor property group called <code>bulk_execution_affinity</code> which contains the nested properties <code>none</code>, <code>balanced</code>, <code>scatter</code> or <code>compact</code>. Each of these properties, if applied to an <em>executor</em> enforces a particular guarantee of binding <em>execution agents</em> to the <em>execution resources</em> associated with the <em>executor</em> in a particular pattern.</p><h3 id="example"><a name="example" href="#example"></a>Example</h3><p>Below is an example <em>(Listing 4)</em> of executing a parallel task over 8 threads using <code>bulk_execute</code>, with the affinity binding <code>bulk_execution_affinity.scatter</code>. We request affinity binding using <code>prefer</code> and then check to see if the executor is able to support it using <code>query</code>.</p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>{
  bulk_executor exec;

  auto affExec = execution::prefer(exec,
    execution::bulk_execution_affinity.scatter);

  if (execution::query(affExec, execution::bulk_execution_affinity.scatter)) {
    std::cout &amp;lt;&amp;lt; &quot;bulk_execute using bulk_execution_affinity.scatter&quot;
      &amp;lt;&amp;lt; std::endl;
  }

  affExec.bulk_execute([](std::size_t i, shared s) {
    func(i);
  }, 8, sharedFactory);
}
</code></pre>">{
  bulk_executor exec;

  <span class="hljs-keyword">auto</span> affExec = execution::prefer(exec,
    execution::bulk_execution_affinity.scatter);

  <span class="hljs-keyword">if</span> (execution::query(affExec, execution::bulk_execution_affinity.scatter)) {
    <span class="hljs-built_in">std</span>::<span class="hljs-built_in">cout</span> &lt;&lt; <span class="hljs-string">"bulk_execute using bulk_execution_affinity.scatter"</span>
      &lt;&lt; <span class="hljs-built_in">std</span>::endl;
  }

  affExec.bulk_execute([](<span class="hljs-built_in">std</span>::size_t i, shared s) {
    func(i);
  }, <span class="hljs-number">8</span>, sharedFactory);
}
</code></pre><p><em>Listing 4: Example of using the bulk_execution_affinity property</em></p><h3 id="proposed-wording"><a name="proposed-wording" href="#proposed-wording"></a>Proposed Wording</h3><p>The <code>bulk_execution_affinity_t</code> property is a behavioral property as defined in P0443 <a href="http://wg21.link/p0443">[22]</a> which describes the guarantees an executor provides to the binding of <em>execution agents</em> created by a call to <code>bulk_execute</code> to the underlying <em>threads of execution</em> and to the locality of those <em>threads of execution</em>.</p><p>The <code>bulk_execution_affinity_t</code> property provides nested property types and objects as described below, where:</p><ul>
<li><code>e</code> denotes an executor object of type <code>E</code>,</li><li><code>f</code> denotes a function object of type <code>F&amp;&amp;</code>,</li><li><code>s</code> denotes a shape object of type <code>execution::executor_shape&lt;E&gt;</code>, and</li><li><code>sf</code> denotes a function object of type <code>SF</code>.</li></ul><table>
<thead>
<tr>
<th>Nested Property Type</th>
<th>Nested Property Name</th>
<th>Requirements</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>bulk_execution_affinity_t::none_t</code></td>
<td><code>bulk_execution_affinity_t::none</code></td>
<td>A call to <code>e.bulk_execute(f, s, sf)</code> has no requirements on the binding of <em>execution agents</em> to the underlying <em>execution resources</em>.</td>
<td></td>
</tr>
<tr>
<td><code>bulk_execution_affinity_t::scatter_t</code></td>
<td><code>bulk_execution_scatter_t::scatter</code></td>
<td>A call to <code>e.bulk_execute(f, s, sf)</code> must bind the created <em>execution agents</em> to the underlying <em>execution resources</em> (ordered by physical closeness) such that they are distributed equally across the <em>execution resources</em> in a round-robin fashion. <br><br> If the execution context associated with <code>e</code> fails to bind the created <em>execution agents</em>  to the underlying <em>execution resources</em> then <code>bulk_execute</code> must throw an exception.</td>
</tr>
<tr>
<td><code>bulk_execution_affinity_t::compact_t</code></td>
<td><code>bulk_execution_compact_t::compact</code></td>
<td>A call to <code>e.bulk_execute(f, s, sf)</code> must bind the created <em>execution agents</em> to the underlying <em>execution resources</em> such that they are distributed as close as possible to the <em>execution resource</em> of the <em>thread of execution</em> which created them. <br><br> If the execution context associated with <code>e</code> fails to bind the created <em>execution agents</em>  to the underlying <em>execution resources</em> then <code>bulk_execute</code> must throw an exception.</td>
</tr>
<tr>
<td>bulk_execution_affinity_t::balanced_t</td>
<td>bulk_execution_balanced_t::balanced</td>
<td>A call to <code>e.bulk_execute(f, s, sf)</code> must bind the created <em>execution agents</em> to the underlying <em>execution resources</em> (ordered by physical closeness) such that they are distributed equally across the <em>execution resources</em> in a bin packing fashion. <br><br> If the execution context associated with <code>e</code> fails to bind the created <em>execution agents</em>  to the underlying <em>execution resources</em> then <code>bulk_execute</code> must throw an exception.</td>
</tr>
</tbody>
</table><blockquote>
<p>[<em>Note:</em> The requirements of the <code>bulk_execution_affinity_t</code> nested properties do not enforce a specific binding, simply that the binding follows the requirements set out above and that the pattern is consistent across invocations of the bulk execution functions. <em>—end note</em>]</p>
<p>[<em>Note:</em> It’s expected that the default value of <code>bulk_execution_affinity_t</code> for most executors be <code>bulk_execution_affinity_t::none_t</code>. <em>—end note</em>]</p>
<p><a href="http://pages.tacc.utexas.edu/~eijkhout/pcse/html/omp-affinity.html"><em>Note:</em> The terms used for the <code>bulk_execution_affinity_t</code> nested properties are derived from the OpenMP properties [[33]</a> including the Intel specific balanced affinity binding <a href="https://software.intel.com/en-us/node/522518">[[34]</a> <em>—end note</em>]</p>
</blockquote><p>For any two invocations; <code>e1.bulk_execute(f1, s1, sf1)</code> and <code>e2.bulk_execute(f2, s2, sf2)</code>, the binding of <em>execution agents</em> to the underlying <em>execution resources</em> must be consistent, if:</p><ul>
<li><code>e1 == e2</code>,</li><li><code>execution::query(e1, execution::bulk_execution_affinity) != execution::bulk_execution_affinity.none</code>, and</li><li><code>s1 == s2</code>.</li></ul><blockquote>
<p>[<em>Note:</em> If you have two invocation of <code>bulk_execute</code> where the binding of <em>execution agents</em> to the underlying <em>execution resources</em> is guaranteed to be consistent, this can lead to limitations of resource utilization. <em>—end note</em>]</p>
<p>[<em>Note:</em> If two <em>executors</em> <code>e1</code> and <code>e2</code> invoke a bulk execution function in order, where <code>execution::query(e1, execution::context) == query(e2, execution::context)</code> is <code>true</code> and <code>execution::query(e1, execution::bulk_execution_affinity) == query(e2, execution::bulk_execution_affinity)</code> is <code>false</code>, this will likely result in <code>e1</code> binding <em>execution agents</em> if necessary to achieve the requested affinity pattern and then <code>e2</code> rebinding to achieve the new affinity pattern. Rebinding <em>execution agents</em> to <em>execution resources</em> may take substantial time and may affect performance of subsequent code. <em>—end note</em>]</p>
</blockquote><h2 id="concurrency-property"><a name="concurrency-property" href="#concurrency-property"></a>Concurrency property</h2><p>We propose a query-only executor property called <code>concurrency_t</code> which returns the maximum potential concurrency available to <em>executor</em>.</p><h3 id="example"><a name="example" href="#example"></a>Example</h3><p>Below is an example <em>(Listing 5)</em> of querying an executor for the maximum concurrency it can provide via <code>concurrency</code>.</p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>{
  executor exec;

  auto maxConcurrency = execution::query(exec, execution::concurrency);
}
</code></pre>">{
  executor exec;

  <span class="hljs-keyword">auto</span> maxConcurrency = execution::query(exec, execution::concurrency);
}
</code></pre><p><em>Listing 5: Example of using the concurrency property</em></p><h2 id="proposed-wording"><a name="proposed-wording" href="#proposed-wording"></a>Proposed Wording</h2><p>The <code>concurrency_t</code> property <em>(Listing 6)</em> is a query-only property as defined in P0443 <a href="http://wg21.link/p0443">[22]</a>. </p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>struct concurrency_t
{
  static constexpr bool is_requirable = false;
  static constexpr bool is_preferable = false;

  using polymorphic_query_result_type = size_t;

  template&amp;lt;class Executor&amp;gt;
    static constexpr decltype(auto) static_query_v
      = Executor::query(concurrency_t());
};
</code></pre>"><span class="hljs-keyword">struct</span> concurrency_t
{
  <span class="hljs-keyword">static</span> <span class="hljs-keyword">constexpr</span> <span class="hljs-keyword">bool</span> is_requirable = <span class="hljs-keyword">false</span>;
  <span class="hljs-keyword">static</span> <span class="hljs-keyword">constexpr</span> <span class="hljs-keyword">bool</span> is_preferable = <span class="hljs-keyword">false</span>;

  <span class="hljs-keyword">using</span> polymorphic_query_result_type = size_t;

  <span class="hljs-keyword">template</span>&lt;<span class="hljs-keyword">class</span> Executor&gt;
    <span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">constexpr</span> <span class="hljs-title">decltype</span><span class="hljs-params">(<span class="hljs-keyword">auto</span>)</span> static_query_v
      </span>= Executor::query(concurrency_t());
};
</code></pre><p><em>Listing 6: Proposed specification for concurrency_t</em></p><p>The <code>concurrency_t</code> property can be used only with <code>query</code>, which returns the maximum potential concurrency available to the executor. If the value is not well defined or not computable, <code>0</code> is returned.</p><p>The value returned from <code>execution::query(e, concurrency_t)</code>, where <code>e</code> is an executor, shall not change between invocations.</p><blockquote>
<p>[<em>Note:</em> The expectation here is that the maximum available concurrency for an <em>executor</em> as described here is equivalent to calling <code>this_thread::hardware_concurrency()</code> <em>—end note</em>]</p>
</blockquote><h2 id="execution-locality-intersection-property"><a name="execution-locality-intersection-property" href="#execution-locality-intersection-property"></a>Execution locality intersection property</h2><p>We propose a query-only executor property called <code>execution_locality_intersection_t</code> which returns the maximum potential concurrency that ia available to both of two <em>executors</em>.</p><h3 id="example"><a name="example" href="#example"></a>Example</h3><p>Below is an example <em>(Listing 7)</em> of querying whether two <em>executors</em> have overlapping maximum concurrency using <code>execution_locality_intersection</code>.</p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>{
  executor_a execA;
  executor_b execB;

  auto concurrencyOverlap = execution::query(execA,
    execution::execution_locality_intersection(execB));
}
</code></pre>">{
  executor_a execA;
  executor_b execB;

  <span class="hljs-keyword">auto</span> concurrencyOverlap = execution::query(execA,
    execution::execution_locality_intersection(execB));
}
</code></pre><p><em>Listing 7: Example of using the concurrency property</em></p><h2 id="proposed-wording"><a name="proposed-wording" href="#proposed-wording"></a>Proposed Wording</h2><p>The <code>execution_locality_intersection_t</code> property <em>(Listing 8)</em> is a query-only property as defined in P0443 <a href="http://wg21.link/p0443">[22]</a>. </p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>struct execution_locality_intersection_t
{
  static constexpr bool is_requirable = false;
  static constexpr bool is_preferable = false;

  using polymorphic_query_result_type = size_t;

  template&amp;lt;class Executor, class DestExecutor&amp;gt;
    static constexpr decltype(auto) static_query_v
      = Executor::query(execution_locality_intersection_t{}(DestExecutor{})));

  template &amp;lt;class DestExecutor&amp;gt;
  size_t operator()(DestExecutor &amp;amp;&amp;amp;d);
};
</code></pre>"><span class="hljs-keyword">struct</span> execution_locality_intersection_t
{
  <span class="hljs-keyword">static</span> <span class="hljs-keyword">constexpr</span> <span class="hljs-keyword">bool</span> is_requirable = <span class="hljs-keyword">false</span>;
  <span class="hljs-keyword">static</span> <span class="hljs-keyword">constexpr</span> <span class="hljs-keyword">bool</span> is_preferable = <span class="hljs-keyword">false</span>;

  <span class="hljs-keyword">using</span> polymorphic_query_result_type = size_t;

  <span class="hljs-keyword">template</span>&lt;<span class="hljs-keyword">class</span> Executor, <span class="hljs-keyword">class</span> DestExecutor&gt;
    <span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">constexpr</span> <span class="hljs-title">decltype</span><span class="hljs-params">(<span class="hljs-keyword">auto</span>)</span> static_query_v
      </span>= Executor::query(execution_locality_intersection_t{}(DestExecutor{})));

  <span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">class</span> DestExecutor&gt;
  <span class="hljs-function">size_t <span class="hljs-title">operator</span><span class="hljs-params">()</span><span class="hljs-params">(DestExecutor &amp;&amp;d)</span></span>;
};
</code></pre><p><em>Listing 8: Proposed specification for execution_locality_intersection_t</em></p><p>The <code>execution_locality_intersection_t</code> property can be used only with <code>query</code>, which returns the maximum potential concurrency available to both <em>executors</em>. If the value is not well defined or not computable, <code>0</code> is returned.</p><p>The value returned from <code>execution::query(e1, execution_locality_intersection_t(e2))</code>, where <code>e1</code> and <code>e2</code> are executors, shall not change between invocations.</p><blockquote>
<p>[<em>Note:</em> The expectation here is that the maximum available concurrency for an <em>executor</em> as described here is equivalent to calling <code>this_thread::hardware_concurrency()</code> <em>—end note</em>]</p>
</blockquote><h2 id="memory-locality-intersection-property"><a name="memory-locality-intersection-property" href="#memory-locality-intersection-property"></a>Memory locality intersection property</h2><p>We propose a query-only executor property called <code>execution_locality_intersection_t</code> which specifies whether two <em>executors</em> share a common memory locality, such that memory allocated by those <em>executors</em> both have similar affinity.</p><p>This is useful for determining whether memory local to one <em>executor</em> would require migration in order to be local to another <em>executor</em>.</p><h3 id="example"><a name="example" href="#example"></a>Example</h3><p>Below is an example <em>(Listing 9)</em> of querying whether two <em>executors</em> have common memory locality <code>execution_locality_intersection</code>.</p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>{
  executor_a execA;
  executor_b execB;

  auto concurrencyOverlap = execution::query(execA,
    execution::execution_locality_intersection(execB));
}
</code></pre>">{
  executor_a execA;
  executor_b execB;

  <span class="hljs-keyword">auto</span> concurrencyOverlap = execution::query(execA,
    execution::execution_locality_intersection(execB));
}
</code></pre><p><em>Listing 9: Example of using the concurrency property</em></p><h2 id="proposed-wording"><a name="proposed-wording" href="#proposed-wording"></a>Proposed Wording</h2><p>The <code>memory_locality_intersection_t</code> property <em>(Listing 10)</em> is a query-only property as defined in P0443 <a href="http://wg21.link/p0443">[22]</a>. </p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>struct memory_locality_intersection_t
{
  static constexpr bool is_requirable = false;
  static constexpr bool is_preferable = false;

  using polymorphic_query_result_type = bool;

  template&amp;lt;class Executor, class DestExecutor&amp;gt;
    static constexpr decltype(auto) static_query_v
      = Executor::query(memory_locality_intersection_t{}(DestExecutor{})));

  template &amp;lt;class DestExecutor&amp;gt;
  bool operator()(DestExecutor &amp;amp;&amp;amp;d);
};
</code></pre>"><span class="hljs-keyword">struct</span> memory_locality_intersection_t
{
  <span class="hljs-keyword">static</span> <span class="hljs-keyword">constexpr</span> <span class="hljs-keyword">bool</span> is_requirable = <span class="hljs-keyword">false</span>;
  <span class="hljs-keyword">static</span> <span class="hljs-keyword">constexpr</span> <span class="hljs-keyword">bool</span> is_preferable = <span class="hljs-keyword">false</span>;

  <span class="hljs-keyword">using</span> polymorphic_query_result_type = <span class="hljs-keyword">bool</span>;

  <span class="hljs-keyword">template</span>&lt;<span class="hljs-keyword">class</span> Executor, <span class="hljs-keyword">class</span> DestExecutor&gt;
    <span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">constexpr</span> <span class="hljs-title">decltype</span><span class="hljs-params">(<span class="hljs-keyword">auto</span>)</span> static_query_v
      </span>= Executor::query(memory_locality_intersection_t{}(DestExecutor{})));

  <span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">class</span> DestExecutor&gt;
  <span class="hljs-function"><span class="hljs-keyword">bool</span> <span class="hljs-title">operator</span><span class="hljs-params">()</span><span class="hljs-params">(DestExecutor &amp;&amp;d)</span></span>;
};
</code></pre><p><em>Listing 10: Proposed specification for memory_locality_intersection_t</em></p><p>The <code>memory_locality_intersection_t</code> property can be used only with <code>query</code>, which returns <code>true</code> if both <em>executors</em> share a common address space, and <code>false</code> otherwise. If the value is not well defined or not computable, <code>false</code> is returned.</p><p>The value returned from <code>execution::query(e1, memory_locality_intersection_t(e2))</code>, where <code>e1</code> and <code>e2</code> are executors, shall not change between invocations.</p><h1 id="future-work"><a name="future-work" href="#future-work"></a>Future Work</h1><p>There are a number of additional features which we are considering for inclusion in this paper but are not ready yet.</p><h2 id="migrating-data"><a name="migrating-data" href="#migrating-data"></a>Migrating data</h2><p>This paper currently provides a mechanism for detecting whether two <em>executors</em> share a common memory locality. However, it does not provide a way to invoke migration of data allocated local to one <em>executor</em> into the locality of another <em>executor</em>.</p><p>We envision that this mechanic could be facilitated by a customization point on two <em>executors</em> and perhaps a <code>span</code> or <code>mdspan</code> accessor.</p><h2 id="supporting-different-affinity-domains"><a name="supporting-different-affinity-domains" href="#supporting-different-affinity-domains"></a>Supporting different affinity domains</h2><p>This paper currently assumes a NUMA-like system, however there are many other kinds of systems with many different architectures with different kinds of processors, memory and connections between them.</p><p>In order to accurately take advantage of the range of systems available now and in the future we will need some way to parameterize or enumerate the different affinity domains which an executor can structure around.</p><p>Furthermore, in order to have control over those affinity domains we need a way in which to mask out the components of that domain that we wish to work with.</p><p>However, whichever option we opt for, it must be in such a way as to allow further additions as new system architectures become available.</p><h1 id="acknowledgments"><a name="acknowledgments" href="#acknowledgments"></a>Acknowledgments</h1><p>Thanks to Christopher Di Bella, Toomas Remmelg, and Morris Hafner for their reviews and suggestions.</p><h1 id="references"><a name="references" href="#references"></a>References</h1><p><a href="http://wg21.link/p0687">[1]</a> P0687: Data Movement in C++</p><p><a href="https://link.springer.com/chapter/10.1007/978-3-642-30961-8_2">[2]</a> The Design of OpenMP Thread Affinity</p><p>[3] Euro-Par 2011 Parallel Processing: 17th International, Affinity Matters</p><p><a href="https://www.open-mpi.org/projects/hwloc/">[4]</a> Portable Hardware Locality</p><p><a href="https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf">[5]</a> SYCL 1.2.1</p><p><a href="https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf">[6]</a> OpenCL 2.2</p><p><a href="http://www.hsafoundation.com/standards/">[7]</a> HSA</p><p><a href="http://www.openmp.org/wp-content/uploads/openmp-TR5-final.pdf">[8]</a> OpenMP 5.0</p><p><a href="https://github.com/dcdillon/cpuaff">[9]</a> cpuaff</p><p><a href="http://pmem.io/">[10]</a> Persistent Memory Programming</p><p><a href="https://github.com/memkind/memkind">[11]</a> MEMKIND</p><p><a href="https://docs.oracle.com/cd/E26502_01/html/E29031/pbind-1m.html">[12]</a> Solaris pbind()</p><p><a href="https://linux.die.net/man/2/sched_setaffinity">[13]</a> Linux sched_setaffinity() </p><p><a href="https://msdn.microsoft.com/en-us/library/windows/desktop/ms686247(v=vs.85).aspx">[14]</a> Windows SetThreadAffinityMask()</p><p><a href="https://chapel-lang.org/">[15]</a> Chapel</p><p><a href="http://x10-lang.org/">[16]</a> X10</p><p><a href="https://bitbucket.org/berkeleylab/upcxx/wiki/Home">[17]</a> UPC++</p><p><a href="https://www.threadingbuildingblocks.org/">[18]</a> TBB</p><p><a href="https://github.com/STEllAR-GROUP/hpx">[19]</a> HPX</p><p><a href="https://github.com/m-a-d-n-e-s-s/madness">[20]</a> MADNESS</p><p><a href="https://www.open-mpi.org/projects/hwloc/lstopo/">[21]</a> Portable Hardware Locality Istopo</p><p><a href="http://wg21.link/p0443">[22]</a> A Unified Executors Proposal for C++</p><p><a href="http://wg21.link/p0737">[23]</a> P0737 : Execution Context of Execution Agents</p><p><a href="https://docs.google.com/viewer?a=v&amp;pid=sites&amp;srcid=bGJsLmdvdnxwYWRhbC13b3Jrc2hvcHxneDozOWE0MjZjOTMxOTk3NGU3">[24]</a> Exposing the Locality of new Memory Hierarchies to HPC Applications</p><p><a href="http://mpi-forum.org/docs/">[25]</a> MPI</p><p><a href="http://www.csm.ornl.gov/pvm/">[26]</a> Parallel Virtual Machine</p><p><a href="http://etutorials.org/Linux+systems/cluster+computing+with+linux/Part+II+Parallel+Programming/Chapter+11+Fault-Tolerant+and+Adaptive+Programs+with+PVM/11.2+Building+Fault-Tolerant+Parallel+Applications/">[27]</a> Building Fault-Tolerant Parallel Applications</p><p><a href="http://journals.sagepub.com/doi/10.1177/1094342013488238">[28]</a> Post-failure recovery of MPI communication capability</p><p><a href="http://www.mcs.anl.gov/~lusk/papers/fault-tolerance.pdf">[29]</a> Fault Tolerance in MPI Programs</p><p><a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0323r4.html">[30]</a> p0323r4 std::expected</p><p><a href="https://developer.movidius.com/">[31]</a>: Intel® Movidius™ Neural Compute Stick</p><p><a href="http://dx.doi.org/10.1137/15M1026171">[32]</a> MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation</p><p><a href="http://pages.tacc.utexas.edu/~eijkhout/pcse/html/omp-affinity.html">[33]</a> OpenMP topic: Affinity</p><p><a href="https://software.intel.com/en-us/node/522518">[34]</a> Balanced Affinity Type</p><p><a href="http://wg21.link/p0796">[35]</a> Supporting Heterogeneous &amp; Distributed Computing Through Affinity</p><p><a href="http://wg21.link/p1437">[36]</a> System topology discovery for heterogeneous &amp; distributed computing</p>

<footer style="position:fixed; font-size:.8em; text-align:right; bottom:0px; margin-left:-25px; height:20px; width:100%;">generated by <a href="http://pad.haroopress.com" target="_blank">haroopad</a></footer>
</body>
</html>
