<!DOCTYPE html>
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <title>index.html</title>
  <meta name="generator" content="Haroopad 0.13.1" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0">

  <style>div.oembedall-githubrepos{border:1px solid #DDD;border-radius:4px;list-style-type:none;margin:0 0 10px;padding:8px 10px 0;font:13.34px/1.4 helvetica,arial,freesans,clean,sans-serif;width:452px;background-color:#fff}div.oembedall-githubrepos .oembedall-body{background:-moz-linear-gradient(center top,#FAFAFA,#EFEFEF);background:-webkit-gradient(linear,left top,left bottom,from(#FAFAFA),to(#EFEFEF));border-bottom-left-radius:4px;border-bottom-right-radius:4px;border-top:1px solid #EEE;margin-left:-10px;margin-top:8px;padding:5px 10px;width:100%}div.oembedall-githubrepos h3{font-size:14px;margin:0;padding-left:18px;white-space:nowrap}div.oembedall-githubrepos p.oembedall-description{color:#444;font-size:12px;margin:0 0 3px}div.oembedall-githubrepos p.oembedall-updated-at{color:#888;font-size:11px;margin:0}div.oembedall-githubrepos ul.oembedall-repo-stats{border:none;float:right;font-size:11px;font-weight:700;padding-left:15px;position:relative;z-index:5;margin:0}div.oembedall-githubrepos ul.oembedall-repo-stats li{border:none;color:#666;display:inline-block;list-style-type:none;margin:0!important}div.oembedall-githubrepos ul.oembedall-repo-stats li a{background-color:transparent;border:none;color:#666!important;background-position:5px -2px;background-repeat:no-repeat;border-left:1px solid #DDD;display:inline-block;height:21px;line-height:21px;padding:0 5px 0 23px}div.oembedall-githubrepos ul.oembedall-repo-stats li:first-child a{border-left:medium none;margin-right:-3px}div.oembedall-githubrepos ul.oembedall-repo-stats li a:hover{background:5px -27px no-repeat #4183C4;color:#FFF!important;text-decoration:none}div.oembedall-githubrepos ul.oembedall-repo-stats li:first-child a:hover{border-bottom-left-radius:3px;border-top-left-radius:3px}ul.oembedall-repo-stats li:last-child a:hover{border-bottom-right-radius:3px;border-top-right-radius:3px}span.oembedall-closehide{background-color:#aaa;border-radius:2px;cursor:pointer;margin-right:3px}div.oembedall-container{margin-top:5px;text-align:left}.oembedall-ljuser{font-weight:700}.oembedall-ljuser img{vertical-align:bottom;border:0;padding-right:1px}.oembedall-stoqembed{border-bottom:1px dotted #999;float:left;overflow:hidden;width:730px;line-height:1;background:#FFF;color:#000;font-family:Arial,Liberation Sans,DejaVu Sans,sans-serif;font-size:80%;text-align:left;margin:0;padding:0}.oembedall-stoqembed a{color:#07C;text-decoration:none;margin:0;padding:0}.oembedall-stoqembed a:hover{text-decoration:underline}.oembedall-stoqembed a:visited{color:#4A6B82}.oembedall-stoqembed h3{font-family:Trebuchet MS,Liberation Sans,DejaVu Sans,sans-serif;font-size:130%;font-weight:700;margin:0;padding:0}.oembedall-stoqembed .oembedall-reputation-score{color:#444;font-size:120%;font-weight:700;margin-right:2px}.oembedall-stoqembed .oembedall-user-info{height:35px;width:185px}.oembedall-stoqembed .oembedall-user-info .oembedall-user-gravatar32{float:left;height:32px;width:32px}.oembedall-stoqembed .oembedall-user-info .oembedall-user-details{float:left;margin-left:5px;overflow:hidden;white-space:nowrap;width:145px}.oembedall-stoqembed .oembedall-question-hyperlink{font-weight:700}.oembedall-stoqembed .oembedall-stats{background:#EEE;margin:0 0 0 7px;padding:4px 7px 6px;width:58px}.oembedall-stoqembed .oembedall-statscontainer{float:left;margin-right:8px;width:86px}.oembedall-stoqembed .oembedall-votes{color:#555;padding:0 0 7px;text-align:center}.oembedall-stoqembed .oembedall-vote-count-post{font-size:240%;color:#808185;display:block;font-weight:700}.oembedall-stoqembed .oembedall-views{color:#999;padding-top:4px;text-align:center}.oembedall-stoqembed .oembedall-status{margin-top:-3px;padding:4px 0;text-align:center;background:#75845C;color:#FFF}.oembedall-stoqembed .oembedall-status strong{color:#FFF;display:block;font-size:140%}.oembedall-stoqembed .oembedall-summary{float:left;width:635px}.oembedall-stoqembed .oembedall-excerpt{line-height:1.2;margin:0;padding:0 0 5px}.oembedall-stoqembed .oembedall-tags{float:left;line-height:18px}.oembedall-stoqembed .oembedall-tags a:hover{text-decoration:none}.oembedall-stoqembed .oembedall-post-tag{background-color:#E0EAF1;border-bottom:1px solid #3E6D8E;border-right:1px solid #7F9FB6;color:#3E6D8E;font-size:90%;line-height:2.4;margin:2px 2px 2px 0;padding:3px 4px;text-decoration:none;white-space:nowrap}.oembedall-stoqembed .oembedall-post-tag:hover{background-color:#3E6D8E;border-bottom:1px solid #37607D;border-right:1px solid #37607D;color:#E0EAF1}.oembedall-stoqembed .oembedall-fr{float:right}.oembedall-stoqembed .oembedall-statsarrow{background-image:url(http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=3);background-repeat:no-repeat;overflow:hidden;background-position:0 -435px;float:right;height:13px;margin-top:12px;width:7px}.oembedall-facebook1{border:1px solid #1A3C6C;padding:0;font:13.34px/1.4 verdana;width:500px}.oembedall-facebook2{background-color:#627add}.oembedall-facebook2 a{color:#e8e8e8;text-decoration:none}.oembedall-facebookBody{background-color:#fff;vertical-align:top;padding:5px}.oembedall-facebookBody .contents{display:inline-block;width:100%}.oembedall-facebookBody div img{float:left;margin-right:5px}div.oembedall-lanyard{-webkit-box-shadow:none;-webkit-transition-delay:0s;-webkit-transition-duration:.4000000059604645s;-webkit-transition-property:width;-webkit-transition-timing-function:cubic-bezier(0.42,0,.58,1);background-attachment:scroll;background-clip:border-box;background-color:transparent;background-image:none;background-origin:padding-box;border-width:0;box-shadow:none;color:#112644;display:block;float:left;font-family:'Trebuchet MS',Trebuchet,sans-serif;font-size:16px;height:253px;line-height:19px;margin:0;max-width:none;min-height:0;outline:#112644 0;overflow-x:visible;overflow-y:visible;padding:0;position:relative;text-align:left;vertical-align:baseline;width:804px}div.oembedall-lanyard .tagline{font-size:1.5em}div.oembedall-lanyard .wrapper{overflow:hidden;clear:both}div.oembedall-lanyard .split{float:left;display:inline}div.oembedall-lanyard .prominent-place .flag:active,div.oembedall-lanyard .prominent-place .flag:focus,div.oembedall-lanyard .prominent-place .flag:hover,div.oembedall-lanyard .prominent-place .flag:link,div.oembedall-lanyard .prominent-place .flag:visited{float:left;display:block;width:48px;height:48px;position:relative;top:-5px;margin-right:10px}div.oembedall-lanyard .place-context{font-size:.889em}div.oembedall-lanyard .prominent-place .sub-place{display:block}div.oembedall-lanyard .prominent-place{font-size:1.125em;line-height:1.1em;font-weight:400}div.oembedall-lanyard .main-date{color:#8CB4E0;font-weight:700;line-height:1.1}div.oembedall-lanyard .first{width:48.57%;margin:0 0 0 2.857%}.mermaid .label{color:#333}.node circle,.node polygon,.node rect{fill:#cde498;stroke:#13540c;stroke-width:1px}.edgePath .path{stroke:green;stroke-width:1.5px}.cluster rect{fill:#cdffb2;rx:40;stroke:#6eaa49;stroke-width:1px}.cluster text{fill:#333}.actor{stroke:#13540c;fill:#cde498}text.actor{fill:#000;stroke:none}.actor-line{stroke:grey}.messageLine0{stroke-width:1.5;stroke-dasharray:"2 2";marker-end:"url(#arrowhead)";stroke:#333}.messageLine1{stroke-width:1.5;stroke-dasharray:"2 2";stroke:#333}#arrowhead{fill:#333}#crosshead path{fill:#333!important;stroke:#333!important}.messageText{fill:#333;stroke:none}.labelBox{stroke:#326932;fill:#cde498}.labelText,.loopText{fill:#000;stroke:none}.loopLine{stroke-width:2;stroke-dasharray:"2 2";marker-end:"url(#arrowhead)";stroke:#326932}.note{stroke:#6eaa49;fill:#fff5ad}.noteText{fill:#000;stroke:none;font-family:'trebuchet ms',verdana,arial;font-size:14px}.section{stroke:none;opacity:.2}.section0,.section2{fill:#6eaa49}.section1,.section3{fill:#fff;opacity:.2}.sectionTitle0,.sectionTitle1,.sectionTitle2,.sectionTitle3{fill:#333}.sectionTitle{text-anchor:start;font-size:11px;text-height:14px}.grid .tick{stroke:lightgrey;opacity:.3;shape-rendering:crispEdges}.grid path{stroke-width:0}.today{fill:none;stroke:red;stroke-width:2px}.task{stroke-width:2}.taskText{text-anchor:middle;font-size:11px}.taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px}.taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}.taskText0,.taskText1,.taskText2,.taskText3{fill:#fff}.task0,.task1,.task2,.task3{fill:#487e3a;stroke:#13540c}.taskTextOutside0,.taskTextOutside1,.taskTextOutside2,.taskTextOutside3{fill:#000}.active0,.active1,.active2,.active3{fill:#cde498;stroke:#13540c}.activeText0,.activeText1,.activeText2,.activeText3{fill:#000!important}.done0,.done1,.done2,.done3{stroke:grey;fill:lightgrey;stroke-width:2}.doneText0,.doneText1,.doneText2,.doneText3{fill:#000!important}.crit0,.crit1,.crit2,.crit3{stroke:#f88;fill:red;stroke-width:2}.activeCrit0,.activeCrit1,.activeCrit2,.activeCrit3{stroke:#f88;fill:#cde498;stroke-width:2}.doneCrit0,.doneCrit1,.doneCrit2,.doneCrit3{stroke:#f88;fill:lightgrey;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}.activeCritText0,.activeCritText1,.activeCritText2,.activeCritText3,.doneCritText0,.doneCritText1,.doneCritText2,.doneCritText3{fill:#000!important}.titleText{text-anchor:middle;font-size:18px;fill:#000}text{font-family:'trebuchet ms',verdana,arial;font-size:14px}html{height:100%}body{margin:0!important;padding:5px 20px 26px!important;background-color:#fff;font-family:"Lucida Grande","Segoe UI","Apple SD Gothic Neo","Malgun Gothic","Lucida Sans Unicode",Helvetica,Arial,sans-serif;font-size:.9em;overflow-x:hidden;overflow-y:auto}br,h1,h2,h3,h4,h5,h6{clear:both}hr.page{background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAYAAAAECAYAAACtBE5DAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAyJpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMC1jMDYwIDYxLjEzNDc3NywgMjAxMC8wMi8xMi0xNzozMjowMCAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RSZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNSBNYWNpbnRvc2giIHhtcE1NOkluc3RhbmNlSUQ9InhtcC5paWQ6OENDRjNBN0E2NTZBMTFFMEI3QjRBODM4NzJDMjlGNDgiIHhtcE1NOkRvY3VtZW50SUQ9InhtcC5kaWQ6OENDRjNBN0I2NTZBMTFFMEI3QjRBODM4NzJDMjlGNDgiPiA8eG1wTU06RGVyaXZlZEZyb20gc3RSZWY6aW5zdGFuY2VJRD0ieG1wLmlpZDo4Q0NGM0E3ODY1NkExMUUwQjdCNEE4Mzg3MkMyOUY0OCIgc3RSZWY6ZG9jdW1lbnRJRD0ieG1wLmRpZDo4Q0NGM0E3OTY1NkExMUUwQjdCNEE4Mzg3MkMyOUY0OCIvPiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1wbWV0YT4gPD94cGFja2V0IGVuZD0iciI/PqqezsUAAAAfSURBVHjaYmRABcYwBiM2QSA4y4hNEKYDQxAEAAIMAHNGAzhkPOlYAAAAAElFTkSuQmCC) repeat-x;border:0;height:3px;padding:0}hr.underscore{border-top-style:dashed!important}body >:first-child{margin-top:0!important}img.plugin{box-shadow:0 1px 3px rgba(0,0,0,.1);border-radius:3px}iframe{border:0}figure{-webkit-margin-before:0;-webkit-margin-after:0;-webkit-margin-start:0;-webkit-margin-end:0}kbd{border:1px solid #aaa;-moz-border-radius:2px;-webkit-border-radius:2px;border-radius:2px;-moz-box-shadow:1px 2px 2px #ddd;-webkit-box-shadow:1px 2px 2px #ddd;box-shadow:1px 2px 2px #ddd;background-color:#f9f9f9;background-image:-moz-linear-gradient(top,#eee,#f9f9f9,#eee);background-image:-o-linear-gradient(top,#eee,#f9f9f9,#eee);background-image:-webkit-linear-gradient(top,#eee,#f9f9f9,#eee);background-image:linear-gradient(top,#eee,#f9f9f9,#eee);padding:1px 3px;font-family:inherit;font-size:.85em}.oembeded .oembed_photo{display:inline-block}img[data-echo]{margin:25px 0;width:100px;height:100px;background:url(../img/ajax.gif) center center no-repeat #fff}.spinner{display:inline-block;width:10px;height:10px;margin-bottom:-.1em;border:2px solid rgba(0,0,0,.5);border-top-color:transparent;border-radius:100%;-webkit-animation:spin 1s infinite linear;animation:spin 1s infinite linear}.spinner:after{content:'';display:block;width:0;height:0;position:absolute;top:-6px;left:0;border:4px solid transparent;border-bottom-color:rgba(0,0,0,.5);-webkit-transform:rotate(45deg);transform:rotate(45deg)}@-webkit-keyframes spin{to{-webkit-transform:rotate(360deg)}}@keyframes spin{to{transform:rotate(360deg)}}p.toc{margin:0!important}p.toc ul{padding-left:10px}p.toc>ul{padding:10px;margin:0 10px;display:inline-block;border:1px solid #ededed;border-radius:5px}p.toc li,p.toc ul{list-style-type:none}p.toc li{width:100%;padding:0;overflow:hidden}p.toc li a::after{content:"."}p.toc li a:before{content:"• "}p.toc h5{text-transform:uppercase}p.toc .title{float:left;padding-right:3px}p.toc .number{margin:0;float:right;padding-left:3px;background:#fff;display:none}input.task-list-item{margin-left:-1.62em}.markdown{font-family:"Hiragino Sans GB","Microsoft YaHei",STHeiti,SimSun,"Lucida Grande","Lucida Sans Unicode","Lucida Sans",'Segoe UI',AppleSDGothicNeo-Medium,'Malgun Gothic',Verdana,Tahoma,sans-serif;padding:20px}.markdown a{text-decoration:none;vertical-align:baseline}.markdown a:hover{text-decoration:underline}.markdown h1{font-size:2.2em;font-weight:700;margin:1.5em 0 1em}.markdown h2{font-size:1.8em;font-weight:700;margin:1.275em 0 .85em}.markdown h3{font-size:1.6em;font-weight:700;margin:1.125em 0 .75em}.markdown h4{font-size:1.4em;font-weight:700;margin:.99em 0 .66em}.markdown h5{font-size:1.2em;font-weight:700;margin:.855em 0 .57em}.markdown h6{font-size:1em;font-weight:700;margin:.75em 0 .5em}.markdown h1+p,.markdown h1:first-child,.markdown h2+p,.markdown h2:first-child,.markdown h3+p,.markdown h3:first-child,.markdown h4+p,.markdown h4:first-child,.markdown h5+p,.markdown h5:first-child,.markdown h6+p,.markdown h6:first-child{margin-top:0}.markdown hr{border:1px solid #ccc}.markdown p{margin:1em 0;word-wrap:break-word}.markdown ol{list-style-type:decimal}.markdown li{display:list-item;line-height:1.4em}.markdown blockquote{margin:1em 20px}.markdown blockquote>:first-child{margin-top:0}.markdown blockquote>:last-child{margin-bottom:0}.markdown blockquote cite:before{content:'\2014 \00A0'}.markdown .code{border-radius:3px;word-wrap:break-word}.markdown pre{border-radius:3px;word-wrap:break-word;border:1px solid #ccc;overflow:auto;padding:.5em}.markdown pre code{border:0;display:block}.markdown pre>code{font-family:Consolas,Inconsolata,Courier,monospace;font-weight:700;white-space:pre;margin:0}.markdown code{border-radius:3px;word-wrap:break-word;border:1px solid #ccc;padding:0 5px;margin:0 2px}.markdown img{max-width:100%}.markdown mark{color:#000;background-color:#fcf8e3}.markdown table{padding:0;border-collapse:collapse;border-spacing:0;margin-bottom:16px}.markdown table tr td,.markdown table tr th{border:1px solid #ccc;margin:0;padding:6px 13px}.markdown table tr th{font-weight:700}.markdown table tr th>:first-child{margin-top:0}.markdown table tr th>:last-child{margin-bottom:0}.markdown table tr td>:first-child{margin-top:0}.markdown table tr td>:last-child{margin-bottom:0}@import url(http://fonts.googleapis.com/css?family=Roboto+Condensed:300italic,400italic,700italic,400,300,700);.haroopad{padding:20px;color:#222;font-size:15px;font-family:"Roboto Condensed",Tauri,"Hiragino Sans GB","Microsoft YaHei",STHeiti,SimSun,"Lucida Grande","Lucida Sans Unicode","Lucida Sans",'Segoe UI',AppleSDGothicNeo-Medium,'Malgun Gothic',Verdana,Tahoma,sans-serif;background:#fff;line-height:1.6;-webkit-font-smoothing:antialiased}.haroopad a{color:#3269a0}.haroopad a:hover{color:#4183c4}.haroopad h2{border-bottom:1px solid #e6e6e6}.haroopad h6{color:#777}.haroopad hr{border:1px solid #e6e6e6}.haroopad blockquote>code,.haroopad h1>code,.haroopad h2>code,.haroopad h3>code,.haroopad h4>code,.haroopad h5>code,.haroopad h6>code,.haroopad li>code,.haroopad p>code,.haroopad td>code{font-family:Consolas,"Liberation Mono",Menlo,Courier,monospace;font-size:85%;background-color:rgba(0,0,0,.02);padding:.2em .5em;border:1px solid #efefef}.haroopad pre>code{font-size:1em;letter-spacing:-1px;font-weight:700}.haroopad blockquote{border-left:4px solid #e6e6e6;padding:0 15px;color:#777}.haroopad table{background-color:#fafafa}.haroopad table tr td,.haroopad table tr th{border:1px solid #e6e6e6}.haroopad table tr:nth-child(2n){background-color:#f2f2f2}.hljs{display:block;overflow-x:auto;padding:.5em;background:#fdf6e3;color:#657b83;-webkit-text-size-adjust:none}.diff .hljs-header,.hljs-comment,.hljs-doctype,.hljs-javadoc,.hljs-pi,.lisp .hljs-string{color:#93a1a1}.css .hljs-tag,.hljs-addition,.hljs-keyword,.hljs-request,.hljs-status,.hljs-winutils,.method,.nginx .hljs-title{color:#859900}.hljs-command,.hljs-dartdoc,.hljs-hexcolor,.hljs-link_url,.hljs-number,.hljs-phpdoc,.hljs-regexp,.hljs-rules .hljs-value,.hljs-string,.hljs-tag .hljs-value,.tex .hljs-formula{color:#2aa198}.css .hljs-function,.hljs-built_in,.hljs-chunk,.hljs-decorator,.hljs-id,.hljs-identifier,.hljs-localvars,.hljs-title,.vhdl .hljs-literal{color:#268bd2}.hljs-attribute,.hljs-class .hljs-title,.hljs-constant,.hljs-link_reference,.hljs-parent,.hljs-type,.hljs-variable,.lisp .hljs-body,.smalltalk .hljs-number{color:#b58900}.css .hljs-pseudo,.diff .hljs-change,.hljs-attr_selector,.hljs-cdata,.hljs-header,.hljs-pragma,.hljs-preprocessor,.hljs-preprocessor .hljs-keyword,.hljs-shebang,.hljs-special,.hljs-subst,.hljs-symbol,.hljs-symbol .hljs-string{color:#cb4b16}.hljs-deletion,.hljs-important{color:#dc322f}.hljs-link_label{color:#6c71c4}.tex .hljs-formula{background:#eee8d5}.MathJax_Hover_Frame{border-radius:.25em;-webkit-border-radius:.25em;-moz-border-radius:.25em;-khtml-border-radius:.25em;box-shadow:0 0 15px #83A;-webkit-box-shadow:0 0 15px #83A;-moz-box-shadow:0 0 15px #83A;-khtml-box-shadow:0 0 15px #83A;border:1px solid #A6D!important;display:inline-block;position:absolute}.MathJax_Hover_Arrow{position:absolute;width:15px;height:11px;cursor:pointer}#MathJax_About{position:fixed;left:50%;width:auto;text-align:center;border:3px outset;padding:1em 2em;background-color:#DDD;color:#000;cursor:default;font-family:message-box;font-size:120%;font-style:normal;text-indent:0;text-transform:none;line-height:normal;letter-spacing:normal;word-spacing:normal;word-wrap:normal;white-space:nowrap;float:none;z-index:201;border-radius:15px;-webkit-border-radius:15px;-moz-border-radius:15px;-khtml-border-radius:15px;box-shadow:0 10px 20px gray;-webkit-box-shadow:0 10px 20px gray;-moz-box-shadow:0 10px 20px gray;-khtml-box-shadow:0 10px 20px gray;filter:progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')}.MathJax_Menu{position:absolute;background-color:#fff;color:#000;width:auto;padding:2px;border:1px solid #CCC;margin:0;cursor:default;font:menu;text-align:left;text-indent:0;text-transform:none;line-height:normal;letter-spacing:normal;word-spacing:normal;word-wrap:normal;white-space:nowrap;float:none;z-index:201;box-shadow:0 10px 20px gray;-webkit-box-shadow:0 10px 20px gray;-moz-box-shadow:0 10px 20px gray;-khtml-box-shadow:0 10px 20px gray;filter:progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')}.MathJax_MenuItem{padding:2px 2em;background:0 0}.MathJax_MenuArrow{position:absolute;right:.5em;color:#666}.MathJax_MenuActive .MathJax_MenuArrow{color:#fff}.MathJax_MenuArrow.RTL{left:.5em;right:auto}.MathJax_MenuCheck{position:absolute;left:.7em}.MathJax_MenuCheck.RTL{right:.7em;left:auto}.MathJax_MenuRadioCheck{position:absolute;left:1em}.MathJax_MenuRadioCheck.RTL{right:1em;left:auto}.MathJax_MenuLabel{padding:2px 2em 4px 1.33em;font-style:italic}.MathJax_MenuRule{border-top:1px solid #CCC;margin:4px 1px 0}.MathJax_MenuDisabled{color:GrayText}.MathJax_MenuActive{background-color:Highlight;color:HighlightText}.MathJax_Menu_Close{position:absolute;width:31px;height:31px;top:-15px;left:-15px}#MathJax_Zoom{position:absolute;background-color:#F0F0F0;overflow:auto;display:block;z-index:301;padding:.5em;border:1px solid #000;margin:0;font-weight:400;font-style:normal;text-align:left;text-indent:0;text-transform:none;line-height:normal;letter-spacing:normal;word-spacing:normal;word-wrap:normal;white-space:nowrap;float:none;box-shadow:5px 5px 15px #AAA;-webkit-box-shadow:5px 5px 15px #AAA;-moz-box-shadow:5px 5px 15px #AAA;-khtml-box-shadow:5px 5px 15px #AAA;filter:progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')}#MathJax_ZoomOverlay{position:absolute;left:0;top:0;z-index:300;display:inline-block;width:100%;height:100%;border:0;padding:0;margin:0;background-color:#fff;opacity:0;filter:alpha(opacity=0)}#MathJax_ZoomFrame{position:relative;display:inline-block;height:0;width:0}#MathJax_ZoomEventTrap{position:absolute;left:0;top:0;z-index:302;display:inline-block;border:0;padding:0;margin:0;background-color:#fff;opacity:0;filter:alpha(opacity=0)}.MathJax_Preview{color:#888}#MathJax_Message{position:fixed;left:1px;bottom:2px;background-color:#E6E6E6;border:1px solid #959595;margin:0;padding:2px 8px;z-index:102;color:#000;font-size:80%;width:auto;white-space:nowrap}#MathJax_MSIE_Frame{position:absolute;top:0;left:0;width:0;z-index:101;border:0;margin:0;padding:0}.MathJax_Error{color:#C00;font-style:italic}footer{position:fixed;font-size:.8em;text-align:right;bottom:0;margin-left:-25px;height:20px;width:100%}</style>
</head>
<body class="markdown haroopad">
<p><strong>Document number: P0567R0</strong><br><strong>Date: 2017-01-30</strong><br><strong>Project: SG1, SG14</strong><br><strong>Authors: Gordon Brown, Ruyman Reyes, Michael Wong</strong><br><strong>Emails: gordon@codeplay.com, ruyman@codeplay.com, michael@codeplay.com</strong><br><strong>Reply to: michael@codeplay.com, gordon@codeplay.com</strong></p><h1 id="asynchronous-managed-pointer-for-heterogeneous-computing"><a name="asynchronous-managed-pointer-for-heterogeneous-computing" href="#asynchronous-managed-pointer-for-heterogeneous-computing"></a>Asynchronous managed pointer for Heterogeneous computing</h1><h2 id="introduction"><a name="introduction" href="#introduction"></a>Introduction</h2><h3 id="summary"><a name="summary" href="#summary"></a>Summary</h3><p>This paper proposes an addition to the C++ standard library to facilitate the management of a memory allocation which can exist consistently across the memory region of the host CPU and the memory region(s) of one or more remote devices. This addition is in the form of the class template <code>managed_ptr</code>; similar to the <code>std::shared_ptr</code> but with the addition that it can share its memory allocation across the memory region of the host CPU and the memory region(s) of one or more remote devices.</p><h3 id="aim"><a name="aim" href="#aim"></a>Aim</h3><p>The aim of this paper is to begin an exploratory work into designing a unified interface for data movement. There are many different data flow models to consider when designing such an interface so it is expected that this paper will serve only as a beginning.</p><p>The approach proposed in this paper does not include all of the use cases that a complete solution would cover.</p><ul>
<li>This approach makes the assumption that there is only a single host CPU device which is capable of performing synchronisation with <strong>execution contexts</strong>.</li><li>This approach does not include an optimized manner of moving data between <strong>execution contexts</strong>.</li><li>This approach does not include support for systems which allows for multiple devices to access the same memory concurrently.</li></ul><h2 id="introduction"><a name="introduction" href="#introduction"></a>Introduction</h2><h3 id="motivation"><a name="motivation" href="#motivation"></a>Motivation</h3><p>Non-heterogeneous or distributed systems where there is typically a single device; a host CPU with a single memory region; an area of addressable memory. In contrast, heterogeneous and distributed systems have multiple devices (including the host CPU) with their own discrete memory regions.</p><p>A device in this respect can be any architecture that exists within a system that is C++ programmable; this can include CPUs, GPUs, APUs, FPGAs, DSPs, NUMA nodes, I/O devices and other forms of accelerators.</p><p>This introduces additional complexity for accessing a memory allocation that is not present in current C++, the requirement that said memory allocation be available in one of many memory regions within a system throughout a program’s execution. For the purposes of this paper, we will refer to such a memory allocation as a managed memory allocation as it describes a memory allocation that is accessible consistently across multiple memory regions. With this requirement comes the possibility that a memory region on a given device may not have the most recently modified copy of a managed memory allocation, therefore requiring synchronisation to move the data.</p><p>As the act of dispatching work to a remote device to be executed was once only a problem for third party APIs and so too was the act of moving data to those devices for computations to be performed on. However, now that C++ is progressing towards a unified interface for execution [1] the act of moving data to remote devices to be accessible for dispatch via executors is now a problem for C++ to solve as well, a unified interface for data movement of such. The act of moving data to a remote device is very tightly coupled with the work being performed on said remote device. This means that this unified interface for data movement must also be tightly coupled with the unified interface for execution.</p><h3 id="influence"><a name="influence" href="#influence"></a>Influence</h3><p>This proposal is influenced by the SYCL specification [2] and the work that was done in defining it. This was largely due to the intention of the SYCL specification to define a standard that was based entirely in C++ and as in line with the direction of C++ as possible. In order to develop this proposal further, we also seek the experience from other programming models including HPX [3], KoKKos [4], Raja [5], and others.</p><p>This approach is also heavily influenced by the proposal for a unified interface for execution [1] as the interface proposed in this paper interacts directly with this.</p><h3 id="scope-of-this-paper"><a name="scope-of-this-paper" href="#scope-of-this-paper"></a>Scope of this Paper</h3><p>There are some additional important considerations when looking at a container for data movement in heterogeneous and distributed systems, however, in order to prevent this paper from becoming too verbose, these have been left out of the scope of the proposed additions here. It is important to note them in relation to this paper for future works, for further details on these additional considerations see the future work section.</p><h3 id="naming-considerations"><a name="naming-considerations" href="#naming-considerations"></a>Naming considerations</h3><p>During the development of this paper, many names were considered both for the <code>managed_ptr</code> itself and for its interface.</p><p>Alternative names that were considered for <code>managed_ptr</code> were <code>temporal</code> as it described a container which gave temporal access to a managed memory allocation, <code>managed_container</code> as the original design was based on the <code>std::vector</code> container and <code>managed_array</code> as the managed memory allocation is statically sized.</p><p>Alternative names for the <code>put()</code> and <code>get()</code> interface were <code>acquire()</code> and <code>release()</code> as you were effectively acquiring and releasing the managed memory allocation and <code>send()</code> and <code>receive()</code> as you are effectively sending and receiving back the managed memory allocation.</p><h2 id="proposed-additions"><a name="proposed-additions" href="#proposed-additions"></a>Proposed Additions</h2><h3 id="requirements-on-execution-context"><a name="requirements-on-execution-context" href="#requirements-on-execution-context"></a>Requirements on Execution Context</h3><p>For the purposes of this paper, it is necessary to introduce requirements on the <strong>execution context</strong>; the object associated with an <strong>executor</strong> which encapsulates the underlying execution resource on which functions are executed on.</p><p>The <strong>execution context</strong> must encapsulate both the execution agents and a global memory region that is accessible from those execution agents. It is required that the memory region of the <strong>execution context</strong> be cache coherent across the execution agents created from the same invocation of an execution function. It is not required that the memory region of the <strong>execution context</strong> be cache coherent across execution agents created from different execution functions or on different <strong>execution contexts</strong>.</p><p><em>Note: some systems are capable of providing cache coherency across execution functions or <strong>execution contexts</strong> through the availability of shared virtual memory addressing or shared physical memory. This feature is discussed in more detail in the future work section.</em></p><h3 id="managed-pointer-class-template"><a name="managed-pointer-class-template" href="#managed-pointer-class-template"></a>Managed Pointer Class Template</h3><p>The proposed addition to the standard library is the <code>managed_ptr</code> class template (Figure 1). The <code>managed_ptr</code> is a smart pointer which has ownership of a contiguous managed allocation of memory that is shared between the host CPU and one or more <strong>execution contexts</strong>. It is important to note here that an <strong>execution context</strong> may be on the host CPU accessing the host CPU memory region, however, the interface remains the same.</p><p>At any one time the managed memory allocation can exist in the memory regions of the host CPU and any number of <strong>execution contexts</strong>. However, the managed memory allocation may only be accessible in one of these memory regions at any given time. This memory region is said to be the accessible memory region. If the accessible memory region is of the host CPU, the host CPU is said to be accessible and if the accessible memory region is of an <strong>execution context</strong> that <strong>execution context</strong> is said to be the accessible <strong>execution context</strong>.</p><p>For the host CPU to be accessible or for an <strong>execution context</strong> to be the accessible <strong>execution context</strong>, if it is not already, a synchronisation operation is required. A synchronisation operation is an implementation defined asynchronous operation which moves the data from the currently accessible memory region to another memory region. From the point at which a synchronisation operation is triggered the currently accessible memory region is no longer accessible. Once the synchronisation point is complete the memory region the data is being moved to is now the accessible memory region.</p><p>Synchronisation operations are course grained synchronisation in that they synchronise the entire managed memory allocation of a <code>managed_ptr</code>.</p><p><em>Note: some systems are capable of providing finer-grained synchronisation via atomic operations, however, this is generally only when there is shared virtual memory addressing or shared physical memory. This feature is discussed in more detail in the future work section.</em></p><p>There are three ways in which a synchronisation operation can be triggered, each of which will be described in more detail further on. The first is by calling a member function or customisation point of an <strong>executor</strong> triggering a synchronisation operation to the memory region of said <strong>execution context</strong>. The second is by calling the <code>get()</code> member function on the <code>managed_ptr</code> itself as this will call the above-mentioned member functions of the <strong>executor</strong>. The third is by passing a <code>managed_ptr</code> to an <strong>executor</strong> control structure such as <code>async()</code>, as this will implicitly trigger the above-mentioned member functions on the <strong>executor</strong>.</p><p>Any <code>managed_ptr</code> can only have a single host CPU that can be accessible for any given application, that is capable of triggering synchronisation operations.</p><p><em>Note: some systems may wish to have multiple host CPU nodes which are capable of triggering synchronisation operations. This feature is discussed in more detail in the future work section.</em></p><p>If the host CPU is not accessible the <code>managed_ptr</code> is required to maintain a pointer to the accessible <strong>execution context</strong> in order for it to perform synchronisation operations.</p><p>Memory is only required to be allocated where it is accessible. The managed memory allocation is only required to be allocated on the host CPU memory region if the <code>managed_ptr</code> is constructed with a pointer or if a synchronisation operation is triggered to the host CPU. managed memory allocation is only required to be allocated on the memory region of an <strong>execution context</strong> if a synchronisation operation is triggered to that <strong>execution context</strong>.</p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>namespace std {
namespace experimental {
namespace execution {

/* managed_ptr class template */
template &amp;lt;class T&amp;gt;
class managed_ptr {
public:

  /* aliases */
  using value_type      = T;
  using pointer         = value_type *;
  using const_pointer   = const value_type *;
  using reference       = value_type &amp;amp;;
  using const_reference = const value_type &amp;amp;;
  using future_type     = __future_type__;

  /* constructors */
  managed_ptr(size_t); // (1)
  managed_ptr(pointer, size_t); // (2)
  managed_ptr(const_pointer, size_t); // (3)
  template &amp;lt;typename allocatorT&amp;gt;
  managed_ptr(size_t, allocatorT); // (4)

  /* copy/move constructors/operators, destructor */
  managed_ptr(const managed_ptr &amp;amp;);
  managed_ptr(const managed_ptr &amp;amp;&amp;amp;);
  managed_ptr &amp;amp;operator=(const managed_ptr &amp;amp;);
  managed_ptr &amp;amp;operator=(const managed_ptr &amp;amp;&amp;amp;);
  ~managed_ptr();

  /* synchronisation member functions */
  bool is_accessible() const;
  future_type get() const;

  /* operators */
  reference operator[](int index);
  const_reference operator[](int index) const;

  /* other member functions */
  size_t size() const;
};
}  // namespace execution
}  // namespace experimental
}  // namespace std
</code></pre>"><span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span> {
<span class="hljs-keyword">namespace</span> experimental {
<span class="hljs-keyword">namespace</span> execution {

<span class="hljs-comment">/* managed_ptr class template */</span>
<span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">class</span> T&gt;
<span class="hljs-keyword">class</span> managed_ptr {
<span class="hljs-keyword">public</span>:

  <span class="hljs-comment">/* aliases */</span>
  <span class="hljs-keyword">using</span> value_type      = T;
  <span class="hljs-keyword">using</span> pointer         = value_type *;
  <span class="hljs-keyword">using</span> const_pointer   = <span class="hljs-keyword">const</span> value_type *;
  <span class="hljs-keyword">using</span> reference       = value_type &amp;;
  <span class="hljs-keyword">using</span> const_reference = <span class="hljs-keyword">const</span> value_type &amp;;
  <span class="hljs-keyword">using</span> future_type     = __future_type__;

  <span class="hljs-comment">/* constructors */</span>
  managed_ptr(size_t); <span class="hljs-comment">// (1)</span>
  managed_ptr(pointer, size_t); <span class="hljs-comment">// (2)</span>
  managed_ptr(const_pointer, size_t); <span class="hljs-comment">// (3)</span>
  <span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span> allocatorT&gt;
  managed_ptr(size_t, allocatorT); <span class="hljs-comment">// (4)</span>

  <span class="hljs-comment">/* copy/move constructors/operators, destructor */</span>
  managed_ptr(<span class="hljs-keyword">const</span> managed_ptr &amp;);
  managed_ptr(<span class="hljs-keyword">const</span> managed_ptr &amp;&amp;);
  managed_ptr &amp;<span class="hljs-keyword">operator</span>=(<span class="hljs-keyword">const</span> managed_ptr &amp;);
  managed_ptr &amp;<span class="hljs-keyword">operator</span>=(<span class="hljs-keyword">const</span> managed_ptr &amp;&amp;);
  ~managed_ptr();

  <span class="hljs-comment">/* synchronisation member functions */</span>
  <span class="hljs-function"><span class="hljs-keyword">bool</span> <span class="hljs-title">is_accessible</span><span class="hljs-params">()</span> <span class="hljs-keyword">const</span></span>;
  <span class="hljs-function">future_type <span class="hljs-title">get</span><span class="hljs-params">()</span> <span class="hljs-keyword">const</span></span>;

  <span class="hljs-comment">/* operators */</span>
  reference <span class="hljs-keyword">operator</span>[](<span class="hljs-keyword">int</span> index);
  const_reference <span class="hljs-keyword">operator</span>[](<span class="hljs-keyword">int</span> index) <span class="hljs-keyword">const</span>;

  <span class="hljs-comment">/* other member functions */</span>
  <span class="hljs-function">size_t <span class="hljs-title">size</span><span class="hljs-params">()</span> <span class="hljs-keyword">const</span></span>;
};
}  <span class="hljs-comment">// namespace execution</span>
}  <span class="hljs-comment">// namespace experimental</span>
}  <span class="hljs-comment">// namespace std</span>
</code></pre><p><em>Figure 1: managed_ptr class template</em></p><p>The <code>managed_ptr</code> class template has a single template parameter; <code>T</code> specifying the type of the elements.</p><p>The type parameter <code>T</code> specifies the element type of the container, any <code>T</code> satisfies the <code>managed_ptr</code> element type requirements if:</p><ul>
<li><code>T</code> is a standard layout type.</li><li><code>T</code> is copy constructible.</li></ul><p>A  <code>managed_ptr</code> can be constructed in a number of ways:</p><ul>
<li>for constructor (1) the  <code>managed_ptr</code> allocates the number of elements specified by the <code>size_t</code> parameter using the default allocator, therefore allocating on the host remote device memory region.</li><li>for constructor (2) the  <code>managed_ptr</code> takes ownership of the pointer parameter.</li><li>for constructor (3) the  <code>managed_ptr</code> takes ownership of the const_pointer parameter.</li><li>for constructor (4) the  <code>managed_ptr</code> allocates the number of elements specified by the <code>size_t</code> parameter using the allocator specified by the allocatorT parameter.</li></ul><p>The default constructor T is not called for the elements of the <code>managed_ptr</code> during allocation.</p><p>The destructor is not required to perform any synchronisation operations.</p><p>The member function <code>get()</code> triggers a synchronisation operation to the host CPU if the host CPU returns a <code>future_type</code> object which can be used to wait on the operation completing. If an error occurred during the synchronisation point or the pointer to the accessible <strong>execution context</strong> is no longer valid then an exception is thrown and stored within the <code>future_type</code> that is returned. It is undefined to call <code>get()</code> within a function executing on an <strong>execution context</strong>.</p><p>The member function <code>is_accessible()</code> returns a boolean specifying whether the host CPU is accessible.</p><p>The subscript operator returns a reference to an element of the pointer at the index specified by <code>index</code> parameter. If the subscript operator is called on the host CPU and the host CPU is not accessible then the return value is undefined. If the subscript operator is called within a function executing on an <strong>execution context</strong> and said <strong>execution context</strong> is not the accessible <strong>execution context</strong> then the return value is undefined.</p><p>The member function <code>size</code> returns the number of elements of type <code>T</code> stored within the managed allocation.</p><h3 id="extensions-to-executors-and-execution-contexts"><a name="extensions-to-executors-and-execution-contexts" href="#extensions-to-executors-and-execution-contexts"></a>Extensions to Executors and Execution Contexts</h3><p>In order to facilitate the synchronisation operations required to support the <strong>managed_ptr</strong> the following extensions are proposed for the unified interface for execution.</p><p>These extensions are in the form of member functions and global customisation point functions which each trigger a synchronisation operation. The <code>put()</code> and <code>then_put()</code> functions trigger a synchronisation operation to an <strong>execution context</strong>. The <code>get()</code> and <code>then_get()</code> functions trigger a synchronisation operation to the host CPU. Each function takes a variadic set of <code>managed_ptr</code> and triggers a synchronisation operation for each of them and the global customisation point functions take an <strong>executor</strong> parameter. Each function returns a <code>future_type</code> object that can be used to wait on the synchronisation operation. The <code>then_get()</code> and <code>then_put()</code> functions also take an <code>future_type</code> predicate parameter.</p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>namespace std {
namespace experimental {
namespace execution {

/* executor classes */
class &amp;lt;executor-class&amp;gt; {
  ...

  template &amp;lt;typename... ManagedPtrTN&amp;gt;
  executor_future_t&amp;lt;&amp;lt;executor-class&amp;gt;, void&amp;gt; put(ManagedPtrTN...);
  template &amp;lt;typename... ManagedPtrTN&amp;gt;
  executor_future_t&amp;lt;&amp;lt;executor-class&amp;gt;, void&amp;gt; get(ManagedPtrTN...);
  template &amp;lt;typename Predicate, typename... ManagedPtrTN&amp;gt;
  executor_future_t&amp;lt;&amp;lt;executor-class&amp;gt;, void&amp;gt; then_put(Predicate pred, ManagedPtrTN...);
  template &amp;lt;typename Predicate, typename... ManagedPtrTN&amp;gt;
  executor_future_t&amp;lt;&amp;lt;executor-class&amp;gt;, void&amp;gt; then_get(Predicate pred, ManagedPtrTN...);

  ...
};

/* customisation points */
template &amp;lt;typename Executor, typename... ManagedPtrTN&amp;gt;
executor_future_t&amp;lt;Executor, void&amp;gt; put(Executor, ManagedPtrTN...);
template &amp;lt;typename Executor, typename... ManagedPtrTN&amp;gt;
executor_future_t&amp;lt;Executor, void&amp;gt; get(Executor, ManagedPtrTN...);
template &amp;lt;typename Predicate, typename Executor, typename... ManagedPtrTN&amp;gt;
executor_future_t&amp;lt;Executor, void&amp;gt; then_put(Predicate pred, Executor, ManagedPtrTN...);
template &amp;lt;typename Predicate, typename Executor, typename... ManagedPtrTN&amp;gt;
executor_future_t&amp;lt;Executor, void&amp;gt; then_get(Predicate pred, Executor, ManagedPtrTN...);

}  // namespace execution
}  // namespace experimental
}  // namespace std
</code></pre>"><span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span> {
<span class="hljs-keyword">namespace</span> experimental {
<span class="hljs-keyword">namespace</span> execution {

<span class="hljs-comment">/* executor classes */</span>
<span class="hljs-keyword">class</span> &lt;executor-<span class="hljs-keyword">class</span>&gt; {
  ...

  <span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span>... ManagedPtrTN&gt;
  executor_future_t&lt;&lt;executor-<span class="hljs-keyword">class</span>&gt;, <span class="hljs-keyword">void</span>&gt; put(ManagedPtrTN...);
  <span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span>... ManagedPtrTN&gt;
  executor_future_t&lt;&lt;executor-<span class="hljs-keyword">class</span>&gt;, <span class="hljs-keyword">void</span>&gt; get(ManagedPtrTN...);
  <span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span> Predicate, <span class="hljs-keyword">typename</span>... ManagedPtrTN&gt;
  executor_future_t&lt;&lt;executor-<span class="hljs-keyword">class</span>&gt;, <span class="hljs-keyword">void</span>&gt; then_put(Predicate pred, ManagedPtrTN...);
  <span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span> Predicate, <span class="hljs-keyword">typename</span>... ManagedPtrTN&gt;
  executor_future_t&lt;&lt;executor-<span class="hljs-keyword">class</span>&gt;, <span class="hljs-keyword">void</span>&gt; then_get(Predicate pred, ManagedPtrTN...);

  ...
};

<span class="hljs-comment">/* customisation points */</span>
<span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span> Executor, <span class="hljs-keyword">typename</span>... ManagedPtrTN&gt;
executor_future_t&lt;Executor, <span class="hljs-keyword">void</span>&gt; put(Executor, ManagedPtrTN...);
<span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span> Executor, <span class="hljs-keyword">typename</span>... ManagedPtrTN&gt;
executor_future_t&lt;Executor, <span class="hljs-keyword">void</span>&gt; get(Executor, ManagedPtrTN...);
<span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span> Predicate, <span class="hljs-keyword">typename</span> Executor, <span class="hljs-keyword">typename</span>... ManagedPtrTN&gt;
executor_future_t&lt;Executor, <span class="hljs-keyword">void</span>&gt; then_put(Predicate pred, Executor, ManagedPtrTN...);
<span class="hljs-keyword">template</span> &lt;<span class="hljs-keyword">typename</span> Predicate, <span class="hljs-keyword">typename</span> Executor, <span class="hljs-keyword">typename</span>... ManagedPtrTN&gt;
executor_future_t&lt;Executor, <span class="hljs-keyword">void</span>&gt; then_get(Predicate pred, Executor, ManagedPtrTN...);

}  <span class="hljs-comment">// namespace execution</span>
}  <span class="hljs-comment">// namespace experimental</span>
}  <span class="hljs-comment">// namespace std</span>
</code></pre><p><em>Figure 2: Extensions to unified interface for execution</em></p><p>As described previously there are three ways in which a synchronisation operation can be triggered, and these are separated into explicit and implicit.</p><p>The first is by calling one of the member functions or global customisation point functions described above. This will trigger a synchronisation operation to the memory region of the <strong>execution context</strong> associated with the <strong>executor</strong>. This method is useful as it allows finer control over the synchronisation operations by chaining continuations.</p><p>The second is by calling the <code>get()</code> member function on the <code>managed_ptr</code> itself as this will implicitly trigger a call to <code>get()</code> on the current accessible <strong>execution context</strong> if the host CPU is not currently accessible. This method is useful when you want to synchronise data back to the host CPU in order to use the resulting data in regular sequential code.</p><p>The third is by passing a <code>managed_ptr</code> to an executor control structure such as <code>async()</code>, as this will implicitly trigger a call to the <code>put()</code> member function on the <strong>executor</strong>. If the host CPU is not accessible at this point then this will implicitly trigger a call to<code>get()</code> on the <code>managed_ptr</code> which will subsequently implicitly trigger a call to <code>get()</code> on the currently accessible <strong>execution context</strong>. This method is useful as it allows you to simply pass a  <code>managed_ptr</code> directly to a control structure without requiring an explicit call to <code>put()</code>.</p><h2 id="examples"><a name="examples" href="#examples"></a>Examples</h2><p>There is a wide range of user cases that affect the design of a potential interface when it comes to asynchronous tasks and the movement of the data that those tasks require, however for the purposes of this paper the following example should provide a base requirement for the features this paper proposes.</p><p>The following examples show how the features that will be presented in this paper can be used to move data from the host CPU memory region to the memory region of a GPU in order to perform an operation on said data via the <strong>executor</strong> interface.</p><p>The first example shows how this would look using the explicit interface for synchronisation operations (figure 3) and the second shows how this would look using the implicit interface for synchronisation operations (Figure 4).</p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>/* Construct a context and executor for executing work on the GPU */
gpu_execution_context gpuContext;
auto gpuExecutor = gpuContext.executor();

/* Retrieve gpu allocator */
auto gpuAllocator = gpuContext.allocator();

/* Construct a managed_ptr ptraA allocated on the host CPU */
std::experimental::execution::managed_ptr&amp;lt;float&amp;gt; ptrA(1024);

/* Construct a managed_ptr ptrB allocated on the GPU execution context */
std::experimental::execution::managed_ptr&amp;lt;float&amp;gt; ptrB(1024, gpuAllocator);

/* Populate ptrA */
populate(ptrA);

/* Construct a series of compute and data operations */
auto fut =
  std::experimental::execution::put(gpuExecutor, ptrA)
      .then_put(gpuExecutor, ptrB)
          .then_async(gpuExecutor, [=](auto _ptrA, auto _ptrB) { /* ... */ }, ptrA, ptrB)
            .then_get(gpuExecutor, ptrA, ptrB);

/* Wait on the operations to execute */
fut.wait();

/* Print the result */
print(ptrB);
</code></pre>"><span class="hljs-comment">/* Construct a context and executor for executing work on the GPU */</span>
gpu_execution_context gpuContext;
<span class="hljs-keyword">auto</span> gpuExecutor = gpuContext.executor();

<span class="hljs-comment">/* Retrieve gpu allocator */</span>
<span class="hljs-keyword">auto</span> gpuAllocator = gpuContext.allocator();

<span class="hljs-comment">/* Construct a managed_ptr ptraA allocated on the host CPU */</span>
<span class="hljs-built_in">std</span>::experimental::execution::managed_ptr&lt;<span class="hljs-keyword">float</span>&gt; ptrA(<span class="hljs-number">1024</span>);

<span class="hljs-comment">/* Construct a managed_ptr ptrB allocated on the GPU execution context */</span>
<span class="hljs-built_in">std</span>::experimental::execution::managed_ptr&lt;<span class="hljs-keyword">float</span>&gt; ptrB(<span class="hljs-number">1024</span>, gpuAllocator);

<span class="hljs-comment">/* Populate ptrA */</span>
populate(ptrA);

<span class="hljs-comment">/* Construct a series of compute and data operations */</span>
<span class="hljs-keyword">auto</span> fut =
  <span class="hljs-built_in">std</span>::experimental::execution::put(gpuExecutor, ptrA)
      .then_put(gpuExecutor, ptrB)
          .then_async(gpuExecutor, [=](<span class="hljs-keyword">auto</span> _ptrA, <span class="hljs-keyword">auto</span> _ptrB) { <span class="hljs-comment">/* ... */</span> }, ptrA, ptrB)
            .then_get(gpuExecutor, ptrA, ptrB);

<span class="hljs-comment">/* Wait on the operations to execute */</span>
fut.wait();

<span class="hljs-comment">/* Print the result */</span>
print(ptrB);
</code></pre><p><em>Figure 3: Example of explicit interface</em></p><pre class="cpp hljs"><code class="cpp" data-origin="<pre><code class=&quot;cpp&quot;>/* Construct a context and executor for executing work on the GPU */
gpu_execution_context gpuContext;
auto gpuExecutor = gpuContext.executor();

/* Retrieve gpu allocator */
auto gpuAllocator = gpuContext.allocator();

/* Construct a managed_ptr ptraA allocated on the host CPU */
std::experimental::execution::managed_ptr&amp;lt;float&amp;gt; ptrA(1024);

/* Construct a managed_ptr ptrB allocated on the GPU execution context */
std::experimental::execution::managed_ptr&amp;lt;float&amp;gt; ptrB(1024, gpuAllocator);

/* Populate ptrA */
populate(ptrA);

/* Perform the compute operation with implicit synchronisation operations to the GPU execution context */
std::experimental::execution::async(gpuExecutor, [=](auto _ptrA, auto _ptrB) { /* ... */ }, ptrA, ptrB).wait();

/* Perform a synchronisation operation to the host CPU */
ptrB.get().wait();

/* Print the result */
print(ptrB);
</code></pre>"><span class="hljs-comment">/* Construct a context and executor for executing work on the GPU */</span>
gpu_execution_context gpuContext;
<span class="hljs-keyword">auto</span> gpuExecutor = gpuContext.executor();

<span class="hljs-comment">/* Retrieve gpu allocator */</span>
<span class="hljs-keyword">auto</span> gpuAllocator = gpuContext.allocator();

<span class="hljs-comment">/* Construct a managed_ptr ptraA allocated on the host CPU */</span>
<span class="hljs-built_in">std</span>::experimental::execution::managed_ptr&lt;<span class="hljs-keyword">float</span>&gt; ptrA(<span class="hljs-number">1024</span>);

<span class="hljs-comment">/* Construct a managed_ptr ptrB allocated on the GPU execution context */</span>
<span class="hljs-built_in">std</span>::experimental::execution::managed_ptr&lt;<span class="hljs-keyword">float</span>&gt; ptrB(<span class="hljs-number">1024</span>, gpuAllocator);

<span class="hljs-comment">/* Populate ptrA */</span>
populate(ptrA);

<span class="hljs-comment">/* Perform the compute operation with implicit synchronisation operations to the GPU execution context */</span>
<span class="hljs-built_in">std</span>::experimental::execution::async(gpuExecutor, [=](<span class="hljs-keyword">auto</span> _ptrA, <span class="hljs-keyword">auto</span> _ptrB) { <span class="hljs-comment">/* ... */</span> }, ptrA, ptrB).wait();

<span class="hljs-comment">/* Perform a synchronisation operation to the host CPU */</span>
ptrB.get().wait();

<span class="hljs-comment">/* Print the result */</span>
print(ptrB);
</code></pre><p><em>Figure 4: Example of implicit interface</em></p><h2 id="future-work"><a name="future-work" href="#future-work"></a>Future Work</h2><p>There are many other considerations to make when looking at a model for data movement for heterogeneous and distributed systems, however, this paper aims to establish a foundation which can be extended to include other paradigms in the future.</p><h3 id="more-complex-data-movement-policies"><a name="more-complex-data-movement-policies" href="#more-complex-data-movement-policies"></a>More Complex Data Movement Policies</h3><p>We may wish to introduce more control over the way in which data is moved between devices at a high level and the consistency between execution contexts. This could be done by having static type traits associated with executor classes which describe the data movement and consistency properties of the execution context associated with the executor.</p><h3 id="more-complex-execution-ordering-policies"><a name="more-complex-execution-ordering-policies" href="#more-complex-execution-ordering-policies"></a>More Complex Execution Ordering Policies</h3><p>We may wish to introduce more control over the order in which operations (both data movement and compute) are executed. This could be done by having static type traits associated with executor classes which describe the ordering guarantees between operations, so rather than always having a strict sequential ordering, an implementation may want to relax the requirements based on access dependencies to allow for optimisations.</p><h3 id="synchronisation-operations-from-multiple-host-cpu-devices"><a name="synchronisation-operations-from-multiple-host-cpu-devices" href="#synchronisation-operations-from-multiple-host-cpu-devices"></a>Synchronisation Operations From Multiple Host CPU Devices</h3><p>We may wish to introduce the ability for a <code>managed_ptr</code> to trigger synchronisation operations from multiple host CPU nodes. This could be done by having an additional concept which allows an execution context to be a host node, another device that is capable of triggering synchronisation operations. The <code>put()</code> and <code>get()</code> in this case would be relative to the current host node. So, for example, a <code>put()</code> on node A to node B would be the equivalent to a <code>get()</code> on node B from node A.</p><h3 id="additional-containers"><a name="additional-containers" href="#additional-containers"></a>Additional Containers</h3><p>We may wish to extend this principle of a managed pointer to other containers that would be useful to share across heterogeneous and distributed systems such as vectors or arrays. This could be done by having containers such as <code>managed_vector</code> or <code>managed_array</code> that would have similar requirements to the standard containers of the same names in terms of storage and access yet would be extended to support access from remote devices as with the <code>managed_ptr</code>.</p><h2 id="data-movement-customisation-points"><a name="data-movement-customisation-points" href="#data-movement-customisation-points"></a>Data Movement Customisation Points</h2><p>We may wish to add customisation points to optimise data movement. This could be done by introducing data movement channels, which can be implemented to optimise data movement between specific execution contexts or between an input or output stream. These could be made to be static types to allow for compile time data movement optimisation for compile-time embedded DSELs.</p><h3 id="implicit-data-movement"><a name="implicit-data-movement" href="#implicit-data-movement"></a>Implicit Data Movement</h3><p>We may wish to introduce a way of implicitly moving data between execution contexts without the need for explicitly acquiring and releasing the data. This could be done by having the control structures such as <code>async</code> perform the put and get implicitly, though this removes any ability to create continuations on data movement operations.</p><h3 id="hierarchical-memory-structures"><a name="hierarchical-memory-structures" href="#hierarchical-memory-structures"></a>Hierarchical Memory Structures</h3><p>While CPUs have a single flat memory region with a single address space, most heterogeneous devices have a more complex hierarchy of memory regions each with their own distinct address spaces. Each of these memory regions have a unique access scope, semantics and latency. Some heterogeneous programming models provide a unified or shared memory address space to allow more generic programming such as OpenCL 2.x [6], HSA [7] and CUDA [8], however, this will not always result in the most efficient memory access. This can be supported either in hardware where the host CPU and remote devices share the same physical memory or software where a cross-device cache coherency system is in place, and there are various different levels at which this feature can be supported. In general, this means that pointers that are allocated in the host CPU memory region can be used directly in the memory regions of remote devices, though this sometimes requires mapping operations to be performed.</p><p>We may wish to investigate this feature further to incorporate support for this kind of systems, ensuring that the <code>managed_ptr</code> can fully utilise the memory regions on these systems.</p><h2 id="references"><a name="references" href="#references"></a>References</h2><p>[1] P0443R0 A Unified Executors Proposal for C++:<br><a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0443r0.html">https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0443r0.html</a></p><p>[2] SYCL 1.2 Specification:<br><a href="https://www.khronos.org/registry/sycl/specs/sycl-1.2.pdf">https://www.khronos.org/registry/sycl/specs/sycl-1.2.pdf</a></p><p>[3] STEllAR-GROUP HPX Project:<br><a href="https://github.com/STEllAR-GROUP/hpx">https://github.com/STEllAR-GROUP/hpx</a></p><p>[4] KoKKos Project:<br><a href="https://github.com/kokkos">https://github.com/kokkos</a></p><p>[5] Raja Project:<br><a href="http://software.llnl.gov/RAJA/">http://software.llnl.gov/RAJA/</a></p><p>[6] OpenCL 2.2 Specification<br><a href="https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf">https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf</a></p><p>[7] HSA Specification<br><a href="http://www.hsafoundation.com/standards/">http://www.hsafoundation.com/standards/</a></p><p>[8] CUDA Unified Memory<br><a href="https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/">https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/</a></p>

<footer style="position:fixed; font-size:.8em; text-align:right; bottom:0px; margin-left:-25px; height:20px; width:100%;">generated by <a href="http://pad.haroopress.com" target="_blank">haroopad</a></footer>
</body>
</html>
