<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
<HTML 
lang=en-us><HEAD><TITLE>N3209: Progress guarantees for C++0x (revised)</TITLE>
<META content="text/html; charset=windows-1252" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.7600.16625"></HEAD>
<BODY>
<TABLE summary="Identifying information for this document.">
  <TBODY>
  <TR>
    <TH>Doc. No.:</TH>
    <TD>WG21/N3209<BR>J16/10-0199</TD></TR>
  <TR>
    <TH>Date:</TH>
    <TD>2010-11-11</TD></TR>
  <TR>
    <TH>Reply to:</TH>
    <TD>Hans-J. Boehm, Pablo Halpern</TD></TR>
  <TR>
    <TH>Phone:</TH>
    <TD>+1-650-857-3406</TD></TR>
  <TR>
    <TH>Email:</TH>
    <TD><A href="mailto:Hans.Boehm@hp.com">Hans.Boehm@hp.com</A>, <A 
      href="phalpern@halpernwightsoftware.com">phalpern@halpernwightsoftware.com</A></TD></TR></TBODY></TABLE>
<H1>N3209: Progress guarantees for C++0x (US 3 and US 186)(revised)</H1>
<P>National body comment US 3 points out that the FCD makes no meaningful 
progress guarantees for multithreaded C++ programs. US 186 points out that the 
current try_lock specification allows the function to always spuriously fail, 
again making it impossible for programmers to provide useful progress 
guarantees. 
<P>The complaints are clearly correct; we say nothing about progress guarantees. 
Many real applications will need to make some assumptions beyond the standard. 
The question is whether we want to address this directly, or view it in the same 
way that we currently view performance guarantees: as a quality of 
implementation issue. The correct answer is not clear. Here we argue for a 
fairly conservative solution that adds at most "normative encouragement" to the 
specification. The arguments in favor of such a minimalist solution are 
essentially: 
<OL>
  <LI>It's late in the game. 
  <LI>We are not aware of any approach to specifying progress guarantees that is 
  painlessly applicable across the variety of platforms that are likely to want 
  to support C++0x. Any attempt to do something more substantial is likely to be 
  controversial. 
  <LI>It appears likely that providing a meaningful guarantee would necessitate 
  other changes in the library specification. </LI></OL>
<P>Our proposal implements the "normative encouragement" consensus that appeared 
in Rapperswil, though one of us is mildly concerned that even this may be 
stronger than appropriate. 
<P>Here we outline the difficulties in doing anything more substantial. 
<H2>Difficulties in specifying progress guarantees</H2>Here we list our concerns 
with adding a hard progress guarantee specification. Essentially all of these 
were previously discussed on the reflector. They also point to some of the 
difficulties with providing very precise "normative encouragement". 
<OL>
  <LI>Java previously considered this issue, and made only very minimal 
  guarantees. I believe Java guarantees that if I write (<I>warning 
  pseudocode!</I>) with <TT>x</TT> initially false and atomic (i.e. Java 
  volatile) 
  <TABLE border=1 align=center>
    <TBODY>
    <TR>
      <TD>Thread 1 </TD>
      <TD>Thread 2 </TD></TR>
    <TR>
      <TD rowSpan=3>x = true;<BR>print "hello";<BR>... 
      <TD rowSpan=3>while (!x) {} <BR><BR></TD></TR></TBODY></TABLE>and "hello" is 
  printed, then thread 2 terminates, i.e. the update to <TT>x</TT> can't be in 
  flight forever. We believe it makes no guarantee that "hello" is ever printed, 
  or the loop terminates if it's not. This was intentional. Everyone agreed that 
  a general purpose JVM should provide stronger guarantees. But it was argued 
  that it was fine for something like the Oracle embedded JVM that processes 
  stored procedures not to use fully preemptive scheduling, and thus this should 
  not be required in all contexts. We believe similar considerations apply for 
  C++ use in embedded systems. Some of us believe that the resulting formulation 
  at <A 
  href="http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.4.9">http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.4.9</A> 
  is still a more complicated specification than we would like in the C++ 
  standard. 
  <LI>The Java guarantee requires that atomic updates become visible to other 
  threads in finite amount of time. This would at least be far easier to 
  implement if hardware provided a similar guarantee, i.e. that values written 
  to a processor store buffer eventually become visible to other threads (even 
  if the processor then goes into a tight infinite loop, touching little 
  memory). To our current knowledge, not all major processor vendors have 
  committed to this, making it unclear what an implementation needs to look like 
  in order to absolutely guarantee this. 
  <LI>US 3 really wants a stronger guarantee than (my interpretation of) Java, 
  namely that the above loop above actually terminate no matter what, i.e. that 
  thread 1 is scheduled and the new value of <TT>x</TT> eventually becomes 
  visible to thread 2. This is not going to be enforceable on systems with 
  strict priorities in which there might be a single processor, and thread 2 
  might always have higher priority than thread 1. Since C++0x does not provide 
  priorities, this is not a show-stopper. But it does mean that progress 
  guarantees would have to be weakened if we added priorities in the future. And 
  it means that we would be providing a guarantee that cannot be assumed by 
  library writers on most current systems. For example, a library intended to be 
  callable from an arbitrary pthreads thread should not spin-wait on a helper 
  thread with normal priority, since the helper thread may never be scheduled. 
  We would be guaranteeing that this works. 
  <LI>We suspect that an MxN thread implementation along the lines that Solaris 
  traditionally provided wouldn't satisfy the requested guarantee either in all 
  contexts. But we're not sure about this. 
  <LI>If we want to guarantee progress for all unblocked threads, what does this 
  mean for a thread that occasionally acquires a lock, perhaps as an 
  implementation detail in some library routine? Is it required to eventually 
  acquire the lock if it is available infinitely often? This would effectively 
  require some level of fairness for locks, which may be at odds with 
  performance. Any progress guarantee without such a guarantee seems nearly 
  vacuous, unless we restrict which library routines may acquire locks. 
  <P>Lock implementations sometimes use a briefly held spin lock to protect the 
  internal data structures of the lock. Any real notion of fairness seems 
  unattainable if an unlucky thread can repeatedly fail to acquire this spin 
  lock, as appears likely in such a setting. </P></LI></OL>
<P>Thus imposing an absolute requirement here is tricky, and may be 
inappropriate for some settings. The normative encouragement option suggested by 
US16 as a possible alternative seems much more palatable. 
<H2>Try_lock spurious failures</H2>
<P>If we choose not to provide hard progress guarantees for the C++ threads in 
general, there are fewer reasons to say anything about <TT>try_lock()</TT> 
spurious failures. 
<P>It can still be argued that some legitimate algorithms could rely on the fact 
that <TT>try_lock()</TT> failed to obtain useful failures. One example, due to 
the authors of US 186, is that a graph traversal algorithm might legitimately 
conclude that if a lock on a node is held by another traversal thread, this node 
is already being visited by another thread, and hence no further work needs to 
be done. 
<P>However, it appears to us that such algorithms can almost always be more 
simply rewritten using the atomics library. A graph traversal algorithm 
typically already needs a flag to indicate whether a node has been visited. If 
that flag is made atomic, rather than protected by a lock, it can also easily be 
used to detect an in-progress traversal and ensure that exactly one thread 
processes each node, if that is desired. 
<P>On the other hand, there are strong reasons to require that programs be 
written to tolerate spurious <TT>try_lock()</TT> failures: 
<OL>
  <LI>As pointed out in <A 
  href="http://portal.acm.org/citation.cfm?id=1375581.1375591">Boehm, Adve, 
  "Foundations of the C++ Concurrency Memory Model", PLDI 08</A>, enforcing 
  sequential consistency for data-race-free programs without spurious 
  <TT>try_lock()</TT> failures requires significantly stronger memory ordering 
  for <TT>lock()</TT> operations on <TT>try_lock()</TT>-compatible mutex types. 
  On some architectures that significantly increases the cost of uncontended 
  mutex acquisitions. This cost appears to greatly outweigh any benefit from 
  prohibiting spurious <TT>try_lock()</TT> failures. 
  <LI>It allows a user-written <TT>try_lock()</TT> to fail if, for example, the 
  implementation fails to acquire a low-level lock used to protect the mutex 
  data structure. Or it allows such an operation to be written directly in terms 
  of <TT>compare_exchange_weak</TT>. 
  <LI>It ensures that client code remains correct when, for example, a debugging 
  thread is introduced that occasionally acquires locks in order to be able to 
  read consistent values from a data structure being checked or examined. Any 
  code that obtains information from <TT>try_lock()</TT> failure would break 
  with the introduction of another thread that purely locks and reads the data 
  structure. </LI></OL>
<P>(All of these were brought out in the reflector discussion that also included 
Herb Sutter, Bronek Kozicki, Lawrence Crowl.) 
<P>On the other hand, we would like to discourage implementations from 
generating actual spurious failures of <TT>try_lock()</TT> when they could be 
avoided, since those introduce performance problems. And we want to make it 
clear that a <TT>try_lock()</TT> implementation that always fails is not useful. 
Thus we propose either normative encouragement, or a note, to explicitly 
discourage such implementations. 
<H2>Proposed resolution:</H2>We propose wording that provides normative 
encouragement to address both issues. We are not sure (with some mild 
disagreement among the authors) whether this requirement can even be stated 
precisely enough for normative encouragement. An alternative is to move all or 
some of the normative text into notes. We propose to add after 1.10p1 
[intro.multithread]: 
<BLOCKQUOTE><INS>Implementations should ensure that all unblocked threads 
  eventually make forward progress. [Note: Standard library functions may 
  silently block on I/O or locks. Factors in the execution environment, 
  including externally-imposed thread priorities, may prevent an implementation 
  from making certain guarantees of forward progress. -- end note]</INS> 
</BLOCKQUOTE>We also propose to add at the end of 1.10 [intro.multithread]: 
<BLOCKQUOTE><INS>An implementation should ensure that the last value (in 
  modification order) assigned by an atomic or synchronization operation will 
  become visible to all other threads in a finite period of time.</INS> 
</BLOCKQUOTE>Recall that it is currently unclear whether existing hardware 
easily supports this as a hard guarantee. But a lot of existing software would 
not work well if the guarantee were frequently violated. (Note that the 
corresponding property for ordinary objects is not observable without 
introducing a data race.) 
<P>And finally add at the end of 30.4.1p14 [thread.mutex.requirements] 
<BLOCKQUOTE><INS>An implementation should ensure that try_lock() does not 
  consistently return false in the absence of contending mutex 
  acquisitions.</INS> </BLOCKQUOTE>
<P>(Does "should" make sense in a requirements table like this? I think it is OK 
to apply to user-defined implementations, as well as the built-in mutex types. 
It will in fact apply to only the built-in mutex types if the lockable 
requirements paper goes in, to both if it doesn't. This is is too vague and 
untestable to be a "shall" requirement.) 
<P>For consistency, add to the <TT>compare_exchange_weak</TT> specification 
[atomics.types.operations] 29.6p23, just before the note: 
<BLOCKQUOTE><INS>Implementations should ensure that weak compare-and-exchange 
  operations do not consistently return false unless either the atomic object 
  has value different from <TT>expected</TT>, or there are concurrent 
  modifications to the atomic object.</INS> </BLOCKQUOTE>
<P>By stating these explicitly, we preserve our options, but potentially avoid 
some misunderstandings. </P></BODY></HTML>
