Jakob Østergaard Hegelund

Tech stuff of all kinds

Useful return from short-circuit or

2016-10-05

This could have been a bootnote on the next post but time flew and now it will have to be a post on its own. Consider two solutions to the same problem:

// C++ // a(), b() and c() return std::optional std::string pick() { return a().value_or(b().value_or(c().value_or("-default-"))); } ;; LISP ;; (a), (b) and (c) return nil or string (defun pick () (or (a) (b) (c) "-default-"))

My point earlier was that having or return not a boolean, but the actual first non-false argument led to some simple beautiful constructs. I stand by that, but there's another important difference between the two solutions. Consider first the actual sequence of evaluations in the C++ example:

// First, as evaluate the arguments to the value_or functions, we // must call all three functions tmp0 = a(); tmp1 = b(); tmp2 = c(); tmp3 = std::string("-default-"); // Now, with the arguments evaluated, we can "fold" back to get // the result tmp4 = tmp2.value_or(tmp3); tmp5 = tmp1.value_or(tmp4); tmp6 = tmp0.value_or(tmp5); // Our result is in tmp6

Since value_or is a function, it's argument must be evaluated before the function can be called. With this follows that it is necessary to evaluate all three functions before we can decide which result to use - even though all we want is the first value. Compare that to a short-circuit or:

(let ((tmp0 (a))) (if tmp0 tmp0 (let ((tmp1 (b))) (if tmp1 tmp1 (let ((tmp2 (c))) (if tmp2 tmp2 "-default-"))))))

The important point here is, that if (a) evaluates to non-nil, it is returned and nothing else is evaluated. Only if it is nil, do we evaluate (b), and so forth.

So in conclusion; not only is a short-circuit or operator that actually returns the first non-false value useful for elegant programming constructs, it is also the most efficient solution for the solution of this simple problem.

In programming, if you find that your solution to a simple problem is not simple, you are in trouble. The short-circuit or is indeed a simple solution to a simple problem.

Useful return from logical or

2016-09-29

This is just one if these little cool things that everyone should know about, but which are too small to really mention. So now I'm going to mention it anyway.

Picking the first useful value

A problem that I run across every now and then is, that I'm given a couple (or more) optional variables (either std::optional, home-brewn Optional or even pointers to values that can either be 0 or actually point to a valid value - conceptually that's the same for the purpose of this post).

std::string printable_customer_name(...) { // Retrieve contact details on customer std::string *fullname = ...; std::string *email = ...; // Now return the fullname if we have it, the e-mail otherwise, // or "-unknown-" if we don't have either if (fullname) return *fullname; if (email) return *email; return "-unknown-"; }

Now this gets the job done. We could have used std::optional instead; there could be all sorts of reasons why we would want that - but actually this would give us access to the .value_or method which seeks to assist us in solving exactly this type of problem. Lo and behold.

std::string printable_customer_name(...) { // Retrieve contact details on customer std::optional fullname = ...; std::optional email = ...; // Now return the fullname if we have it, the e-mail otherwise, // or "-unknown-" if we don't have either return fullname.value_or(email.value_or("-unknown-")); }

While shorter, this is hardly elegant. Imagine having to select between five different values; that's a lot of typing .value_or(). Could we imagine a better way? Well conceptually, we're doing a logical or returning the first value that does not evaluate to logical false. Could we exploit that?

std::string printable_customer_name(...) { // Retrieve contact details on customer std::optional fullname = ...; std::optional email = ...; // Now return the fullname if we have it, the e-mail otherwise, // or "-unknown-" if we don't have either return fullname || email || "-unknown-"; }

Even if the STL had an overloaded operator|| for std::optional and associated trickery to make such a construct possible, this would not generalize to our pointer example earlier. The return value of logical or is a boolean and that's the end of it (at least right now).

But it isn't like that everywhere - and this is what triggered this post. I recently wrote this code in a LISP project for solving this exact problem:

(defun get-printable-name-from (guid) "Returns fullname for account if one exists, otherwise email address, otherwise '-unknown-" (declare (string guid)) (let ((doc (api-get ...))) (car (or (sexp-path doc "contact" "fullname") (sexp-path doc "contact" "email") '("-unknown-")))))

While the above is actual production code, if we re-write it to match the style of our C++ examples, it would look more like:

(defun get-printable-name-from (guid) "..." (let ((fullname ...) (email ...)) (car (or fullname email '("-unknown-")))))

The (or takes the three arguments fullname, email and '("-unknown-"). The two first are variables that either hold the empty list (nil) or a list of one string element. The '("-unknown-") is a list literal, a single element list holding just the string "-unknown-".

What (or does, is, it returns the first element which does not evaluate to false (nil). It does not simply return true or false. This of course has interesting implications when the arguments to or are not of the same type. In other words, it returns the first variable which does not hold the empty list, or, it returns the non-empty list literal. The role of (car is to simply pick the first element of the list that is returned by (or; and we can immediately see that we are guaranteed to always get a non-empty list back from (or.

And that's actually it. It's not much, but having or return the first non-false element instead of simply returning true or false is really convenient.

Bitten by quotes

2016-08-25

I got bittem by quotes twice in a day recently. When "obviously correct" code doesn't work, debugging quickly gets frustrating. In both cases I had super simple obviously correct code that malfunctioned.

Naturally, in both instances, my code was obviously wrong once I got to take a step back.

Modifying literals

So I was writing test cases for a newly written procedure which destructively modifies a parameter passed to it. This itself is an interesting concept in LISP and it's one of these pretty big "little differences" to C++ where my experience really is.

Let's digress. In C++ you can pass arguments by value or by reference; but can choose to pass a pointer to an object - this is technically passing by value (you pass the pointer by value) but the code that gets generated by the compiler is more like when you pass by reference (which passes a pointer). However, in C++ you would stick a const on every argument your function shouldn't modify (which can cause confusion when you stick a const on a pointer-type value but the function then modifies the non-const data that the constant-valued pointer is pointing at; but poor code and lack of insight can cause mistakes in any language of course).

In LISP on the other hand, you always pass references to data (except when you pass literals) and there's no const equivalent. Effectively that means any procedure can modify the arguments it's passed. Of course you don't generally want that, and the standard library has conventions on procedure names that modify versus those that don't (e.g. subst vs. nsubst) - so this is not really a problem in the real world.

Anyway, I wrote a function that modifies a subset of an s-expression representing an XML document; it modifies the given document in-place and returns the modified document. The implementation is not important but the snippet below shows that in one situation we modify the cdr of the document (sexp) given.

(defun sexp-add-parameter (sexp key value) " ... " (declare (type list sexp) (type string key) (type string value)) ... ;; We don't have an arg list; insert one (setf (cdr sexp) (cons (list :@ (cons key value)) (cdr sexp))))) sexp)

My test code uses the 5am framework and I had two tests that looked like:

;; Add parameter to empty element with non-empty parameter list (5am:is (equal '("foo" (:@ ("a" . "b") ("key" . "value"))) (sexp-add-parameter '("foo" '(:@ ("a" . "b"))) "key" "value"))) ;; Add parameter to non-empty element with empty parameter list (5am:is (equal '("foo" (:@ ("key" . "value")) "baz") (sexp-add-parameter '("foo" (:@) "baz") "key" "value")))

So in short these two tests supply S-expressions that represent simple XML documents and the sexp-add-parameter procedure is called to insert (or alter) a given parameter (or attribute if you will) on the top-level element.

On my developer machine, executing this code from the editor (C-x C-e from Emacs using SLIME integration to the running SBCL LISP system) just worked. But the CI system failed the build! Re-building locally with the build-system optimization settings also failed the test.

That just sucks; obviously correct code that works on one optimization setting and fails on another. Argh! I trust the SBCL implementation, it is my general impression that it is quite mature, so I didn't want to believe this was an optimizer error. But debugging stuff like this... Argh... The debugger stack traces showed some really strange things like my simple quoted parameters being different from what the code said. So for example, evaluating the call (sexp-add-parameter '("foo" (:@) "baz")) would result in a debugger stack trace with a top-level call being (sexp-add-parameter '("foo" (:@ ("a" . "b")) "baz")) for example. Super super strange - I mean, you do the most simple thing; you execute a call - then your debugger claims the top-level is different from the call you just evaluated. Ouch.

Anyway, after a bit of hair pulling, I eventually ended up locating this little tidbit in the CLHS about the quote operator. It says The consequences are undefined if literal objects (including quoted objects) are destructively modified. Well d'oh. What was I thinking? This simple change fixed everything:

;; Add parameter to empty element with non-empty parameter list (5am:is (equal '("foo" (:@ ("a" . "b") ("key" . "value"))) (sexp-add-parameter (list "foo" (list :@ '("a" . "b"))) "key" "value"))) ;; Add parameter to non-empty element with empty parameter list (5am:is (equal '("foo" (:@ ("key" . "value")) "baz") (sexp-add-parameter (list "foo" (list :@) "baz") "key" "value")))

Ever since I learned using the quote operator, I just saw it as a shorthand for list (and cons). It never once occurred to me that list returns a freshly consed (freshly allocated) list, whereas the quote operator produces a list literal. And in most situation I could get away with my ignorance because "usually" I don't provide literal lists to procedures that modify their arguments. Lesson learned. More importantly I think I need to rename this procedure to make it clear that it modifies its argument.

Redefining procedures

When working in the LISP REPL you quickly get used to redefining procedures and re-evaluating calls to them - this is probably the primary reason I like working in a LISP environment; the escape from the old edit-compile-run cycle. But you quicly get spoiled - I got used to redefining my procedures and calling them and expecting them to actually be redefined. Well... It turns out that this does not always work exactly like I had imagined.

Let's say we have a report generator that can generate a number of different reports. Let's further entertain the idea that it holds a data structure that describes the report generators and also refers to the actual report generator procedures (for easy invocation from some "driver" or scheduling procedure). It could look like this:

(defparameter *reporters* `(,(make-repgen ... :generator #'repgen/accounts ...) ,(make-repgen ... :generator #'repgen/accounts-full ...)))

So in this example I have a list that contains two repgen structure instances. Each instance has a number of members, one of them being generator which is initialized to one of two report generator functions; repgen/accounts and repgen/accounts-full respectively.

While working on this system, I did some corrections to my repgen/accounts procedure, redefined it, and then called the report generator. Much to my surprise my changes did not take effect. Of course, I tried this a number of times with variations by my changes kept not taking effect. Again, this was a case of the simplest thing not working - very difficult to debug.

The following example may shed some light on this; a simple REPL session:

* (defun foo () "hi") FOO * (defparameter fun #'foo) FUN * (funcall fun) "hi" * (defun foo () "there") STYLE-WARNING: redefining COMMON-LISP-USER::FOO in DEFUN FOO * (funcall fun) "hi" *

I would have expected the second call to return "there" of course. Again, the CLHS to the rescue; the page about the Sharpsign Single-Quote and the function. It turns out that using #'foo results in the definition of the function foo, not a "dynamic link" to whatever definition is currently pointed to by the function by that name. So in other terms it's akin to static linking.

Of course when you know it, it's simple enough. I just re-evaluate my defparameter as well, after redefining procedures it reference.

What to take away...

I didn't read the HyperSpec from start to finish before starting programming LISP. And if I had, I wouldn't remember it anyway. In general I think it has been pretty smooth sailing - the language being so syntactically small makes it relatively easy to get started and doing relatively complex things (simple macros are not syntactically hard - they can be hard to wrap your mind around, but that is because of the concept, not so much because of the syntax). Of course I get bitten by my own misunderstandings and lack of insight at times, but frankly I'm surprised at how well it goes in general. This day was special - getten bitten by the simplest things, twice. Special enough to get space here.