What are the statements to modify the metadata of variables?

The ATTRIB, LABEL, LENGTH, FORMAT, INFORMAL statements. All of these are designed to modify the metadata of variables that we’re processing in a DATA step in DS2. We have none of those, but we do have the DECLARE statement. Remember, the DECLARE statement can use the HAVING clause to change formats, formats, labels, that type of thing. And so while the LENGTH, LABELS, ATTRIBUTES, FORMAT, INFORMAL statements all have differing syntaxes, in DS2 we only have one statement, the DECLARE statement, and the syntax is always the same. The next thing I discovered missing in DS2 was the ARRAY statement. I use ARRAYS extensively in my DATA step programming. And I was very disappointed to think that I would not have ARRAYS available to me in DS2, but it’s not true. If you think about things, in Base SAS, we do two significantly different things with the same statement.

This ARRAY is an ARRAY of PDF variables. It’s pointing at a series of variables in the program DATA vector named N1, N2, N3, N4, so on. Now, this ARRAY is a temporary ARRAY, and it’s not associated with program DATA vectors at all. It’s just using system space to store some information for us in an ARRAY kind of format. In DS2, we approach this with two different approaches. The first one, an an an an an an an an an an ARRAY of PDF variables, is implemented in DS2 with an ARRAY statement. Notice that, really, the only difference between this and the ARRAY statement was the requirement to specify a data type. So ARRAY DOUBLE and then N5. This will produce a series of variables in the program DATA vector, N1, 2, 3, 4, 5. And it’ll work just like an ARRAY reworked in a traditional DATA step.

For the temporary ARRAY, those ARRAYS that have elements, not at the program DATA vector, we had to specify the type. If you remember, when we did this, and we had to specify TEMPORARY, and then the system took care of putting that out in the system space. In DS2, you just declare a multidimensional structure. So here I’m declaring the C ARRAY as a five-element one-dimensional ARRAY. And it’s character one, and I get five elements stored in system space, just like a temporary ARRAY in the traditional DATA step. And like the temporary ARRAY in the traditional DATA step, whose elements did not appear in the program DATA vector, because they are not associated with variables, ARRAYS declared with a DECLARE class statement, whether they’re global or local in DS2, their elements are never associated with PDF variables, so they don’t appear in a PD either. And that’s quite handy when you’re using ARRAYS as look-up tables.

Now, some things are missing from the DATA step that is missing in DS2. And so these are all statements associated with processing text files. And as of this course writing, DS2 still only reads and writes from tables. So it does not read text files. And that’s why those statements are not in the DS2 documentation. Then there’s another series of statements that we use in Base SAS to control the SAS session itself. Rather than the program, we’re feeding back and controlling the SAS session. We can actually turn the SAS session off, affect the color of the log with DISPLAY, MANAGER commands, those types of things. We can also execute operating system commands from DS2. DS2 actually runs in a separate process from the SAS process that spawned it. And as a result, it gets information from the SAS system, and it sends back your log. But it executes in a completely separate process, so it doesn’t have any hooks to control the SAS session that spawned it or to interact with the operating system that SAS is operating in.

We’re going to see later on that the DS2 that the SAS started, might not even be running on the same platform, it might be running up in CAS, it might be running in Hadoop or Teradata. And so I don’t expect to see any of this type of functionality ever built into DS2. Hi, it’s Mark again. In lesson three of High-Performance Data Manipulation with SAS DS2, we’ll take a look at program structure in more detail. Also, we’ll take a deeper dive into the different data types that DS2 can process. And finally, we’ll look at some interesting functions that can be useful on our DS2 programs. So let’s get going. Before we dive too deeply into the program blocks themselves, I’d like to talk about a particular global statement for the DS2 procedure. Now, the DS2 procedure supports very few global statements that go outside of program blocks. And they’re not global, like the way we think about a global statement in the SAS system, for instance. A global statement in PRO DS2 goes before a program block, affects the behavior of processing for that block, and that block alone. And as soon as the subsequent program block has finished processing, the option goes back to its default behavior.

We’re going to look at DS2 options, which allow us to change some things that make it helpful when we’re processing in DS2 for overcoming things like division by zero, and for troubleshooting. So the first one is the DIVBYZERO=. Now, the default behavior for DIVBYZERO is an error. In the old days, a traditional SAS DATA step, when you were running your program, and everything was fine, and then all of a sudden a variable came into play with a zero in it that was the divisor in a problem, the program used to throw an error and stop processing. We called these data errors. In subsequent versions of SAS, they throw a note in the log, but they continue to process, just produce a missing value.

Now, DS2 is very ANSI compliant. And ANSI systems don’t allow division by zero. And so normally, if you get into a situation where the divisor is zero, then DS2 is going to throw an error and stop processing. If you wish to make this more like a traditional SAS DATA step, then you can set DS2 option DIVBYZERO=IGNORE, in which case you won’t even get a note in the log, a null or missing value will be produced, and processing will continue. But the most useful one is type one. As you can imagine, we’d said that DS2 supported 17 different data types. And those data types are invariably going to be used in multiple expressions that cause data type conversions to happen automatically in the background all the time.

Leave a Comment