Homework 1 – ECS 120, Winter 2024
1 Auto-graded problems
These problems are not randomized, so there is no need to first submit a file named req. Each problem below appears as a separate “Assignment” in Gradescope, beginning with “HW1:” .
1.1 DFAs
For each problem submit to Gradescope a .dfa file describing a DFA deciding the given language. Make sure that it is a plain text file that ends in .dfa (not .txt).
Use the finite automata simulator to test the DFAs: http://web.cs.ucdavis.edu/~doty/ automata/. Documentation is available at the help link at the top of that web page.
Do not just submit to Gradescope without testing on the simulator. The purpose of this homework is to develop intuition. Gradescope will tell you when your DFA gets an answer wrong, but it will not tell you why it was wrong. You’ll develop more intuition by running the DFA in the simulator, trying to come up with some of your own examples and seeing where they fail, than you will by just using the Gradescope autograder as a black box. Once you think your solution works, submit to Gradescope. If you fail any test cases, go back to the simulator and use it to see why those cases fail. During an exam, there’s no autograder to help you figure out if your answer is correct. Practice right now how to determine for yourself whether it is correct.
Gradescope may give strange errors if your file is not formatted properly. If your file is not formatted properly, the simulator will tell you this with more user-friendly errors. Also, if you lose points on a Gradescope test case, try that test case in the simulator to ensure that your DFA is behaving as you expect.
begin and end: {w ∈ {0, 1}* | w begins with 010 and ends with a 0 }
at most three 1s: {w ∈ {0, 1}* | w contains at most three 1’s}.
no substring: {w ∈ {a, b, c}* | w does not contain the substring acab}.
even odd: {w ∈ {a, b}* | w starts with a and has even length, or w starts with b and has odd length }.
mod: {w ∈ {0, 1}* | w is the binary expansion of n ∈ N and n ≡ 3 mod 5}. Assume ε represents 0 and that leading 0’s are allowed. A number n ∈ N is congruent to 3 mod 5 (written n ≡ 3 mod 5) if n is 3 greater than a multiple of 5, i.e., n = 5k + 3 for some k ∈ N. For instance, 3, 8, and 13 are congruent to 3 mod 5.
1.2 Regular expressions
For each problem submit to Gradescope a.regex file with a regular expression deciding the given language.Use the regular expression evaluator to test each regex: http://web.cs.ucdavis.edu/~doty/automata/. Do not test them using the regular expression library of a programming language; typically these are more powerful and have many more features that are not available in the mathematical definition of regular expressions from the textbook. Only the special symbols ( ) * + | are allowed, as well as “input alphabet” symbols: alphanumeric, and . and @.
Note on subexpressions: You may want to use the ability of the regex simulator to define subexpressions that can be used in the main regex. (See example that loads when you click “Load Default”).But it is crucial to use variable names for the subexpressions that are not themselves symbols in the input alphabet; e.g., if you write something like A = (A|B|C);, then later when you write A, it’s not clear whether it refers to the symbol A or the subexpression (A|B|C). Instead try something like alphabet = (A|B|C); and use alphabet in subsequent expressions,or X = (A|B|C); if X is not in the input alphabet.
Note on nested stars: Regex algorithms can take a long time to run when the number of nested stars is large. The number of nested stars is the maximum number of* ’s (or+ ’s) that appear on any root-to-leaf path in the parse tree of the regex. a*b* has one nested star, (a* )*b* has two nested stars, and ((a* )*b* )+ has three nested stars. Note that some of these are unnecessary; for instance (a* )*b* is equivalent to a*b* None of the problems below require more than two nested stars; if you have a regex with more, see if it can be simplified by removing redundant stars such as (a* )* .
even/odd/substring:
{x ∈ {a,b}* ' x c(x h)o(a)n(s)tains bo(an even)th(n)t(u)h(m)e(b)subs(er of)tr(a)o()rba(x)bb and(has a)naaba(odd)anumber of b’s, or }
first appears more:
{x ∈ {0, 1}* | |x| ≥ 3 and the first symbol of x appears at least three times total in x}
repeat near end: {x ∈ {0, 1}* | x[|x| − 5] = x[|x| − 3] }
Assume we start indexing at 1, so that x[|x|] is the last symbol in x, and x[1] is the first.
email: {x ∈ Σ* | x is a syntactically valid email address}
Definition of “syntactically valid email address”: Let Σ = {., @, a, b } contain the alphabetic symbols a and b,1 as well as the symbols for period . and “at” @. Syntactically valid emails are of the form username@host . domain where username and host are nonempty and may contain alphabetic symbols or ., but never two .’s in a row, nor can either of them begin or end with a ., and domain must be of length 2 or 3 and contain only alphabetic symbols. For example, [email protected] and [email protected] are valid email addresses, but aaabb.aba is not (no @ symbol), nor is [email protected] (username starts with a .), nor is
[email protected] (domain is too long), nor is [email protected] or aaba@aaabb. (domain is too short), nor is [email protected] (two periods in a row), nor is ab.ba@[email protected] (too many @ symbols).
sequence design for DNA nanotechnology: We once designed some synthetic DNA strands
that self-assembled to execute Boolean circuits: https://web.cs.ucdavis.edu/~doty/papers/ #drmaurdsa. We had to be careful designing the DNA sequences to ensure they behaved as
we wanted. Among other constraints, every sequence needed to obey all of the following rules:
. starts with a G or C and ends with a G or a C,
. has an A or T within two indices of each end (i.e., the first, second, or third symbol is an A or T, and also the last, second-to-last, or third-to-last symbol is an A or T),
. has at most one appearance of C,
. does not have four G’s in a row; this would form something we didn’t want, called a G-tetrad or G-tetraplex: https://tinyurl.com/yzkq3tzw
Write a regex indicating strings that violate any of the rules above,i.e., it decides the following language: {x ∈ {A, C, G, T}* | x violates at least one of the rules}.
1.3 CFGs
For each problem submit to Gradescope a .cfg file with a context-free grammar deciding the given language.
mod length: {x ∈ {a,b}* | |x| ≡ 3 mod 5}
substring: {x ∈ {a,b}* | x contains the substring abba}
equal 0 and 1: {x ∈ {0, 1}* | #(0, x) = #(1, x)}
palindrome: {x ∈ {0, 1}* | x = xR }
Recall that xR is the reverse of x.
first or last: {0i1j 0k | i,j, k ∈ N and (i = j or j = k)}
integers: The set of strings that look like nonnegative decimal integers with no leading 0’s. For example: 0, 1, 2, 3, 10, 11, 12, 21, 100, 99999
expressions: The set of strings that look like arithmetic expressions using nonnegative integers and the operations +, -, *, /, and parentheses to group terms.
For example, the following are properly formatted arithmetic expressions: 0, 2, 2+30, 2+30*401, (2+30)*401/(23+0), (((1+2)/3-4)*5+6)*7
The following are not: 02, (2+30, 2+30*401+, (2+30)*401), -4, 2++3, (), 2*(), ((((1+2)*3-4)*5+6)*7
2 Written problems
Please complete the written portion of this homework on Gradescope, in the assignment titled “HW1 written”. There, you will find the problem statements for the written portion. Please type solutions directly into Gradescope, using appropriate mathematical notation when appropriate, by typing LATEX in double dollar signs. For example, type $$D = (Q,\Sigma,\delta,s,F)$$ to display D = (Q,Σ,δ,s, F). By clicking outside the text entry field, you can see a preview of how the mathematics will render. See the second half of this page for examples: https://hackmd.io/ cmThXieERK2AX_VJDqR3IQ?both#Gradescope-MarkdownLatex
Your written solutions will be checked for completeness but not for correctness. To receive credit, you must make a serious attempt at all problems.
3 Optional challenge problems
Please read the syllabus for a discussion of optional challenge problems. Briefly, you don’t have to submit a solution to these, and they aren’t worth any points. But, if you find any interesting, and if you think you have a solution, please email it directly to me: [email protected].
1. You showed by a simple counting argument that some language A ⊂ {0, 1}≤5 cannot be decided by any DFA with fewer than 9 states. In this problem, we will see how far this can be pushed.
Step 1 (easy): Devise a single DFA D that can decide any language A ⊂ {0, 1}≤5 by setting accept states appropriately. In other words, give Q, s ∈ Q, and δ : Q × {0, 1} → Q so that, for every A ⊂ {0, 1}≤5, there is FA ⊆ Q such that, letting DA = (Q,{0, 1},δ,s, FA ) be a DFA, we have L(DA) = A. How large is |Q|?
Step 2 (moderate): If you are allowed to modify both the set of accept states and the transitions, can you make the number of states of D less than 30? In other words, show that for every language A ⊂ {0, 1}≤5, some DFA with at most 30 states decides A.
Step 3 (difficult): What is the smallest number of states needed to decide any language A ⊂ {0, 1}≤5? More precisely, if s(A) is the number of states in the smallest DFA
deciding A, what is max s(A)? For this, you might find the Myhill-Nerode Theorem A己{0,1}≤5
useful: https://en.wikipedia.org/wiki/Myhill%E2%80%93Nerode_theorem