Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
File I/O
Objectives
This programming assignment is designed to deepen your understanding of File I/O operations by focusing on the use of system calls commonly employed in Unix I/O.Introduction
The goal of this assignment is to develop a program that reads a movie script file called “500-Days-of-Summer" provided via a shell command and performs specific search functional ities. Once the program starts, it waits for the user to input a keyword (up to 512 bytes) and performs different types of word searches based on the entered keyword.Handout Instructions
Use the following command to retrieve the contents of this assignment.Linux> cd~
Linux> tar xvf /root/cs_pa5.tar
Inside the cs_pa5 directory, you will find pa5.c, makefile and 500-Days-of-Summer_s.txt files.You will only need to modify the pa5 . c file!
Compilation and Execution
Simply execute“make” to generate one executable, “pa5”.pa5 takes one argument, one input file.
For example, you can run the program, with the following command:
Linux> cd cs_pa5/Linux> make
Linux> ./pa5 500-Days-of-Summer_s.txt
Case 1: Searching Single Word Locations
When the program receives a single word as input, it scarches for the word's location(s), and prints the location(s) to STDOUT in the following format:
Case 2: Searching Several Words Locations
When the program receives multiple words (separated by a single space), it searches for line(s) that contain all the entered words and prints the line(s) to STDOUT in the following format:
[line number]
Case3: Searching Several Consecutive Words Locations
When the program receives a phrase enclosed in double quotes(**), it searches for the location(s) of the phrase.
- The phrase does not contain newline characters (e.g.\n).
- The phrase may include consecutive spaces or tabs(\t).
- The program prints the location(s) to STDOUT in the following format
If multiple occurrences of the phrase exist within a single line, the program prints all their locations, as shown in the example below.
Example Input File Line: HA HA HA HA (first line of the input file )
Input: "HA HA"
Output: 1:0 1:3 1:6
Case 4: Searching Simple Regular Expression Keyword Locations
When the program receives a string in the format [word1]*[word2] (two words scparated by an*), it searches for line(s) where [word1 ] and [word2 ] are separated by one or more characters.
- There must be no spaces between [word1] and *, or between * and [word2).
- The case [word2]*[word1] is not considered.
- The program prints the line(s) to STDOUT in the following format:
Restriction
This assignment is designed to be implemented in the Linux server environment (e.g,x86_64).The use of header files is restricted to the following. Using any other header files will result in a score of zero:- Custom header files implemented by a student.
- unistd.h, fcnt1.h, stdlib.h, sys/types.h, sys/stat.h, and errno.h.
The binary executable generated throug h the Makefile must be named PA5.
Word: A string separated by spaces, which is case-insensitive.
Examples of words that can be used for word scarches include: eg, god, and, adam, brother's, priests', kirjath-arba, sons', score: !
Phrase: A string separated by newlines, which is case-insensitive.
Whitespace: Includes tabs, spaces, and newlines.
Words containing writing symbols are treated as distinet words. For example, searching for ‘he' should only return ‘he' in the results and not 'her' or ‘he's’.
Supplementary explanations
Line number: The line number starts from l and includes rows with no content, such as empty lines, a well
Index: The starting position of a word (or phrase) in a line is numbered, beginning from 0 and increasing by 1 for cach 1-byte character present in the ASCII table. This includes spaces and writing symbols.
If the given word appears multiple times in a movie script, all corresponding lines must be printed. For Case I and Case 3,all positions must be printed even if they are in the same line. For Case 2 and Case 4, cach line with duplicates should only be printed once. When there are multiple search results, a space must be added after cach result position. At the end of the last search result, a newline character should be used instcad of a space. The example below illustrates this:
Input: "he is"
Output: 1955:33 2315:42 5885:22(New Line)
In the input, the characters " and * are classified as follows:
- The input will not contain both " and *simultancously, so such cases are not tested.
- If the input contains the " character, it corresponds to Case 3.
- If the input contains the* character, it corresponds to Case 4.
- If the input consists of two or more words without the " or * characters, it corresponds to Case 2.
- If the input consists of a single word without the" or * characters, it corresponds to Case 1.
Grading Policy
Points allocated for each test case will be added to the total score if the test case is passed. If a test case is not passed, the allocated points will not be included in the total score. There is no partial scoring for individual test cases.
There are a total of 18 test cases, cach worth 5 points. Additionally, the report is worth 10 points.
If the execution time for a test case excceds 2 minutes on the grading server, no points will be awarded for that test case. Only test cases with an execution time of 1 0 seconds or less will be used for grading. The grading server is identical to the server used for the assignment.
The input file is assumed to be too large to fit entirely into the system's memory. However, the longest line within the file is smal cnough to be processed in memory. This means the program needs to hand le the file in smaller chunks (i.e., line) rather than loading it all at once. For each input, the program must read and process each line from the beginning of the file.
There is a 10% penalty per day after the submission deadline. You carn no points after 5 days of the deadline.Grading will be based on whether each test case is passed.
Hand-in Instructions
When submitting the assignment, compress the files into "Student_ID.tar" and upload it to LearnUs with your report in pdf.
The compressed file should not contain directories, only the following files:
- pa5.c (C code files)
- pa5.h (Header files)
- report (a pdffile)