CSI2107-01: Computer System


Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due


File I/O

CSI2107-01: Computer System,
Programming Assignment #5
(due on Sunday, Dec 22, 2024 by 11:59 pm)

Objectives

This programming assignment is designed to deepen your understanding of File I/O operations by focusing on the use of system calls commonly employed in Unix I/O.

Introduction

The goal of this assignment is to develop a program that reads a movie script file called “500-Days-of-Summer" provided via a shell command and performs specific search functional ities. Once the program starts, it waits for the user to input a keyword (up to 512 bytes) and performs different types of word searches based on the entered keyword.

Handout Instructions

Use the following command to retrieve the contents of this assignment.
Linux> cd~
Linux> tar xvf /root/cs_pa5.tar
Inside the cs_pa5 directory, you will find pa5.c, makefile and 500-Days-of-Summer_s.txt files.You will only need to modify the pa5 . c file!

Compilation and Execution

Simply execute“make” to generate one executable, “pa5”.pa5 takes one argument, one input file.

For example, you can run the program, with the following command:

Linux> cd cs_pa5/
Linux> make
Linux> ./pa5 500-Days-of-Summer_s.txt


Figure I: Example output
Figure l shows an example output from execution. The output shows the location of the input text. There is 4 case of the input and output types:


Case 1: Searching Single Word Locations

When the program receives a single word as input, it scarches for the word's location(s), and prints the location(s) to STDOUT in the following format:

[line number]:[start index of the word]


Case 2: Searching Several Words Locations

When the program receives multiple words (separated by a single space), it searches for line(s) that contain all the entered words and prints the line(s) to STDOUT in the following format:
[line number]

Case3: Searching Several Consecutive Words Locations

When the program receives a phrase enclosed in double quotes(**), it searches for the location(s) of the phrase.

  • The phrase does not contain newline characters (e.g.\n).
  • The phrase may include consecutive spaces or tabs(\t).
  • The program prints the location(s) to STDOUT in the following format
[line number]:[start index of the phrase]

If multiple occurrences of the phrase exist within a single line, the program prints all their locations, as shown in the example below.

Example Input File Line: HA HA HA HA (first line of the input file )

Input: "HA HA"


Output: 1:0 1:3 1:6

Case 4: Searching Simple Regular Expression Keyword Locations
When the program receives a string in the format [word1]*[word2] (two words scparated by an*), it searches for line(s) where [word1 ] and [word2 ] are separated by one or more characters.


  • There must be no spaces between [word1] and *, or between * and [word2).
  • The case [word2]*[word1] is not considered.
  • The program prints the line(s) to STDOUT in the following format:



[line number ]

Restriction

This assignment is designed to be implemented in the Linux server environment (e.g,x86_64).The use of header files is restricted to the following. Using any other header files will result in a score of zero:
  • Custom header files implemented by a student.
  • unistd.h, fcnt1.h, stdlib.h, sys/types.h, sys/stat.h, and errno.h.


The binary executable generated throug h the Makefile must be named PA5.

Word: A string separated by spaces, which is case-insensitive.

Examples of words that can be used for word scarches include: eg, god, and, adam, brother's, priests', kirjath-arba, sons', score: !

Phrase: A string separated by newlines, which is case-insensitive.

Whitespace: Includes tabs, spaces, and newlines.

Words containing writing symbols are treated as distinet words. For example, searching for ‘he' should only return ‘he' in the results and not 'her' or ‘he's’.

Supplementary explanations

Line number: The line number starts from l and includes rows with no content, such as empty lines, a well


Index: The starting position of a word (or phrase) in a line is numbered, beginning from 0 and increasing by 1 for cach 1-byte character present in the ASCII table. This includes spaces and writing symbols.

If the given word appears multiple times in a movie script, all corresponding lines must be printed. For Case I and Case 3,all positions must be printed even if they are in the same line. For Case 2 and Case 4, cach line with duplicates should only be printed once. When there are multiple search results, a space must be added after cach result position. At the end of the last search result, a newline character should be used instcad of a space. The example below illustrates this: 

Input: "he is"
Output: 1955:33 2315:42 5885:22(New Line)

In the input, the characters " and * are classified as follows:

  • The input will not contain both " and *simultancously, so such cases are not tested.
  • If the input contains the " character, it corresponds to Case 3.
  • If the input contains the* character, it corresponds to Case 4.
  • If the input consists of two or more words without the " or * characters, it corresponds to Case 2.
  • If the input consists of a single word without the" or * characters, it corresponds to Case 1.
Scarch results should be printed in the order they appear, starting from the first row and column of the input file. Both the input file and the keywords are limited to ASCI1 characters. The strings in the input file may include multiple consecutive spaces, but the file itself does not start with a space. Similarly, keywords do not have spaces at the beginning or the end. The program terminates when it receives 'PASEXIT’ as an input.

Grading Policy

Points allocated for each test case will be added to the total score if the test case is passed. If a test case is not passed, the allocated points will not be included in the total score. There is no partial scoring for individual test cases.


There are a total of 18 test cases, cach worth 5 points. Additionally, the report is worth 10 points.

If the execution time for a test case excceds 2 minutes on the grading server, no points will be awarded for that test case. Only test cases with an execution time of 1 0 seconds or less will be used for grading. The grading server is identical to the server used for the assignment.

The input file is assumed to be too large to fit entirely into the system's memory. However, the longest line within the file is smal cnough to be processed in memory. This means the program needs to hand le the file in smaller chunks (i.e., line) rather than loading it all at once. For each input, the program must read and process each line from the beginning of the file.

There is a 10% penalty per day after the submission deadline. You carn no points after 5 days of the deadline.Grading will be based on whether each test case is passed.


Hand-in Instructions

When submitting the assignment, compress the files into "Student_ID.tar" and upload it to LearnUs with your report in pdf.


The compressed file should not contain directories, only the following files:


  • pa5.c (C code files)
  • pa5.h (Header files)
  • report (a pdffile)


发表评论

电子邮件地址不会被公开。 必填项已用*标注