Thursday, May 13, 2010

The course grades have been submitted

Excellence in Teaching Award for Demian Lessa

Later this year, I'd like to nominate Demian Lessa for the UB Graduate Student Excellence in Teaching Award. In my opinion, he did a tremendous job as a TA in CSE 462.

If you support this nomination, please send me email, if possible together with a testimonial.


--jc

Monday, May 3, 2010

Solutions to Exam #1 problems

http://www.cse.buffalo.edu/~chomicki/462/t1-sols.pdf

Wednesday, April 28, 2010

Solutions to Exam #2 problems

http://www.cse.buffalo.edu/~chomicki/462/t2-sols.pdf

Monday, April 26, 2010

Project #2 deadline

The deadline is Tuesday, April 27, 8:00am.

Friday, April 23, 2010

Recitation #13 Slides

The slides for the thirteenth (rD) recitation are available through the link below.

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/rD.pdf

Monday, April 19, 2010

Project #2 Extra Credit: Relational Schema

If you are providing a solution for the extra credit problem, use this script to create your schema in Oracle. Note: since we are providing the schema creation script, you no longer need to include it in your submission!


CREATE TABLE ROOT(
id INT NOT NULL PRIMARY KEY
);

CREATE TABLE NODE(
id INT NOT NULL PRIMARY KEY,
label VARCHAR(200) NOT NULL,
left INT NOT NULL,
right INT NOT NULL,
CONSTRAINT ck_left_right CHECK (left <= right)
);

CREATE TABLE TEXT(
id INT NOT NULL PRIMARY KEY REFERENCES NODE(id),
textValue VARCHAR(2000)
);

CREATE TABLE ATTRIBUTE(
id INT NOT NULL REFERENCES NODE(id),
attribute VARCHAR(200) NOT NULL,
textValue VARCHAR(2000),
CONSTRAINT pk_attribute PRIMARY KEY(id, attribute)
);

Sunday, April 18, 2010

Project #2 Submission Instructions

You should submit a single file named project2.zip using the submit_cse462 command. The structure of the compressed file should be as follows:
1) problem1 subfolder for Problem 1 containing:
  • XQuery files: Q1.xq, Q2.x1, Q3.xq, and Q4.xq
  • XML result files: Q1.xml, Q2.xml, Q3.xml, and Q4.xml
  • EITHER phd.xml file with embedded DTD
  • OR phd.xml and phd.dtd
2) optional problem2 subfolder for Problem 2 containing:
  • Java file: XMLToRelational.java
  • Inserts file: insert.sql
  • SQL files: Q1.sql, Q2.sql, Q3.sql, and Q4.sql
  • Result files: Q1.txt, Q2.txt, Q3.txt, and Q4.txt
A sample zip file with this structure is available at:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/project2.zip

Friday, April 16, 2010

Recitation #12 Slides

The slides for the twelfth (rC) recitation are available through the link below.

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/rC.pdf

Wednesday, April 14, 2010

PhD dissertation defence

 If you are interested in database research in our department, you may want to attend the following:

PhD Dissertation Defense

Xi Zhang

224 Bell Hall
Friday, April 16, 2010, 10am

Committee
Professor Jan Chomicki (Chair)
Professor Hung Q. Ngo
Professor Michalis Petropoulos


Title: Probabilities and Sets in Preference Querying

Abstract:

User preferences in databases are attracting increasing
interests with the boom of information systems and the trend of
personalization. In the literature, there are two different frameworks
dealing with this topic, namely quantitative approaches and qualitative
approaches. The former assume the availability of a scoring function,
while the latter do not. Instead, in qualitative approaches,
preferences are expressed using preference formulas. We investigate
three advanced topics on preferences stemming from those frameworks.

First, we study top-k queries over uncertain data in the quantitative
framework. We formulate three intuitive semantic properties for top-k
queries in probabilistic databases, and propose Global-Topk query
semantics which satisfies them to a great degree. We also design
efficient dynamic programming algorithms for query evaluation.

Second, we observe that so far all work on top-k queries in
probabilistic database focus on ordinal scores, while there are
applications where cardinal scores are more appropriate. This
motivates our work on preference strength, where we consider the
magnitude of score in addition to the order it establishes over
tuples.

Finally, as a counterpart to the top-k query in the quantitative
framework, we explore the set preference problem in the qualitative
framework. Observing the fact that preferences can also be collective
in this case, our goal is to tackle this second-order problem with
first-order tools. We propose a logical framework for set
preferences. Candidate sets are represented using profiles consisting
of scalar features. This reduces set preferences to tuple preferences
over set profiles.  We also propose algorithms to compute the "best"
sets effectively.

New handout: Integrity and Security

http://www.cse.buffalo.edu/~chomicki/462/handout-constraints.pdf

Monday, April 12, 2010

Saxon on nickelback/timberlake

The Saxon library is now available from the CSE systems:

/util/saxonhe/saxon9he.jar

In the examples given below, just change the -cp parameter in the call to Java to reflect the location of the library.

Project #2: changes

The relational representation of XML documents has been modified by introducing separate
relations for text and attribute values.
Please see the updated project description:

http://www.cse.buffalo.edu/~chomicki/462/proj2.pdf

Saturday, April 10, 2010

Project #2 Extra Credit: DOM API

For the extra credit problem in project #2, you MUST use the Apache Xerces2 library. The pages of interest are:

Home Page: http://xerces.apache.org/xerces2-j/
Java Docs: http://xerces.apache.org/xerces2-j/javadocs/api/
Download: http://archive.apache.org/dist/xml/xerces-j/

Make sure you download the 2.9.0 binary version of Xerces2 for Java: Xerces-J-bin.2.9.0.zip (or tar.gz). The library comes with a samples folder containing a number of examples for using DOM. I strongly suggest that you look at the code under both samples/dom and samples/ui. The distribution also includes a copy of the java docs.

I'm also linking to an example showing you how to traverse and print a DOM tree:

http://idevelopment.info/data/Programming/java/xml/XercesXmlDomExample.java

If you want to use the original DatabaseInventory.xml file, you can download it from here:

http://www.idevelopment.info/data/Programming/java/xml/ExampleXMLandDTDFile.html

Project #2 XML File

The phd.xml file for project #2 is available for download:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/phd.xml

Friday, April 9, 2010

Recitation #11 Slides

The slides for the eleventh (rB) recitation are available through the link below.

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/rB.pdf

The files used in the recitation are also available for download:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/xquery-files.zip

The Saxon library used in the recitation is available for download from:

http://saxon.sourceforge.net/#F9.2HE

Make sure you chose the Download for Java link. All you need from the download is the saxon9he.jar file. To run an XQuery file named Q1.xq using Saxon, use the following command:

> java -cp ./saxon9he.jar net.sf.saxon.Query Q1.xq

This assumes that the saxon9he.jar file is in the current directory. If the library is in a folder called /some/folder, then you would use:

> java -cp /some/folder/saxon9he.jar net.sf.saxon.Query Q1.xq

This works under Unix/Linux. For other systems, it's a simple matter of adapting the cp parameter.

Thursday, April 8, 2010

Recitation #9 and #10 Slides

The slides for the ninth (r9) and tenth (rA) recitations are finally available through the links below. Sorry for the delay.

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/r9.pdf
http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/rA.pdf

Monday, April 5, 2010

Project #2 (due April 26, 2010)

http://www.cse.buffalo.edu/~chomicki/462/proj2.pdf

No deadline extensions.

Thursday, March 25, 2010

New material: database design problems

http://www.cse.buffalo.edu/~chomicki/462/solutions-design.pdf

Monday, March 22, 2010

New material: XML, E-R

XML databases:
http://www.cse.buffalo.edu/~chomicki/462/handout-xml.pdf

E-R example schema (courtesy of Demian Lessa):
http://www.cse.buffalo.edu/~chomicki/462/demian-er.pdf

Recitation #8 Slides

The slides for the eighth recitation (r8) are available through the link below:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/r8.pdf

Wednesday, March 17, 2010

Recitation #7 Slides

The slides for the seventh recitation (r7) are available through the link below:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/r7.pdf

Wednesday, March 3, 2010

Hint for Question 1.4 of Project 1

I am providing the results for a query similar to query 1.4. This may help you check whether you got the query right:

List the authors having at least 20 papers, each of
which is directly cited by at least 20 other papers:

Michael Stonebraker
Jeffrey D. Ullman
David J. DeWitt
Philip A. Bernstein
Won Kim
Rakesh Agrawal
Catriel Beeri
Michael J. Carey
Yehoshua Sagiv

Monday, March 1, 2010

Deadline extended

The deadline for Project #1 has been extended to Monday, March 8.

Sunday, February 28, 2010

Recitation #6 Slides

The slides for the sixth recitation (r6) are available through the link below:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/r6.pdf

Thursday, February 25, 2010

UPDATE: DBLP Practice Queries

The DBLP practice queries have been updated with minor syntax fixes to support the Oracle SQL dialect. In particular, because of the two non-standard features below:

1. Oracle does not support the keyword AS in FROM for aliasing, but you can specify aliases without AS.


2. The set difference relational operator in Oracle is MINUS rather than EXCEPT.

For other non-standard features in Oracle see:

http://infolab.stanford.edu/~ullman/fcdb/oracle/or-nonstandard.html

New slides: Conceptual Database Design

Wednesday, February 24, 2010

Clarification: Paper and Publication

Guys,

For the purposes of Project #1, paper and publication are synonyms and correspond to an entity in the DBLP_ENTITY table. For question 1.1, you should restrict your search to entities of type inproceedings, as explained in the last recitation (r5).

Demian

Monday, February 22, 2010

Top-K

The Java library file for the first project (p5) is available through the link below:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/topK.jar


About the library:

  • TopKImpl is the skeleton class that you must implement. It is a stub that implements the TopK interface. You may create any number of helper methods in this class.
  • ScoringFunction is the interface that defines the signature of a scoring function. A sample implementation of scoring function, ScoringFunctionImpl, is provided for you to test your implementation.
  • Tuple is a simple class that extracts and encapsulates a single tuple from a ResultSet object.
  • OraConnectionProxy is a helper class for you to connect to Oracle. Use if you like it. First, you must modify the user name and password in order to connect to Oracle.
  • Launcher has a main method that you can use to test your implementation. Two queries are provided with code calling your Top-K implementation. Test data is available to help you check if your algorithm is running correctly (the data has been commented out).

If you have any questions, feel free to send me an e-mail.

Recitation #5 Slides

The slides for the fifth recitation (r5) are available through the link below:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/r5.pdf


Observation about query 2.1 in the project.

I want to make this absolutely clear. Your query may return any number of rows satisfying the given criteria (it doesn't matter how many rows the query returns). However, the third row in the result must be the one containing the "author who published the THIRD smallest number of papers amongst the authors with more than 400 publications." This can only be achieved using the ORDER BY clause.

Thursday, February 18, 2010

Monday, February 15, 2010

Recitation #4 Slides

The slides for the fourth recitation (r4) are available through the link below:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/r4.pdf

The practice queries for DBLP are also available here:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/dblp-practice.sql

Before running the queries against the database don't forget to prepend all table names with my schema name, DLESSA, as explained in the slides. If you have problems connecting to Oracle or accessing the tables under my schema, please drop me an e-mail ASAP.

Thursday, February 11, 2010

Monday, February 8, 2010

DISTINCT in SQL

Some clarification is needed after the discussion of DISTINCT in the class today:

1. SELECT DISTINCT A1,...,An FROM...WHERE...GROUP BY...HAVING...
results in eliminating duplicates from the result of the query
SELECT A1,...,An FROM...WHERE...GROUP BY...HAVING...
I.e., duplicate elimination is the last step in SELECT query evaluation.

2. SELECT COUNT(DISTINCT A)...eliminates duplicates from the single column A before applying COUNT (the same for other aggregate functions except COUNT(*)).

3. DISTINCT cannot be used with COUNT(*). So if one wants to count the number of distinct tuples in a relation, one needs to define a view using SELECT DISTINCT and then compute COUNT(*) over the view.

Additional office hours before the test

I will be in my office tomorrow from 11am to 3pm, and Wednesday from 1:30 to 2:30pm.

--Jan Chomicki

Friday, February 5, 2010

Recitation #3 Slides

The slides for the third recitation (r3) are available through the link below:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/r3.pdf

Please, go over the queries in the slides and make sure you understand them. Make sure you also understand the solutions to the Advanced SQL Queries posted by Dr. Chomicki.

If you have questions, you may stop by any time during my office hours. If you cannot make to my office hours (or the professor's), shoot me an e-mail and I'll try to meet you some other time.

Monday, February 1, 2010

Java References

For those of you who asked me for some Java references, here they go:

Thinking in Java, 3rd edition, by Bruce Eckel. This is an excellent free online book. I strongly recommend:

http://www.mindview.net/Books/TIJ/

For a number of good video tutorials on both Java and Eclipse, visit:

http://eclipsetutorial.sourceforge.net/

Hope this helps!

Demian

Advanced SQL queries

Saturday, January 30, 2010

Recitation #2 Slides

The slides for the second recitation (r2) are available through the link below:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/r2.pdf

The two sample source files can be downloaded from here:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/DBTest.java
http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/JDBCTest02.java

From nickelback (timberlake), you can compile and run these files as follows:

1. Configure the environment:

> source /util/oracle/coraenv.sh

2. Compile the files:

> javac DBTest.java
> javac JDBCTest02.java

3. Run the files:

> java DBTest
> java JDBCTest02

Regards,

Demian Lessa

Monday, January 25, 2010

Recitation #1 Slides

The slides for the first recitation (r1) are available through the link below:

http://www-student.cse.buffalo.edu/~dlessa/cse462-SP10/r1.pdf

A few slides were added with additional information about the various constraint types. If you have any questions, be sure to e-mail me or stop by my office hours.

Regards,

Demian Lessa

Monday, January 18, 2010

Connecting to Oracle

Instructions: Connecting to the Course's Oracle Database (courtesy of Demian Lessa)

In this course, you will use the SQL*Plus command line client to
interact with an Oracle 11g database. In order to connect to the
database, follow the instructions below.

1. Connect to nickelback (if you are on campus) or timberlake (if you
are off campus). For instance:

> ssh dlessa@timberlake.cse.buffalo.edu
password:

timberlake:~


2. Define the environment variables ORACLE_HOME and TWO_TASK:

timberlake:~ setenv ORACLE_HOME /util/oracle/product/current/
client_1
timberlake:~ setenv TWO_TASK aos.buffalo.edu


3. Start SQL*Plus. You will be prompted for credentials. Use your UBIT
user name (the same you used to log on in step 1) for user-name and for
your password, choose either cse462r1 or cse462r2, according to the
recitation which you registered for.

timberlake:~ sqlplus

SQL*Plus: Release 11.1.0.6.0 - Production on Mon Jan 18 12:58:18 2010
Copyright (c) 1982, 2007, Oracle. All rights reserved.

Enter user-name: dlessa
Enter password:

Connected to: Oracle Database 11g Enterprise Edition Release
11.1.0.7.0 - 64bit Production with the Partitioning, OLAP,
Data Mining and Real Application Testing options

SQL>


4. Because all passwords are standard, you should change your password
as soon as you login. Simply type 'password' on the command prompt and
provide the old and new passwords as requested:

SQL> password
Changing password for DLESSA
Old password:
New password:
Retype new password:
Password changed

SQL>


5. In order to close your SQL*Plus session, type 'quit' on the command
prompt:

SQL> quit

Disconnected from Oracle Database 11g Enterprise Edition Release
11.1.0.7.0 - 64bit Production with the Partitioning, OLAP, Data
Mining and Real Application Testing options

timberlake:~


For further information, refer to the CSE wiki Oracle page:
If you still cannot get it to work, stop by my office hours.

Regards,

Demian Lessa

Sunday, January 10, 2010

Welcome to CSE 462!

This is the official course blog of CSE 462 Database Concepts. It will be used for course announcements and discussions.