diff options
Diffstat (limited to 'Documentation')
44 files changed, 2475 insertions, 371 deletions
diff --git a/Documentation/ABI/testing/ima_policy b/Documentation/ABI/testing/ima_policy new file mode 100644 index 000000000000..6434f0df012e --- /dev/null +++ b/Documentation/ABI/testing/ima_policy @@ -0,0 +1,61 @@ +What: security/ima/policy +Date: May 2008 +Contact: Mimi Zohar <zohar@us.ibm.com> +Description: + The Trusted Computing Group(TCG) runtime Integrity + Measurement Architecture(IMA) maintains a list of hash + values of executables and other sensitive system files + loaded into the run-time of this system. At runtime, + the policy can be constrained based on LSM specific data. + Policies are loaded into the securityfs file ima/policy + by opening the file, writing the rules one at a time and + then closing the file. The new policy takes effect after + the file ima/policy is closed. + + rule format: action [condition ...] + + action: measure | dont_measure + condition:= base | lsm + base: [[func=] [mask=] [fsmagic=] [uid=]] + lsm: [[subj_user=] [subj_role=] [subj_type=] + [obj_user=] [obj_role=] [obj_type=]] + + base: func:= [BPRM_CHECK][FILE_MMAP][INODE_PERMISSION] + mask:= [MAY_READ] [MAY_WRITE] [MAY_APPEND] [MAY_EXEC] + fsmagic:= hex value + uid:= decimal value + lsm: are LSM specific + + default policy: + # PROC_SUPER_MAGIC + dont_measure fsmagic=0x9fa0 + # SYSFS_MAGIC + dont_measure fsmagic=0x62656572 + # DEBUGFS_MAGIC + dont_measure fsmagic=0x64626720 + # TMPFS_MAGIC + dont_measure fsmagic=0x01021994 + # SECURITYFS_MAGIC + dont_measure fsmagic=0x73636673 + + measure func=BPRM_CHECK + measure func=FILE_MMAP mask=MAY_EXEC + measure func=INODE_PERM mask=MAY_READ uid=0 + + The default policy measures all executables in bprm_check, + all files mmapped executable in file_mmap, and all files + open for read by root in inode_permission. + + Examples of LSM specific definitions: + + SELinux: + # SELINUX_MAGIC + dont_measure fsmagic=0xF97CFF8C + + dont_measure obj_type=var_log_t + dont_measure obj_type=auditd_log_t + measure subj_user=system_u func=INODE_PERM mask=MAY_READ + measure subj_role=system_r func=INODE_PERM mask=MAY_READ + + Smack: + measure subj_user=_ func=INODE_PERM mask=MAY_READ diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 1462ed86d40a..a3a83d38f96f 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile @@ -12,7 +12,8 @@ DOCBOOKS := z8530book.xml mcabook.xml device-drivers.xml \ kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \ gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \ genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \ - mac80211.xml debugobjects.xml sh.xml regulator.xml + mac80211.xml debugobjects.xml sh.xml regulator.xml \ + alsa-driver-api.xml writing-an-alsa-driver.xml ### # The build process is as follows (targets): diff --git a/Documentation/sound/alsa/DocBook/alsa-driver-api.tmpl b/Documentation/DocBook/alsa-driver-api.tmpl index 9d644f7e241e..0230a96f0564 100644 --- a/Documentation/sound/alsa/DocBook/alsa-driver-api.tmpl +++ b/Documentation/DocBook/alsa-driver-api.tmpl @@ -1,11 +1,11 @@ -<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN"> - -<book> -<?dbhtml filename="index.html"> +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" + "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> <!-- ****************************************************** --> <!-- Header --> <!-- ****************************************************** --> +<book id="ALSA-Driver-API"> <bookinfo> <title>The ALSA Driver API</title> @@ -35,6 +35,8 @@ </bookinfo> +<toc></toc> + <chapter><title>Management of Cards and Devices</title> <sect1><title>Card Management</title> !Esound/core/init.c @@ -71,6 +73,10 @@ !Esound/pci/ac97/ac97_codec.c !Esound/pci/ac97/ac97_pcm.c </sect1> + <sect1><title>Virtual Master Control API</title> +!Esound/core/vmaster.c +!Iinclude/sound/control.h + </sect1> </chapter> <chapter><title>MIDI API</title> <sect1><title>Raw MIDI API</title> @@ -89,6 +95,9 @@ <sect1><title>Hardware-Dependent Devices API</title> !Esound/core/hwdep.c </sect1> + <sect1><title>Jack Abstraction Layer API</title> +!Esound/core/jack.c + </sect1> <sect1><title>ISA DMA Helpers</title> !Esound/core/isadma.c </sect1> diff --git a/Documentation/DocBook/genericirq.tmpl b/Documentation/DocBook/genericirq.tmpl index 3a882d9a90a9..c671a0168096 100644 --- a/Documentation/DocBook/genericirq.tmpl +++ b/Documentation/DocBook/genericirq.tmpl @@ -440,6 +440,7 @@ desc->chip->end(); used in the generic IRQ layer. </para> !Iinclude/linux/irq.h +!Iinclude/linux/interrupt.h </chapter> <chapter id="pubfunctions"> diff --git a/Documentation/DocBook/mac80211.tmpl b/Documentation/DocBook/mac80211.tmpl index 77c3c202991b..fbeaffc1dcc3 100644 --- a/Documentation/DocBook/mac80211.tmpl +++ b/Documentation/DocBook/mac80211.tmpl @@ -17,8 +17,7 @@ </authorgroup> <copyright> - <year>2007</year> - <year>2008</year> + <year>2007-2009</year> <holder>Johannes Berg</holder> </copyright> @@ -165,8 +164,8 @@ usage should require reading the full document. !Pinclude/net/mac80211.h Frame format </sect1> <sect1> - <title>Alignment issues</title> - <para>TBD</para> + <title>Packet alignment</title> +!Pnet/mac80211/rx.c Packet alignment </sect1> <sect1> <title>Calling into mac80211 from interrupts</title> @@ -223,6 +222,17 @@ usage should require reading the full document. !Finclude/net/mac80211.h ieee80211_key_flags </chapter> + <chapter id="powersave"> + <title>Powersave support</title> +!Pinclude/net/mac80211.h Powersave support + </chapter> + + <chapter id="beacon-filter"> + <title>Beacon filter support</title> +!Pinclude/net/mac80211.h Beacon filter support +!Finclude/net/mac80211.h ieee80211_beacon_loss + </chapter> + <chapter id="qos"> <title>Multiple queues and QoS support</title> <para>TBD</para> diff --git a/Documentation/DocBook/uio-howto.tmpl b/Documentation/DocBook/uio-howto.tmpl index 52e1b79ce0e6..8f6e3b2403c7 100644 --- a/Documentation/DocBook/uio-howto.tmpl +++ b/Documentation/DocBook/uio-howto.tmpl @@ -42,6 +42,13 @@ GPL version 2. <revhistory> <revision> + <revnumber>0.8</revnumber> + <date>2008-12-24</date> + <authorinitials>hjk</authorinitials> + <revremark>Added name attributes in mem and portio sysfs directories. + </revremark> + </revision> + <revision> <revnumber>0.7</revnumber> <date>2008-12-23</date> <authorinitials>hjk</authorinitials> @@ -303,12 +310,19 @@ interested in translating it, please email me appear if the size of the mapping is not 0. </para> <para> - Each <filename>mapX/</filename> directory contains two read-only files - that show start address and size of the memory: + Each <filename>mapX/</filename> directory contains four read-only files + that show attributes of the memory: </para> <itemizedlist> <listitem> <para> + <filename>name</filename>: A string identifier for this mapping. This + is optional, the string can be empty. Drivers can set this to make it + easier for userspace to find the correct mapping. + </para> +</listitem> +<listitem> + <para> <filename>addr</filename>: The address of memory that can be mapped. </para> </listitem> @@ -366,12 +380,19 @@ offset = N * getpagesize(); <filename>/sys/class/uio/uioX/portio/</filename>. </para> <para> - Each <filename>portX/</filename> directory contains three read-only - files that show start, size, and type of the port region: + Each <filename>portX/</filename> directory contains four read-only + files that show name, start, size, and type of the port region: </para> <itemizedlist> <listitem> <para> + <filename>name</filename>: A string identifier for this port region. + The string is optional and can be empty. Drivers can set it to make it + easier for userspace to find a certain port region. + </para> +</listitem> +<listitem> + <para> <filename>start</filename>: The first port of this region. </para> </listitem> diff --git a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl b/Documentation/DocBook/writing-an-alsa-driver.tmpl index 87a7c07ab658..46b08fef3744 100644 --- a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl +++ b/Documentation/DocBook/writing-an-alsa-driver.tmpl @@ -1,11 +1,11 @@ -<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN"> - -<book> -<?dbhtml filename="index.html"> +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" + "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> <!-- ****************************************************** --> <!-- Header --> <!-- ****************************************************** --> +<book id="Writing-an-ALSA-Driver"> <bookinfo> <title>Writing an ALSA Driver</title> <author> @@ -492,9 +492,9 @@ } /* (2) */ - card = snd_card_new(index[dev], id[dev], THIS_MODULE, 0); - if (card == NULL) - return -ENOMEM; + err = snd_card_create(index[dev], id[dev], THIS_MODULE, 0, &card); + if (err < 0) + return err; /* (3) */ err = snd_mychip_create(card, pci, &chip); @@ -590,8 +590,9 @@ <programlisting> <![CDATA[ struct snd_card *card; + int err; .... - card = snd_card_new(index[dev], id[dev], THIS_MODULE, 0); + err = snd_card_create(index[dev], id[dev], THIS_MODULE, 0, &card); ]]> </programlisting> </informalexample> @@ -809,26 +810,28 @@ <para> As mentioned above, to create a card instance, call - <function>snd_card_new()</function>. + <function>snd_card_create()</function>. <informalexample> <programlisting> <![CDATA[ struct snd_card *card; - card = snd_card_new(index, id, module, extra_size); + int err; + err = snd_card_create(index, id, module, extra_size, &card); ]]> </programlisting> </informalexample> </para> <para> - The function takes four arguments, the card-index number, the + The function takes five arguments, the card-index number, the id string, the module pointer (usually <constant>THIS_MODULE</constant>), - and the size of extra-data space. The last argument is used to + the size of extra-data space, and the pointer to return the + card instance. The extra_size argument is used to allocate card->private_data for the chip-specific data. Note that these data - are allocated by <function>snd_card_new()</function>. + are allocated by <function>snd_card_create()</function>. </para> </section> @@ -915,15 +918,16 @@ </para> <section id="card-management-chip-specific-snd-card-new"> - <title>1. Allocating via <function>snd_card_new()</function>.</title> + <title>1. Allocating via <function>snd_card_create()</function>.</title> <para> As mentioned above, you can pass the extra-data-length - to the 4th argument of <function>snd_card_new()</function>, i.e. + to the 4th argument of <function>snd_card_create()</function>, i.e. <informalexample> <programlisting> <![CDATA[ - card = snd_card_new(index[dev], id[dev], THIS_MODULE, sizeof(struct mychip)); + err = snd_card_create(index[dev], id[dev], THIS_MODULE, + sizeof(struct mychip), &card); ]]> </programlisting> </informalexample> @@ -952,8 +956,8 @@ <para> After allocating a card instance via - <function>snd_card_new()</function> (with - <constant>NULL</constant> on the 4th arg), call + <function>snd_card_create()</function> (with + <constant>0</constant> on the 4th arg), call <function>kzalloc()</function>. <informalexample> @@ -961,7 +965,7 @@ <![CDATA[ struct snd_card *card; struct mychip *chip; - card = snd_card_new(index[dev], id[dev], THIS_MODULE, NULL); + err = snd_card_create(index[dev], id[dev], THIS_MODULE, 0, &card); ..... chip = kzalloc(sizeof(*chip), GFP_KERNEL); ]]> @@ -5750,8 +5754,9 @@ struct _snd_pcm_runtime { .... struct snd_card *card; struct mychip *chip; + int err; .... - card = snd_card_new(index[dev], id[dev], THIS_MODULE, NULL); + err = snd_card_create(index[dev], id[dev], THIS_MODULE, 0, &card); .... chip = kzalloc(sizeof(*chip), GFP_KERNEL); .... @@ -5763,7 +5768,7 @@ struct _snd_pcm_runtime { </informalexample> When you created the chip data with - <function>snd_card_new()</function>, it's anyway accessible + <function>snd_card_create()</function>, it's anyway accessible via <structfield>private_data</structfield> field. <informalexample> @@ -5775,9 +5780,10 @@ struct _snd_pcm_runtime { .... struct snd_card *card; struct mychip *chip; + int err; .... - card = snd_card_new(index[dev], id[dev], THIS_MODULE, - sizeof(struct mychip)); + err = snd_card_create(index[dev], id[dev], THIS_MODULE, + sizeof(struct mychip), &card); .... chip = card->private_data; .... diff --git a/Documentation/Smack.txt b/Documentation/Smack.txt index 989c2fcd8111..629c92e99783 100644 --- a/Documentation/Smack.txt +++ b/Documentation/Smack.txt @@ -184,14 +184,16 @@ length. Single character labels using special characters, that being anything other than a letter or digit, are reserved for use by the Smack development team. Smack labels are unstructured, case sensitive, and the only operation ever performed on them is comparison for equality. Smack labels cannot -contain unprintable characters or the "/" (slash) character. +contain unprintable characters or the "/" (slash) character. Smack labels +cannot begin with a '-', which is reserved for special options. There are some predefined labels: - _ Pronounced "floor", a single underscore character. - ^ Pronounced "hat", a single circumflex character. - * Pronounced "star", a single asterisk character. - ? Pronounced "huh", a single question mark character. + _ Pronounced "floor", a single underscore character. + ^ Pronounced "hat", a single circumflex character. + * Pronounced "star", a single asterisk character. + ? Pronounced "huh", a single question mark character. + @ Pronounced "Internet", a single at sign character. Every task on a Smack system is assigned a label. System tasks, such as init(8) and systems daemons, are run with the floor ("_") label. User tasks @@ -412,6 +414,36 @@ sockets. A privileged program may set this to match the label of another task with which it hopes to communicate. +Smack Netlabel Exceptions + +You will often find that your labeled application has to talk to the outside, +unlabeled world. To do this there's a special file /smack/netlabel where you can +add some exceptions in the form of : +@IP1 LABEL1 or +@IP2/MASK LABEL2 + +It means that your application will have unlabeled access to @IP1 if it has +write access on LABEL1, and access to the subnet @IP2/MASK if it has write +access on LABEL2. + +Entries in the /smack/netlabel file are matched by longest mask first, like in +classless IPv4 routing. + +A special label '@' and an option '-CIPSO' can be used there : +@ means Internet, any application with any label has access to it +-CIPSO means standard CIPSO networking + +If you don't know what CIPSO is and don't plan to use it, you can just do : +echo 127.0.0.1 -CIPSO > /smack/netlabel +echo 0.0.0.0/0 @ > /smack/netlabel + +If you use CIPSO on your 192.168.0.0/16 local network and need also unlabeled +Internet access, you can have : +echo 127.0.0.1 -CIPSO > /smack/netlabel +echo 192.168.0.0/16 -CIPSO > /smack/netlabel +echo 0.0.0.0/0 @ > /smack/netlabel + + Writing Applications for Smack There are three sorts of applications that will run on a Smack system. How an diff --git a/Documentation/arm/Samsung-S3C24XX/Suspend.txt b/Documentation/arm/Samsung-S3C24XX/Suspend.txt index 0dab6e32c130..a30fe510572b 100644 --- a/Documentation/arm/Samsung-S3C24XX/Suspend.txt +++ b/Documentation/arm/Samsung-S3C24XX/Suspend.txt @@ -40,13 +40,13 @@ Resuming Machine Support --------------- - The machine specific functions must call the s3c2410_pm_init() function + The machine specific functions must call the s3c_pm_init() function to say that its bootloader is capable of resuming. This can be as simple as adding the following to the machine's definition: - INITMACHINE(s3c2410_pm_init) + INITMACHINE(s3c_pm_init) - A board can do its own setup before calling s3c2410_pm_init, if it + A board can do its own setup before calling s3c_pm_init, if it needs to setup anything else for power management support. There is currently no support for over-riding the default method of @@ -74,7 +74,7 @@ statuc void __init machine_init(void) enable_irq_wake(IRQ_EINT0); - s3c2410_pm_init(); + s3c_pm_init(); } diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt index dc6045577a8b..43cb1004d35f 100644 --- a/Documentation/arm/memory.txt +++ b/Documentation/arm/memory.txt @@ -29,7 +29,14 @@ ffff0000 ffff0fff CPU vector page. CPU supports vector relocation (control register V bit.) -ffc00000 fffeffff DMA memory mapping region. Memory returned +fffe0000 fffeffff XScale cache flush area. This is used + in proc-xscale.S to flush the whole data + cache. Free for other usage on non-XScale. + +fff00000 fffdffff Fixmap mapping region. Addresses provided + by fix_to_virt() will be located here. + +ffc00000 ffefffff DMA memory mapping region. Memory returned by the dma_alloc_xxx functions will be dynamically mapped here. diff --git a/Documentation/block/switching-sched.txt b/Documentation/block/switching-sched.txt index 634c952e1964..d5af3f630814 100644 --- a/Documentation/block/switching-sched.txt +++ b/Documentation/block/switching-sched.txt @@ -35,9 +35,3 @@ noop anticipatory deadline [cfq] # echo anticipatory > /sys/block/hda/queue/scheduler # cat /sys/block/hda/queue/scheduler noop [anticipatory] deadline cfq - -Each io queue has a set of io scheduler tunables associated with it. These -tunables control how the io scheduler works. You can find these entries -in: - -/sys/block/<device>/queue/iosched diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt index 5b0cfa67aff9..ce73f3eb5ddb 100644 --- a/Documentation/cpu-freq/governors.txt +++ b/Documentation/cpu-freq/governors.txt @@ -117,10 +117,28 @@ accessible parameters: sampling_rate: measured in uS (10^-6 seconds), this is how often you want the kernel to look at the CPU usage and to make decisions on what to do about the frequency. Typically this is set to values of -around '10000' or more. - -show_sampling_rate_(min|max): the minimum and maximum sampling rates -available that you may set 'sampling_rate' to. +around '10000' or more. It's default value is (cmp. with users-guide.txt): +transition_latency * 1000 +The lowest value you can set is: +transition_latency * 100 or it may get restricted to a value where it +makes not sense for the kernel anymore to poll that often which depends +on your HZ config variable (HZ=1000: max=20000us, HZ=250: max=5000). +Be aware that transition latency is in ns and sampling_rate is in us, so you +get the same sysfs value by default. +Sampling rate should always get adjusted considering the transition latency +To set the sampling rate 750 times as high as the transition latency +in the bash (as said, 1000 is default), do: +echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \ + >ondemand/sampling_rate + +show_sampling_rate_(min|max): THIS INTERFACE IS DEPRECATED, DON'T USE IT. +You can use wider ranges now and the general +cpuinfo_transition_latency variable (cmp. with user-guide.txt) can be +used to obtain exactly the same info: +show_sampling_rate_min = transtition_latency * 500 / 1000 +show_sampling_rate_max = transtition_latency * 500000 / 1000 +(divided by 1000 is to illustrate that sampling rate is in us and +transition latency is exported ns). up_threshold: defines what the average CPU usage between the samplings of 'sampling_rate' needs to be for the kernel to make a decision on diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt index 917918f84fc7..75f41193f3e1 100644 --- a/Documentation/cpu-freq/user-guide.txt +++ b/Documentation/cpu-freq/user-guide.txt @@ -152,6 +152,18 @@ cpuinfo_min_freq : this file shows the minimum operating frequency the processor can run at(in kHz) cpuinfo_max_freq : this file shows the maximum operating frequency the processor can run at(in kHz) +cpuinfo_transition_latency The time it takes on this CPU to + switch between two frequencies in nano + seconds. If unknown or known to be + that high that the driver does not + work with the ondemand governor, -1 + (CPUFREQ_ETERNAL) will be returned. + Using this information can be useful + to choose an appropriate polling + frequency for a kernel governor or + userspace daemon. Make sure to not + switch the frequency too often + resulting in performance loss. scaling_driver : this file shows what cpufreq driver is used to set the frequency on this CPU diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt index 45932ec21cee..b41f3e58aefa 100644 --- a/Documentation/cputopology.txt +++ b/Documentation/cputopology.txt @@ -18,11 +18,11 @@ For an architecture to support this feature, it must define some of these macros in include/asm-XXX/topology.h: #define topology_physical_package_id(cpu) #define topology_core_id(cpu) -#define topology_thread_siblings(cpu) -#define topology_core_siblings(cpu) +#define topology_thread_cpumask(cpu) +#define topology_core_cpumask(cpu) The type of **_id is int. -The type of siblings is cpumask_t. +The type of siblings is (const) struct cpumask *. To be consistent on all architectures, include/linux/topology.h provides default definitions for any of the above macros that are diff --git a/Documentation/devices.txt b/Documentation/devices.txt index 2be08240ee80..62254d4510c6 100644 --- a/Documentation/devices.txt +++ b/Documentation/devices.txt @@ -3145,6 +3145,12 @@ Your cooperation is appreciated. 1 = /dev/blockrom1 Second ROM card's translation layer interface ... +260 char OSD (Object-based-device) SCSI Device + 0 = /dev/osd0 First OSD Device + 1 = /dev/osd1 Second OSD Device + ... + 255 = /dev/osd255 256th OSD Device + **** ADDITIONAL /dev DIRECTORY ENTRIES This section details additional entries that should or may exist in diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 1e89a51ea49b..88519daab6e9 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff @@ -62,7 +62,6 @@ aic7*reg_print.c* aic7*seq.h* aicasm aicdb.h* -asm asm-offsets.h asm_offsets.h autoconf.h* diff --git a/Documentation/dynamic-debug-howto.txt b/Documentation/dynamic-debug-howto.txt new file mode 100644 index 000000000000..674c5663d346 --- /dev/null +++ b/Documentation/dynamic-debug-howto.txt @@ -0,0 +1,240 @@ + +Introduction +============ + +This document describes how to use the dynamic debug (ddebug) feature. + +Dynamic debug is designed to allow you to dynamically enable/disable kernel +code to obtain additional kernel information. Currently, if +CONFIG_DYNAMIC_DEBUG is set, then all pr_debug()/dev_debug() calls can be +dynamically enabled per-callsite. + +Dynamic debug has even more useful features: + + * Simple query language allows turning on and off debugging statements by + matching any combination of: + + - source filename + - function name + - line number (including ranges of line numbers) + - module name + - format string + + * Provides a debugfs control file: <debugfs>/dynamic_debug/control which can be + read to display the complete list of known debug statements, to help guide you + +Controlling dynamic debug Behaviour +=============================== + +The behaviour of pr_debug()/dev_debug()s are controlled via writing to a +control file in the 'debugfs' filesystem. Thus, you must first mount the debugfs +filesystem, in order to make use of this feature. Subsequently, we refer to the +control file as: <debugfs>/dynamic_debug/control. For example, if you want to +enable printing from source file 'svcsock.c', line 1603 you simply do: + +nullarbor:~ # echo 'file svcsock.c line 1603 +p' > + <debugfs>/dynamic_debug/control + +If you make a mistake with the syntax, the write will fail thus: + +nullarbor:~ # echo 'file svcsock.c wtf 1 +p' > + <debugfs>/dynamic_debug/control +-bash: echo: write error: Invalid argument + +Viewing Dynamic Debug Behaviour +=========================== + +You can view the currently configured behaviour of all the debug statements +via: + +nullarbor:~ # cat <debugfs>/dynamic_debug/control +# filename:lineno [module]function flags format +/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup - "SVCRDMA Module Removed, deregister RPC RDMA transport\012" +/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init - "\011max_inline : %d\012" +/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init - "\011sq_depth : %d\012" +/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init - "\011max_requests : %d\012" +... + + +You can also apply standard Unix text manipulation filters to this +data, e.g. + +nullarbor:~ # grep -i rdma <debugfs>/dynamic_debug/control | wc -l +62 + +nullarbor:~ # grep -i tcp <debugfs>/dynamic_debug/control | wc -l +42 + +Note in particular that the third column shows the enabled behaviour +flags for each debug statement callsite (see below for definitions of the +flags). The default value, no extra behaviour enabled, is "-". So +you can view all the debug statement callsites with any non-default flags: + +nullarbor:~ # awk '$3 != "-"' <debugfs>/dynamic_debug/control +# filename:lineno [module]function flags format +/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012" + + +Command Language Reference +========================== + +At the lexical level, a command comprises a sequence of words separated +by whitespace characters. Note that newlines are treated as word +separators and do *not* end a command or allow multiple commands to +be done together. So these are all equivalent: + +nullarbor:~ # echo -c 'file svcsock.c line 1603 +p' > + <debugfs>/dynamic_debug/control +nullarbor:~ # echo -c ' file svcsock.c line 1603 +p ' > + <debugfs>/dynamic_debug/control +nullarbor:~ # echo -c 'file svcsock.c\nline 1603 +p' > + <debugfs>/dynamic_debug/control +nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > + <debugfs>/dynamic_debug/control + +Commands are bounded by a write() system call. If you want to do +multiple commands you need to do a separate "echo" for each, like: + +nullarbor:~ # echo 'file svcsock.c line 1603 +p' > /proc/dprintk ;\ +> echo 'file svcsock.c line 1563 +p' > /proc/dprintk + +or even like: + +nullarbor:~ # ( +> echo 'file svcsock.c line 1603 +p' ;\ +> echo 'file svcsock.c line 1563 +p' ;\ +> ) > /proc/dprintk + +At the syntactical level, a command comprises a sequence of match +specifications, followed by a flags change specification. + +command ::= match-spec* flags-spec + +The match-spec's are used to choose a subset of the known dprintk() +callsites to which to apply the flags-spec. Think of them as a query +with implicit ANDs between each pair. Note that an empty list of +match-specs is possible, but is not very useful because it will not +match any debug statement callsites. + +A match specification comprises a keyword, which controls the attribute +of the callsite to be compared, and a value to compare against. Possible +keywords are: + +match-spec ::= 'func' string | + 'file' string | + 'module' string | + 'format' string | + 'line' line-range + +line-range ::= lineno | + '-'lineno | + lineno'-' | + lineno'-'lineno +// Note: line-range cannot contain space, e.g. +// "1-30" is valid range but "1 - 30" is not. + +lineno ::= unsigned-int + +The meanings of each keyword are: + +func + The given string is compared against the function name + of each callsite. Example: + + func svc_tcp_accept + +file + The given string is compared against either the full + pathname or the basename of the source file of each + callsite. Examples: + + file svcsock.c + file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c + +module + The given string is compared against the module name + of each callsite. The module name is the string as + seen in "lsmod", i.e. without the directory or the .ko + suffix and with '-' changed to '_'. Examples: + + module sunrpc + module nfsd + +format + The given string is searched for in the dynamic debug format + string. Note that the string does not need to match the + entire format, only some part. Whitespace and other + special characters can be escaped using C octal character + escape \ooo notation, e.g. the space character is \040. + Alternatively, the string can be enclosed in double quote + characters (") or single quote characters ('). + Examples: + + format svcrdma: // many of the NFS/RDMA server dprintks + format readahead // some dprintks in the readahead cache + format nfsd:\040SETATTR // one way to match a format with whitespace + format "nfsd: SETATTR" // a neater way to match a format with whitespace + format 'nfsd: SETATTR' // yet another way to match a format with whitespace + +line + The given line number or range of line numbers is compared + against the line number of each dprintk() callsite. A single + line number matches the callsite line number exactly. A + range of line numbers matches any callsite between the first + and last line number inclusive. An empty first number means + the first line in the file, an empty line number means the + last number in the file. Examples: + + line 1603 // exactly line 1603 + line 1600-1605 // the six lines from line 1600 to line 1605 + line -1605 // the 1605 lines from line 1 to line 1605 + line 1600- // all lines from line 1600 to the end of the file + +The flags specification comprises a change operation followed +by one or more flag characters. The change operation is one +of the characters: + +- + remove the given flags + ++ + add the given flags + += + set the flags to the given flags + +The flags are: + +p + Causes a printk() message to be emitted to dmesg + +Note the regexp ^[-+=][scp]+$ matches a flags specification. +Note also that there is no convenient syntax to remove all +the flags at once, you need to use "-psc". + +Examples +======== + +// enable the message at line 1603 of file svcsock.c +nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' > + <debugfs>/dynamic_debug/control + +// enable all the messages in file svcsock.c +nullarbor:~ # echo -n 'file svcsock.c +p' > + <debugfs>/dynamic_debug/control + +// enable all the messages in the NFS server module +nullarbor:~ # echo -n 'module nfsd +p' > + <debugfs>/dynamic_debug/control + +// enable all 12 messages in the function svc_process() +nullarbor:~ # echo -n 'func svc_process +p' > + <debugfs>/dynamic_debug/control + +// disable all 12 messages in the function svc_process() +nullarbor:~ # echo -n 'func svc_process -p' > + <debugfs>/dynamic_debug/control + +// enable messages for NFS calls READ, READLINK, READDIR and READDIR+. +nullarbor:~ # echo -n 'format "nfsd: READ" +p' > + <debugfs>/dynamic_debug/control diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 20d3b94703a4..1135996bec8b 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -6,20 +6,47 @@ be removed from this file. --------------------------- -What: old static regulatory information and ieee80211_regdom module parameter -When: 2.6.29 +What: The ieee80211_regdom module parameter +When: March 2010 / desktop catchup + +Why: This was inherited by the CONFIG_WIRELESS_OLD_REGULATORY code, + and currently serves as an option for users to define an + ISO / IEC 3166 alpha2 code for the country they are currently + present in. Although there are userspace API replacements for this + through nl80211 distributions haven't yet caught up with implementing + decent alternatives through standard GUIs. Although available as an + option through iw or wpa_supplicant its just a matter of time before + distributions pick up good GUI options for this. The ideal solution + would actually consist of intelligent designs which would do this for + the user automatically even when travelling through different countries. + Until then we leave this module parameter as a compromise. + + When userspace improves with reasonable widely-available alternatives for + this we will no longer need this module parameter. This entry hopes that + by the super-futuristically looking date of "March 2010" we will have + such replacements widely available. + +Who: Luis R. Rodriguez <lrodriguez@atheros.com> + +--------------------------- + +What: CONFIG_WIRELESS_OLD_REGULATORY - old static regulatory information +When: March 2010 / desktop catchup + Why: The old regulatory infrastructure has been replaced with a new one which does not require statically defined regulatory domains. We do not want to keep static regulatory domains in the kernel due to the the dynamic nature of regulatory law and localization. We kept around the old static definitions for the regulatory domains of: + * US * JP * EU + and used by default the US when CONFIG_WIRELESS_OLD_REGULATORY was - set. We also kept around the ieee80211_regdom module parameter in case - some applications were relying on it. Changing regulatory domains - can now be done instead by using nl80211, as is done with iw. + set. We will remove this option once the standard Linux desktop catches + up with the new userspace APIs we have implemented. + Who: Luis R. Rodriguez <lrodriguez@atheros.com> --------------------------- @@ -229,7 +256,9 @@ Who: Jan Engelhardt <jengelh@computergmbh.de> --------------------------- What: b43 support for firmware revision < 410 -When: July 2008 +When: The schedule was July 2008, but it was decided that we are going to keep the + code as long as there are no major maintanance headaches. + So it _could_ be removed _any_ time now, if it conflicts with something new. Why: The support code for the old firmware hurts code readability/maintainability and slightly hurts runtime performance. Bugfixes for the old firmware are not provided by Broadcom anymore. @@ -311,7 +340,8 @@ Who: Krzysztof Piotr Oledzki <ole@ans.pl> --------------------------- What: i2c_attach_client(), i2c_detach_client(), i2c_driver->detach_client() -When: 2.6.29 (ideally) or 2.6.30 (more likely) +When: 2.6.30 +Check: i2c_attach_client i2c_detach_client Why: Deprecated by the new (standard) device driver binding model. Use i2c_driver->probe() and ->remove() instead. Who: Jean Delvare <khali@linux-fr.org> @@ -326,17 +356,6 @@ Who: Hans de Goede <hdegoede@redhat.com> --------------------------- -What: SELinux "compat_net" functionality -When: 2.6.30 at the earliest -Why: In 2.6.18 the Secmark concept was introduced to replace the "compat_net" - network access control functionality of SELinux. Secmark offers both - better performance and greater flexibility than the "compat_net" - mechanism. Now that the major Linux distributions have moved to - Secmark, it is time to deprecate the older mechanism and start the - process of removing the old code. -Who: Paul Moore <paul.moore@hp.com> ---------------------------- - What: sysfs ui for changing p4-clockmod parameters When: September 2009 Why: See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and @@ -344,3 +363,20 @@ Why: See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and Removal is subject to fixing any remaining bugs in ACPI which may cause the thermal throttling not to happen at the right time. Who: Dave Jones <davej@redhat.com>, Matthew Garrett <mjg@redhat.com> + +----------------------------- + +What: __do_IRQ all in one fits nothing interrupt handler +When: 2.6.32 +Why: __do_IRQ was kept for easy migration to the type flow handlers. + More than two years of migration time is enough. +Who: Thomas Gleixner <tglx@linutronix.de> + +----------------------------- + +What: obsolete generic irq defines and typedefs +When: 2.6.30 +Why: The defines and typedefs (hw_interrupt_type, no_irq_type, irq_desc_t) + have been kept around for migration reasons. After more than two years + it's time to remove them finally +Who: Thomas Gleixner <tglx@linutronix.de> diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index ec6a9392a173..4e78ce677843 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -437,8 +437,11 @@ grab BKL for cases when we close a file that had been opened r/w, but that can and should be done using the internal locking with smaller critical areas). Current worst offender is ext2_get_block()... -->fasync() is a mess. This area needs a big cleanup and that will probably -affect locking. +->fasync() is called without BKL protection, and is responsible for +maintaining the FASYNC bit in filp->f_flags. Most instances call +fasync_helper(), which does that maintenance, so it's not normally +something one needs to worry about. Return values > 0 will be mapped to +zero in the VFS layer. ->readdir() and ->ioctl() on directories must be changed. Ideally we would move ->readdir() to inode_operations and use a separate method for directory diff --git a/Documentation/i2c/busses/i2c-nforce2 b/Documentation/i2c/busses/i2c-nforce2 index fae3495bcbaf..9698c396b830 100644 --- a/Documentation/i2c/busses/i2c-nforce2 +++ b/Documentation/i2c/busses/i2c-nforce2 @@ -7,10 +7,14 @@ Supported adapters: * nForce3 250Gb MCP 10de:00E4 * nForce4 MCP 10de:0052 * nForce4 MCP-04 10de:0034 - * nForce4 MCP51 10de:0264 - * nForce4 MCP55 10de:0368 - * nForce4 MCP61 10de:03EB - * nForce4 MCP65 10de:0446 + * nForce MCP51 10de:0264 + * nForce MCP55 10de:0368 + * nForce MCP61 10de:03EB + * nForce MCP65 10de:0446 + * nForce MCP67 10de:0542 + * nForce MCP73 10de:07D8 + * nForce MCP78S 10de:0752 + * nForce MCP79 10de:0AA2 Datasheet: not publicly available, but seems to be similar to the AMD-8111 SMBus 2.0 adapter. diff --git a/Documentation/i2c/busses/i2c-piix4 b/Documentation/i2c/busses/i2c-piix4 index ef1efa79b1df..f889481762b5 100644 --- a/Documentation/i2c/busses/i2c-piix4 +++ b/Documentation/i2c/busses/i2c-piix4 @@ -4,7 +4,7 @@ Supported adapters: * Intel 82371AB PIIX4 and PIIX4E * Intel 82443MX (440MX) Datasheet: Publicly available at the Intel website - * ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges + * ServerWorks OSB4, CSB5, CSB6, HT-1000 and HT-1100 southbridges Datasheet: Only available via NDA from ServerWorks * ATI IXP200, IXP300, IXP400, SB600, SB700 and SB800 southbridges Datasheet: Not publicly available diff --git a/Documentation/i2c/instantiating-devices b/Documentation/i2c/instantiating-devices new file mode 100644 index 000000000000..b55ce57a84db --- /dev/null +++ b/Documentation/i2c/instantiating-devices @@ -0,0 +1,167 @@ +How to instantiate I2C devices +============================== + +Unlike PCI or USB devices, I2C devices are not enumerated at the hardware +level. Instead, the software must know which devices are connected on each +I2C bus segment, and what address these devices are using. For this +reason, the kernel code must instantiate I2C devices explicitly. There are +several ways to achieve this, depending on the context and requirements. + + +Method 1: Declare the I2C devices by bus number +----------------------------------------------- + +This method is appropriate when the I2C bus is a system bus as is the case +for many embedded systems. On such systems, each I2C bus has a number +which is known in advance. It is thus possible to pre-declare the I2C +devices which live on this bus. This is done with an array of struct +i2c_board_info which is registered by calling i2c_register_board_info(). + +Example (from omap2 h4): + +static struct i2c_board_info __initdata h4_i2c_board_info[] = { + { + I2C_BOARD_INFO("isp1301_omap", 0x2d), + .irq = OMAP_GPIO_IRQ(125), + }, + { /* EEPROM on mainboard */ + I2C_BOARD_INFO("24c01", 0x52), + .platform_data = &m24c01, + }, + { /* EEPROM on cpu card */ + I2C_BOARD_INFO("24c01", 0x57), + .platform_data = &m24c01, + }, +}; + +static void __init omap_h4_init(void) +{ + (...) + i2c_register_board_info(1, h4_i2c_board_info, + ARRAY_SIZE(h4_i2c_board_info)); + (...) +} + +The above code declares 3 devices on I2C bus 1, including their respective +addresses and custom data needed by their drivers. When the I2C bus in +question is registered, the I2C devices will be instantiated automatically +by i2c-core. + +The devices will be automatically unbound and destroyed when the I2C bus +they sit on goes away (if ever.) + + +Method 2: Instantiate the devices explicitly +-------------------------------------------- + +This method is appropriate when a larger device uses an I2C bus for +internal communication. A typical case is TV adapters. These can have a +tuner, a video decoder, an audio decoder, etc. usually connected to the +main chip by the means of an I2C bus. You won't know the number of the I2C +bus in advance, so the method 1 described above can't be used. Instead, +you can instantiate your I2C devices explicitly. This is done by filling +a struct i2c_board_info and calling i2c_new_device(). + +Example (from the sfe4001 network driver): + +static struct i2c_board_info sfe4001_hwmon_info = { + I2C_BOARD_INFO("max6647", 0x4e), +}; + +int sfe4001_init(struct efx_nic *efx) +{ + (...) + efx->board_info.hwmon_client = + i2c_new_device(&efx->i2c_adap, &sfe4001_hwmon_info); + + (...) +} + +The above code instantiates 1 I2C device on the I2C bus which is on the +network adapter in question. + +A variant of this is when you don't know for sure if an I2C device is +present or not (for example for an optional feature which is not present +on cheap variants of a board but you have no way to tell them apart), or +it may have different addresses from one board to the next (manufacturer +changing its design without notice). In this case, you can call +i2c_new_probed_device() instead of i2c_new_device(). + +Example (from the pnx4008 OHCI driver): + +static const unsigned short normal_i2c[] = { 0x2c, 0x2d, I2C_CLIENT_END }; + +static int __devinit usb_hcd_pnx4008_probe(struct platform_device *pdev) +{ + (...) + struct i2c_adapter *i2c_adap; + struct i2c_board_info i2c_info; + + (...) + i2c_adap = i2c_get_adapter(2); + memset(&i2c_info, 0, sizeof(struct i2c_board_info)); + strlcpy(i2c_info.name, "isp1301_pnx", I2C_NAME_SIZE); + isp1301_i2c_client = i2c_new_probed_device(i2c_adap, &i2c_info, + normal_i2c); + i2c_put_adapter(i2c_adap); + (...) +} + +The above code instantiates up to 1 I2C device on the I2C bus which is on +the OHCI adapter in question. It first tries at address 0x2c, if nothing +is found there it tries address 0x2d, and if still nothing is found, it +simply gives up. + +The driver which instantiated the I2C device is responsible for destroying +it on cleanup. This is done by calling i2c_unregister_device() on the +pointer that was earlier returned by i2c_new_device() or +i2c_new_probed_device(). + + +Method 3: Probe an I2C bus for certain devices +---------------------------------------------- + +Sometimes you do not have enough information about an I2C device, not even +to call i2c_new_probed_device(). The typical case is hardware monitoring +chips on PC mainboards. There are several dozen models, which can live +at 25 different addresses. Given the huge number of mainboards out there, +it is next to impossible to build an exhaustive list of the hardware +monitoring chips being used. Fortunately, most of these chips have +manufacturer and device ID registers, so they can be identified by +probing. + +In that case, I2C devices are neither declared nor instantiated +explicitly. Instead, i2c-core will probe for such devices as soon as their +drivers are loaded, and if any is found, an I2C device will be +instantiated automatically. In order to prevent any misbehavior of this +mechanism, the following restrictions apply: +* The I2C device driver must implement the detect() method, which + identifies a supported device by reading from arbitrary registers. +* Only buses which are likely to have a supported device and agree to be + probed, will be probed. For example this avoids probing for hardware + monitoring chips on a TV adapter. + +Example: +See lm90_driver and lm90_detect() in drivers/hwmon/lm90.c + +I2C devices instantiated as a result of such a successful probe will be +destroyed automatically when the driver which detected them is removed, +or when the underlying I2C bus is itself destroyed, whichever happens +first. + +Those of you familiar with the i2c subsystem of 2.4 kernels and early 2.6 +kernels will find out that this method 3 is essentially similar to what +was done there. Two significant differences are: +* Probing is only one way to instantiate I2C devices now, while it was the + only way back then. Where possible, methods 1 and 2 should be preferred. + Method 3 should only be used when there is no other way, as it can have + undesirable side effects. +* I2C buses must now explicitly say which I2C driver classes can probe + them (by the means of the class bitfield), while all I2C buses were + probed by default back then. The default is an empty class which means + that no probing happens. The purpose of the class bitfield is to limit + the aforementioned undesirable side effects. + +Once again, method 3 should be avoided wherever possible. Explicit device +instantiation (methods 1 and 2) is much preferred for it is safer and +faster. diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients index 6b9af7d479c2..c1a06f989cf7 100644 --- a/Documentation/i2c/writing-clients +++ b/Documentation/i2c/writing-clients @@ -207,15 +207,26 @@ You simply have to define a detect callback which will attempt to identify supported devices (returning 0 for supported ones and -ENODEV for unsupported ones), a list of addresses to probe, and a device type (or class) so that only I2C buses which may have that type of device -connected (and not otherwise enumerated) will be probed. The i2c -core will then call you back as needed and will instantiate a device -for you for every successful detection. +connected (and not otherwise enumerated) will be probed. For example, +a driver for a hardware monitoring chip for which auto-detection is +needed would set its class to I2C_CLASS_HWMON, and only I2C adapters +with a class including I2C_CLASS_HWMON would be probed by this driver. +Note that the absence of matching classes does not prevent the use of +a device of that type on the given I2C adapter. All it prevents is +auto-detection; explicit instantiation of devices is still possible. Note that this mechanism is purely optional and not suitable for all devices. You need some reliable way to identify the supported devices (typically using device-specific, dedicated identification registers), otherwise misdetections are likely to occur and things can get wrong -quickly. +quickly. Keep in mind that the I2C protocol doesn't include any +standard way to detect the presence of a chip at a given address, let +alone a standard way to identify devices. Even worse is the lack of +semantics associated to bus transfers, which means that the same +transfer can be seen as a read operation by a chip and as a write +operation by another chip. For these reasons, explicit device +instantiation should always be preferred to auto-detection where +possible. Device Deletion diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 54f21a5c262b..be3bde51b564 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -44,6 +44,7 @@ parameter is applicable: FB The frame buffer device is enabled. HW Appropriate hardware is enabled. IA-64 IA-64 architecture is enabled. + IMA Integrity measurement architecture is enabled. IOSCHED More than one I/O scheduler is enabled. IP_PNP IP DHCP, BOOTP, or RARP is enabled. ISAPNP ISA PnP code is enabled. @@ -492,10 +493,12 @@ and is between 256 and 4096 characters. It is defined in the file Default: 64 hpet= [X86-32,HPET] option to control HPET usage - Format: { enable (default) | disable | force } + Format: { enable (default) | disable | force | + verbose } disable: disable HPET and use PIT instead force: allow force enabled of undocumented chips (ICH4, VIA, nVidia) + verbose: show contents of HPET registers during setup com20020= [HW,NET] ARCnet - COM20020 chipset Format: @@ -829,6 +832,15 @@ and is between 256 and 4096 characters. It is defined in the file hvc_iucv= [S390] Number of z/VM IUCV hypervisor console (HVC) terminal devices. Valid values: 0..8 + hvc_iucv_allow= [S390] Comma-separated list of z/VM user IDs. + If specified, z/VM IUCV HVC accepts connections + from listed z/VM user IDs only. + + i2c_bus= [HW] Override the default board specific I2C bus speed + or register an additional I2C bus that is not + registered from board initialization code. + Format: + <bus_id>,<clkrate> i8042.debug [HW] Toggle i8042 debug mode i8042.direct [HW] Put keyboard port into non-translated mode @@ -902,6 +914,15 @@ and is between 256 and 4096 characters. It is defined in the file ihash_entries= [KNL] Set number of hash buckets for inode cache. + ima_audit= [IMA] + Format: { "0" | "1" } + 0 -- integrity auditing messages. (Default) + 1 -- enable informational integrity auditing messages. + + ima_hash= [IMA] + Formt: { "sha1" | "md5" } + default: "sha1" + in2000= [HW,SCSI] See header of drivers/scsi/in2000.c. @@ -1310,8 +1331,13 @@ and is between 256 and 4096 characters. It is defined in the file memtest= [KNL,X86] Enable memtest Format: <integer> - range: 0,4 : pattern number default : 0 <disable> + Specifies the number of memtest passes to be + performed. Each pass selects another test + pattern from a given set of patterns. Memtest + fills the memory with this pattern, validates + memory contents and reserves bad memory + regions that are detected. meye.*= [HW] Set MotionEye Camera parameters See Documentation/video4linux/meye.txt. @@ -1816,11 +1842,6 @@ and is between 256 and 4096 characters. It is defined in the file autoconfiguration. Ranges are in pairs (memory base and size). - dynamic_printk Enables pr_debug()/dev_dbg() calls if - CONFIG_DYNAMIC_PRINTK_DEBUG has been enabled. - These can also be switched on/off via - <debugfs>/dynamic_printk/modules - print-fatal-signals= [KNL] debug: print fatal signals print-fatal-signals=1: print segfault info to @@ -2009,15 +2030,6 @@ and is between 256 and 4096 characters. It is defined in the file If enabled at boot time, /selinux/disable can be used later to disable prior to initial policy load. - selinux_compat_net = - [SELINUX] Set initial selinux_compat_net flag value. - Format: { "0" | "1" } - 0 -- use new secmark-based packet controls - 1 -- use legacy packet controls - Default value is 0 (preferred). - Value can be changed at runtime via - /selinux/compat_net. - serialnumber [BUGS=X86-32] shapers= [NET] diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index 7a3bb1abb830..b132e4a3cf0f 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt @@ -141,7 +141,8 @@ rx_ccid = 2 Default CCID for the receiver-sender half-connection; see tx_ccid. seq_window = 100 - The initial sequence window (sec. 7.5.2). + The initial sequence window (sec. 7.5.2) of the sender. This influences + the local ackno validity and the remote seqno validity windows (7.5.1). tx_qlen = 5 The size of the transmit buffer in packets. A value of 0 corresponds diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index c7712787933c..ec5de02f543f 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -2,7 +2,7 @@ ip_forward - BOOLEAN 0 - disabled (default) - not 0 - enabled + not 0 - enabled Forward Packets between interfaces. @@ -36,49 +36,49 @@ rt_cache_rebuild_count - INTEGER IP Fragmentation: ipfrag_high_thresh - INTEGER - Maximum memory used to reassemble IP fragments. When + Maximum memory used to reassemble IP fragments. When ipfrag_high_thresh bytes of memory is allocated for this purpose, the fragment handler will toss packets until ipfrag_low_thresh is reached. - + ipfrag_low_thresh - INTEGER - See ipfrag_high_thresh + See ipfrag_high_thresh ipfrag_time - INTEGER - Time in seconds to keep an IP fragment in memory. + Time in seconds to keep an IP fragment in memory. ipfrag_secret_interval - INTEGER - Regeneration interval (in seconds) of the hash secret (or lifetime + Regeneration interval (in seconds) of the hash secret (or lifetime for the hash secret) for IP fragments. Default: 600 ipfrag_max_dist - INTEGER - ipfrag_max_dist is a non-negative integer value which defines the - maximum "disorder" which is allowed among fragments which share a - common IP source address. Note that reordering of packets is - not unusual, but if a large number of fragments arrive from a source - IP address while a particular fragment queue remains incomplete, it - probably indicates that one or more fragments belonging to that queue - have been lost. When ipfrag_max_dist is positive, an additional check - is done on fragments before they are added to a reassembly queue - if - ipfrag_max_dist (or more) fragments have arrived from a particular IP - address between additions to any IP fragment queue using that source - address, it's presumed that one or more fragments in the queue are - lost. The existing fragment queue will be dropped, and a new one + ipfrag_max_dist is a non-negative integer value which defines the + maximum "disorder" which is allowed among fragments which share a + common IP source address. Note that reordering of packets is + not unusual, but if a large number of fragments arrive from a source + IP address while a particular fragment queue remains incomplete, it + probably indicates that one or more fragments belonging to that queue + have been lost. When ipfrag_max_dist is positive, an additional check + is done on fragments before they are added to a reassembly queue - if + ipfrag_max_dist (or more) fragments have arrived from a particular IP + address between additions to any IP fragment queue using that source + address, it's presumed that one or more fragments in the queue are + lost. The existing fragment queue will be dropped, and a new one started. An ipfrag_max_dist value of zero disables this check. Using a very small value, e.g. 1 or 2, for ipfrag_max_dist can result in unnecessarily dropping fragment queues when normal - reordering of packets occurs, which could lead to poor application - performance. Using a very large value, e.g. 50000, increases the - likelihood of incorrectly reassembling IP fragments that originate + reordering of packets occurs, which could lead to poor application + performance. Using a very large value, e.g. 50000, increases the + likelihood of incorrectly reassembling IP fragments that originate from different IP datagrams, which could result in data corruption. Default: 64 INET peer storage: inet_peer_threshold - INTEGER - The approximate size of the storage. Starting from this threshold + The approximate size of the storage. Starting from this threshold entries will be thrown aggressively. This threshold also determines entries' time-to-live and time intervals between garbage collection passes. More entries, less time-to-live, less GC interval. @@ -105,7 +105,7 @@ inet_peer_gc_maxtime - INTEGER in effect under low (or absent) memory pressure on the pool. Measured in seconds. -TCP variables: +TCP variables: somaxconn - INTEGER Limit of socket listen() backlog, known in userspace as SOMAXCONN. @@ -310,7 +310,7 @@ tcp_orphan_retries - INTEGER tcp_reordering - INTEGER Maximal reordering of packets in a TCP stream. - Default: 3 + Default: 3 tcp_retrans_collapse - BOOLEAN Bug-to-bug compatibility with some broken printers. @@ -521,7 +521,7 @@ IP Variables: ip_local_port_range - 2 INTEGERS Defines the local port range that is used by TCP and UDP to - choose the local port. The first number is the first, the + choose the local port. The first number is the first, the second the last local port number. Default value depends on amount of memory available on the system: > 128Mb 32768-61000 @@ -594,12 +594,12 @@ icmp_errors_use_inbound_ifaddr - BOOLEAN If zero, icmp error messages are sent with the primary address of the exiting interface. - + If non-zero, the message will be sent with the primary address of the interface that received the packet that caused the icmp error. This is the behaviour network many administrators will expect from a router. And it can make debugging complicated network layouts - much easier. + much easier. Note that if no primary address exists for the interface selected, then the primary address of the first non-loopback interface that @@ -611,7 +611,7 @@ igmp_max_memberships - INTEGER Change the maximum number of multicast groups we can subscribe to. Default: 20 -conf/interface/* changes special settings per interface (where "interface" is +conf/interface/* changes special settings per interface (where "interface" is the name of your network interface) conf/all/* is special, changes the settings for all interfaces @@ -625,11 +625,11 @@ log_martians - BOOLEAN accept_redirects - BOOLEAN Accept ICMP redirect messages. accept_redirects for the interface will be enabled if: - - both conf/{all,interface}/accept_redirects are TRUE in the case forwarding - for the interface is enabled + - both conf/{all,interface}/accept_redirects are TRUE in the case + forwarding for the interface is enabled or - - at least one of conf/{all,interface}/accept_redirects is TRUE in the case - forwarding for the interface is disabled + - at least one of conf/{all,interface}/accept_redirects is TRUE in the + case forwarding for the interface is disabled accept_redirects for the interface will be disabled otherwise default TRUE (host) FALSE (router) @@ -640,8 +640,8 @@ forwarding - BOOLEAN mc_forwarding - BOOLEAN Do multicast routing. The kernel needs to be compiled with CONFIG_MROUTE and a multicast routing daemon is required. - conf/all/mc_forwarding must also be set to TRUE to enable multicast routing - for the interface + conf/all/mc_forwarding must also be set to TRUE to enable multicast + routing for the interface medium_id - INTEGER Integer value used to differentiate the devices by the medium they @@ -649,7 +649,7 @@ medium_id - INTEGER the broadcast packets are received only on one of them. The default value 0 means that the device is the only interface to its medium, value of -1 means that medium is not known. - + Currently, it is used to change the proxy_arp behavior: the proxy_arp feature is enabled for packets forwarded between two devices attached to different media. @@ -699,16 +699,22 @@ accept_source_route - BOOLEAN default TRUE (router) FALSE (host) -rp_filter - BOOLEAN - 1 - do source validation by reversed path, as specified in RFC1812 - Recommended option for single homed hosts and stub network - routers. Could cause troubles for complicated (not loop free) - networks running a slow unreliable protocol (sort of RIP), - or using static routes. - +rp_filter - INTEGER 0 - No source validation. - - conf/all/rp_filter must also be set to TRUE to do source validation + 1 - Strict mode as defined in RFC3704 Strict Reverse Path + Each incoming packet is tested against the FIB and if the interface + is not the best reverse path the packet check will fail. + By default failed packets are discarded. + 2 - Loose mode as defined in RFC3704 Loose Reverse Path + Each incoming packet's source address is also tested against the FIB + and if the source address is not reachable via any interface + the packet check will fail. + + Current recommended practice in RFC3704 is to enable strict mode + to prevent IP spoofing from DDos attacks. If using asymmetric routing + or other complicated routing, then loose mode is recommended. + + conf/all/rp_filter must also be set to non-zero to do source validation on the interface Default value is 0. Note that some distributions enable it @@ -782,6 +788,12 @@ arp_ignore - INTEGER The max value from conf/{all,interface}/arp_ignore is used when ARP request is received on the {interface} +arp_notify - BOOLEAN + Define mode for notification of address and device changes. + 0 - (default): do nothing + 1 - Generate gratuitous arp replies when device is brought up + or hardware address changes. + arp_accept - BOOLEAN Define behavior when gratuitous arp replies are received: 0 - drop gratuitous arp frames @@ -823,7 +835,7 @@ apply to IPv6 [XXX?]. bindv6only - BOOLEAN Default value for IPV6_V6ONLY socket option, - which restricts use of the IPv6 socket to IPv6 communication + which restricts use of the IPv6 socket to IPv6 communication only. TRUE: disable IPv4-mapped address feature FALSE: enable IPv4-mapped address feature @@ -833,19 +845,19 @@ bindv6only - BOOLEAN IPv6 Fragmentation: ip6frag_high_thresh - INTEGER - Maximum memory used to reassemble IPv6 fragments. When + Maximum memory used to reassemble IPv6 fragments. When ip6frag_high_thresh bytes of memory is allocated for this purpose, the fragment handler will toss packets until ip6frag_low_thresh is reached. - + ip6frag_low_thresh - INTEGER - See ip6frag_high_thresh + See ip6frag_high_thresh ip6frag_time - INTEGER Time in seconds to keep an IPv6 fragment in memory. ip6frag_secret_interval - INTEGER - Regeneration interval (in seconds) of the hash secret (or lifetime + Regeneration interval (in seconds) of the hash secret (or lifetime for the hash secret) for IPv6 fragments. Default: 600 @@ -854,17 +866,17 @@ conf/default/*: conf/all/*: - Change all the interface-specific settings. + Change all the interface-specific settings. [XXX: Other special features than forwarding?] conf/all/forwarding - BOOLEAN - Enable global IPv6 forwarding between all interfaces. + Enable global IPv6 forwarding between all interfaces. - IPv4 and IPv6 work differently here; e.g. netfilter must be used + IPv4 and IPv6 work differently here; e.g. netfilter must be used to control which interfaces may forward packets and which not. - This also sets all interfaces' Host/Router setting + This also sets all interfaces' Host/Router setting 'forwarding' to the specified value. See below for details. This referred to as global forwarding. @@ -875,12 +887,12 @@ proxy_ndp - BOOLEAN conf/interface/*: Change special settings per interface. - The functional behaviour for certain settings is different + The functional behaviour for certain settings is different depending on whether local forwarding is enabled or not. accept_ra - BOOLEAN Accept Router Advertisements; autoconfigure using them. - + Functional default: enabled if local forwarding is disabled. disabled if local forwarding is enabled. @@ -926,7 +938,7 @@ accept_source_route - INTEGER Default: 0 autoconf - BOOLEAN - Autoconfigure addresses using Prefix Information in Router + Autoconfigure addresses using Prefix Information in Router Advertisements. Functional default: enabled if accept_ra_pinfo is enabled. @@ -935,11 +947,11 @@ autoconf - BOOLEAN dad_transmits - INTEGER The amount of Duplicate Address Detection probes to send. Default: 1 - + forwarding - BOOLEAN - Configure interface-specific Host/Router behaviour. + Configure interface-specific Host/Router behaviour. - Note: It is recommended to have the same setting on all + Note: It is recommended to have the same setting on all interfaces; mixed router/host scenarios are rather uncommon. FALSE: @@ -948,13 +960,13 @@ forwarding - BOOLEAN 1. IsRouter flag is not set in Neighbour Advertisements. 2. Router Solicitations are being sent when necessary. - 3. If accept_ra is TRUE (default), accept Router + 3. If accept_ra is TRUE (default), accept Router Advertisements (and do autoconfiguration). 4. If accept_redirects is TRUE (default), accept Redirects. TRUE: - If local forwarding is enabled, Router behaviour is assumed. + If local forwarding is enabled, Router behaviour is assumed. This means exactly the reverse from the above: 1. IsRouter flag is set in Neighbour Advertisements. @@ -989,7 +1001,7 @@ router_solicitation_interval - INTEGER Default: 4 router_solicitations - INTEGER - Number of Router Solicitations to send until assuming no + Number of Router Solicitations to send until assuming no routers are present. Default: 3 @@ -1013,11 +1025,11 @@ temp_prefered_lft - INTEGER max_desync_factor - INTEGER Maximum value for DESYNC_FACTOR, which is a random value - that ensures that clients don't synchronize with each + that ensures that clients don't synchronize with each other and generate new addresses at exactly the same time. value is in seconds. Default: 600 - + regen_max_retry - INTEGER Number of attempts before give up attempting to generate valid temporary addresses. @@ -1025,13 +1037,15 @@ regen_max_retry - INTEGER max_addresses - INTEGER Number of maximum addresses per interface. 0 disables limitation. - It is recommended not set too large value (or 0) because it would - be too easy way to crash kernel to allow to create too much of + It is recommended not set too large value (or 0) because it would + be too easy way to crash kernel to allow to create too much of autoconfigured addresses. Default: 16 disable_ipv6 - BOOLEAN - Disable IPv6 operation. + Disable IPv6 operation. If accept_dad is set to 2, this value + will be dynamically set to TRUE if DAD fails for the link-local + address. Default: FALSE (enable IPv6 operation) accept_dad - INTEGER diff --git a/Documentation/networking/ixgbe.txt b/Documentation/networking/ixgbe.txt new file mode 100644 index 000000000000..eeb68685c788 --- /dev/null +++ b/Documentation/networking/ixgbe.txt @@ -0,0 +1,199 @@ +Linux Base Driver for 10 Gigabit PCI Express Intel(R) Network Connection +======================================================================== + +March 10, 2009 + + +Contents +======== + +- In This Release +- Identifying Your Adapter +- Building and Installation +- Additional Configurations +- Support + + + +In This Release +=============== + +This file describes the ixgbe Linux Base Driver for the 10 Gigabit PCI +Express Intel(R) Network Connection. This driver includes support for +Itanium(R)2-based systems. + +For questions related to hardware requirements, refer to the documentation +supplied with your 10 Gigabit adapter. All hardware requirements listed apply +to use with Linux. + +The following features are available in this kernel: + - Native VLANs + - Channel Bonding (teaming) + - SNMP + - Generic Receive Offload + - Data Center Bridging + +Channel Bonding documentation can be found in the Linux kernel source: +/Documentation/networking/bonding.txt + +Ethtool, lspci, and ifconfig can be used to display device and driver +specific information. + + +Identifying Your Adapter +======================== + +This driver supports devices based on the 82598 controller and the 82599 +controller. + +For specific information on identifying which adapter you have, please visit: + + http://support.intel.com/support/network/sb/CS-008441.htm + + +Building and Installation +========================= + +select m for "Intel(R) 10GbE PCI Express adapters support" located at: + Location: + -> Device Drivers + -> Network device support (NETDEVICES [=y]) + -> Ethernet (10000 Mbit) (NETDEV_10000 [=y]) + +1. make modules & make modules_install + +2. Load the module: + +# modprobe ixgbe + + The insmod command can be used if the full + path to the driver module is specified. For example: + + insmod /lib/modules/<KERNEL VERSION>/kernel/drivers/net/ixgbe/ixgbe.ko + + With 2.6 based kernels also make sure that older ixgbe drivers are + removed from the kernel, before loading the new module: + + rmmod ixgbe; modprobe ixgbe + +3. Assign an IP address to the interface by entering the following, where + x is the interface number: + + ifconfig ethx <IP_address> + +4. Verify that the interface works. Enter the following, where <IP_address> + is the IP address for another machine on the same subnet as the interface + that is being tested: + + ping <IP_address> + + +Additional Configurations +========================= + + Viewing Link Messages + --------------------- + Link messages will not be displayed to the console if the distribution is + restricting system messages. In order to see network driver link messages on + your console, set dmesg to eight by entering the following: + + dmesg -n 8 + + NOTE: This setting is not saved across reboots. + + + Jumbo Frames + ------------ + The driver supports Jumbo Frames for all adapters. Jumbo Frames support is + enabled by changing the MTU to a value larger than the default of 1500. + The maximum value for the MTU is 16110. Use the ifconfig command to + increase the MTU size. For example: + + ifconfig ethx mtu 9000 up + + The maximum MTU setting for Jumbo Frames is 16110. This value coincides + with the maximum Jumbo Frames size of 16128. + + Generic Receive Offload, aka GRO + -------------------------------- + The driver supports the in-kernel software implementation of GRO. GRO has + shown that by coalescing Rx traffic into larger chunks of data, CPU + utilization can be significantly reduced when under large Rx load. GRO is an + evolution of the previously-used LRO interface. GRO is able to coalesce + other protocols besides TCP. It's also safe to use with configurations that + are problematic for LRO, namely bridging and iSCSI. + + GRO is enabled by default in the driver. Future versions of ethtool will + support disabling and re-enabling GRO on the fly. + + + Data Center Bridging, aka DCB + ----------------------------- + + DCB is a configuration Quality of Service implementation in hardware. + It uses the VLAN priority tag (802.1p) to filter traffic. That means + that there are 8 different priorities that traffic can be filtered into. + It also enables priority flow control which can limit or eliminate the + number of dropped packets during network stress. Bandwidth can be + allocated to each of these priorities, which is enforced at the hardware + level. + + To enable DCB support in ixgbe, you must enable the DCB netlink layer to + allow the userspace tools (see below) to communicate with the driver. + This can be found in the kernel configuration here: + + -> Networking support + -> Networking options + -> Data Center Bridging support + + Once this is selected, DCB support must be selected for ixgbe. This can + be found here: + + -> Device Drivers + -> Network device support (NETDEVICES [=y]) + -> Ethernet (10000 Mbit) (NETDEV_10000 [=y]) + -> Intel(R) 10GbE PCI Express adapters support + -> Data Center Bridging (DCB) Support + + After these options are selected, you must rebuild your kernel and your + modules. + + In order to use DCB, userspace tools must be downloaded and installed. + The dcbd tools can be found at: + + http://e1000.sf.net + + + Ethtool + ------- + The driver utilizes the ethtool interface for driver configuration and + diagnostics, as well as displaying statistical information. Ethtool + version 3.0 or later is required for this functionality. + + The latest release of ethtool can be found from + http://sourceforge.net/projects/gkernel. + + + NAPI + ---- + + NAPI (Rx polling mode) is supported in the ixgbe driver. NAPI is enabled + by default in the driver. + + See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI. + + +Support +======= + +For general information, go to the Intel support website at: + + http://support.intel.com + +or the Intel Wired Networking project hosted by Sourceforge at: + + http://e1000.sourceforge.net + +If an issue is identified with the released source code on the supported +kernel with a supported adapter, email the specific information related +to the issue to e1000-devel@lists.sf.net diff --git a/Documentation/networking/rds.txt b/Documentation/networking/rds.txt new file mode 100644 index 000000000000..c67077cbeb80 --- /dev/null +++ b/Documentation/networking/rds.txt @@ -0,0 +1,356 @@ + +Overview +======== + +This readme tries to provide some background on the hows and whys of RDS, +and will hopefully help you find your way around the code. + +In addition, please see this email about RDS origins: +http://oss.oracle.com/pipermail/rds-devel/2007-November/000228.html + +RDS Architecture +================ + +RDS provides reliable, ordered datagram delivery by using a single +reliable connection between any two nodes in the cluster. This allows +applications to use a single socket to talk to any other process in the +cluster - so in a cluster with N processes you need N sockets, in contrast +to N*N if you use a connection-oriented socket transport like TCP. + +RDS is not Infiniband-specific; it was designed to support different +transports. The current implementation used to support RDS over TCP as well +as IB. Work is in progress to support RDS over iWARP, and using DCE to +guarantee no dropped packets on Ethernet, it may be possible to use RDS over +UDP in the future. + +The high-level semantics of RDS from the application's point of view are + + * Addressing + RDS uses IPv4 addresses and 16bit port numbers to identify + the end point of a connection. All socket operations that involve + passing addresses between kernel and user space generally + use a struct sockaddr_in. + + The fact that IPv4 addresses are used does not mean the underlying + transport has to be IP-based. In fact, RDS over IB uses a + reliable IB connection; the IP address is used exclusively to + locate the remote node's GID (by ARPing for the given IP). + + The port space is entirely independent of UDP, TCP or any other + protocol. + + * Socket interface + RDS sockets work *mostly* as you would expect from a BSD + socket. The next section will cover the details. At any rate, + all I/O is performed through the standard BSD socket API. + Some additions like zerocopy support are implemented through + control messages, while other extensions use the getsockopt/ + setsockopt calls. + + Sockets must be bound before you can send or receive data. + This is needed because binding also selects a transport and + attaches it to the socket. Once bound, the transport assignment + does not change. RDS will tolerate IPs moving around (eg in + a active-active HA scenario), but only as long as the address + doesn't move to a different transport. + + * sysctls + RDS supports a number of sysctls in /proc/sys/net/rds + + +Socket Interface +================ + + AF_RDS, PF_RDS, SOL_RDS + These constants haven't been assigned yet, because RDS isn't in + mainline yet. Currently, the kernel module assigns some constant + and publishes it to user space through two sysctl files + /proc/sys/net/rds/pf_rds + /proc/sys/net/rds/sol_rds + + fd = socket(PF_RDS, SOCK_SEQPACKET, 0); + This creates a new, unbound RDS socket. + + setsockopt(SOL_SOCKET): send and receive buffer size + RDS honors the send and receive buffer size socket options. + You are not allowed to queue more than SO_SNDSIZE bytes to + a socket. A message is queued when sendmsg is called, and + it leaves the queue when the remote system acknowledges + its arrival. + + The SO_RCVSIZE option controls the maximum receive queue length. + This is a soft limit rather than a hard limit - RDS will + continue to accept and queue incoming messages, even if that + takes the queue length over the limit. However, it will also + mark the port as "congested" and send a congestion update to + the source node. The source node is supposed to throttle any + processes sending to this congested port. + + bind(fd, &sockaddr_in, ...) + This binds the socket to a local IP address and port, and a + transport. + + sendmsg(fd, ...) + Sends a message to the indicated recipient. The kernel will + transparently establish the underlying reliable connection + if it isn't up yet. + + An attempt to send a message that exceeds SO_SNDSIZE will + return with -EMSGSIZE + + An attempt to send a message that would take the total number + of queued bytes over the SO_SNDSIZE threshold will return + EAGAIN. + + An attempt to send a message to a destination that is marked + as "congested" will return ENOBUFS. + + recvmsg(fd, ...) + Receives a message that was queued to this socket. The sockets + recv queue accounting is adjusted, and if the queue length + drops below SO_SNDSIZE, the port is marked uncongested, and + a congestion update is sent to all peers. + + Applications can ask the RDS kernel module to receive + notifications via control messages (for instance, there is a + notification when a congestion update arrived, or when a RDMA + operation completes). These notifications are received through + the msg.msg_control buffer of struct msghdr. The format of the + messages is described in manpages. + + poll(fd) + RDS supports the poll interface to allow the application + to implement async I/O. + + POLLIN handling is pretty straightforward. When there's an + incoming message queued to the socket, or a pending notification, + we signal POLLIN. + + POLLOUT is a little harder. Since you can essentially send + to any destination, RDS will always signal POLLOUT as long as + there's room on the send queue (ie the number of bytes queued + is less than the sendbuf size). + + However, the kernel will refuse to accept messages to + a destination marked congested - in this case you will loop + forever if you rely on poll to tell you what to do. + This isn't a trivial problem, but applications can deal with + this - by using congestion notifications, and by checking for + ENOBUFS errors returned by sendmsg. + + setsockopt(SOL_RDS, RDS_CANCEL_SENT_TO, &sockaddr_in) + This allows the application to discard all messages queued to a + specific destination on this particular socket. + + This allows the application to cancel outstanding messages if + it detects a timeout. For instance, if it tried to send a message, + and the remote host is unreachable, RDS will keep trying forever. + The application may decide it's not worth it, and cancel the + operation. In this case, it would use RDS_CANCEL_SENT_TO to + nuke any pending messages. + + +RDMA for RDS +============ + + see rds-rdma(7) manpage (available in rds-tools) + + +Congestion Notifications +======================== + + see rds(7) manpage + + +RDS Protocol +============ + + Message header + + The message header is a 'struct rds_header' (see rds.h): + Fields: + h_sequence: + per-packet sequence number + h_ack: + piggybacked acknowledgment of last packet received + h_len: + length of data, not including header + h_sport: + source port + h_dport: + destination port + h_flags: + CONG_BITMAP - this is a congestion update bitmap + ACK_REQUIRED - receiver must ack this packet + RETRANSMITTED - packet has previously been sent + h_credit: + indicate to other end of connection that + it has more credits available (i.e. there is + more send room) + h_padding[4]: + unused, for future use + h_csum: + header checksum + h_exthdr: + optional data can be passed here. This is currently used for + passing RDMA-related information. + + ACK and retransmit handling + + One might think that with reliable IB connections you wouldn't need + to ack messages that have been received. The problem is that IB + hardware generates an ack message before it has DMAed the message + into memory. This creates a potential message loss if the HCA is + disabled for any reason between when it sends the ack and before + the message is DMAed and processed. This is only a potential issue + if another HCA is available for fail-over. + + Sending an ack immediately would allow the sender to free the sent + message from their send queue quickly, but could cause excessive + traffic to be used for acks. RDS piggybacks acks on sent data + packets. Ack-only packets are reduced by only allowing one to be + in flight at a time, and by the sender only asking for acks when + its send buffers start to fill up. All retransmissions are also + acked. + + Flow Control + + RDS's IB transport uses a credit-based mechanism to verify that + there is space in the peer's receive buffers for more data. This + eliminates the need for hardware retries on the connection. + + Congestion + + Messages waiting in the receive queue on the receiving socket + are accounted against the sockets SO_RCVBUF option value. Only + the payload bytes in the message are accounted for. If the + number of bytes queued equals or exceeds rcvbuf then the socket + is congested. All sends attempted to this socket's address + should return block or return -EWOULDBLOCK. + + Applications are expected to be reasonably tuned such that this + situation very rarely occurs. An application encountering this + "back-pressure" is considered a bug. + + This is implemented by having each node maintain bitmaps which + indicate which ports on bound addresses are congested. As the + bitmap changes it is sent through all the connections which + terminate in the local address of the bitmap which changed. + + The bitmaps are allocated as connections are brought up. This + avoids allocation in the interrupt handling path which queues + sages on sockets. The dense bitmaps let transports send the + entire bitmap on any bitmap change reasonably efficiently. This + is much easier to implement than some finer-grained + communication of per-port congestion. The sender does a very + inexpensive bit test to test if the port it's about to send to + is congested or not. + + +RDS Transport Layer +================== + + As mentioned above, RDS is not IB-specific. Its code is divided + into a general RDS layer and a transport layer. + + The general layer handles the socket API, congestion handling, + loopback, stats, usermem pinning, and the connection state machine. + + The transport layer handles the details of the transport. The IB + transport, for example, handles all the queue pairs, work requests, + CM event handlers, and other Infiniband details. + + +RDS Kernel Structures +===================== + + struct rds_message + aka possibly "rds_outgoing", the generic RDS layer copies data to + be sent and sets header fields as needed, based on the socket API. + This is then queued for the individual connection and sent by the + connection's transport. + struct rds_incoming + a generic struct referring to incoming data that can be handed from + the transport to the general code and queued by the general code + while the socket is awoken. It is then passed back to the transport + code to handle the actual copy-to-user. + struct rds_socket + per-socket information + struct rds_connection + per-connection information + struct rds_transport + pointers to transport-specific functions + struct rds_statistics + non-transport-specific statistics + struct rds_cong_map + wraps the raw congestion bitmap, contains rbnode, waitq, etc. + +Connection management +===================== + + Connections may be in UP, DOWN, CONNECTING, DISCONNECTING, and + ERROR states. + + The first time an attempt is made by an RDS socket to send data to + a node, a connection is allocated and connected. That connection is + then maintained forever -- if there are transport errors, the + connection will be dropped and re-established. + + Dropping a connection while packets are queued will cause queued or + partially-sent datagrams to be retransmitted when the connection is + re-established. + + +The send path +============= + + rds_sendmsg() + struct rds_message built from incoming data + CMSGs parsed (e.g. RDMA ops) + transport connection alloced and connected if not already + rds_message placed on send queue + send worker awoken + rds_send_worker() + calls rds_send_xmit() until queue is empty + rds_send_xmit() + transmits congestion map if one is pending + may set ACK_REQUIRED + calls transport to send either non-RDMA or RDMA message + (RDMA ops never retransmitted) + rds_ib_xmit() + allocs work requests from send ring + adds any new send credits available to peer (h_credits) + maps the rds_message's sg list + piggybacks ack + populates work requests + post send to connection's queue pair + +The recv path +============= + + rds_ib_recv_cq_comp_handler() + looks at write completions + unmaps recv buffer from device + no errors, call rds_ib_process_recv() + refill recv ring + rds_ib_process_recv() + validate header checksum + copy header to rds_ib_incoming struct if start of a new datagram + add to ibinc's fraglist + if competed datagram: + update cong map if datagram was cong update + call rds_recv_incoming() otherwise + note if ack is required + rds_recv_incoming() + drop duplicate packets + respond to pings + find the sock associated with this datagram + add to sock queue + wake up sock + do some congestion calculations + rds_recvmsg + copy data into user iovec + handle CMSGs + return to application + + diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt new file mode 100644 index 000000000000..0e58b4539176 --- /dev/null +++ b/Documentation/networking/timestamping.txt @@ -0,0 +1,180 @@ +The existing interfaces for getting network packages time stamped are: + +* SO_TIMESTAMP + Generate time stamp for each incoming packet using the (not necessarily + monotonous!) system time. Result is returned via recv_msg() in a + control message as timeval (usec resolution). + +* SO_TIMESTAMPNS + Same time stamping mechanism as SO_TIMESTAMP, but returns result as + timespec (nsec resolution). + +* IP_MULTICAST_LOOP + SO_TIMESTAMP[NS] + Only for multicasts: approximate send time stamp by receiving the looped + packet and using its receive time stamp. + +The following interface complements the existing ones: receive time +stamps can be generated and returned for arbitrary packets and much +closer to the point where the packet is really sent. Time stamps can +be generated in software (as before) or in hardware (if the hardware +has such a feature). + +SO_TIMESTAMPING: + +Instructs the socket layer which kind of information is wanted. The +parameter is an integer with some of the following bits set. Setting +other bits is an error and doesn't change the current state. + +SOF_TIMESTAMPING_TX_HARDWARE: try to obtain send time stamp in hardware +SOF_TIMESTAMPING_TX_SOFTWARE: if SOF_TIMESTAMPING_TX_HARDWARE is off or + fails, then do it in software +SOF_TIMESTAMPING_RX_HARDWARE: return the original, unmodified time stamp + as generated by the hardware +SOF_TIMESTAMPING_RX_SOFTWARE: if SOF_TIMESTAMPING_RX_HARDWARE is off or + fails, then do it in software +SOF_TIMESTAMPING_RAW_HARDWARE: return original raw hardware time stamp +SOF_TIMESTAMPING_SYS_HARDWARE: return hardware time stamp transformed to + the system time base +SOF_TIMESTAMPING_SOFTWARE: return system time stamp generated in + software + +SOF_TIMESTAMPING_TX/RX determine how time stamps are generated. +SOF_TIMESTAMPING_RAW/SYS determine how they are reported in the +following control message: + struct scm_timestamping { + struct timespec systime; + struct timespec hwtimetrans; + struct timespec hwtimeraw; + }; + +recvmsg() can be used to get this control message for regular incoming +packets. For send time stamps the outgoing packet is looped back to +the socket's error queue with the send time stamp(s) attached. It can +be received with recvmsg(flags=MSG_ERRQUEUE). The call returns the +original outgoing packet data including all headers preprended down to +and including the link layer, the scm_timestamping control message and +a sock_extended_err control message with ee_errno==ENOMSG and +ee_origin==SO_EE_ORIGIN_TIMESTAMPING. A socket with such a pending +bounced packet is ready for reading as far as select() is concerned. +If the outgoing packet has to be fragmented, then only the first +fragment is time stamped and returned to the sending socket. + +All three values correspond to the same event in time, but were +generated in different ways. Each of these values may be empty (= all +zero), in which case no such value was available. If the application +is not interested in some of these values, they can be left blank to +avoid the potential overhead of calculating them. + +systime is the value of the system time at that moment. This +corresponds to the value also returned via SO_TIMESTAMP[NS]. If the +time stamp was generated by hardware, then this field is +empty. Otherwise it is filled in if SOF_TIMESTAMPING_SOFTWARE is +set. + +hwtimeraw is the original hardware time stamp. Filled in if +SOF_TIMESTAMPING_RAW_HARDWARE is set. No assumptions about its +relation to system time should be made. + +hwtimetrans is the hardware time stamp transformed so that it +corresponds as good as possible to system time. This correlation is +not perfect; as a consequence, sorting packets received via different +NICs by their hwtimetrans may differ from the order in which they were +received. hwtimetrans may be non-monotonic even for the same NIC. +Filled in if SOF_TIMESTAMPING_SYS_HARDWARE is set. Requires support +by the network device and will be empty without that support. + + +SIOCSHWTSTAMP: + +Hardware time stamping must also be initialized for each device driver +that is expected to do hardware time stamping. The parameter is: + +struct hwtstamp_config { + int flags; /* no flags defined right now, must be zero */ + int tx_type; /* HWTSTAMP_TX_* */ + int rx_filter; /* HWTSTAMP_FILTER_* */ +}; + +Desired behavior is passed into the kernel and to a specific device by +calling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose +ifr_data points to a struct hwtstamp_config. The tx_type and +rx_filter are hints to the driver what it is expected to do. If +the requested fine-grained filtering for incoming packets is not +supported, the driver may time stamp more than just the requested types +of packets. + +A driver which supports hardware time stamping shall update the struct +with the actual, possibly more permissive configuration. If the +requested packets cannot be time stamped, then nothing should be +changed and ERANGE shall be returned (in contrast to EINVAL, which +indicates that SIOCSHWTSTAMP is not supported at all). + +Only a processes with admin rights may change the configuration. User +space is responsible to ensure that multiple processes don't interfere +with each other and that the settings are reset. + +/* possible values for hwtstamp_config->tx_type */ +enum { + /* + * no outgoing packet will need hardware time stamping; + * should a packet arrive which asks for it, no hardware + * time stamping will be done + */ + HWTSTAMP_TX_OFF, + + /* + * enables hardware time stamping for outgoing packets; + * the sender of the packet decides which are to be + * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE + * before sending the packet + */ + HWTSTAMP_TX_ON, +}; + +/* possible values for hwtstamp_config->rx_filter */ +enum { + /* time stamp no incoming packet at all */ + HWTSTAMP_FILTER_NONE, + + /* time stamp any incoming packet */ + HWTSTAMP_FILTER_ALL, + + /* return value: time stamp all packets requested plus some others */ + HWTSTAMP_FILTER_SOME, + + /* PTP v1, UDP, any kind of event packet */ + HWTSTAMP_FILTER_PTP_V1_L4_EVENT, + + ... +}; + + +DEVICE IMPLEMENTATION + +A driver which supports hardware time stamping must support the +SIOCSHWTSTAMP ioctl. Time stamps for received packets must be stored +in the skb with skb_hwtstamp_set(). + +Time stamps for outgoing packets are to be generated as follows: +- In hard_start_xmit(), check if skb_hwtstamp_check_tx_hardware() + returns non-zero. If yes, then the driver is expected + to do hardware time stamping. +- If this is possible for the skb and requested, then declare + that the driver is doing the time stamping by calling + skb_hwtstamp_tx_in_progress(). A driver not supporting + hardware time stamping doesn't do that. A driver must never + touch sk_buff::tstamp! It is used to store how time stamping + for an outgoing packets is to be done. +- As soon as the driver has sent the packet and/or obtained a + hardware time stamp for it, it passes the time stamp back by + calling skb_hwtstamp_tx() with the original skb, the raw + hardware time stamp and a handle to the device (necessary + to convert the hardware time stamp to system time). If obtaining + the hardware time stamp somehow fails, then the driver should + not fall back to software time stamping. The rationale is that + this would occur at a later time in the processing pipeline + than other software time stamping and therefore could lead + to unexpected deltas between time stamps. +- If the driver did not call skb_hwtstamp_tx_in_progress(), then + dev_hard_start_xmit() checks whether software time stamping + is wanted as fallback and potentially generates the time stamp. diff --git a/Documentation/networking/timestamping/.gitignore b/Documentation/networking/timestamping/.gitignore new file mode 100644 index 000000000000..71e81eb2e22f --- /dev/null +++ b/Documentation/networking/timestamping/.gitignore @@ -0,0 +1 @@ +timestamping diff --git a/Documentation/networking/timestamping/Makefile b/Documentation/networking/timestamping/Makefile new file mode 100644 index 000000000000..2a1489fdc036 --- /dev/null +++ b/Documentation/networking/timestamping/Makefile @@ -0,0 +1,6 @@ +CPPFLAGS = -I../../../include + +timestamping: timestamping.c + +clean: + rm -f timestamping diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c new file mode 100644 index 000000000000..43d143104210 --- /dev/null +++ b/Documentation/networking/timestamping/timestamping.c @@ -0,0 +1,533 @@ +/* + * This program demonstrates how the various time stamping features in + * the Linux kernel work. It emulates the behavior of a PTP + * implementation in stand-alone master mode by sending PTPv1 Sync + * multicasts once every second. It looks for similar packets, but + * beyond that doesn't actually implement PTP. + * + * Outgoing packets are time stamped with SO_TIMESTAMPING with or + * without hardware support. + * + * Incoming packets are time stamped with SO_TIMESTAMPING with or + * without hardware support, SIOCGSTAMP[NS] (per-socket time stamp) and + * SO_TIMESTAMP[NS]. + * + * Copyright (C) 2009 Intel Corporation. + * Author: Patrick Ohly <patrick.ohly@intel.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. * See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include <stdio.h> +#include <stdlib.h> +#include <errno.h> +#include <string.h> + +#include <sys/time.h> +#include <sys/socket.h> +#include <sys/select.h> +#include <sys/ioctl.h> +#include <arpa/inet.h> +#include <net/if.h> + +#include "asm/types.h" +#include "linux/net_tstamp.h" +#include "linux/errqueue.h" + +#ifndef SO_TIMESTAMPING +# define SO_TIMESTAMPING 37 +# define SCM_TIMESTAMPING SO_TIMESTAMPING +#endif + +#ifndef SO_TIMESTAMPNS +# define SO_TIMESTAMPNS 35 +#endif + +#ifndef SIOCGSTAMPNS +# define SIOCGSTAMPNS 0x8907 +#endif + +#ifndef SIOCSHWTSTAMP +# define SIOCSHWTSTAMP 0x89b0 +#endif + +static void usage(const char *error) +{ + if (error) + printf("invalid option: %s\n", error); + printf("timestamping interface option*\n\n" + "Options:\n" + " IP_MULTICAST_LOOP - looping outgoing multicasts\n" + " SO_TIMESTAMP - normal software time stamping, ms resolution\n" + " SO_TIMESTAMPNS - more accurate software time stamping\n" + " SOF_TIMESTAMPING_TX_HARDWARE - hardware time stamping of outgoing packets\n" + " SOF_TIMESTAMPING_TX_SOFTWARE - software fallback for outgoing packets\n" + " SOF_TIMESTAMPING_RX_HARDWARE - hardware time stamping of incoming packets\n" + " SOF_TIMESTAMPING_RX_SOFTWARE - software fallback for incoming packets\n" + " SOF_TIMESTAMPING_SOFTWARE - request reporting of software time stamps\n" + " SOF_TIMESTAMPING_SYS_HARDWARE - request reporting of transformed HW time stamps\n" + " SOF_TIMESTAMPING_RAW_HARDWARE - request reporting of raw HW time stamps\n" + " SIOCGSTAMP - check last socket time stamp\n" + " SIOCGSTAMPNS - more accurate socket time stamp\n"); + exit(1); +} + +static void bail(const char *error) +{ + printf("%s: %s\n", error, strerror(errno)); + exit(1); +} + +static const unsigned char sync[] = { + 0x00, 0x01, 0x00, 0x01, + 0x5f, 0x44, 0x46, 0x4c, + 0x54, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, + 0x01, 0x01, + + /* fake uuid */ + 0x00, 0x01, + 0x02, 0x03, 0x04, 0x05, + + 0x00, 0x01, 0x00, 0x37, + 0x00, 0x00, 0x00, 0x08, + 0x00, 0x00, 0x00, 0x00, + 0x49, 0x05, 0xcd, 0x01, + 0x29, 0xb1, 0x8d, 0xb0, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x01, + + /* fake uuid */ + 0x00, 0x01, + 0x02, 0x03, 0x04, 0x05, + + 0x00, 0x00, 0x00, 0x37, + 0x00, 0x00, 0x00, 0x04, + 0x44, 0x46, 0x4c, 0x54, + 0x00, 0x00, 0xf0, 0x60, + 0x00, 0x01, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x01, + 0x00, 0x00, 0xf0, 0x60, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x04, + 0x44, 0x46, 0x4c, 0x54, + 0x00, 0x01, + + /* fake uuid */ + 0x00, 0x01, + 0x02, 0x03, 0x04, 0x05, + + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00 +}; + +static void sendpacket(int sock, struct sockaddr *addr, socklen_t addr_len) +{ + struct timeval now; + int res; + + res = sendto(sock, sync, sizeof(sync), 0, + addr, addr_len); + gettimeofday(&now, 0); + if (res < 0) + printf("%s: %s\n", "send", strerror(errno)); + else + printf("%ld.%06ld: sent %d bytes\n", + (long)now.tv_sec, (long)now.tv_usec, + res); +} + +static void printpacket(struct msghdr *msg, int res, + char *data, + int sock, int recvmsg_flags, + int siocgstamp, int siocgstampns) +{ + struct sockaddr_in *from_addr = (struct sockaddr_in *)msg->msg_name; + struct cmsghdr *cmsg; + struct timeval tv; + struct timespec ts; + struct timeval now; + + gettimeofday(&now, 0); + + printf("%ld.%06ld: received %s data, %d bytes from %s, %d bytes control messages\n", + (long)now.tv_sec, (long)now.tv_usec, + (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular", + res, + inet_ntoa(from_addr->sin_addr), + msg->msg_controllen); + for (cmsg = CMSG_FIRSTHDR(msg); + cmsg; + cmsg = CMSG_NXTHDR(msg, cmsg)) { + printf(" cmsg len %d: ", cmsg->cmsg_len); + switch (cmsg->cmsg_level) { + case SOL_SOCKET: + printf("SOL_SOCKET "); + switch (cmsg->cmsg_type) { + case SO_TIMESTAMP: { + struct timeval *stamp = + (struct timeval *)CMSG_DATA(cmsg); + printf("SO_TIMESTAMP %ld.%06ld", + (long)stamp->tv_sec, + (long)stamp->tv_usec); + break; + } + case SO_TIMESTAMPNS: { + struct timespec *stamp = + (struct timespec *)CMSG_DATA(cmsg); + printf("SO_TIMESTAMPNS %ld.%09ld", + (long)stamp->tv_sec, + (long)stamp->tv_nsec); + break; + } + case SO_TIMESTAMPING: { + struct timespec *stamp = + (struct timespec *)CMSG_DATA(cmsg); + printf("SO_TIMESTAMPING "); + printf("SW %ld.%09ld ", + (long)stamp->tv_sec, + (long)stamp->tv_nsec); + stamp++; + printf("HW transformed %ld.%09ld ", + (long)stamp->tv_sec, + (long)stamp->tv_nsec); + stamp++; + printf("HW raw %ld.%09ld", + (long)stamp->tv_sec, + (long)stamp->tv_nsec); + break; + } + default: + printf("type %d", cmsg->cmsg_type); + break; + } + break; + case IPPROTO_IP: + printf("IPPROTO_IP "); + switch (cmsg->cmsg_type) { + case IP_RECVERR: { + struct sock_extended_err *err = + (struct sock_extended_err *)CMSG_DATA(cmsg); + printf("IP_RECVERR ee_errno '%s' ee_origin %d => %s", + strerror(err->ee_errno), + err->ee_origin, +#ifdef SO_EE_ORIGIN_TIMESTAMPING + err->ee_origin == SO_EE_ORIGIN_TIMESTAMPING ? + "bounced packet" : "unexpected origin" +#else + "probably SO_EE_ORIGIN_TIMESTAMPING" +#endif + ); + if (res < sizeof(sync)) + printf(" => truncated data?!"); + else if (!memcmp(sync, data + res - sizeof(sync), + sizeof(sync))) + printf(" => GOT OUR DATA BACK (HURRAY!)"); + break; + } + case IP_PKTINFO: { + struct in_pktinfo *pktinfo = + (struct in_pktinfo *)CMSG_DATA(cmsg); + printf("IP_PKTINFO interface index %u", + pktinfo->ipi_ifindex); + break; + } + default: + printf("type %d", cmsg->cmsg_type); + break; + } + break; + default: + printf("level %d type %d", + cmsg->cmsg_level, + cmsg->cmsg_type); + break; + } + printf("\n"); + } + + if (siocgstamp) { + if (ioctl(sock, SIOCGSTAMP, &tv)) + printf(" %s: %s\n", "SIOCGSTAMP", strerror(errno)); + else + printf("SIOCGSTAMP %ld.%06ld\n", + (long)tv.tv_sec, + (long)tv.tv_usec); + } + if (siocgstampns) { + if (ioctl(sock, SIOCGSTAMPNS, &ts)) + printf(" %s: %s\n", "SIOCGSTAMPNS", strerror(errno)); + else + printf("SIOCGSTAMPNS %ld.%09ld\n", + (long)ts.tv_sec, + (long)ts.tv_nsec); + } +} + +static void recvpacket(int sock, int recvmsg_flags, + int siocgstamp, int siocgstampns) +{ + char data[256]; + struct msghdr msg; + struct iovec entry; + struct sockaddr_in from_addr; + struct { + struct cmsghdr cm; + char control[512]; + } control; + int res; + + memset(&msg, 0, sizeof(msg)); + msg.msg_iov = &entry; + msg.msg_iovlen = 1; + entry.iov_base = data; + entry.iov_len = sizeof(data); + msg.msg_name = (caddr_t)&from_addr; + msg.msg_namelen = sizeof(from_addr); + msg.msg_control = &control; + msg.msg_controllen = sizeof(control); + + res = recvmsg(sock, &msg, recvmsg_flags|MSG_DONTWAIT); + if (res < 0) { + printf("%s %s: %s\n", + "recvmsg", + (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular", + strerror(errno)); + } else { + printpacket(&msg, res, data, + sock, recvmsg_flags, + siocgstamp, siocgstampns); + } +} + +int main(int argc, char **argv) +{ + int so_timestamping_flags = 0; + int so_timestamp = 0; + int so_timestampns = 0; + int siocgstamp = 0; + int siocgstampns = 0; + int ip_multicast_loop = 0; + char *interface; + int i; + int enabled = 1; + int sock; + struct ifreq device; + struct ifreq hwtstamp; + struct hwtstamp_config hwconfig, hwconfig_requested; + struct sockaddr_in addr; + struct ip_mreq imr; + struct in_addr iaddr; + int val; + socklen_t len; + struct timeval next; + + if (argc < 2) + usage(0); + interface = argv[1]; + + for (i = 2; i < argc; i++) { + if (!strcasecmp(argv[i], "SO_TIMESTAMP")) + so_timestamp = 1; + else if (!strcasecmp(argv[i], "SO_TIMESTAMPNS")) + so_timestampns = 1; + else if (!strcasecmp(argv[i], "SIOCGSTAMP")) + siocgstamp = 1; + else if (!strcasecmp(argv[i], "SIOCGSTAMPNS")) + siocgstampns = 1; + else if (!strcasecmp(argv[i], "IP_MULTICAST_LOOP")) + ip_multicast_loop = 1; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_HARDWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_TX_HARDWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_SOFTWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_TX_SOFTWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_HARDWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_RX_HARDWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_SOFTWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_RX_SOFTWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SOFTWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_SOFTWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SYS_HARDWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_SYS_HARDWARE; + else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RAW_HARDWARE")) + so_timestamping_flags |= SOF_TIMESTAMPING_RAW_HARDWARE; + else + usage(argv[i]); + } + + sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP); + if (socket < 0) + bail("socket"); + + memset(&device, 0, sizeof(device)); + strncpy(device.ifr_name, interface, sizeof(device.ifr_name)); + if (ioctl(sock, SIOCGIFADDR, &device) < 0) + bail("getting interface IP address"); + + memset(&hwtstamp, 0, sizeof(hwtstamp)); + strncpy(hwtstamp.ifr_name, interface, sizeof(hwtstamp.ifr_name)); + hwtstamp.ifr_data = (void *)&hwconfig; + memset(&hwconfig, 0, sizeof(&hwconfig)); + hwconfig.tx_type = + (so_timestamping_flags & SOF_TIMESTAMPING_TX_HARDWARE) ? + HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF; + hwconfig.rx_filter = + (so_timestamping_flags & SOF_TIMESTAMPING_RX_HARDWARE) ? + HWTSTAMP_FILTER_PTP_V1_L4_SYNC : HWTSTAMP_FILTER_NONE; + hwconfig_requested = hwconfig; + if (ioctl(sock, SIOCSHWTSTAMP, &hwtstamp) < 0) { + if ((errno == EINVAL || errno == ENOTSUP) && + hwconfig_requested.tx_type == HWTSTAMP_TX_OFF && + hwconfig_requested.rx_filter == HWTSTAMP_FILTER_NONE) + printf("SIOCSHWTSTAMP: disabling hardware time stamping not possible\n"); + else + bail("SIOCSHWTSTAMP"); + } + printf("SIOCSHWTSTAMP: tx_type %d requested, got %d; rx_filter %d requested, got %d\n", + hwconfig_requested.tx_type, hwconfig.tx_type, + hwconfig_requested.rx_filter, hwconfig.rx_filter); + + /* bind to PTP port */ + addr.sin_family = AF_INET; + addr.sin_addr.s_addr = htonl(INADDR_ANY); + addr.sin_port = htons(319 /* PTP event port */); + if (bind(sock, + (struct sockaddr *)&addr, + sizeof(struct sockaddr_in)) < 0) + bail("bind"); + + /* set multicast group for outgoing packets */ + inet_aton("224.0.1.130", &iaddr); /* alternate PTP domain 1 */ + addr.sin_addr = iaddr; + imr.imr_multiaddr.s_addr = iaddr.s_addr; + imr.imr_interface.s_addr = + ((struct sockaddr_in *)&device.ifr_addr)->sin_addr.s_addr; + if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_IF, + &imr.imr_interface.s_addr, sizeof(struct in_addr)) < 0) + bail("set multicast"); + + /* join multicast group, loop our own packet */ + if (setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, + &imr, sizeof(struct ip_mreq)) < 0) + bail("join multicast group"); + + if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_LOOP, + &ip_multicast_loop, sizeof(enabled)) < 0) { + bail("loop multicast"); + } + + /* set socket options for time stamping */ + if (so_timestamp && + setsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, + &enabled, sizeof(enabled)) < 0) + bail("setsockopt SO_TIMESTAMP"); + + if (so_timestampns && + setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS, + &enabled, sizeof(enabled)) < 0) + bail("setsockopt SO_TIMESTAMPNS"); + + if (so_timestamping_flags && + setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, + &so_timestamping_flags, + sizeof(so_timestamping_flags)) < 0) + bail("setsockopt SO_TIMESTAMPING"); + + /* request IP_PKTINFO for debugging purposes */ + if (setsockopt(sock, SOL_IP, IP_PKTINFO, + &enabled, sizeof(enabled)) < 0) + printf("%s: %s\n", "setsockopt IP_PKTINFO", strerror(errno)); + + /* verify socket options */ + len = sizeof(val); + if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, &val, &len) < 0) + printf("%s: %s\n", "getsockopt SO_TIMESTAMP", strerror(errno)); + else + printf("SO_TIMESTAMP %d\n", val); + + if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS, &val, &len) < 0) + printf("%s: %s\n", "getsockopt SO_TIMESTAMPNS", + strerror(errno)); + else + printf("SO_TIMESTAMPNS %d\n", val); + + if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, &val, &len) < 0) { + printf("%s: %s\n", "getsockopt SO_TIMESTAMPING", + strerror(errno)); + } else { + printf("SO_TIMESTAMPING %d\n", val); + if (val != so_timestamping_flags) + printf(" not the expected value %d\n", + so_timestamping_flags); + } + + /* send packets forever every five seconds */ + gettimeofday(&next, 0); + next.tv_sec = (next.tv_sec + 1) / 5 * 5; + next.tv_usec = 0; + while (1) { + struct timeval now; + struct timeval delta; + long delta_us; + int res; + fd_set readfs, errorfs; + + gettimeofday(&now, 0); + delta_us = (long)(next.tv_sec - now.tv_sec) * 1000000 + + (long)(next.tv_usec - now.tv_usec); + if (delta_us > 0) { + /* continue waiting for timeout or data */ + delta.tv_sec = delta_us / 1000000; + delta.tv_usec = delta_us % 1000000; + + FD_ZERO(&readfs); + FD_ZERO(&errorfs); + FD_SET(sock, &readfs); + FD_SET(sock, &errorfs); + printf("%ld.%06ld: select %ldus\n", + (long)now.tv_sec, (long)now.tv_usec, + delta_us); + res = select(sock + 1, &readfs, 0, &errorfs, &delta); + gettimeofday(&now, 0); + printf("%ld.%06ld: select returned: %d, %s\n", + (long)now.tv_sec, (long)now.tv_usec, + res, + res < 0 ? strerror(errno) : "success"); + if (res > 0) { + if (FD_ISSET(sock, &readfs)) + printf("ready for reading\n"); + if (FD_ISSET(sock, &errorfs)) + printf("has error\n"); + recvpacket(sock, 0, + siocgstamp, + siocgstampns); + recvpacket(sock, MSG_ERRQUEUE, + siocgstamp, + siocgstampns); + } + } else { + /* write one packet */ + sendpacket(sock, + (struct sockaddr *)&addr, + sizeof(addr)); + next.tv_sec += 5; + continue; + } + } + + return 0; +} diff --git a/Documentation/powerpc/dts-bindings/fsl/tsec.txt b/Documentation/powerpc/dts-bindings/fsl/tsec.txt index 7fa4b27574b5..edb7ae19e868 100644 --- a/Documentation/powerpc/dts-bindings/fsl/tsec.txt +++ b/Documentation/powerpc/dts-bindings/fsl/tsec.txt @@ -56,6 +56,12 @@ Properties: hardware. - fsl,magic-packet : If present, indicates that the hardware supports waking up via magic packet. + - bd-stash : If present, indicates that the hardware supports stashing + buffer descriptors in the L2. + - rx-stash-len : Denotes the number of bytes of a received buffer to stash + in the L2. + - rx-stash-idx : Denotes the index of the first byte from the received + buffer to stash in the L2. Example: ethernet@24000 { diff --git a/Documentation/scheduler/00-INDEX b/Documentation/scheduler/00-INDEX index aabcc3a089ba..3c00c9c3219e 100644 --- a/Documentation/scheduler/00-INDEX +++ b/Documentation/scheduler/00-INDEX @@ -2,8 +2,6 @@ - this file. sched-arch.txt - CPU Scheduler implementation hints for architecture specific code. -sched-coding.txt - - reference for various scheduler-related methods in the O(1) scheduler. sched-design-CFS.txt - goals, design and implementation of the Complete Fair Scheduler. sched-domains.txt diff --git a/Documentation/scheduler/sched-coding.txt b/Documentation/scheduler/sched-coding.txt deleted file mode 100644 index cbd8db752acf..000000000000 --- a/Documentation/scheduler/sched-coding.txt +++ /dev/null @@ -1,126 +0,0 @@ - Reference for various scheduler-related methods in the O(1) scheduler - Robert Love <rml@tech9.net>, MontaVista Software - - -Note most of these methods are local to kernel/sched.c - this is by design. -The scheduler is meant to be self-contained and abstracted away. This document -is primarily for understanding the scheduler, not interfacing to it. Some of -the discussed interfaces, however, are general process/scheduling methods. -They are typically defined in include/linux/sched.h. - - -Main Scheduling Methods ------------------------ - -void load_balance(runqueue_t *this_rq, int idle) - Attempts to pull tasks from one cpu to another to balance cpu usage, - if needed. This method is called explicitly if the runqueues are - imbalanced or periodically by the timer tick. Prior to calling, - the current runqueue must be locked and interrupts disabled. - -void schedule() - The main scheduling function. Upon return, the highest priority - process will be active. - - -Locking -------- - -Each runqueue has its own lock, rq->lock. When multiple runqueues need -to be locked, lock acquires must be ordered by ascending &runqueue value. - -A specific runqueue is locked via - - task_rq_lock(task_t pid, unsigned long *flags) - -which disables preemption, disables interrupts, and locks the runqueue pid is -running on. Likewise, - - task_rq_unlock(task_t pid, unsigned long *flags) - -unlocks the runqueue pid is running on, restores interrupts to their previous -state, and reenables preemption. - -The routines - - double_rq_lock(runqueue_t *rq1, runqueue_t *rq2) - -and - - double_rq_unlock(runqueue_t *rq1, runqueue_t *rq2) - -safely lock and unlock, respectively, the two specified runqueues. They do -not, however, disable and restore interrupts. Users are required to do so -manually before and after calls. - - -Values ------- - -MAX_PRIO - The maximum priority of the system, stored in the task as task->prio. - Lower priorities are higher. Normal (non-RT) priorities range from - MAX_RT_PRIO to (MAX_PRIO - 1). -MAX_RT_PRIO - The maximum real-time priority of the system. Valid RT priorities - range from 0 to (MAX_RT_PRIO - 1). -MAX_USER_RT_PRIO - The maximum real-time priority that is exported to user-space. Should - always be equal to or less than MAX_RT_PRIO. Setting it less allows - kernel threads to have higher priorities than any user-space task. -MIN_TIMESLICE -MAX_TIMESLICE - Respectively, the minimum and maximum timeslices (quanta) of a process. - -Data ----- - -struct runqueue - The main per-CPU runqueue data structure. -struct task_struct - The main per-process data structure. - - -General Methods ---------------- - -cpu_rq(cpu) - Returns the runqueue of the specified cpu. -this_rq() - Returns the runqueue of the current cpu. -task_rq(pid) - Returns the runqueue which holds the specified pid. -cpu_curr(cpu) - Returns the task currently running on the given cpu. -rt_task(pid) - Returns true if pid is real-time, false if not. - - -Process Control Methods ------------------------ - -void set_user_nice(task_t *p, long nice) - Sets the "nice" value of task p to the given value. -int setscheduler(pid_t pid, int policy, struct sched_param *param) - Sets the scheduling policy and parameters for the given pid. -int set_cpus_allowed(task_t *p, unsigned long new_mask) - Sets a given task's CPU affinity and migrates it to a proper cpu. - Callers must have a valid reference to the task and assure the - task not exit prematurely. No locks can be held during the call. -set_task_state(tsk, state_value) - Sets the given task's state to the given value. -set_current_state(state_value) - Sets the current task's state to the given value. -void set_tsk_need_resched(struct task_struct *tsk) - Sets need_resched in the given task. -void clear_tsk_need_resched(struct task_struct *tsk) - Clears need_resched in the given task. -void set_need_resched() - Sets need_resched in the current task. -void clear_need_resched() - Clears need_resched in the current task. -int need_resched() - Returns true if need_resched is set in the current task, false - otherwise. -yield() - Place the current process at the end of the runqueue and call schedule. diff --git a/Documentation/scsi/osd.txt b/Documentation/scsi/osd.txt new file mode 100644 index 000000000000..da162f7fd5f5 --- /dev/null +++ b/Documentation/scsi/osd.txt @@ -0,0 +1,198 @@ +The OSD Standard +================ +OSD (Object-Based Storage Device) is a T10 SCSI command set that is designed +to provide efficient operation of input/output logical units that manage the +allocation, placement, and accessing of variable-size data-storage containers, +called objects. Objects are intended to contain operating system and application +constructs. Each object has associated attributes attached to it, which are +integral part of the object and provide metadata about the object. The standard +defines some common obligatory attributes, but user attributes can be added as +needed. + +See: http://www.t10.org/ftp/t10/drafts/osd2/ for the latest draft for OSD 2 +or search the web for "OSD SCSI" + +OSD in the Linux Kernel +======================= +osd-initiator: + The main component of OSD in Kernel is the osd-initiator library. Its main +user is intended to be the pNFS-over-objects layout driver, which uses objects +as its back-end data storage. Other clients are the other osd parts listed below. + +osd-uld: + This is a SCSI ULD that registers for OSD type devices and provides a testing +platform, both for the in-kernel initiator as well as connected targets. It +currently has no useful user-mode API, though it could have if need be. + +exofs: + Is an OSD based Linux file system. It uses the osd-initiator and osd-uld, +to export a usable file system for users. +See Documentation/filesystems/exofs.txt for more details + +osd target: + There are no current plans for an OSD target implementation in kernel. For all +needs, a user-mode target that is based on the scsi tgt target framework is +available from Ohio Supercomputer Center (OSC) at: +http://www.open-osd.org/bin/view/Main/OscOsdProject +There are several other target implementations. See http://open-osd.org for more +links. + +Files and Folders +================= +This is the complete list of files included in this work: +include/scsi/ + osd_initiator.h Main API for the initiator library + osd_types.h Common OSD types + osd_sec.h Security Manager API + osd_protocol.h Wire definitions of the OSD standard protocol + osd_attributes.h Wire definitions of OSD attributes + +drivers/scsi/osd/ + osd_initiator.c OSD-Initiator library implementation + osd_uld.c The OSD scsi ULD + osd_ktest.{h,c} In-kernel test suite (called by osd_uld) + osd_debug.h Some printk macros + Makefile For both in-tree and out-of-tree compilation + Kconfig Enables inclusion of the different pieces + osd_test.c User-mode application to call the kernel tests + +The OSD-Initiator Library +========================= +osd_initiator is a low level implementation of an osd initiator encoder. +But even though, it should be intuitive and easy to use. Perhaps over time an +higher lever will form that automates some of the more common recipes. + +init/fini: +- osd_dev_init() associates a scsi_device with an osd_dev structure + and initializes some global pools. This should be done once per scsi_device + (OSD LUN). The osd_dev structure is needed for calling osd_start_request(). + +- osd_dev_fini() cleans up before a osd_dev/scsi_device destruction. + +OSD commands encoding, execution, and decoding of results: + +struct osd_request's is used to iteratively encode an OSD command and carry +its state throughout execution. Each request goes through these stages: + +a. osd_start_request() allocates the request. + +b. Any of the osd_req_* methods is used to encode a request of the specified + type. + +c. osd_req_add_{get,set}_attr_* may be called to add get/set attributes to the + CDB. "List" or "Page" mode can be used exclusively. The attribute-list API + can be called multiple times on the same request. However, only one + attribute-page can be read, as mandated by the OSD standard. + +d. osd_finalize_request() computes offsets into the data-in and data-out buffers + and signs the request using the provided capability key and integrity- + check parameters. + +e. osd_execute_request() may be called to execute the request via the block + layer and wait for its completion. The request can be executed + asynchronously by calling the block layer API directly. + +f. After execution, osd_req_decode_sense() can be called to decode the request's + sense information. + +g. osd_req_decode_get_attr() may be called to retrieve osd_add_get_attr_list() + values. + +h. osd_end_request() must be called to deallocate the request and any resource + associated with it. Note that osd_end_request cleans up the request at any + stage and it must always be called after a successful osd_start_request(). + +osd_request's structure: + +The OSD standard defines a complex structure of IO segments pointed to by +members in the CDB. Up to 3 segments can be deployed in the IN-Buffer and up to +4 in the OUT-Buffer. The ASCII illustration below depicts a secure-read with +associated get+set of attributes-lists. Other combinations very on the same +basic theme. From no-segments-used up to all-segments-used. + +|________OSD-CDB__________| +| | +|read_len (offset=0) -|---------\ +| | | +|get_attrs_list_length | | +|get_attrs_list_offset -|----\ | +| | | | +|retrieved_attrs_alloc_len| | | +|retrieved_attrs_offset -|----|----|-\ +| | | | | +|set_attrs_list_length | | | | +|set_attrs_list_offset -|-\ | | | +| | | | | | +|in_data_integ_offset -|-|--|----|-|-\ +|out_data_integ_offset -|-|--|--\ | | | +\_________________________/ | | | | | | + | | | | | | +|_______OUT-BUFFER________| | | | | | | +| Set attr list |</ | | | | | +| | | | | | | +|-------------------------| | | | | | +| Get attr descriptors |<---/ | | | | +| | | | | | +|-------------------------| | | | | +| Out-data integrity |<------/ | | | +| | | | | +\_________________________/ | | | + | | | +|________IN-BUFFER________| | | | +| In-Data read |<--------/ | | +| | | | +|-------------------------| | | +| Get attr list |<----------/ | +| | | +|-------------------------| | +| In-data integrity |<------------/ +| | +\_________________________/ + +A block device request can carry bidirectional payload by means of associating +a bidi_read request with a main write-request. Each in/out request is described +by a chain of BIOs associated with each request. +The CDB is of a SCSI VARLEN CDB format, as described by OSD standard. +The OSD standard also mandates alignment restrictions at start of each segment. + +In the code, in struct osd_request, there are two _osd_io_info structures to +describe the IN/OUT buffers above, two BIOs for the data payload and up to five +_osd_req_data_segment structures to hold the different segments allocation and +information. + +Important: We have chosen to disregard the assumption that a BIO-chain (and +the resulting sg-list) describes a linear memory buffer. Meaning only first and +last scatter chain can be incomplete and all the middle chains are of PAGE_SIZE. +For us, a scatter-gather-list, as its name implies and as used by the Networking +layer, is to describe a vector of buffers that will be transferred to/from the +wire. It works very well with current iSCSI transport. iSCSI is currently the +only deployed OSD transport. In the future we anticipate SAS and FC attached OSD +devices as well. + +The OSD Testing ULD +=================== +TODO: More user-mode control on tests. + +Authors, Mailing list +===================== +Please communicate with us on any deployment of osd, whether using this code +or not. + +Any problems, questions, bug reports, lonely OSD nights, please email: + OSD Dev List <osd-dev@open-osd.org> + +More up-to-date information can be found on: +http://open-osd.org + +Boaz Harrosh <bharrosh@panasas.com> +Benny Halevy <bhalevy@panasas.com> + +References +========== +Weber, R., "SCSI Object-Based Storage Device Commands", +T10/1355-D ANSI/INCITS 400-2004, +http://www.t10.org/ftp/t10/drafts/osd/osd-r10.pdf + +Weber, R., "SCSI Object-Based Storage Device Commands -2 (OSD-2)" +T10/1729-D, Working Draft, rev. 3 +http://www.t10.org/ftp/t10/drafts/osd2/osd2r03.pdf diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt index 841a9365d5fd..012858d2b119 100644 --- a/Documentation/sound/alsa/ALSA-Configuration.txt +++ b/Documentation/sound/alsa/ALSA-Configuration.txt @@ -346,6 +346,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. sbirq - IRQ # for CMI8330 chip (SB16) sbdma8 - 8bit DMA # for CMI8330 chip (SB16) sbdma16 - 16bit DMA # for CMI8330 chip (SB16) + fmport - (optional) OPL3 I/O port + mpuport - (optional) MPU401 I/O port + mpuirq - (optional) MPU401 irq # This module supports multiple cards and autoprobe. @@ -388,34 +391,11 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. The power-management is supported. - Module snd-cs4232 - ----------------- - - Module for sound cards based on CS4232/CS4232A ISA chips. - - isapnp - ISA PnP detection - 0 = disable, 1 = enable (default) - - with isapnp=0, the following options are available: - - port - port # for CS4232 chip (PnP setup - 0x534) - cport - control port # for CS4232 chip (PnP setup - 0x120,0x210,0xf00) - mpu_port - port # for MPU-401 UART (PnP setup - 0x300), -1 = disable - fm_port - FM port # for CS4232 chip (PnP setup - 0x388), -1 = disable - irq - IRQ # for CS4232 chip (5,7,9,11,12,15) - mpu_irq - IRQ # for MPU-401 UART (9,11,12,15) - dma1 - first DMA # for CS4232 chip (0,1,3) - dma2 - second DMA # for Yamaha CS4232 chip (0,1,3), -1 = disable - - This module supports multiple cards. This module does not support autoprobe - (if ISA PnP is not used) thus main port must be specified!!! Other ports are - optional. - - The power-management is supported. - Module snd-cs4236 ----------------- - Module for sound cards based on CS4235/CS4236/CS4236B/CS4237B/ + Module for sound cards based on CS4232/CS4232A, + CS4235/CS4236/CS4236B/CS4237B/ CS4238B/CS4239 ISA chips. isapnp - ISA PnP detection - 0 = disable, 1 = enable (default) @@ -437,6 +417,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. The power-management is supported. + This module is aliased as snd-cs4232 since it provides the old + snd-cs4232 functionality, too. + Module snd-cs4281 ----------------- @@ -606,6 +589,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. Module for ESS AudioDrive ES-1688 and ES-688 sound cards. port - port # for ES-1688 chip (0x220,0x240,0x260) + fm_port - port # for OPL3 (option; share the same port as default) mpu_port - port # for MPU-401 port (0x300,0x310,0x320,0x330), -1 = disable (default) irq - IRQ # for ES-1688 chip (5,7,9,10) mpu_irq - IRQ # for MPU-401 port (5,7,9,10) @@ -757,6 +741,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. model - force the model name position_fix - Fix DMA pointer (0 = auto, 1 = use LPIB, 2 = POSBUF) probe_mask - Bitmask to probe codecs (default = -1, meaning all slots) + When the bit 8 (0x100) is set, the lower 8 bits are used + as the "fixed" codec slots; i.e. the driver probes the + slots regardless what hardware reports back probe_only - Only probing and no codec initialization (default=off); Useful to check the initial codec status for debugging bdl_pos_adj - Specifies the DMA IRQ timing delay in samples. @@ -1185,6 +1172,54 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. This module supports multiple devices and PnP. + Module snd-msnd-classic + ----------------------- + + Module for Turtle Beach MultiSound Classic, Tahiti or Monterey + soundcards. + + io - Port # for msnd-classic card + irq - IRQ # for msnd-classic card + mem - Memory address (0xb0000, 0xc8000, 0xd0000, 0xd8000, + 0xe0000 or 0xe8000) + write_ndelay - enable write ndelay (default = 1) + calibrate_signal - calibrate signal (default = 0) + isapnp - ISA PnP detection - 0 = disable, 1 = enable (default) + digital - Digital daughterboard present (default = 0) + cfg - Config port (0x250, 0x260 or 0x270) default = PnP + reset - Reset all devices + mpu_io - MPU401 I/O port + mpu_irq - MPU401 irq# + ide_io0 - IDE port #0 + ide_io1 - IDE port #1 + ide_irq - IDE irq# + joystick_io - Joystick I/O port + + The driver requires firmware files "turtlebeach/msndinit.bin" and + "turtlebeach/msndperm.bin" in the proper firmware directory. + + See Documentation/sound/oss/MultiSound for important information + about this driver. Note that it has been discontinued, but the + Voyetra Turtle Beach knowledge base entry for it is still available + at + http://www.turtlebeach.com/site/kb_ftp/790.asp + + Module snd-msnd-pinnacle + ------------------------ + + Module for Turtle Beach MultiSound Pinnacle/Fiji soundcards. + + io - Port # for pinnacle/fiji card + irq - IRQ # for pinnalce/fiji card + mem - Memory address (0xb0000, 0xc8000, 0xd0000, 0xd8000, + 0xe0000 or 0xe8000) + write_ndelay - enable write ndelay (default = 1) + calibrate_signal - calibrate signal (default = 0) + isapnp - ISA PnP detection - 0 = disable, 1 = enable (default) + + The driver requires firmware files "turtlebeach/pndspini.bin" and + "turtlebeach/pndsperm.bin" in the proper firmware directory. + Module snd-mtpav ---------------- @@ -1824,7 +1859,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. ------------------- Module for sound cards based on the Asus AV100/AV200 chips, - i.e., Xonar D1, DX, D2, D2X and HDAV1.3 (Deluxe). + i.e., Xonar D1, DX, D2, D2X, HDAV1.3 (Deluxe), and Essence STX. This module supports autoprobe and multiple cards. diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index 0f5d26bea80f..8eec05bc079e 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt @@ -56,6 +56,7 @@ ALC262 sony-assamd Sony ASSAMD toshiba-s06 Toshiba S06 toshiba-rx1 Toshiba RX1 + tyan Tyan Thunder n6650W (S2915-E) ultra Samsung Q1 Ultra Vista model lenovo-3000 Lenovo 3000 y410 nec NEC Versa S9100 @@ -261,6 +262,8 @@ Conexant 5051 ============= laptop Basic Laptop config (default) hp HP Spartan laptop + hp-dv6736 HP dv6736 + lenovo-x200 Lenovo X200 laptop STAC9200 ======== @@ -278,6 +281,7 @@ STAC9200 gateway-m4 Gateway laptops with EAPD control gateway-m4-2 Gateway laptops with EAPD control panasonic Panasonic CF-74 + auto BIOS setup (default) STAC9205/9254 ============= @@ -285,6 +289,8 @@ STAC9205/9254 dell-m42 Dell (unknown) dell-m43 Dell Precision dell-m44 Dell Inspiron + eapd Keep EAPD on (e.g. Gateway T1616) + auto BIOS setup (default) STAC9220/9221 ============= @@ -308,6 +314,7 @@ STAC9220/9221 dell-d82 Dell (unknown) dell-m81 Dell (unknown) dell-m82 Dell XPS M1210 + auto BIOS setup (default) STAC9202/9250/9251 ================== @@ -319,6 +326,7 @@ STAC9202/9250/9251 m3 Some Gateway MX series laptops m5 Some Gateway MX series laptops (MP6954) m6 Some Gateway NX series laptops + auto BIOS setup (default) STAC9227/9228/9229/927x ======================= @@ -328,6 +336,7 @@ STAC9227/9228/9229/927x 5stack D965 5stack + SPDIF dell-3stack Dell Dimension E520 dell-bios Fixes with Dell BIOS setup + auto BIOS setup (default) STAC92HD71B* ============ @@ -335,7 +344,10 @@ STAC92HD71B* dell-m4-1 Dell desktops dell-m4-2 Dell desktops dell-m4-3 Dell desktops - hp-m4 HP dv laptops + hp-m4 HP mini 1000 + hp-dv5 HP dv series + hp-hdx HP HDX series + auto BIOS setup (default) STAC92HD73* =========== @@ -345,13 +357,16 @@ STAC92HD73* dell-m6-dmic Dell desktops/laptops with digital mics dell-m6 Dell desktops/laptops with both type of mics dell-eq Dell desktops/laptops + auto BIOS setup (default) STAC92HD83* =========== ref Reference board mic-ref Reference board with power managment for ports + dell-s14 Dell laptop + auto BIOS setup (default) STAC9872 ======== - vaio Setup for VAIO FE550G/SZ110 - vaio-ar Setup for VAIO AR + vaio VAIO laptop without SPDIF + auto BIOS setup (default) diff --git a/Documentation/sound/alsa/HD-Audio.txt b/Documentation/sound/alsa/HD-Audio.txt index 8d68fff71839..c5948f2f9a25 100644 --- a/Documentation/sound/alsa/HD-Audio.txt +++ b/Documentation/sound/alsa/HD-Audio.txt @@ -109,6 +109,13 @@ slot, pass `probe_mask=1`. For the first and the third slots, pass Since 2.6.29 kernel, the driver has a more robust probing method, so this error might happen rarely, though. +On a machine with a broken BIOS, sometimes you need to force the +driver to probe the codec slots the hardware doesn't report for use. +In such a case, turn the bit 8 (0x100) of `probe_mask` option on. +Then the rest 8 bits are passed as the codec slots to probe +unconditionally. For example, `probe_mask=0x103` will force to probe +the codec slots 0 and 1 no matter what the hardware reports. + Interrupt Handling ~~~~~~~~~~~~~~~~~~ @@ -358,10 +365,26 @@ modelname:: to this file. init_verbs:: The extra verbs to execute at initialization. You can add a verb by - writing to this file. Pass tree numbers, nid, verb and parameter. + writing to this file. Pass three numbers: nid, verb and parameter + (separated with a space). hints:: - Shows hint strings for codec parsers for any use. Right now it's - not used. + Shows / stores hint strings for codec parsers for any use. + Its format is `key = value`. For example, passing `hp_detect = yes` + to IDT/STAC codec parser will result in the disablement of the + headphone detection. +init_pin_configs:: + Shows the initial pin default config values set by BIOS. +driver_pin_configs:: + Shows the pin default values set by the codec parser explicitly. + This doesn't show all pin values but only the changed values by + the parser. That is, if the parser doesn't change the pin default + config values by itself, this will contain nothing. +user_pin_configs:: + Shows the pin default config values to override the BIOS setup. + Writing this (with two numbers, NID and value) appends the new + value. The given will be used instead of the initial BIOS value at + the next reconfiguration time. Note that this config will override + even the driver pin configs, too. reconfig:: Triggers the codec re-configuration. When any value is written to this file, the driver re-initialize and parses the codec tree @@ -371,6 +394,14 @@ clear:: Resets the codec, removes the mixer elements and PCM stuff of the specified codec, and clear all init verbs and hints. +For example, when you want to change the pin default configuration +value of the pin widget 0x14 to 0x9993013f, and let the driver +re-configure based on that state, run like below: +------------------------------------------------------------------------ + # echo 0x14 0x9993013f > /sys/class/sound/hwC0D0/user_pin_configs + # echo 1 > /sys/class/sound/hwC0D0/reconfig +------------------------------------------------------------------------ + Power-Saving ~~~~~~~~~~~~ @@ -461,6 +492,16 @@ run with `--no-upload` option, and attach the generated file. There are some other useful options. See `--help` option output for details. +When a probe error occurs or when the driver obviously assigns a +mismatched model, it'd be helpful to load the driver with +`probe_only=1` option (at best after the cold reboot) and run +alsa-info at this state. With this option, the driver won't configure +the mixer and PCM but just tries to probe the codec slot. After +probing, the proc file is available, so you can get the raw codec +information before modified by the driver. Of course, the driver +isn't usable with `probe_only=1`. But you can continue the +configuration via hwdep sysfs file if hda-reconfig option is enabled. + hda-verb ~~~~~~~~ diff --git a/Documentation/sound/alsa/soc/dapm.txt b/Documentation/sound/alsa/soc/dapm.txt index 46f9684d0b29..9e6763264a2e 100644 --- a/Documentation/sound/alsa/soc/dapm.txt +++ b/Documentation/sound/alsa/soc/dapm.txt @@ -116,6 +116,9 @@ SOC_DAPM_SINGLE("HiFi Playback Switch", WM8731_APANA, 4, 1, 0), SND_SOC_DAPM_MIXER("Output Mixer", WM8731_PWR, 4, 1, wm8731_output_mixer_controls, ARRAY_SIZE(wm8731_output_mixer_controls)), +If you dont want the mixer elements prefixed with the name of the mixer widget, +you can use SND_SOC_DAPM_MIXER_NAMED_CTL instead. the parameters are the same +as for SND_SOC_DAPM_MIXER. 2.3 Platform/Machine domain Widgets ----------------------------------- diff --git a/Documentation/sound/oss/CS4232 b/Documentation/sound/oss/CS4232 deleted file mode 100644 index 7d6af7a5c1c2..000000000000 --- a/Documentation/sound/oss/CS4232 +++ /dev/null @@ -1,23 +0,0 @@ -To configure the Crystal CS423x sound chip and activate its DSP functions, -modules may be loaded in this order: - - modprobe sound - insmod ad1848 - insmod uart401 - insmod cs4232 io=* irq=* dma=* dma2=* - -This is the meaning of the parameters: - - io--I/O address of the Windows Sound System (normally 0x534) - irq--IRQ of this device - dma and dma2--DMA channels (DMA2 may be 0) - -On some cards, the board attempts to do non-PnP setup, and fails. If you -have problems, use Linux' PnP facilities. - -To get MIDI facilities add - - insmod opl3 io=* - -where "io" is the I/O address of the OPL3 synthesizer. This will be shown -in /proc/sys/pnp and is normally 0x388. diff --git a/Documentation/sound/oss/Introduction b/Documentation/sound/oss/Introduction index f04ba6bb7395..75d967ff9266 100644 --- a/Documentation/sound/oss/Introduction +++ b/Documentation/sound/oss/Introduction @@ -80,7 +80,7 @@ Notes: additional features. 2. The commercial OSS driver may be obtained from the site: - http://www/opensound.com. This may be used for cards that + http://www.opensound.com. This may be used for cards that are unsupported by the kernel driver, or may be used by other operating systems. diff --git a/Documentation/usb/usbmon.txt b/Documentation/usb/usbmon.txt index 270481906dc8..6c3c625b7f30 100644 --- a/Documentation/usb/usbmon.txt +++ b/Documentation/usb/usbmon.txt @@ -229,16 +229,26 @@ struct usbmon_packet { int status; /* 28: */ unsigned int length; /* 32: Length of data (submitted or actual) */ unsigned int len_cap; /* 36: Delivered length */ - unsigned char setup[8]; /* 40: Only for Control 'S' */ -}; /* 48 bytes total */ + union { /* 40: */ + unsigned char setup[SETUP_LEN]; /* Only for Control S-type */ + struct iso_rec { /* Only for ISO */ + int error_count; + int numdesc; + } iso; + } s; + int interval; /* 48: Only for Interrupt and ISO */ + int start_frame; /* 52: For ISO */ + unsigned int xfer_flags; /* 56: copy of URB's transfer_flags */ + unsigned int ndesc; /* 60: Actual number of ISO descriptors */ +}; /* 64 total length */ These events can be received from a character device by reading with read(2), -with an ioctl(2), or by accessing the buffer with mmap. +with an ioctl(2), or by accessing the buffer with mmap. However, read(2) +only returns first 48 bytes for compatibility reasons. The character device is usually called /dev/usbmonN, where N is the USB bus number. Number zero (/dev/usbmon0) is special and means "all buses". -However, this feature is not implemented yet. Note that specific naming -policy is set by your Linux distribution. +Note that specific naming policy is set by your Linux distribution. If you create /dev/usbmon0 by hand, make sure that it is owned by root and has mode 0600. Otherwise, unpriviledged users will be able to snoop @@ -279,9 +289,10 @@ size is out of [unspecified] bounds for this kernel, the call fails with This call returns the current size of the buffer in bytes. MON_IOCX_GET, defined as _IOW(MON_IOC_MAGIC, 6, struct mon_get_arg) + MON_IOCX_GETX, defined as _IOW(MON_IOC_MAGIC, 10, struct mon_get_arg) -This call waits for events to arrive if none were in the kernel buffer, -then returns the first event. Its argument is a pointer to the following +These calls wait for events to arrive if none were in the kernel buffer, +then return the first event. The argument is a pointer to the following structure: struct mon_get_arg { @@ -294,6 +305,8 @@ Before the call, hdr, data, and alloc should be filled. Upon return, the area pointed by hdr contains the next event structure, and the data buffer contains the data, if any. The event is removed from the kernel buffer. +The MON_IOCX_GET copies 48 bytes, MON_IOCX_GETX copies 64 bytes. + MON_IOCX_MFETCH, defined as _IOWR(MON_IOC_MAGIC, 7, struct mon_mfetch_arg) This ioctl is primarily used when the application accesses the buffer diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt index 7b4596ac4120..e0203662f9e9 100644 --- a/Documentation/x86/boot.txt +++ b/Documentation/x86/boot.txt @@ -158,7 +158,7 @@ Offset Proto Name Meaning 0202/4 2.00+ header Magic signature "HdrS" 0206/2 2.00+ version Boot protocol version supported 0208/4 2.00+ realmode_swtch Boot loader hook (see below) -020C/2 2.00+ start_sys The load-low segment (0x1000) (obsolete) +020C/2 2.00+ start_sys_seg The load-low segment (0x1000) (obsolete) 020E/2 2.00+ kernel_version Pointer to kernel version string 0210/1 2.00+ type_of_loader Boot loader identifier 0211/1 2.00+ loadflags Boot protocol option flags @@ -170,10 +170,11 @@ Offset Proto Name Meaning 0224/2 2.01+ heap_end_ptr Free memory after setup end 0226/2 N/A pad1 Unused 0228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line -022C/4 2.03+ initrd_addr_max Highest legal initrd address +022C/4 2.03+ ramdisk_max Highest legal initrd address 0230/4 2.05+ kernel_alignment Physical addr alignment required for kernel 0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not -0235/3 N/A pad2 Unused +0235/1 N/A pad2 Unused +0236/2 N/A pad3 Unused 0238/4 2.06+ cmdline_size Maximum size of the kernel command line 023C/4 2.07+ hardware_subarch Hardware subarchitecture 0240/8 2.07+ hardware_subarch_data Subarchitecture-specific data @@ -299,14 +300,14 @@ Protocol: 2.00+ e.g. 0x0204 for version 2.04, and 0x0a11 for a hypothetical version 10.17. -Field name: readmode_swtch +Field name: realmode_swtch Type: modify (optional) Offset/size: 0x208/4 Protocol: 2.00+ Boot loader hook (see ADVANCED BOOT LOADER HOOKS below.) -Field name: start_sys +Field name: start_sys_seg Type: read Offset/size: 0x20c/2 Protocol: 2.00+ @@ -468,7 +469,7 @@ Protocol: 2.02+ zero, the kernel will assume that your boot loader does not support the 2.02+ protocol. -Field name: initrd_addr_max +Field name: ramdisk_max Type: read Offset/size: 0x22c/4 Protocol: 2.03+ @@ -542,7 +543,10 @@ Protocol: 2.08+ The payload may be compressed. The format of both the compressed and uncompressed data should be determined using the standard magic - numbers. Currently only gzip compressed ELF is used. + numbers. The currently supported compression formats are gzip + (magic numbers 1F 8B or 1F 9E), bzip2 (magic number 42 5A) and LZMA + (magic number 5D 00). The uncompressed payload is currently always ELF + (magic number 7F 45 4C 46). Field name: payload_length Type: read |